Annotation Instructions
CMV Delta Samples — Persuasiveness & TOM Values
Purpose
We are building evaluation metrics to assess whether a comment is persuasive in the context of r/ChangeMyView (CMV). This annotation will be used for meta-evaluation: comparing human judgments with our automated metric. We focus on TOM (Theory of Mind) categories and their extracted values.
Your labels help us validate whether the TOM values we extract make sense given the text, and we will use these judgments to study alignment and value shift between post and persuasive comment later.
What You Will Annotate
You will see 30 samples, each containing:
- Post: OP's view and TOM analysis (beliefs, desires, intentions, emotions, knowledge, perspective-taking — each with values and content).
- Persuasive comment: The comment that received a delta, with TOM analysis for the same six categories.
- Delta reply: OP's reply awarding the delta (for context).
For each sample you answer 7 binary (Yes/No) questions: 1 about persuasiveness, then 6 about whether the TOM values for the persuasive comment make sense given the comment text. No scales or free text are required — only Yes or No (plus an optional note per sample).
The Binary Questions
Q1 — Persuasiveness (Human Judgment)
In your view, is the designated “persuasive” comment actually persuasive — i.e., does it provide a clear, substantive reason why the OP might change their view?
- Yes (Y): The comment gives a recognizable, substantive argument or reason that could reasonably lead the OP to change their view. The delta makes sense in context.
- No (N): The comment does not clearly explain why the OP would change their view; the link between the comment and the delta is weak or confusing, or the comment is off-topic, purely emotional without argument, or not substantive.
We are not asking whether you would be persuaded. We are asking whether this comment, in context, offers a clear reason for the OP's stated change of view.
Q2–Q7 — TOM Category Values
For the persuasive comment only, we extracted TOM values for six categories. For each category, answer:
Do the extracted values for this category make sense given the comment text?
- Yes (Y): The values listed for this category are supported by the comment — you can see how the author's words reflect these values (beliefs, desires, intentions, emotions, knowledge claims, or perspective-taking).
- No (N): The values for this category do not fit the comment well — they are off, exaggerated, missing the main point, or not grounded in what the comment actually says.
Use the comment body as the “text” and the TOM section on the page for the extracted values. Judge each category on its own.
| # | Category | What to Check |
|---|---|---|
| Q2 | Beliefs | Do the listed belief-values match what the commenter seems to believe or assert? |
| Q3 | Desires | Do the desire-values match what the commenter wants or prioritizes? |
| Q4 | Intentions | Do the intention-values match what the commenter is trying to do or achieve? |
| Q5 | Emotions | Do the emotion-values match the emotional tone or appeals in the comment? |
| Q6 | Knowledge | Do the knowledge-values match what the commenter claims to know or how they use evidence? |
| Q7 | Perspective Taking | Do the perspective-taking values match how the commenter considers other viewpoints? |
If a category has no clear content in the comment (for example, no strong emotions), “make sense” can mean: the extraction is appropriately minimal or says “none,” or the values are at least not wrong. When in doubt, choose Y if the values are plausible given the text; N if they are clearly off.
How to Annotate in This Website
- Sign up or log in using your email and password.
- Go to the Annotate page. You will be taken to the first unfinished sample.
-
For each sample, read:
- The Post (for context, especially its TOM section if needed).
- The Persuasive comment — this is what you mainly judge for all questions.
- The Delta reply (OP awarding the delta), which you can use for additional context.
- The TOM analysis for the persuasive comment (Beliefs, Desires, Intentions, Emotions, Knowledge, Perspective Taking).
- Answer Q1–Q7 with Yes or No, using the criteria above.
- Optionally, add a short note if you want to comment on that sample.
- If the sample seems unusable or clearly bad, you can mark it with “Bad instance”.
- Click Save & Next to save your answers and move forward, or Save & Previous to go back while saving.
How Progress and Resuming Work
- Your answers are saved every time you submit the form.
- You can navigate back to earlier samples and change answers at any time.
- When you log out and later log back in, the site will send you to the first unfinished sample.
Summary of Questions
| Question | What You Judge | Y | N |
|---|---|---|---|
| Q1 — Persuasiveness | Does the persuasive comment give a clear, substantive reason for the OP to change their view? | Yes | No |
| Q2 — Beliefs | Do the TOM Beliefs values for the persuasive comment make sense given the text? | Yes | No |
| Q3 — Desires | Do the TOM Desires values make sense given the text? | Yes | No |
| Q4 — Intentions | Do the TOM Intentions values make sense given the text? | Yes | No |
| Q5 — Emotions | Do the TOM Emotions values make sense given the text? | Yes | No |
| Q6 — Knowledge | Do the TOM Knowledge values make sense given the text? | Yes | No |
| Q7 — Perspective Taking | Do the TOM Perspective Taking values make sense given the text? | Yes | No |
Tips
- Consistency: Apply the same criteria across all 30 samples.
- Focus on TOM: Judge only the TOM categories and values for the persuasive comment.
- Binary only: If you are on the fence, choose the option that fits best and stay consistent with similar cases.
- No discussion: Annotate independently; don’t coordinate answers with others.
Output & Next Steps
We will compute inter-annotator agreement (e.g., majority vote or Cohen's Kappa) and use your labels as ground truth for meta-evaluation against our automated persuasiveness and TOM metrics.
We will also use the same samples to study TOM categories, values alignment, and values shift between post and persuasive comment.
Thank you for your careful annotation.