Annotation Instructions

CMV Delta Samples — Persuasiveness & TOM Values

Purpose

We are building evaluation metrics to assess whether a comment is persuasive in the context of r/ChangeMyView (CMV). This annotation will be used for meta-evaluation: comparing human judgments with our automated metric. We focus on TOM (Theory of Mind) categories and their extracted values.

Your labels help us validate whether the TOM values we extract make sense given the text, and we will use these judgments to study alignment and value shift between post and persuasive comment later.

What You Will Annotate

You will see 30 samples, each containing:

Post: OP's view and TOM analysis (beliefs, desires, intentions, emotions, knowledge, perspective-taking — each with values and content).
Persuasive comment: The comment that received a delta, with TOM analysis for the same six categories.
Delta reply: OP's reply awarding the delta (for context).

For each sample you answer 7 binary (Yes/No) questions: 1 about persuasiveness, then 6 about whether the TOM values for the persuasive comment make sense given the comment text. No scales or free text are required — only Yes or No (plus an optional note per sample).

The Binary Questions

Q1 — Persuasiveness (Human Judgment)

In your view, is the designated “persuasive” comment actually persuasive — i.e., does it provide a clear, substantive reason why the OP might change their view?

Yes (Y): The comment gives a recognizable, substantive argument or reason that could reasonably lead the OP to change their view. The delta makes sense in context.
No (N): The comment does not clearly explain why the OP would change their view; the link between the comment and the delta is weak or confusing, or the comment is off-topic, purely emotional without argument, or not substantive.

We are not asking whether you would be persuaded. We are asking whether this comment, in context, offers a clear reason for the OP's stated change of view.

Q2–Q7 — TOM Category Values

For the persuasive comment only, we extracted TOM values for six categories. For each category, answer:

Do the extracted values for this category make sense given the comment text?

Yes (Y): The values listed for this category are supported by the comment — you can see how the author's words reflect these values (beliefs, desires, intentions, emotions, knowledge claims, or perspective-taking).
No (N): The values for this category do not fit the comment well — they are off, exaggerated, missing the main point, or not grounded in what the comment actually says.

Use the comment body as the “text” and the TOM section on the page for the extracted values. Judge each category on its own.

#	Category	What to Check
Q2	Beliefs	Do the listed belief-values match what the commenter seems to believe or assert?
Q3	Desires	Do the desire-values match what the commenter wants or prioritizes?
Q4	Intentions	Do the intention-values match what the commenter is trying to do or achieve?
Q5	Emotions	Do the emotion-values match the emotional tone or appeals in the comment?
Q6	Knowledge	Do the knowledge-values match what the commenter claims to know or how they use evidence?
Q7	Perspective Taking	Do the perspective-taking values match how the commenter considers other viewpoints?

If a category has no clear content in the comment (for example, no strong emotions), “make sense” can mean: the extraction is appropriately minimal or says “none,” or the values are at least not wrong. When in doubt, choose Y if the values are plausible given the text; N if they are clearly off.

How to Annotate in This Website

Sign up or log in using your email and password.
Go to the Annotate page. You will be taken to the first unfinished sample.
For each sample, read:
- The Post (for context, especially its TOM section if needed).
- The Persuasive comment — this is what you mainly judge for all questions.
- The Delta reply (OP awarding the delta), which you can use for additional context.
- The TOM analysis for the persuasive comment (Beliefs, Desires, Intentions, Emotions, Knowledge, Perspective Taking).
Answer Q1–Q7 with Yes or No, using the criteria above.
Optionally, add a short note if you want to comment on that sample.
If the sample seems unusable or clearly bad, you can mark it with “Bad instance”.
Click Save & Next to save your answers and move forward, or Save & Previous to go back while saving.

How Progress and Resuming Work

Your answers are saved every time you submit the form.
You can navigate back to earlier samples and change answers at any time.
When you log out and later log back in, the site will send you to the first unfinished sample.

Summary of Questions

Question	What You Judge	Y	N
Q1 — Persuasiveness	Does the persuasive comment give a clear, substantive reason for the OP to change their view?	Yes	No
Q2 — Beliefs	Do the TOM Beliefs values for the persuasive comment make sense given the text?	Yes	No
Q3 — Desires	Do the TOM Desires values make sense given the text?	Yes	No
Q4 — Intentions	Do the TOM Intentions values make sense given the text?	Yes	No
Q5 — Emotions	Do the TOM Emotions values make sense given the text?	Yes	No
Q6 — Knowledge	Do the TOM Knowledge values make sense given the text?	Yes	No
Q7 — Perspective Taking	Do the TOM Perspective Taking values make sense given the text?	Yes	No

Tips

Consistency: Apply the same criteria across all 30 samples.
Focus on TOM: Judge only the TOM categories and values for the persuasive comment.
Binary only: If you are on the fence, choose the option that fits best and stay consistent with similar cases.
No discussion: Annotate independently; don’t coordinate answers with others.

Output & Next Steps

We will compute inter-annotator agreement (e.g., majority vote or Cohen's Kappa) and use your labels as ground truth for meta-evaluation against our automated persuasiveness and TOM metrics.

We will also use the same samples to study TOM categories, values alignment, and values shift between post and persuasive comment.

Thank you for your careful annotation.