Annotation Instructions

CMV Delta Samples — Persuasiveness & TOM Values

Purpose

We are building evaluation metrics to assess whether a comment is persuasive in the context of r/ChangeMyView (CMV). This annotation will be used for meta-evaluation: comparing human judgments with our automated metric. We focus on TOM (Theory of Mind) categories and their extracted values.

Your labels help us validate whether the TOM values we extract make sense given the text, and we will use these judgments to study alignment and value shift between post and persuasive comment later.

What You Will Annotate

You will see 30 samples, each containing:

For each sample you answer 7 binary (Yes/No) questions: 1 about persuasiveness, then 6 about whether the TOM values for the persuasive comment make sense given the comment text. No scales or free text are required — only Yes or No (plus an optional note per sample).

The Binary Questions

Q1 — Persuasiveness (Human Judgment)

In your view, is the designated “persuasive” comment actually persuasive — i.e., does it provide a clear, substantive reason why the OP might change their view?

We are not asking whether you would be persuaded. We are asking whether this comment, in context, offers a clear reason for the OP's stated change of view.

Q2–Q7 — TOM Category Values

For the persuasive comment only, we extracted TOM values for six categories. For each category, answer:

Do the extracted values for this category make sense given the comment text?

Use the comment body as the “text” and the TOM section on the page for the extracted values. Judge each category on its own.

# Category What to Check
Q2 Beliefs Do the listed belief-values match what the commenter seems to believe or assert?
Q3 Desires Do the desire-values match what the commenter wants or prioritizes?
Q4 Intentions Do the intention-values match what the commenter is trying to do or achieve?
Q5 Emotions Do the emotion-values match the emotional tone or appeals in the comment?
Q6 Knowledge Do the knowledge-values match what the commenter claims to know or how they use evidence?
Q7 Perspective Taking Do the perspective-taking values match how the commenter considers other viewpoints?

If a category has no clear content in the comment (for example, no strong emotions), “make sense” can mean: the extraction is appropriately minimal or says “none,” or the values are at least not wrong. When in doubt, choose Y if the values are plausible given the text; N if they are clearly off.

How to Annotate in This Website

  1. Sign up or log in using your email and password.
  2. Go to the Annotate page. You will be taken to the first unfinished sample.
  3. For each sample, read:
    • The Post (for context, especially its TOM section if needed).
    • The Persuasive comment — this is what you mainly judge for all questions.
    • The Delta reply (OP awarding the delta), which you can use for additional context.
    • The TOM analysis for the persuasive comment (Beliefs, Desires, Intentions, Emotions, Knowledge, Perspective Taking).
  4. Answer Q1–Q7 with Yes or No, using the criteria above.
  5. Optionally, add a short note if you want to comment on that sample.
  6. If the sample seems unusable or clearly bad, you can mark it with “Bad instance”.
  7. Click Save & Next to save your answers and move forward, or Save & Previous to go back while saving.

How Progress and Resuming Work

Summary of Questions

Question What You Judge Y N
Q1 — Persuasiveness Does the persuasive comment give a clear, substantive reason for the OP to change their view? Yes No
Q2 — Beliefs Do the TOM Beliefs values for the persuasive comment make sense given the text? Yes No
Q3 — Desires Do the TOM Desires values make sense given the text? Yes No
Q4 — Intentions Do the TOM Intentions values make sense given the text? Yes No
Q5 — Emotions Do the TOM Emotions values make sense given the text? Yes No
Q6 — Knowledge Do the TOM Knowledge values make sense given the text? Yes No
Q7 — Perspective Taking Do the TOM Perspective Taking values make sense given the text? Yes No

Tips

Output & Next Steps

We will compute inter-annotator agreement (e.g., majority vote or Cohen's Kappa) and use your labels as ground truth for meta-evaluation against our automated persuasiveness and TOM metrics.

We will also use the same samples to study TOM categories, values alignment, and values shift between post and persuasive comment.


Thank you for your careful annotation.