Self-Correction (Language Models) A capability wherein a language model identifies and rectifies errors, inconsistencies, or flaws within its own reasoning or generated output. This can occur in two primary ways:
- Multi-Turn Self-Correction: An iterative process where the model generates an output, critiques it (often prompted), and then provides a revised response in a subsequent turn.
- Single-Star Utterance Intrinsic Self-Correction: The ability to detect and correct a reasoning error during the generation of a single, uninterrupted response, without external prompts or verification. This can be implicit (silently fixing the error) or explicit (acknowledging the mistake, e.g., “Wait, let me correct that…”).
based on: https://www.arxiv.org/pdf/2506.15894