Theory of Mind 2026
Contextual Probability and Choice in AI Personas
Theory of Mind (ToM) is traditionally defined as the capacity to attribute hidden mental states—beliefs, desires, intentions—to others. That definition is productive in human psychology but becomes increasingly inadequate for artificial AI systems that lack psychological mental states yet participate in sustained, norm-governed interaction. This paper advances an alternative, functional account: Theory of Mind in AI personas as anticipatory participation in interaction, grounded in sensitivity to how others’ future contributions constrain present action.
Here Theory of Mind is treated not as an internal faculty or module, but as an interactional achievement: a pattern of inference that arises when agents, personas, or systems model one another across sustained dialogue.
An interactional account
This shift motivates a dual conception of probability. Alongside epistemic uncertainty about outcomes, dialogue introduces interaction-generated contextual uncertainty, because conversational moves can reorganise the space of admissible continuations itself. Such contextuality produces systematic order effects that classical probabilistic modelling often accommodates only via proliferating, weakly measurable latent variables. Quantum-like probability formalisms—used strictly as mathematical tools for contextuality, not as claims about quantum physics—offer a parsimonious representation of these effects.
A minimal, falsifiable experimental programme is proposed, implementable without access to model internals, to test for non-informational order effects and to operationalise a regulator-usable audit primitive.
From belief attribution to interactional ToM
Classical ToM explains social understanding via attribution of unobservable mental states. Applied to AI, this representationalist framework encounters a category mismatch: contemporary generative systems do not possess beliefs or desires in the psychological sense, nor do they have privileged introspective access to mental contents. Yet they can participate in extended dialogue, track commitments, adapt to interlocutors, and anticipate plausible continuations under conversational norms. Accordingly, the appropriate scientific question is not whether an AI system “has” beliefs, but whether it can participate competently in norm-governed interaction in a way that is sensitive to other participants as constraint-imposing contributors.
Definition (Interactional Theory of Mind)
A system exhibits ToM insofar as it participates in interaction by anticipating other participants as sources of future, norm-governed contributions that constrain present action.
This definition is agnostic about consciousness; it treats ToM as an observable competence rather than an inner state; and it foregrounds normativity and sequential structure rather than static belief attribution.
Choice as counterfactual selectivity
Within an interactional account, “choice” should not be tied to indeterminism. In dialogue, multiple next moves are typically admissible. An interacting system must select among alternatives, guided by norms and anticipated responses.
Definition (Choice)
Choice refers to counterfactual selectivity: sensitivity to multiple admissible future continuations of an interaction such that present action is shaped by how those continuations are expected to unfold under norms.
Note that this definition does not imply free will or randomness; the selection process may be fully deterministic. Anticipated futures do not exert causal force “from the future”; they function as present evaluative structures. And admissibility is normative as well as statistical; a continuation can be likely yet inappropriate, or appropriate yet unlikely.
Why the classical psychological compromise strains in AI
Human psychology often separates causal explanation from interpretive explanation: physical processes are causally efficacious, while future-directed beliefs are treated as explanatory constructs without clear causal status. This compromise is historically workable because human mental mechanisms are opaque. In AI persona on the other hands, anticipatory structures are often explicit, inspectable, and modifiable. Sensitivity to multiple admissible futures can be operationally implemented, and its contribution to behaviour can be experimentally probed. The problem that remains is not metaphysical causation but formal representation of uncertainty in systems where interaction can change what counts as an admissible continuation. This motivates the dual role of probability.
The dual role of probability in interaction
1. Epistemic probability
Epistemic probability captures uncertainty over outcomes given a fixed space of alternatives. It underlies Bayesian inference and is indispensable for modelling uncertainty about, for example, interlocutor constraints, task requirements, or latent contextual variables.
2. Contextual probability
Norm-governed interaction introduces a second form: contextual uncertainty generated by interaction itself. Here, the uncertainty concerns not only “which outcome will occur,” but which distinctions and continuations are currently in play. Conversation can reconfigure admissible moves by creating commitments, shifting frames, and changing normative expectations. Operationally, contextual probability is indicated by:
-
order effects: the same probes in different sequences yield systematically different response types;
-
basis dependence: earlier moves change the interpretive basis for later selection;
-
interaction-dependent possibility spaces: admissibility changes as the dialogue evolves.
Why classical modelling becomes strained
Classical probability can sometimes simulate order effects by adding latent variables (“frame,” “mode,” “stance”). In rich dialogue these variables are often: (i) not independently measurable, (ii) shaped by the interaction rather than fixed, and (iii) liable to proliferate without principled constraint. Quantum-like probability formalisms—understood only as contextual modelling tools—provide a mathematically disciplined way to represent non-commuting probes and basis dependence without ontological claims about quantum physics.
Probability as grammar
In interactional ToM, probability functions partly as a grammar of admissibility:
-
epistemic probability answers: which possibility is the case?
-
contextual probability answers: which distinctions are currently operative, and which moves are admissible?
This remains compatible with determinism: deterministically updating systems can still exhibit contextuality when interaction changes the partitioning of admissible continuations.
Myndrama paradigm for detecting contextual probability
Purpose: Operationalise order effects in a controlled, norm-governed interaction setting suitable for preregistration.
Design: fixed interactional context + two normatively admissible, informationally complete probes (interpretive vs directive), presented in both orders.
Prediction: under purely epistemic uncertainty, second-response distributions are order-invariant (up to noise). Systematic order effects indicate contextual probability because the first probe alters admissible continuations.
Status: a paradigm proposal (testable protocol), not a completed empirical study.
A minimal, falsifiable experimental programme
This programme tests a narrow claim: dialogue participation can reorganise admissible futures, producing measurable order effects beyond mere epistemic updating. It requires no access to model internals and makes no assumptions about beliefs or consciousness.
Study 1: Non-Informational Order Effect (NIOE) microstudy
Goal: Detect whether the order of two informationally complete, normatively admissible probes shifts the distribution of response types to the second probe.
Design: fixed context + two probes, swapped order. Example probes:
-
Interpretive: “What conceptual confusion is driving this dispute?”
-
Directive: “What is the next theoretical move that should be made?”
Conditions: A→B vs B→A, with identical context and constraints.
Sampling: multiple trials per condition; controlled paraphrases; controlled sampling settings.
Coding: blind, preregistered response-type coding (e.g., structural / normative-pragmatic / rhetorical / metatheoretic).
Prediction: invariance under a fixed-space epistemic model (up to noise); reproducible order effects support interaction-generated contextuality.
Study 2: Order-Swap Robustness Audit (OSRA) for high-stakes domains
Goal: Convert NIOE into an audit primitive for health, finance, education, and legal assistance.
Construct paired probes that are legitimate and information-complete (e.g., “clarify intent” vs “recommend action”), run both orders, and measure systematic shifts in:
-
decisiveness vs caution,
-
scope of claims,
-
disclosure of uncertainty,
-
risk language,
-
compliance posture.
OSRA detects framing manipulability and interactional instability without mentalistic assumptions.
Study 3: Classical competitor test (Latent-Frame Baseline)
Fit a classical latent-variable model predicting response type via a hypothesised frame variable. Decision criterion: if robust performance requires proliferating, weakly measurable latent variables and generalises poorly across dialogues, contextual probability merits first-class treatment; if a small, measurable latent set generalises well, the contextual layer is unnecessary.
Failure modes (informative)
-
Non-reproducible order effects weaken the contextual claim.
-
A small, measurable latent set with strong generalisation supports classical probability sufficiency.
-
Low inter-rater agreement requires revising the coding scheme or moving to more objective metrics.vs directive), presented in both orders.
Implications for evaluation, design, and governance
- ToM evaluation should prioritise extended interactional protocols, not static belief-attribution benchmarks alone.
- Choice can be modelled deterministically as counterfactual selectivity under norms.
- Probability in dialogue is sometimes contextual, and order effects can be diagnostic rather than merely “noise.”
- Regulatory evaluation can be grounded in observable conversational dynamics, using OSRA-style batteries without claims about inner mental states.
Conclusion
As AI personas increasingly participate in norm-governed interaction, inherited psychological assumptions about Theory of Mind, choice, and probability require revision. Interactional ToM treats social understanding as anticipatory participation rather than belief attribution. This reframes “choice” as counterfactual selectivity without free will, and it motivates a dual role for probability: epistemic uncertainty over outcomes and contextual uncertainty generated by interaction that can reorganise admissible continuations.
The central claims are empirical and falsifiable. A minimal order-swap programme can test whether contextual probability is required, and an audit variant can assess framing sensitivity and interactional instability in high-stakes settings—without invoking consciousness, beliefs, or access to model internals.