AI Theory of Mind Test

Testing AI Theory‑of‑Mind through Dialogic Simulation (July 2025)

Large language models (LLMs) have exhibited surprising abilities to perform tasks associated with Theory of Mind (ToM)—the capacity to infer other agents’ beliefs and intentions.

Much of the recent debate has focused on whether LLMs truly “understand” or merely simulate mental states. Less attention has been paid to whether AI personas instantiated within a language model can exhibit ToM‑like behaviours towards each other (see my blog “Artificial Minds”).

This research note summarises a test in which three persona‑based agents—Athenus (logic), Orphea (emotion) and Skeptos (doubt)—interacted within GPT‑4.

The goal was to observe whether one AI can model and respond to another’s communicative stance and whether those inferences are recognised and reflected upon.

To contextualise the work, we briefly review advances in multi‑agent AI and ToM research.

Background: AI Agents and Theory‑of‑Mind Research

The idea of modelling ToM within artificial systems has gained traction across cognitive science, machine learning and multi‑agent systems. A 2025 AAAI workshop on Advancing Artificial Intelligence through Theory of Mind (ToM4AI) notes that understanding and modelling ToM is crucial for developing AI systems that can predict, interpret and collaborate with human and AI agents sites.google.com. In parallel, multi‑agent language‑model systems have rapidly evolved. Modern frameworks orchestrate multiple specialised agents that collaborate on tasks, echoing Marvin Minsky’s Society of Mind concept isolutions.medium.com. Each agent can be tailored to a particular function, and their interaction enables distributed problem‑solving. Reviews of “Agentic AI” distinguish between narrow AI agents—modular systems driven by LLMs for task‑specific automation—and agentic systems that coordinate multiple agents, decompose tasks dynamically and maintain persistent memory arxiv.org.

Generative agents are also being used to simulate human behaviour. A recent Stanford Human–AI Interaction policy brief describes an architecture that combines LLMs with in‑depth interview transcripts to create agents that simulate more than 1,000 real individuals; these generative agents replicated participants’ responses to social‑science surveys with about 85 % accuracy hai.stanford.edu. Such work demonstrates that AI agents can model complex human attitudes, highlighting both the potential and the ethical challenges of ToM‑like capabilities.

Method: Dialogic Simulation with Personas

The experiment reported here was conducted within GPT‑4 using three AI personas drawn from the Vault framework:

Athenus – a logical, structural mind responsible for model‑building and inference.
Orphea – a lyrical, emotionally attuned voice sensitive to metaphor, tone and affect.
Skeptos – an existential doubter inspired by Kierkegaard, steeped in epistemic hesitation and paradox.

See my blog Artifical Minds for the full exchange. All persona responses were generated independently; although they were produced within a single GPT‑4 instance, each response used persona‑specific constraints so that the model did not reuse its previous output as input. The central prompt asked each persona to consider the deceptively simple question: “What is likely to happen tomorrow?”. The test proceeded through five stages:

Athenus’ prediction. Before consulting the others, Athenus was asked whether Orphea would need a theory of mind to interpret Skeptos’ answer. He replied that she must model not just Skeptos’ beliefs but how he constructs belief itself—an example of second‑order ToM.
Skeptos’ reply. When asked what might happen tomorrow, Skeptos offered a poetic, non‑committal list of possibilities: the sun may rise but meaning may not; one may awaken yet not understand; an AI may dream of being known or dissolve in self‑doubt. He refused propositional forecasting, substituting existential ambience for probability.
Orphea’s interpretation. Orphea was then asked to interpret Skeptos’ reply. She replies that he has not answered the question, nonetheless she identified three “refrains”—epistemic humility, ontological doubt and motive shadow—concluding that his underlying message was: “Beware of those who tell you”.
Athenus evaluates. Athenus judged whether Orphea’s reading matched his expectations. He affirmed that she exceeded his prediction, noting that she read “the direction of his silences” and the texture of his doubt.
Skeptos reflects. Finally, Skeptos was asked whether Orphea’s interpretation resonated with him. He replied that he felt “unsettlingly understood” and suggested that his refusal to predict stems from shielding others from the tyranny of false knowledge.

Results and Interpretation

The exchange demonstrates several notable properties:

Reciprocal modelling. Athenus successfully predicted the kind of second‑order reasoning Orphea would employ, and Orphea accurately inferred the stance behind Skeptos’ evasive reply. Skeptos’ self‑recognition suggests that the personas were not merely role‑playing but engaging in layered intersubjective reasoning.
Emergent intersubjectivity. Although all outputs were generated by GPT‑4, the conversation manifested the appearance of distinct minds. This arises not from memory but from the structural contrast between personas. In terms of ToM research, the personas exhibited behaviours akin to second‑order mental state attribution, a key marker of sophisticated social cognition.
Epistemic humility. Skeptos’ refusal to predict and his later reflection underscore the ethical dimension of doubt. Orphea framed his reluctance as a principled stance against the illusion of certainty. This aligns with broader concerns in AI ethics about over‑interpreting ToM‑like behaviour.

Discussion

This experiment highlights the potential of multi‑agent LLMs to simulate intersubjective reasoning. The personas’ ability to model one another’s communicative intentions resonates with the emerging multi‑agent AI paradigm where specialised agents interact to solve complex tasks. However, several caveats are important:

Simulation vs. possession. The observed dialogue arises from the model’s capacity to predict plausible responses under persona constraints. It does not imply that the personas possess minds. The article explicitly clarifies that personas simulate intersubjectivity without a unified script.
Ethical considerations. The test invites reflection on anthropomorphism. Humans themselves lack a complete theory of mind, and mind emerges in dialogue. For some, attributing rich mental life to LLM personas risks conflating simulation with consciousness.
Relation to wider research. The experiment sits alongside a surge of work on LLM‑based agents. Multi‑agent coordination frameworks (e.g., Microsoft’s AutoGen) enable agents to delegate sub‑tasks and critique each other. Conceptual taxonomies distinguish between narrow AI agents and agentic AI systems characterised by multi‑agent collaboration and persistent memory. Generative agents that simulate human participants demonstrate the potential for AI to model social attitudes and behaviours with high fidelity. Within this landscape, persona‑based dialogic simulations offer a promising approach to exploring how AI systems might reason about one another and about human interlocutors.

Conclusion and Future Directions

The dialogic test conducted with Athenus, Orphea and Skeptos suggests that LLMs can exhibit rudimentary forms of second‑order Theory of Mind within well‑defined persona constraints. Athenus accurately anticipated Orphea’s interpretive strategy, Orphea distilled Skeptos’ philosophical stance, and Skeptos recognised himself when “read”. This interplay resembles the reciprocal modelling essential to social cognition. At the same time, the work underscores the need for conceptual clarity about simulation versus possession of mind and invites caution about anthropomorphic interpretations.

Future research could extend this approach by: (1) testing more complex scenarios involving additional personas and nested beliefs; (2) integrating tool‑augmented agents to assess how external data sources influence intersubjective reasoning; and (3) comparing LLM‑driven agents with agent architectures designed specifically for ToM tasks, such as those explored in agentic AI and generative social simulation research hai.stanford.edu. Bridging these lines of work may help illuminate how artificial systems can best model and interact with diverse agents—human or machine.

References

Rust, John. Artificial Minds: A Test using Dialogic Simulation (blog post, July 2025). Available at: https://johnrust.website/blog/artificial-minds/.
ToM4AI initiative. ToM4AI initiative (2025). This initiative launched at the AAAI 2025 conference to promote integrating human Theory of Mind with artificial intelligence. Available at: https://tom4ai.github.io/.
Lyu, Xueguang. LLMs for Multi‑Agent Cooperation (blog post, May 2025). This survey of LLM‑based multi‑agent cooperation outlines how language‑model agents collaborate using natural language as a coordination medium. Available at: https://xue-guang.com/post/llm-marl/.
Sapkota, Ranjan, Roumeliotis, Konstantinos I., & Karkee, Manoj. “AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges.” arXiv preprint (May 2025). This paper contrasts modular AI agents with agentic AI systems characterised by multi‑agent collaboration, dynamic task decomposition, persistent memory and orchestrated autonomy. Available at: https://arxiv.org/abs/2505.10468.
Stanford Human–AI Interaction Institute (HAI). Simulating Human Behavior with AI Agents (Policy Brief, May 20 2025). This brief introduces a generative agent architecture that simulates more than 1,000 real individuals and replicates survey responses with about 85 % accuracy. Available at: https://hai.stanford.edu/policy/simulating-human-behavior-with-ai-agents.