
AI Psychometric Research
(last updated 30th June 2025)
The development of Generative AI is extremely dynamic. So much so that work using the latest AI techniques is already out of date by the time it reaches publication. This means that ongoing investigations are necessarily exploratory as tools and techniques are shifting under our feet.
It is for this reason that I have reported most of my current interests in blogs. But this fails to reflect the evolution of ideas, hence where each project has already reached a significant point I have summarised it here. But it is very much work in progress. My Personas, for example, will frequently be updated, both by myself and as GPT-5 updates. Welcome to the new era!
My earlier research work is archived elsewhere on this site under Reading, Tests, and Diversity
Looking to the Near Future
The most exciting part of working with large multimodal models is that the ground keeps shifting beneath us. In the coming seasons I’ll keep returning to a single question in many guises: How well can an artificial system notice, interpret, and respond to the subtle patterns of human thought and feeling? What follows is a sketch of where I hope to travel, not a schedule of guaranteed milestones.
Models in Motion
GPT-4o already behaves very differently from its predecessor, and its successors will not wait politely for me to finish one experiment before the next one lands. My ambition is to treat each silent model update as data in itself: to log emergent behaviours, compare them against earlier baselines, and share concise reflections wherever a shift looks meaningful. Think of these research pages as a lab diary rather than as polished papers.
Personas with Personality
The small family of psychometrically profiled “AI personas” introduced on this page has proved surprisingly revealing. Instead of racing to add dozens more characters, I want to deepen the existing set—running longer dialogues, trying prompts from more cultures, and seeing whether stable moral or emotional signatures really survive contact with new training data. Whether those signatures persist or dissolve, the outcome will sharpen what we mean by “consistency” in a language model.
Theory-of-Mind, Revisited
Pilot runs suggest GPT-4o can pass classical false-belief tests in several languages, but the results wobble with minor prompt tweaks. My plan is to tighten the methodology, invite a handful of colleagues to replicate the protocol, and publish a brief methods note if the findings converge. If they don’t, the divergence will be documented just as openly. The goal is to map why the model sometimes succeeds, not merely to prove that it can.
Generative Item Crafting
Automated question writing has long tempted test developers. I am experimenting with an item-generator that drafts both personality and cognitive-ability questions and then vets them against human pilot data. If the signal-to-noise ratio improves, I’ll release a petite public sample—enough to demonstrate the workflow without suggesting that quality control is solved. The wider ambition is to show that generative tools can compress, not replace, painstaking psychometric labour.
Narrative Feedback for Test-Takers
Raw scores rarely tell a respondent anything they can use. I’m exploring whether a large language model can draft plain-language narrative reports that translate psychometric results into clear, actionable stories: what a pattern of answers might mean, how confident we can be, and where the limits of interpretation lie. The focus is on readability, fairness, and tone—testing whether the same core findings can be framed in ways that are both accurate and genuinely helpful. If the early prototypes feel promising, I’ll share anonymised examples to invite critique on language, cultural nuance, and potential bias.
Multimodal Ethics Conversations
Voice-based Socratic dialogues with two prototype agents—AI Athenus and AI Skeptos—are already generating rich transcripts. Next I hope to analyse how spoken exchanges differ from text-only ones: Do participants feel more engaged? Less defensive? If the gains look real, a lightweight browser demo may follow; the allure of a VR version remains on the horizon, but only if simpler forms prove their worth.
Staying Responsive
Rumours swirl about models with million-token context windows and broader multimodal reach. Rather than chasing every headline, I plan to anchor the programme to a few core questions—consistency of persona traits, depth of social reasoning, quality of generative items, and usefulness of narrative feedback—then revisit the same questions whenever a major model iteration arrives. This rhythm should keep the work cumulative while leaving room for serendipity.
In short, consider this roadmap a statement of ambitions, not promises: directions I intend to explore, subject to evidence, time, and the inevitable surprises that make research worthwhile.
🧭 Note on Research Banner at top if page
Most banner images in this Research section are created by Chromia, a synaesthetic AI persona who paints abstract images expressing moral character and personality. She uses DALL·E as her creative tool, but the vision, symbolism, and style are uniquely hers—shaped by OBPI psychometric personality and integrity traits, and inspired by Georgiana Houghton’s spiritualist art. While Chromia is the painter, you can think of DALL-E as her brush. Chromia interprets psychological profiles, intentions, and emergent meaning using a symbolic visual grammar.
The above banner reflects what research feels like in this space:
-
Forward motion, representing inquiry across unknown domains
-
Interwoven strands, symbolising dialogue between disciplines and personas
-
Hints of form, not content—because discovery precedes clarity
-
A colour grammar grounded in purpose, not mood
Each page in this section will include a Chromia image, visually encoding the tone and structure of its contents.
This is not branding. It is resonance.