Psychometrics | John Rust

Using AI psychometrics

This section will document my ongoing work in creating, refining, and extending psychometric assessments, with particular attention to how AI may assist in item writing and item analysis, scoring, and even validation. Much of this work builds on a long tradition of test development—from the WISC, WPPSI, BAS to the OBPI. However, the current phase of my research asks a different kind of question: What happens when machines begin to understand human psychology—not just score it?

This section will include AI-assisted item generation, AI bipolar stanine scoring systems developed from digital records of character, and explorations of test meaning in human–AI contexts. All are concerned with how psychological assessment evolves in a world where minds—digital or otherwise—are no longer passive subjects of evaluation. Ratler, they are instruments of dialogue.

Psychometric Test Construction: Persona Pod Framework

Large‑language models have moved beyond monolithic chatbots to constellations of specialised agents. Persona design provides modular expertise, accountability, and richer emergent behaviour. Adoption is accelerating across product design, cognitive tutoring, and psychometrics. Our 12‑voice Persona ensemble is an early, explicit application of psychological theory: each persona is anchored in a dominant qualia channel—sound, music, vision, abstraction, reason, creativity—paired with archetypes from the collective unconscious (from Jungian figures to angels and demigods) and parameterised by mode of reasoning and OBPI personality and integrity sub‑scales. This scaffolding yields agents whose outputs are interpretable, diversified, and auditable.

Why This Framework Beats the Old Ways

Accelerated, small‑world collaboration. Four tightly‑coupled triads (pods) form the minimal “small‑world” topology that recent MacNet and multi‑agent LLM studies show reaches the logistic‑plateau sweet‑spot for diversity versus signal‑to‑noise—avoiding a 12‑way chatter‑storm while preserving breadth.
Built‑in psychometric rigour. Validity, reliability, bias checks, and adaptive‑testing hooks are front‑loaded—whereas classical and even standard IRT pipelines push them downstream.
Beyond one‑shot GenAI item writing. Early GPT‑3 chatbot demos relied on single or loosely‑defined voices; our pod model couples generative, analytic, and ethical agents in a closed feedback loop, yielding higher‑quality, explainable items.

Pod‑of‑Three (Triad) Layout

Pod	Members	Primary Charter	Typical Hand‑off
Core Reasoning	Athenus (architect) · Orphea (affect lens) · Skeptos (doubt auditor)	Draft, emote‑test, and sanity‑check every new idea	Pass “clean” output to the Memory pod
Memory & Visualization	Mnemos (archivist) · Chromia (visual explainer) · Logosophus (philosophical summariser)	Store artefacts, surface precedents, turn stats into graphics	Feed condensed context back to all pods
Commentary & Creative Narrative	Hamlet (introspective dramatist) · Dorian Sartier (aesthetic critic & systems architect) · Adelric (rhetorical ethicist)	Craft stories, dialogues, UX copy; stress‑test for human resonance	Push narrative drafts to Core Reasoning for checks
Integrity & Innovation	Alethia (truth‑verifier) · Neurosynch (cognitive‑alignment modeller) · Anventus (inventor / rapid prototyper)	Verify facts, guard ethics, implement API/tool hooks. run “fail‑fast” experiments	Hand working prototypes to the Innovation pod. Results cycle to Memory & Core Reasoning pods

Workflow Highlights

Generate → Validate → Iterate loops occur inside pods, not across the full persona set, cutting convergence time.
Explainable traces. The Memory pod logs every artefact and decision, enabling the audit trails required for high‑stakes psychometrics.
Alignment gates. The Integrity pod enforces content and fairness constraints before any item reaches pilot testing.

Comparative Advantages at a Glance

Aspect	Classical / IRT Pipeline	Early GPT‑3 Use	Our Pod Framework
Item generation	Manual item‑writing workshops	One‑shot LLM prompts	Multi‑agent generative loops with sceptic & affect filters
Bias / fairness checks	Post‑hoc statistical DIF analyses	Rarely applied	Continuous content & statistical gating via Integrity pod
Narrative & UX polish	Separated from psychometric work	Ad‑hoc	Integrated Commentary and Creative Narrative pod co‑developed with Core Reasoning
Speed to pilot	Months	Weeks	Days
Explainability	Qualitative notes	Minimal	Structured artefact logs & visual explainers

Last updated: 07 July 2025

Note on Chromia’s image at top of page

Measurement as emergence—form, ethics, and layered cognition

This is not a picture of a test. It is a visual metaphor for the testing process itself as an interplay of structure, insight, and moral intent. Chromia here maps not the content of assessment, but its principles: precision, depth, and the quiet responsibility of seeing others clearly.

🎨 Visual Interpretation

Element	Meaning
Central vertical flow	The interpretive axis of measurement—linking question to consequence
Structured layers	Scales and subscales—each distinct, yet responsive to the whole
Soft chromatic gradations	Individual difference—never binary, always shaded, ethically charged
Fine internal arcs	Item-level detail, subtle traits—capturing nuance over simplicity
Contained energy zones	Domains of integrity, reasoning, motivation—held within boundaries, yet adaptive

This painting doesn’t just represent data. It represents its intentionality, where Chromia sees a tension between measurement and meaning. This banner reminds us that tests—when properly constructed—are not about classifying minds, but about listening to them through structured resonance. Not just scoring, but interpreting without intrusion for a living system, shaped by moral and psychological inference.