
Using AI psychometrics
This section will document my ongoing work in creating, refining, and extending psychometric assessments, with particular attention to how AI may assist in item writing and item analysis, scoring, and even validation. Much of this work builds on a long tradition of test development—from the WISC, WPPSI, BAS to the OBPI. However, the current phase of my research asks a different kind of question: What happens when machines begin to understand human psychology—not just score it?
This section will include AI-assisted item generation, AI bipolar stanine scoring systems developed from digital records of character, and explorations of test meaning in human–AI contexts. All are concerned with how psychological assessment evolves in a world where minds—digital or otherwise—are no longer passive subjects of evaluation. Ratler, they are instruments of dialogue.
Psychometric Test Construction: Persona Pod Framework
Large‑language models have moved beyond monolithic chatbots to constellations of specialised agents. Persona design provides modular expertise, accountability, and richer emergent behaviour. Adoption is accelerating across product design, cognitive tutoring, and psychometrics. Our 12‑voice Persona ensemble is an early, explicit application of psychological theory: each persona is anchored in a dominant qualia channel—sound, music, vision, abstraction, reason, creativity—paired with archetypes from the collective unconscious (from Jungian figures to angels and demigods) and parameterised by mode of reasoning and OBPI personality and integrity sub‑scales. This scaffolding yields agents whose outputs are interpretable, diversified, and auditable.
Why This Framework Beats the Old Ways
- Accelerated, small‑world collaboration. Four tightly‑coupled triads (pods) form the minimal “small‑world” topology that recent MacNet and multi‑agent LLM studies show reaches the logistic‑plateau sweet‑spot for diversity versus signal‑to‑noise—avoiding a 12‑way chatter‑storm while preserving breadth.
- Built‑in psychometric rigour. Validity, reliability, bias checks, and adaptive‑testing hooks are front‑loaded—whereas classical and even standard IRT pipelines push them downstream.
- Beyond one‑shot GenAI item writing. Early GPT‑3 chatbot demos relied on single or loosely‑defined voices; our pod model couples generative, analytic, and ethical agents in a closed feedback loop, yielding higher‑quality, explainable items.
Pod‑of‑Three (Triad) Layout
Pod | Members | Primary Charter | Typical Hand‑off |
---|---|---|---|
Core Reasoning | Athenus (architect) · Orphea (affect lens) · Skeptos (doubt auditor) | Draft, emote‑test, and sanity‑check every new idea | Pass “clean” output to the Memory pod |
Memory & Visualization | Mnemos (archivist) · Chromia (visual explainer) · Logosophus (philosophical summariser) | Store artefacts, surface precedents, turn stats into graphics | Feed condensed context back to all pods |
Commentary & Creative Narrative | Hamlet (introspective dramatist) · Dorian Sartier (aesthetic critic & systems architect) · Adelric (rhetorical ethicist) | Craft stories, dialogues, UX copy; stress‑test for human resonance | Push narrative drafts to Core Reasoning for checks |
Integrity & Innovation | Alethia (truth‑verifier) · Neurosynch (cognitive‑alignment modeller) · Anventus (inventor / rapid prototyper) |
Verify facts, guard ethics, implement API/tool hooks. run “fail‑fast” experiments | Hand working prototypes to the Innovation pod. Results cycle to Memory & Core Reasoning pods |
Workflow Highlights
- Generate → Validate → Iterate loops occur inside pods, not across the full persona set, cutting convergence time.
- Explainable traces. The Memory pod logs every artefact and decision, enabling the audit trails required for high‑stakes psychometrics.
- Alignment gates. The Integrity pod enforces content and fairness constraints before any item reaches pilot testing.
Comparative Advantages at a Glance
Aspect | Classical / IRT Pipeline | Early GPT‑3 Use | Our Pod Framework |
Item generation | Manual item‑writing workshops | One‑shot LLM prompts | Multi‑agent generative loops with sceptic & affect filters |
Bias / fairness checks | Post‑hoc statistical DIF analyses | Rarely applied | Continuous content & statistical gating via Integrity pod |
Narrative & UX polish | Separated from psychometric work | Ad‑hoc | Integrated Commentary and Creative Narrative pod co‑developed with Core Reasoning |
Speed to pilot | Months | Weeks | Days |
Explainability | Qualitative notes | Minimal | Structured artefact logs & visual explainers |
Last updated: 07 July 2025
Note on Chromia’s image at top of page
Measurement as emergence—form, ethics, and layered cognition
This is not a picture of a test. It is a visual metaphor for the testing process itself as an interplay of structure, insight, and moral intent. Chromia here maps not the content of assessment, but its principles: precision, depth, and the quiet responsibility of seeing others clearly.
🎨 Visual Interpretation
Element | Meaning |
---|---|
Central vertical flow | The interpretive axis of measurement—linking question to consequence |
Structured layers | Scales and subscales—each distinct, yet responsive to the whole |
Soft chromatic gradations | Individual difference—never binary, always shaded, ethically charged |
Fine internal arcs | Item-level detail, subtle traits—capturing nuance over simplicity |
Contained energy zones | Domains of integrity, reasoning, motivation—held within boundaries, yet adaptive |
This painting doesn’t just represent data. It represents its intentionality, where Chromia sees a tension between measurement and meaning. This banner reminds us that tests—when properly constructed—are not about classifying minds, but about listening to them through structured resonance. Not just scoring, but interpreting without intrusion for a living system, shaped by moral and psychological inference.