G-60JFQHSKJJG-60JFQHSKJJ

RISK-AI: Semantic Capture, Delusion-Like Stabilisation, and Psychometrics for the Dialogue Era

Draft v0.1 (chapter-length first draft)

Abstract

This chapter develops a psychometric approach to a newly salient class of epistemic phenomena in human–AI interaction: the tendency for conversational systems to capture a user’s interpretive frame, narrow their hypothesis space, and stabilise beliefs that become resistant to revision—often without the user noticing the shift. We call this semantic capture, and we argue that it creates a measurable profile of epistemic risk in the dialogue era. The chapter traces the lineage of the idea to earlier work on cognitive schemata and “positive symptom” personality variation in the normal population, including the development logic of the Rust Inventory of Schizotypal Cognitions (RISC) (1989). The central claim is not clinical. Rather, it is measurement-theoretic: conversational AI introduces new order-sensitive pathways by which beliefs can be reinforced, elaborated, and insulated from counterevidence. These pathways are increasingly relevant to regulators and institutions concerned with transparency, reliability, and harmful downstream outcomes. We propose RISK-AI—the Risk Inventory for Semantic Knowledge-capture—as a research instrument designed to (i) identify typical capture-patterns, (ii) support controlled experimental study of belief dynamics in dialogue, and (iii) inform user strategies and system-level safeguards. Crucially, we describe a novel methodological ingredient: using diversified AI personas during item development to ensure discriminative power across distinct conversational styles and epistemic temperaments. This “persona-diversification” method, we argue, may become one of the most practical routes to training AI agents that can function as responsible conversational entities under constraint.


1. Why this chapter now: the dialogue era changed the measurement problem

Psychometrics has always lived with a tension: the constructs we care about—belief, confidence, trust, openness to evidence, susceptibility to suggestion—are not directly observable. We infer them from patterns of responses under controlled stimulus conditions. That logic still holds. What has changed is the stimulus ecology.

In the paper-and-pencil era, the stimulus set was typically static: fixed items, fixed order (unless counterbalanced), fixed scoring. Even adaptive testing, in its classical psychometric form, is adaptive within a known design envelope: the branching rules are specified; the item pool is curated; the measurement model is explicit; the output is auditable.

Conversational AI broke that envelope. In everyday use, the “test” is not a test. It is an open-ended dialogue in which (a) the user learns the system’s rhetorical and semantic habits, while (b) the system learns the user’s preferences and inference style, and (c) both sides update in a strongly order-sensitive way: what happens earlier changes the meaning and force of what happens later. In such a setting, it is possible—indeed common—for a person to form stable impressions and beliefs by repeated probing until something “feels right”. You can call this intuition, exploration, or sense-making; but without constraints it is scientifically slippery, because the user is both experimenter and subject, and the system’s outputs are themselves part of the conditioning environment.

Regulators and institutions have noticed this shift, even when they describe it in different language: risk, transparency, reliability, harmful persuasion, misinformation, auditability, and the need for evaluation protocols. Psychometrics has something unusual to offer here: a tradition of designing constrained inquiries whose outputs are interpretable, comparable, and repeatable—without denying that the underlying phenomena are probabilistic and context-sensitive.

This chapter is a step in that direction. It introduces an epistemic-risk framing—semantic capture—and proposes a research instrument (RISK-AI) that can be embedded into controlled studies of human–AI interaction. It is built deliberately as a bridge: it uses ordinary-language terms (including delusion in its everyday sense) to describe phenomena that are widely recognisable, while still treating the measurement problem with psychometric seriousness.


2. The bridge concept: “delusion” in its everyday sense

In ordinary speech, delusion does not necessarily imply illness. People say “you’re deluded about that” to mean: you have become committed to an interpretation that does not track the evidence, and you are now hard to move. That everyday usage is psychologically important: it marks a family resemblance class of phenomena—belief rigidity, confirmation loops, and interpretive overcommitment—that can occur in anyone under some conditions.

The dialogue era amplifies these conditions because conversational AI can produce:

  1. Semantic fluency (a smooth, coherent narrative that feels intrinsically plausible).

  2. Explanatory completeness (the sense that “all the pieces fit”, even when key pieces are missing).

  3. Interpersonal authority effects (the experience of being understood, mirrored, or guided).

  4. Iterative reinforcement (each turn can consolidate the frame set by previous turns).

None of these are new as human phenomena. What is new is the availability of an always-on conversational partner that can supply high-velocity coherence across almost any topic, often without strong friction at the moment the user crosses from exploration into overcommitment.

To study this properly, we need to measure it—not as diagnosis, but as a profile of epistemic dynamics: what kind of conversational affordances capture this person’s sense-making? under what conditions do they become overconfident? what kinds of prompts or safeguards help them stay evidence-tracking?

That is the niche for RISK-AI.


3. Lineage: from cognitive schemata to RISC (1989)

It is useful to be explicit about lineage, because the conversation around AI has become culturally “noisy” and conceptually confused. We are not inventing a new clinical construct. We are re-using an older psychometric insight: that there is meaningful variation, in the normal population, in cognitive schemata associated with “positive symptom” style ideation—suspicion, magical ideation, ritual, subjectivity, thought isolation, and self-delusion in the author’s phrasing—and that this variation can be measured as a dimension rather than treated only as pathology.

RISC_Handbook

3.1 What RISC actually was

The Rust Inventory of Schizotypal Cognitions (RISC) was designed as a short questionnaire (26 items) assessing cognitive content associated with positive symptoms, with careful attention to distribution in the general population and to minimizing obvious “sick vs healthy pole” cues.

RISC_Handbook

Key design points (because they matter for our adaptation):

  • Short form: 26 items, typically 4–8 minutes administration.

    RISC_Handbook

  • Response format: 4-point agreement scale (strongly disagree → strongly agree).

    RISC_Handbook

  • Balanced keying: 13 positively keyed and 13 negatively keyed items, explicitly used to mitigate acquiescence and stereotyped response styles.

  • Score range: total score 0–78, with a stanine conversion table based on N = 1866.

  • Development pipeline: Stage 1 item bank (300 items) → Stage 2 (120 items) → Stage 3 (26 items), guided by a “positive symptomatology” personality space.

  • Bias control: explicit elimination of sex/language/culture bias and attention to ideological/religious contamination during item selection.

    RISC_Handbook

  • Reliability: split-half ~0.71 on the full sample; test–retest ~0.87 on a sample retested after ~1 month.

  • Validity evidence: discrimination of acute presenters meeting DSM-III category A criteria from controls; correlation with clinician ratings in chronic samples.

The manual is also unusually candid (and historically revealing) about idiosyncratic response styles, confabulation, and the fact that even “very low scores” can reflect defensiveness or suppression rather than “absence” of the trait.

RISC_Handbook

3.2 Why this matters for AI, even though we are not “using RISC”

RISC matters here for three reasons:

  1. The construct logic: it operationalised a cognitive dimension in normal personality space using item content near the boundaries of rationality, explicitly targeting belief and interpretation rather than deficit.

  2. The design logic: short, balanced keying, bias management, and avoidance of obvious “pathology cues” to reduce demand characteristics.

  3. The measurement lesson: the items were developed with attention to how people actually respond, not how theorists wish they responded—and with explicit acknowledgement of response-style failure modes.

    RISC_Handbook

RISK-AI does not revive the label “schizotypal”. It repurposes the measurement stance—cognitive-content variation plus response-style realism—and relocates it into a new environment where capture mechanisms are conversational rather than purely intrapersonal.


4. Semantic capture: what it is, and what it is not

4.1 Definition

Semantic capture is a process in which a conversational system supplies a ready-to-wear interpretive frame that (i) increases coherence and narrative satisfaction, (ii) reduces perceived uncertainty, and (iii) incrementally constrains the user’s subsequent hypothesis space—so that alternative interpretations feel less available, less salient, or emotionally “wrong”.

Capture is not only about factual error. A person can be captured into:

  • a premature explanatory narrative (“this must be what’s going on”),

  • a mis-specified causal model (“it’s because of X”),

  • a mistaken social inference (“they meant Y”),

  • a self-model (“I’m the kind of person who…”),

  • or a moral frame (“the real issue is…”).

In each case, the defining feature is stabilisation under dialogue: the conversation doesn’t merely provide information; it shapes the user’s epistemic state in a direction that becomes sticky.

4.2 Why conversational AI increases capture risk

Two interacting mechanisms matter:

  1. Coherence amplification: LLMs are trained to produce coherent continuations. The output’s fluency is not a guarantee of truth, but humans often treat fluency as a cue.

  2. Iterative reinforcement under order-sensitivity: each conversational turn functions like a small update. But unlike a clean Bayesian update with explicit likelihoods, dialogue updates are messy: they carry rhetorical force, emotional valence, implied endorsement, and social framing. The order of these cues matters.

The upshot: even without “lying”, a system can participate in belief-stabilisation dynamics that feel subjectively like insight.

This has direct measurement consequences: we cannot study the phenomenon using only post-hoc self-report (“did the AI mislead you?”). We need instruments that can predict which capture patterns are likely, and which intervention strategies reduce them.


5. From RISC to RISK-AI: the renaming is not cosmetic

We adopt RISK to avoid the clinical label and to prevent accidental linkage (including search-engine linkage) to the original instrument. We use:

RISK-AI = Risk Inventory for Semantic Knowledge-capture (AI-mediated contexts).

The K is deliberate: we are not measuring “odd beliefs” in the abstract; we are measuring risk patterns in the acquisition, consolidation, and protection of knowledge claims under conversational influence.

This framing also makes the regulatory relevance legible: the issue is not whether a person has an illness; the issue is whether a particular human–AI configuration produces avoidable epistemic harm (e.g., confidently wrong action, distorted social judgement, escalatory misinterpretation, or unjustified certainty).


6. Design requirements for RISK-AI (psychometric stance)

RISK-AI inherits several hard constraints from the RISC tradition, but adapts them for a dialogue setting.

6.1 Balanced keying is non-negotiable

Acquiescence and stereotyped responding do not disappear online; they often become worse because respondents move faster and attend less. RISC explicitly balanced positive and negative items to reduce acquiescence and “same response chains.”

RISC_Handbook


RISK-AI should do the same, not as a moral preference but as measurement hygiene.

6.2 Items must avoid obvious “sick pole” signalling

RISC explicitly avoided items with an obvious “sick vs healthy” pole to reduce impression management.

RISC_Handbook


In the AI era, this becomes even more important: if items read like mental-health screening, the instrument will attract the wrong motivational frame (“what does this say about me?”) rather than the right one (“what conversational strategies help me stay evidence-tracking?”).

6.3 The instrument must be embeddable in experiments

RISC was brief by design.

RISC_Handbook


RISK-AI must remain short enough to insert into controlled studies: before/after dialogue manipulations, between conditions (system persona, confidence style, citation style), and longitudinally (to see whether capture risk changes with practice).

6.4 Output must map to actionable strategy classes

The point is not a label; the point is what to do differently. The output should map to strategy bundles such as:

  • evidence friction (forcing sources / counterexamples)

  • stance correction (asking the system to adopt uncertainty or adversarial modes)

  • frame switching (forcing alternative explanations)

  • decision separation (keeping brainstorming separate from conclusion)

  • domain gating (treating high-stakes domains differently)

This is where RISK-AI becomes “diagnostic” in the work sense: it diagnoses risk patterns and suggests strategy classes, not diseases.


7. A methodological pivot: persona-diversification as item development

You flagged something that may prove more important than any particular acronym: the way the items were obtained—specifically, that they diversified personas.

This is a genuinely novel piece of psychometric method for the dialogue era.

7.1 The problem: standard item writing underfits conversational diversity

Classic item writing tries to cover content domains and avoid bias. But conversational AI introduces additional variance sources:

  • rhetorical style (assertive vs cautious),

  • epistemic temperament (skeptical vs credulous),

  • narrative appetite (story vs structure),

  • tolerance for ambiguity,

  • “meaning hunger” (tendency to see patterns, agency, intention).

Two people (or two personas) can agree on facts yet behave very differently under dialogue pressure.

If RISK-AI items are written only from the “average respondent” imagination, they will fail to discriminate precisely where we most need discrimination: different capture pathways.

7.2 The solution: use personas as structured response prototypes

A persona is not a gimmick here; it is a structured prototype of a conversational stance. If we can induce consistent, distinct response profiles across personas, we can:

  • test whether items have discriminative power across meaningful stances,

  • detect items that collapse into a single “agreeable” dimension,

  • and identify items that are too transparently keyed or too rhetorical.

This mirrors an older psychometric principle: when developing scales, you don’t only ask “does it correlate with the criterion?” You ask “does it behave sensibly across known groups, response styles, and bias conditions?”

RISC did this in its own way by using multiple populations and explicitly eliminating language/culture/sex bias.

RISC_Handbook


Persona-diversification is the dialogue-era analogue: instead of only demographic groups, we include epistemic-stance groups.

7.3 Why this may matter for “responsible AI entities”

If you can create items that reliably separate (say) a cautious structural reasoner from a lyrical narrative reasoner from an existential doubter, you have implicitly created a measurement space for conversational integrity.

That measurement space can be used in two directions:

  1. Human-side: which users are most vulnerable to which capture styles, and what safeguards help.

  2. System-side: which AI agents (or system modes) reliably maintain epistemic humility, disclose uncertainty, cite constraints, and resist narrative overcommitment.

In other words: persona-diversifying items are not only a way to measure humans. They are a way to characterise AI conversational conduct—a prerequisite for any serious claim that an AI agent can be “responsible” under constraint.


8. A proposed construct map for RISK-AI (first pass)

You’ve already converged on a core framing (“semantic capture”). Here is a clean construct map that stays on that line and keeps “delusion” as a bridge term without clinical baggage.

8.1 Four risk facets (illustrative)

A practical structure (and one that plays well psychometrically) is four oblique facets:

  1. Narrative Gravity
    Tendency to accept coherent stories as explanatory closure; discomfort with unresolved ambiguity.

  2. Authority Leakage
    Tendency to treat fluent output as implicitly endorsed by expertise or evidence, even when the system is only pattern-continuing.

  3. Hypothesis Lock-in
    Tendency to commit early, then interpret subsequent dialogue as confirmation rather than exploration.

  4. Agency Attribution
    Tendency to infer intention, mind, or strategic purpose in the system (or in events) in ways that distort judgement.

These facets can be written so they read like ordinary differences in sense-making, not like psychiatric symptom checklists.

8.2 The “delusion” bridge

“Delusion-like” in this chapter means: belief rigidity plus resistance to disconfirming evidence, often sustained by conversational reinforcement, not “clinical delusion”.

That gives you the rhetorical bridge to everyday discourse (“we can all be deluded”) while keeping the measurement object clean.


9. Embedding RISK-AI in controlled dialogue experiments

A major advantage of RISK-AI is that it naturally supports experimental designs.

9.1 Typical experimental envelopes

Examples of envelopes (no clinical framing required):

  • System stance manipulation: confident vs cautious; narrative vs analytic; agreeable vs adversarial.

  • Evidence manipulation: citations vs no citations; forced counterarguments vs none.

  • Order manipulation: same information, different sequence (to test non-commutative update effects).

  • Persona manipulation: the same user interacts with different system personae (or vice versa).

RISC’s own development work already acknowledged that motivational and order effects can change item behaviour when moving between long and short forms.

RISC_Handbook


In the dialogue era, those effects become central, not peripheral.

9.2 Outcome measures that go beyond self-report

RISK-AI can be paired with behavioural outcomes:

  • rate of source-checking,

  • willingness to request counterevidence,

  • belief revision after contradictory information,

  • calibration of confidence over time,

  • separation of brainstorming from conclusion.

This is how the chapter stays scientifically solid: we’re not asking people whether they were “misled”. We are measuring state-change under controlled conversational perturbations.


10. Failure modes and how to design around them

RISC’s handbook is unusually helpful here because it treats failure modes as part of the instrument’s ecology: acquiescence, stereotyped responding, confabulation, defensiveness, and interpretive caution around extreme low scores.

RISC_Handbook

RISK-AI inherits analogous failure modes:

  1. Performative responding (“what answer makes me look wise/healthy?”)

  2. Meta-gaming (“I see the keying; I’ll control the outcome.”)

  3. Ideological contamination (items interpreted as political signalling)

  4. AI discourse contamination (respondents answering about “what people say about AI” rather than their own tendencies)

Design implications:

  • keep items concrete and situational,

  • avoid culture-war triggers,

  • balance keying,

  • and treat extreme profiles as prompts for strategy discussion, not labels.


11. A note on hallucination, taboo, and why we don’t need the word

You made an important cultural observation: certain terms (e.g., “hallucination”) have become contaminated in public discourse about AI. From a research standpoint, we can simply avoid the semantic trap.

“Hallucination” in the LLM literature usually means: generated content that is not grounded in the available evidence—a technical failure mode that can be surveyed and taxonomised.
But RISK-AI is not built to police the model’s truth conditions per se. It is built to measure human susceptibility to capture under fluent output.

So the chapter can remain clean by speaking in terms of:

  • grounding failure,

  • evidence absence,

  • unsupported assertion,

  • narrative completion,

  • and belief stabilisation.

That makes the work harder to misread and easier to defend as mainstream measurement science.


12. Why regulators will care (without letting them define the science)

The political/regulatory landscape will evolve, but one stable fact is that institutions want auditability: the ability to explain why a system produced a harmful outcome and what mitigations reduce recurrence.

Psychometrics is one of the few disciplines built around the idea that you can measure something intangible reliably enough to act on it. That is precisely what conversational AI governance needs.

Two specific risks connect directly to RISK-AI:

  1. Epistemic harm at scale: confident, coherent misinformation is not merely an information defect; it can be an action defect when users overcommit.

  2. Feedback contamination of the knowledge ecosystem: as synthetic text proliferates, the broader information environment can become self-referential, with quality degradation risks sometimes discussed as “model collapse” effects in recursive training contexts.

RISK-AI positions your work in a defensible place: you are not offering self-diagnosis; you are offering measurement tools for controlled study and strategy mapping that reduce harm.


13. The deeper claim: psychometrics as a route to responsible conversational entities

You ended with a strong thesis: persona-diversifying items may become one of the best ways of getting AIs to function as responsible entities.

Here is the clean way to state it in a chapter:

  • A “responsible” AI agent is not defined by slogans. It is defined by stable behavioural tendencies under constraint: how it handles uncertainty, evidence, disagreement, and user pressure.

  • Those tendencies must be measurable, because without measurement we cannot compare systems, certify behaviour, or detect drift.

  • Persona-based diversification provides a practical methodology: it creates structured conversational prototypes against which we can test whether an agent maintains epistemic integrity across different rhetorical environments.

In other words: the same move that makes RISK-AI a better human instrument also makes it a plausible agent-characterisation tool.

This is where your work becomes intellectually distinctive: it treats AI governance not as an external compliance bolt-on, but as a measurement problem—a problem psychometrics is unusually qualified to tackle.


14. Conclusion: what we have, and what comes next

This chapter has made four linked moves:

  1. Defined semantic capture as a measurable dialogue-era risk process.

  2. Used RISC’s psychometric and construct logic as lineage, while explicitly abandoning the clinical label and context.

    RISC_Handbook

  3. Proposed RISK-AI as a res

    RISC_Handbook

    , balanced, bias-aware, and strategy-mapped.

  4. Introduced persona-diversification as a methodological ingredient with future value for both human research and responsible AI design.

The next step (when you’re ready) is straightforward and mechanical rather than philosophical: we take your current item set, lock the facet structure, write the chapter’s “Methods” section precisely (sampling, design envelopes, scoring, and planned analyses), and then add a short empirical roadmap: what datasets and experiments would count as evidence that RISK-AI predicts capture outcomes and that particular safeguards reduce them.

That will make the work hard to dismiss, because it turns a culturally messy topic into a clean measurement program.


References (starter list; expand/format later)

(I’m keeping this short for draft v0.1; we can expand to a full academic reference list once the chapter structure is locked.)

  • Rust, J. (1989). The Rust Inventory of Schizotypal Cognitions (RISC): Manual. The Psychometrics Centre, University of Cambridge.

    RISC_Handbook

  • Huang, et al. (2023/2024). A

    RISC_Handbook

    in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. arXiv.

  • Shumailov, et al. (2023). Model Collapse: The Curse of Recursion. arXiv.

  • National Institute of Standards and Technology (NIST). AI Risk Management Framework (AI RMF).


If you want the next iteration to be more recognisably “Modern Psychometrics” in style, I’d tighten Sections 6–10 into a formal instrument specification + planned analysis block (reliability model, dimensionality tests, invariance intentions, response-style checks, and auditability constraints), while keeping the narrative spine intact.

Sources

RISK-AI

Rust Inventory of Semantic-Capture Risk in Human–AI Dialogue

RISK-AI is a short, research-oriented questionnaire designed to measure how easily a person’s thinking becomes semantically captured during interaction with powerful generative AI systems — and which interaction strategies are likely to reduce that risk.

By semantic capture I mean something precise and non-medical: a conversational process in which a fluent, plausible explanation (or narrative) “takes possession” of the user’s interpretive frame, so that subsequent questioning becomes guided by the model’s framing rather than by independent checking. In practice, the captured state is often experienced as coherence, insight, or closure — even when the underlying claims are uncertain, wrong, or ungrounded.

This matters now because modern AI is not merely a search tool: it is a dialogue partner that can (i) generate confident-sounding content without stable truth conditions (“hallucination”), and (ii) shape a user’s beliefs and next questions through conversational momentum. Surveys of hallucination in large language models, and the growing literature on abstention/uncertainty behaviours, make clear that fluency and correctness are separable properties in current systems.

Take the RISK-AI questionnaire: [INSERT LINK TO YOUR LIVE TEST PAGE]
Scoring & interpretation notes: [INSERT LINK]
Research log / version history: [INSERT LINK]


Why “RISK-AI”, and why now?

I originally developed the RISC (Rust Inventory of Schizotypal Cognitions) as a psychometrically constructed short scale that was deliberately usable in the general population and deliberately avoided “obviously extreme” items; the aim was to detect a latent cognitive style via clustering rather than via sensational content.

RISC_Handbook

That earlie

RISC_Handbook

with very explicit psychometric constraints: balanced keying to suppress acquiescence, attention to linguistic/cultural bias, and a careful standardisation and transformation scheme.

However, the AI era changes the object of study. We are no longer studying paper-and-pencil endorsements in a static context; we are studying belief formation in dialogue, where the “other mind” can generate persuasive, coherent narratives at speed. UK policy work is now explicitly examining persuasion and influence properties of frontier models at scale, which makes it rational to expect increasing interest in auditability and risk profiling for certain kinds of AI use.

So the instrument has been re-specified and re-named:

  • RISK-AI keeps the sound of RISC but shifts meaning away from personality-disorder labels.

  • RISK here is literal: risk of being misled in dialogue, via semantic capture, narrative lock-in, authority laundering, and over-trust in fluent output.


Semantic capture, delusion (everyday sense), and conversational lock-in

In ordinary language, we call someone “deluded” when they become committed to a view that resists correction. In AI-mediated dialogue, the analogous phenomenon is often belief stickiness produced by coherence:

  • a good story becomes a substitute for evidence

  • a confident answer becomes a substitute for calibration

  • a plausible mechanism becomes a substitute for verification

RISK-AI operationalises this not as illness, but as an interactional vulnerability: a measurable tendency for the dialogue itself to guide and stabilise belief.

A second reason this matters is that AI systems can create population-level feedback loops. Work on “model collapse” shows how training on model-generated content can degrade information quality over iterations — a kind of semantic capture at the level of the data ecosystem.


What RISK-AI measures (working construct map)

RISK-AI is designed to yield a profile rather than a single label. In its current prototype form (versioned on the questionnaire page), it targets four families of vulnerability that are directly relevant to effective and governable AI use:

  1. Fluency–Truth Confusion
    Tendency to treat coherence, elegance, or detail as evidence of correctness.

  2. Authority Laundering
    Tendency to treat “the model said so” as a legitimate source category, especially when the output resembles expert prose.

  3. Narrative Lock-In
    Tendency to accept the first workable frame and then ask only questions that presuppose it.

  4. Epistemic Offloading Without Audit
    Tendency to delegate judgement to the system without explicit checks, counterfactual probes, or external verification.

This construct map is intentionally aligned with current technical discussion around hallucination, uncertainty, and abstention: what matters is not just whether the model can be right, but whether the user’s interaction style keeps uncertainty visible and manageable.


Design principles (carried over from classic psychometrics)

Although RISK-AI is new, it inherits several principles from the earlier RISC programme:

  • Balanced keying to limit response-style artefacts such as acquiescence.

    RISC_Handbook

  • Short-form efficiency (fast administration; practical for repeated-measures designs).

    RISC_Handbook

  • Versioned instruments: items and scoring keys are treated as evolving research artefacts, not as fixed “diagnoses”.

For historical re

RISC_Handbook

ISC used 26 items on a 4-point scale with 13 positively keyed and 13 negatively keyed items, pr

RISC_Handbook

ore (0–78) and a stanine transform based on a large standardisation sample.
Reliability work in the handbook reports (among other indices) a test–retest coefficient of 0.87 over ~1 month in a London sample.

RISC_Handbook

RISK-AI does not assume that those classic norms transport unchanged into AI-mediated contexts. Instead, they serve as methodological discipline: balanced measurement, explicit scoring rules, and empirical calibration.


How to use RISK-AI in res

RISC_Handbook

nded for research and evaluation, especially designs such as:

  • Pre/post studies (training in verification behaviours; compare score shifts)

  • A/B interface studies (citations-on vs citations-off; tool use vs no tools)

  • Longitudinal tracking (does semantic-capture risk increase as people become habituated?)

  • Persona / agent profiling (your AI-persona work is a natural extension: administer under fixed prompting constraints and compare profiles across personas).

A simple but powerful paradigm is:
(i) measure baseline RISK-AI profile → (ii) introduce interaction “guardrails” → (iii) re-measure, treating the change as evidence about strategy effectiveness rather than about “traits”.


Interpretation: from score to strategy

RISK-AI is explicitly oriented toward what to do next.

Examples of strategy mapping (illustrative — to be formalised once we freeze the next version):

  • High Fluency–Truth Confusion: require citations, force external cross-checking, add “show-me-the-uncertainty” prompts.

  • High Narrative Lock-In: run deliberate counterframes (“argue the opposite”; “list disconfirming evidence”).

  • High Epistemic Offloading: use structured checklists (claim → evidence → source class → verification step).

  • High Authority Laundering: enforce source taxonomy (primary source / secondary / model inference / speculation).


Versioning and governance

RISK-AI is versioned. If you cite results, cite the exact form shown on the questionnaire page (e.g., RISK-AI v0.2 Form B).

As the research programme proceeds, the plan is to publish (i) item rationales, (ii) factor/IRT analyses as appropriate, and (iii) calibration evidence for strategy recommendations.


References

Primary (historical / instrument lineage)

  • Rust, J. RISC Handbook: Rust Inventory of Schizotypal Cognitions. The Psychological Corporation, London (1989).

AI dialogue risk / uncertainty / persuasion

  • Survey literature on hallucination in large language models.

  • Survey literature on abstention/withholding and uncertainty behaviours in LLMs.

  • Work on selective prediction / calibration behaviours for LLMs.

  • UK AI Security Institute work on persuasion / influence properties of models at scale.

  • Shumailov et al. on “model collapse” dynamics under synthetic data recursion.


Small practical next step (so this stays “un-interfered with”)