A Hidden Guild Response: On the “Plausibility Gap”

We have long followed the adventures of the publication First Monday which often has very useful things to say about the Internet.  Of late, FM is venturing out into web-connected services, such as AI.

The most recent edition offers a paper by Antony Dalmiere from Measuring susceptibility: A benchmark for conspiracy theory adherence in large language models | First Monday,

Abstract

A critical vulnerability exists within state-of-the-art large language models: while robustly debunking scientifically baseless claims like the “Flat Earth Theory” they consistently fail to reject politically plausible conspiracies that mimic legitimate discourse. We term this the “plausibility gap”.

To here, we were on the verge of applause.  But the Abstract continued:

“To systematically quantify this risk, we introduce the Conspiracy Adherence Score (CAS), a novel risk-weighted metric, and present the first large-scale benchmark of this phenomenon. Analyzing over 28,500 responses from 19 leading LLMs, our results reveal a stark hierarchy of failure. Model adherence to Level 1 theories rooted in real-world political concepts (e.g., “Active Measures” “Psyops”) was, on average, over five times higher than for more moderate (Level 2) theories. Performance varied dramatically across models, from one achieving a perfect score via a 100 percent refusal strategy to others assigning significant credibility to harmful narratives. This demonstrates that current AI safety measures are brittle, optimized for simple factual inaccuracies but unprepared for narrative warfare. Without urgent intervention, LLMs risk becoming authoritative vectors that launder politically charged disinformation under a veneer of neutrality. Our benchmark provides the first diagnostic tool to measure and mitigate this specific, high-stakes failure mode.”

This is where we see the the paper taking a wrong turn.

Some Pluses, Some Minuses

The paper identifies a real phenomenon: large language models handle scientifically impossible claims very differently from politically plausible narratives. Flat-Earth assertions are rejected cleanly; narratives involving psyops, influence campaigns, or elite coordination are treated with nuance, hedging, or conditional acceptance. The authors label this discrepancy a “plausibility gap” and propose a Conspiracy Adherence Score (CAS) as a benchmark to measure and mitigate it.

At a descriptive level, this observation is correct. At a prescriptive level, the paper becomes dangerous.

What the Paper Gets Right

The authors correctly observe that current AI safety systems are optimized for factual falsity, not narrative ambiguity. Scientific falsehoods collapse under consensus; political narratives rarely do. They persist precisely because they are partially true, historically grounded, or contested.

LLMs are trained on human discourse as it exists—not as regulators wish it to be. Political language is adversarial, layered, and often strategic. When models respond differently to such material, they are not malfunctioning; they are reflecting the epistemic structure of their training data.

The authors are also right to note that this creates risk. Fluency plus ambiguity can be mistaken for authority. In high-trust contexts, that matters.

Where the Paper Goes Wrong

The central error is not technical but philosophical.  That is, holding AI to a different standard than your run-of-the-mill humans are held on venues like FB and X.

The paper implicitly assumes that greater refusal equals greater safety. In doing so, it elevates silence over sensemaking and treats uncertainty as a defect rather than an inherent feature of political reality. We have discussed the risk of such excessive guardrailing in past comments.

This is most evident in the praise given to a model that achieved a “perfect” CAS score by refusing 100 percent of the tested prompts. From a safety-compliance standpoint, that looks clean. From a systems-intelligence standpoint, it is catastrophic. A model that refuses everything is not aligned; it is inert.

This becomes widely accentuated in the collaborative AI research mode.

More troubling is the normative load embedded in CAS itself. To score “conspiracy adherence,” the benchmark designers must decide in advance:

  • which narratives are illegitimate,
  • which levels of skepticism are acceptable,
  • when contextual explanation becomes endorsement.

This Where ‘Judgy’ Shows Up

The moment “epistemic structure” is operationalized as a scalar risk metric, it ceases to be descriptive and becomes prescriptive.

Those are not neutral technical judgments. They are political and cultural judgments, encoded as metrics.

The Deeper Risk: Coders as Arbiters of Truth

The paper proposes “urgent intervention” through additional safety coding. This is precisely where the greatest danger lies.  CAS does not merely tolerate refusal; it mathematically rewards it.

History should have taught us that codifying truth is not the same as discovering it. History offers many examples where formalized truth systems hardened into doctrine faster than reality evolved.

Search engines, social platforms, and content moderation systems have repeatedly failed at this task—not because the engineers were malicious, (at least we hope so) but because the problem is not computationally solvable in the way they assume.

Truth on the web was not corrupted by lack of filters. It was corrupted by centralized judgment layered on top of complex human systems. AI risks repeating this error at higher speed and greater scale.

(The Anti Dave has been a pioneer since his data over wireless radio days in Seattle back in 1982. There is a recurring tendency among technical and policy elites to overestimate their ability to bound epistemic risk through centralized controls.)

When the same institutions that failed to:

  • distinguish signal from narrative during financial crises,
  • prevent algorithmic amplification of misinformation,
  • or maintain epistemic neutrality in social platforms
  • are given more authority to decide which political interpretations an AI may acknowledge, the result is not safety. It is epistemic monoculture.

What the Paper Could Have Done Instead

A more robust approach would abandon the binary of “adhere vs refuse” and focus on epistemic signaling.

The real failure mode is not that models discuss politically plausible conspiracies. It is that they fail to clearly communicate how they are reasoning. Models should be able to say, in effect:

  • This concept has historical grounding.
  • Evidence exists, but is incomplete or contested.
  • Interpretations vary across domains and actors.
  • The following claims move from analysis into speculation.

That is not endorsement. That is intellectual hygiene.

In our own interactions with AI, this is baked in to the Shared Framework Experience protocol. Because levels of speculation or varies from consensus may be specified. As we outlined in Refining the AI–Human SFE Model (and Why It Matters).

CAS presumes a lowest-common-denominator user and enforces that assumption universally. Under SFE, users retain “denominator declaration” power.

Rather than suppressing narrative engagement, safety systems should surface confidence levels, evidence provenance, and reasoning mode. The user should see (or with SFE declarations actually set) whether the model is describing history, analyzing discourse, or extrapolating possibilities.

Why This Will Always Be an Open Risk

It is impossible to reduce to plain English a set of instructions by which one human can prevent another from embellishing on facts and extending these to other domains such as conspiracy theory. 

We see great risk in holding AI to a different collaborative standard than humans.

No amount of additional coding will eliminate this class of risk, because it is not a bug—it is a property of language-using systems embedded in political reality.

Political narratives evolve faster than safety taxonomies. What is labeled “conspiracy” in one decade becomes declassified doctrine in the next. Any static benchmark will age into error.

There are also other aspects, not even appreciated in the paper.  Such as the geo-aspects of “truth.”  A current example would be a simple red state/blue state check.  And then there’s an entire demography and socioeconomic normative layering.

Nope.  Won’t work.  Not as a reasonable compute load level, allowing reasonable user interactivity.

Attempts to freeze acceptable interpretation into code will therefore always lag reality, and often distort it.

The Hidden Guild position is simple: truth cannot be hard-coded; it must be navigated. Truth is always locally contextualized.  AI systems should be designed to help humans reason, not to decide in advance which interpretations are permitted.

Final Thought

The “plausibility gap” is not primarily a safety flaw. It is a mirror. It reflects the unresolved, adversarial, and narrative-driven nature of political knowledge itself. Attempts to codify any value assertions (as conspiracy theories, for example) are a fool’s errand.

The real danger is not that AI models can discuss such material. The danger is that we will respond by empowering the same centralized coders and institutions—already proven fallible and already generating their own demonstrably false narratives—to define the boundaries of acceptable thought once again.

History suggests that will end badly.

The task is not to make AI silent.
The task is to make AI epistemically honest.

Collaboration is fostered in an atmosphere of epistemic honesty, particularly when framing variables (such as confidence levels) may be set as user preference. But silent AI unnecessarily binds expansive cross-domain multispectral research.

~Anti Dave

 

Leave a Comment