Spectral Attention Diagnostics Reveal Valid Mathematical Reasoning

← Back to Papers

Research Paper

Spectral Attention Diagnostics Reveal Valid Mathematical Reasoning

Authors: Valentin Noël

nlp ai-safety advanced • arXiv ↗ • HuggingFace ↗

Published: 2026-01-02 • Added: 2026-01-04

Key Insights

Treating attention matrices as token‑level graphs lets spectral analysis separate sound from unsound mathematical proofs.
Four graph‑spectral metrics (Fiedler value, high‑frequency energy ratio, smoothness, spectral entropy) achieve huge effect sizes (Cohen’s d ≤ 3.30) across seven models from four families, without any training or fine‑tuning.
A single threshold on any metric yields 85–96 % classification accuracy; calibrated thresholds reach 93–95 % on the full dataset.
The diagnostics capture logical coherence rather than merely formal verifier acceptance, uncovering mathematically valid arguments that automated checkers reject due to technicalities.
Model architecture influences the signal: sliding‑window attention (e.g., Mistral‑7B) shifts discriminative power from high‑frequency energy to late‑layer smoothness.

Abstract

We present a training-free method for detecting valid mathematical reasoning in large language models through spectral analysis of attention patterns. By treating attention matrices as adjacency matrices of dynamic graphs over tokens, we extract four interpretable spectral diagnostics, the Fiedler value (algebraic connectivity), high-frequency energy ratio (HFER), graph signal smoothness, and spectral entropy, that exhibit statistically significant differences between valid and invalid mathematical proofs. Experiments across seven transformer models from four independent architectural families (Meta Llama, Alibaba Qwen, Microsoft Phi, and Mistral AI) demonstrate that this spectral signature produces effect sizes up to Cohen's $d = 3.30$ ($p < 10^{-116}$), enabling 85.0--95.6\% classification accuracy under rigorous evaluation, with calibrated thresholds reaching 93--95\% on the full dataset. The method requires no training data, fine-tuning, or learned classifiers: a single threshold on a spectral metric suffices for high accuracy. Through systematic label correction, we discover that the spectral method detects logical coherence rather than compiler acceptance, identifying mathematically valid proofs that formal verifiers reject due to technical failures. We further identify an architectural dependency: Mistral-7B's Sliding Window Attention shifts the discriminative signal from HFER to late-layer Smoothness ($d = 2.09$, $p_{\text{MW}} = 1.16 \times 10^{-48}$), revealing that attention mechanism design affects which spectral features capture reasoning validity. These findings establish spectral graph analysis as a principled framework for reasoning verification with immediate applications to hallucination detection and AI safety monitoring.

Full Analysis

# Spectral Attention Diagnostics Reveal Valid Mathematical Reasoning **Authors:** Valentin Noël **Source:** [HuggingFace](None) | [arXiv](https://arxiv.org/abs/2601.00791) **Published:** 2026-01-02 ## Summary - Treating attention matrices as token‑level graphs lets spectral analysis separate sound from unsound mathematical proofs. - Four graph‑spectral metrics (Fiedler value, high‑frequency energy ratio, smoothness, spectral entropy) achieve huge effect sizes (Cohen’s d ≤ 3.30) across seven models from four families, without any training or fine‑tuning. - A single threshold on any metric yields 85–96 % classification accuracy; calibrated thresholds reach 93–95 % on the full dataset. - The diagnostics capture logical coherence rather than merely formal verifier acceptance, uncovering mathematically valid arguments that automated checkers reject due to technicalities. - Model architecture influences the signal: sliding‑window attention (e.g., Mistral‑7B) shifts discriminative power from high‑frequency energy to late‑layer smoothness. ## Abstract We present a training-free method for detecting valid mathematical reasoning in large language models through spectral analysis of attention patterns. By treating attention matrices as adjacency matrices of dynamic graphs over tokens, we extract four interpretable spectral diagnostics, the Fiedler value (algebraic connectivity), high-frequency energy ratio (HFER), graph signal smoothness, and spectral entropy, that exhibit statistically significant differences between valid and invalid mathematical proofs. Experiments across seven transformer models from four independent architectural families (Meta Llama, Alibaba Qwen, Microsoft Phi, and Mistral AI) demonstrate that this spectral signature produces effect sizes up to Cohen's $d = 3.30$ ($p < 10^{-116}$), enabling 85.0--95.6\% classification accuracy under rigorous evaluation, with calibrated thresholds reaching 93--95\% on the full dataset. The method requires no training data, fine-tuning, or learned classifiers: a single threshold on a spectral metric suffices for high accuracy. Through systematic label correction, we discover that the spectral method detects logical coherence rather than compiler acceptance, identifying mathematically valid proofs that formal verifiers reject due to technical failures. We further identify an architectural dependency: Mistral-7B's Sliding Window Attention shifts the discriminative signal from HFER to late-layer Smoothness ($d = 2.09$, $p_{\text{MW}} = 1.16 \times 10^{-48}$), revealing that attention mechanism design affects which spectral features capture reasoning validity. These findings establish spectral graph analysis as a principled framework for reasoning verification with immediate applications to hallucination detection and AI safety monitoring. --- *Topics: nlp, ai-safety* *Difficulty: advanced* *Upvotes: 0*