Learning Library

← Back to Library

Ai-Safety

5 items in this topic

Paper

Agent-as-a-Judge: Structured LLM Evaluation Framework

  • Pure LLM judges often mis‑evaluate complex, multi‑step outputs because they lack explicit reasoning and verification mechanisms.
  • The paper introduces a modular “agent‑as‑judge” system that first plans an evaluation strategy, then invokes external tools (e.g., calculators, code runners) to verify intermediate claims.
Paper

Entropy‑Guided Token Attacks on Vision‑Language Models

  • Tokens with the highest predictive entropy dominate the semantic output of V‑L models; tampering only with these few tokens yields large degradations.
  • Entropy‑driven attacks achieve comparable (or greater) success with far lower perturbation budgets than naïve or gradient‑based token attacks.
Paper

Topological Reasoning via Holonomic Neural Networks

  • Traditional Transformers and RNNs reside in a “Metric Phase” where causal order can be broken by semantic noise, causing hallucinations.
  • By formulating inference as a Symmetry‑Protected Topological (SPT) phase, logical operations become analogous to non‑Abelian anyon braiding, giving them immunity to local perturbations.
Paper

Hypernetwork‑Driven Private Conditional VAEs for Federated Synthesis

  • A shared hypernetwork generates client‑specific VAE decoders and class‑conditional latent priors from lightweight private codes, enabling personalization without exposing raw data.
  • Differential‑privacy is enforced at the hypernetwork level by clipping and adding Gaussian noise to aggregated gradients, protecting against gradient‑based leakage.
Paper

Spectral Attention Diagnostics Reveal Valid Mathematical Reasoning

  • Treating attention matrices as token‑level graphs lets spectral analysis separate sound from unsound mathematical proofs.
  • Four graph‑spectral metrics (Fiedler value, high‑frequency energy ratio, smoothness, spectral entropy) achieve huge effect sizes (Cohen’s d ≤ 3.30) across seven models from four families, without any training or fine‑tuning.