Learning Library

← Back to Library

Ai-Safety

5 items in this topic

Paper

Agent-as-a-Judge: Structured LLM Evaluation Framework

▲ 5 • research-paper • advanced

Pure LLM judges often mis‑evaluate complex, multi‑step outputs because they lack explicit reasoning and verification mechanisms.
The paper introduces a modular “agent‑as‑judge” system that first plans an evaluation strategy, then invokes external tools (e.g., calculators, code runners) to verify intermediate claims.

Paper

Entropy‑Guided Token Attacks on Vision‑Language Models

▲ 6 • research-paper • advanced

Tokens with the highest predictive entropy dominate the semantic output of V‑L models; tampering only with these few tokens yields large degradations.
Entropy‑driven attacks achieve comparable (or greater) success with far lower perturbation budgets than naïve or gradient‑based token attacks.

Paper

Topological Reasoning via Holonomic Neural Networks

research-paper • advanced

Traditional Transformers and RNNs reside in a “Metric Phase” where causal order can be broken by semantic noise, causing hallucinations.
By formulating inference as a Symmetry‑Protected Topological (SPT) phase, logical operations become analogous to non‑Abelian anyon braiding, giving them immunity to local perturbations.

Paper

Hypernetwork‑Driven Private Conditional VAEs for Federated Synthesis

research-paper • advanced

A shared hypernetwork generates client‑specific VAE decoders and class‑conditional latent priors from lightweight private codes, enabling personalization without exposing raw data.
Differential‑privacy is enforced at the hypernetwork level by clipping and adding Gaussian noise to aggregated gradients, protecting against gradient‑based leakage.

Paper

Spectral Attention Diagnostics Reveal Valid Mathematical Reasoning

research-paper • advanced

Treating attention matrices as token‑level graphs lets spectral analysis separate sound from unsound mathematical proofs.
Four graph‑spectral metrics (Fiedler value, high‑frequency energy ratio, smoothness, spectral entropy) achieve huge effect sizes (Cohen’s d ≤ 3.30) across seven models from four families, without any training or fine‑tuning.