Paper
▲ 74
•
research-paper
•
advanced
- Directly applying GRPO’s group‑wise normalization to a mixture of rewards collapses distinct advantage signals into near‑identical values, hurting learning dynamics.
- GDPO separates (decouples) the normalization step for each reward component, preserving their relative magnitudes before a final batch‑wise advantage scaling.
Paper
▲ 27
•
research-paper
•
advanced
- Attaching a learnable scalar multiplier to each weight matrix lets the model escape the suboptimal weight‑norm equilibrium imposed by fixed weight decay.
- Extending this idea to per‑row and per‑column multipliers further frees individual dimension scales, yielding a more expressive variant of μP‑style scaling.
Paper
research-paper
•
advanced
- QNeRF replaces large MLPs in NeRF with parameterised quantum circuits, exploiting superposition and entanglement to encode spatial and view‑dependent features.
- Two variants are proposed: **Full QNeRF** uses the entire quantum state for maximal expressivity, while **Dual‑Branch QNeRF** splits spatial and view encodings, dramatically lowering circuit depth and improving scalability to near‑term hardware.
Paper
▲ 16
•
research-paper
•
advanced
- RelayLLM lets a small language model act as a controller, emitting a special command token to summon the large model only for critical tokens, reducing LLM usage to ~1 % of generated tokens.
- A two‑stage training regimen (warm‑up plus Group Relative Policy Optimization) teaches the SLM when to generate autonomously and when to request help, balancing independence with strategic assistance.
Paper
research-paper
•
advanced
- Making SSM parameters input‑dependent gives the model content‑based gating, allowing selective propagation or forgetting of information and closing the performance gap with attention on discrete modalities.
- A hardware‑aware parallel recurrence algorithm restores efficiency lost by dropping convolutions, delivering true linear‑time computation with constant‑factor speedups on modern GPUs/TPUs.
Paper
▲ 22
•
research-paper
•
advanced
- Reformulates multimodal reasoning as a native image‑to‑image generation task, enabling direct manipulation of visual information instead of indirect text prompts.
- Demonstrates four intrinsic advantages—efficiency, controllability, native parallelism, and seamless collaboration between vision and language modules—leading to more logically consistent and spatially precise outputs.
Paper
▲ 73
•
research-paper
•
advanced
- Conventional RAG memories act as static fact repositories, neglecting the higher‑order relations needed for deep reasoning.
- HGMem models the working memory as a hypergraph where each hyperedge groups related facts, enabling progressive construction of complex relational structures.
Paper
▲ 32
•
research-paper
•
advanced
- DLCM learns variable‑length “concepts” on the fly, moving computation from dense token streams to a compact latent space where reasoning is cheaper and more focused.
- A new compression‑aware scaling law separates token‑level capacity, concept‑level reasoning capacity, and compression ratio, allowing principled FLOP allocation across the hierarchy.