Learning Library

← Back to Library

Efficiency

8 items in this topic

Paper

Decoupled Reward Normalization for Stable Multi‑Reward RL

  • Directly applying GRPO’s group‑wise normalization to a mixture of rewards collapses distinct advantage signals into near‑identical values, hurting learning dynamics.
  • GDPO separates (decouples) the normalization step for each reward component, preserving their relative magnitudes before a final batch‑wise advantage scaling.
Paper

Learnable Multipliers for Adaptive Scale in LLM Matrix Layers

  • Attaching a learnable scalar multiplier to each weight matrix lets the model escape the suboptimal weight‑norm equilibrium imposed by fixed weight decay.
  • Extending this idea to per‑row and per‑column multipliers further frees individual dimension scales, yielding a more expressive variant of μP‑style scaling.
Paper

Quantum‑Enhanced Neural Radiance Fields for Compact 3D Synthesis

  • QNeRF replaces large MLPs in NeRF with parameterised quantum circuits, exploiting superposition and entanglement to encode spatial and view‑dependent features.
  • Two variants are proposed: **Full QNeRF** uses the entire quantum state for maximal expressivity, while **Dual‑Branch QNeRF** splits spatial and view encodings, dramatically lowering circuit depth and improving scalability to near‑term hardware.
Paper

Token‑Level Collaborative Decoding for Efficient LLM Reasoning

  • RelayLLM lets a small language model act as a controller, emitting a special command token to summon the large model only for critical tokens, reducing LLM usage to ~1 % of generated tokens.
  • A two‑stage training regimen (warm‑up plus Group Relative Policy Optimization) teaches the SLM when to generate autonomously and when to request help, balancing independence with strategic assistance.
Paper

Mamba: Fast Linear‑Time Sequence Modeling with Input‑Conditioned State Spaces

  • Making SSM parameters input‑dependent gives the model content‑based gating, allowing selective propagation or forgetting of information and closing the performance gap with attention on discrete modalities.
  • A hardware‑aware parallel recurrence algorithm restores efficiency lost by dropping convolutions, delivering true linear‑time computation with constant‑factor speedups on modern GPUs/TPUs.
Paper

DiffThinker: Diffusion‑Based Generative Multimodal Reasoning

  • Reformulates multimodal reasoning as a native image‑to‑image generation task, enabling direct manipulation of visual information instead of indirect text prompts.
  • Demonstrates four intrinsic advantages—efficiency, controllability, native parallelism, and seamless collaboration between vision and language modules—leading to more logically consistent and spatially precise outputs.
Paper

Hypergraph‑Based Memory for Enhanced Multi‑Step RAG

  • Conventional RAG memories act as static fact repositories, neglecting the higher‑order relations needed for deep reasoning.
  • HGMem models the working memory as a hypergraph where each hyperedge groups related facts, enabling progressive construction of complex relational structures.
Paper

Hierarchical Language Modeling with Dynamic Concept Compression

  • DLCM learns variable‑length “concepts” on the fly, moving computation from dense token streams to a compact latent space where reasoning is cheaper and more focused.
  • A new compression‑aware scaling law separates token‑level capacity, concept‑level reasoning capacity, and compression ratio, allowing principled FLOP allocation across the hierarchy.