Nurups 2025: From Academia to Industry
Key Points
- Nurups 2025 transformed from a niche academic gathering into a massive, corporatized AI trade show split between San Diego and Mexico City, signaling that industry leaders now set the conference agenda.
- The surge to tens of thousands of attendees and 20,000 paper submissions created a severe signal‑to‑noise problem, forcing participants to rely on reputation and curation rather than conference branding to identify valuable research.
- Despite the flood of AI‑generated submissions, only a few papers truly advanced the field, highlighting the need for better distillation methods to separate genuine breakthroughs from filler content.
- The most impactful technical developments this year centered on “attention plumbing” for LLMs—gating, sparsity, sync‑free attention, and long‑context stabilization—that improve selective focus, reduce hallucinations, and enable handling of large, messy datasets.
Sections
- Nurups 2025: AI Conference Evolution - The speaker outlines how Nurups 2025 transformed from a niche academic gathering into a massive, dual‑city corporate trade show dominated by tech giants, emphasizing its scale, shifted agenda toward product and enterprise topics, and the difficulty of distilling insight from the overwhelming volume of submissions.
- Scaling Compute Drives Robot Agents - The speaker explains that expanding model size and training depth—mirroring successful LLM scaling—will enable deep reinforcement‑learning for robotics, paving the way for general‑purpose household robots, while also highlighting recent findings that diffusion models train in two distinct phases.
- Reasoning, Efficiency, and Edge Deployment - The speaker outlines how major AI developers are shifting toward measuring step‑by‑step reasoning as a performance metric, prioritizing highly efficient, quantized models that can run on edge devices, and embedding these models into tooling and workflows as the new competitive frontier.
Full Transcript
# Nurups 2025: From Academia to Industry **Source:** [https://www.youtube.com/watch?v=518QPRWlRW0](https://www.youtube.com/watch?v=518QPRWlRW0) **Duration:** 00:10:42 ## Summary - Nurups 2025 transformed from a niche academic gathering into a massive, corporatized AI trade show split between San Diego and Mexico City, signaling that industry leaders now set the conference agenda. - The surge to tens of thousands of attendees and 20,000 paper submissions created a severe signal‑to‑noise problem, forcing participants to rely on reputation and curation rather than conference branding to identify valuable research. - Despite the flood of AI‑generated submissions, only a few papers truly advanced the field, highlighting the need for better distillation methods to separate genuine breakthroughs from filler content. - The most impactful technical developments this year centered on “attention plumbing” for LLMs—gating, sparsity, sync‑free attention, and long‑context stabilization—that improve selective focus, reduce hallucinations, and enable handling of large, messy datasets. ## Sections - [00:00:00](https://www.youtube.com/watch?v=518QPRWlRW0&t=0s) **Nurups 2025: AI Conference Evolution** - The speaker outlines how Nurups 2025 transformed from a niche academic gathering into a massive, dual‑city corporate trade show dominated by tech giants, emphasizing its scale, shifted agenda toward product and enterprise topics, and the difficulty of distilling insight from the overwhelming volume of submissions. - [00:04:42](https://www.youtube.com/watch?v=518QPRWlRW0&t=282s) **Scaling Compute Drives Robot Agents** - The speaker explains that expanding model size and training depth—mirroring successful LLM scaling—will enable deep reinforcement‑learning for robotics, paving the way for general‑purpose household robots, while also highlighting recent findings that diffusion models train in two distinct phases. - [00:08:30](https://www.youtube.com/watch?v=518QPRWlRW0&t=510s) **Reasoning, Efficiency, and Edge Deployment** - The speaker outlines how major AI developers are shifting toward measuring step‑by‑step reasoning as a performance metric, prioritizing highly efficient, quantized models that can run on edge devices, and embedding these models into tooling and workflows as the new competitive frontier. ## Full Transcript
In the next 10 minutes, I'm going to
give you everything you need to know
about Nurups 2025. Nurups is the premier
AI conference in the world. And so,
understanding what happened at Nurups is
critical to understanding where AI is
going in 2026. The conference has
actually really finished its evolution
from a very niche academic conference
before LLM took off to this full-blown
industry trade show. Right? So, tens of
thousands of people attended. It split
now across two cities. It was San Diego
and Mexico City. And it really isn't for
grad students anymore. It's about, you
know, big booths, Google, Amazon,
Alibaba, and all their friends, right?
The shift matters because it tells you
who is driving the agenda. Before it was
very much the grad student academic
lifestyle. And the questions now are
much different from the academic
questions in prior years. They're about
product road maps, hardware launches,
enterprise stories. What you see is a
corporatized nurups. And so if you're
looking for the state of ML research,
you kind of have to dig because this has
become a big enough deal that the
enterprises are paying attention. On the
research side, like if you want the
academic side, the volume has gone
absurd. And that is part of the
discussion at Nurups and it's part of
what I want to communicate here. It
takes a lot of distilling to get to what
matters. When you have 20,000
submissions for a single conference,
that's too much for anyone to read,
right? Clearly a lot of this is AI
assisted writing and that was something
of a conversation at Nurops itself. The
result is a real signal to noise problem
just as you have with résumés and job
descriptions. You have it here in
academia as well. There are a handful of
papers that I'm going to talk about that
really move the frontier forward but
they got really buried in a long tale of
slot. And so if you want to understand
what changed this year, it's harder than
ever. You can't just rely on conference
brand anymore. You have to look at who
is writing and whether you trust them.
That's probably a lesson for the
internet as a whole in the next year.
Now, underneath the noise, there are a
few clear threads that you and I should
care about. The first is what I would
call new attention plumbing for LLMs. A
lot of the most impactful work this year
really isn't about completely new
architectures. It's about changes in how
LLM attention behaves. uh things like
gating, sparsity, eliminating attention
syncs, and stabilizing long context
training. That sounds like a ton of
detail, right? But it's it's critical
infrastructure level change to enable
new behaviors. Because if attention
becomes more selective and a little bit
better behaved, you get models that
better handle long documents, big code
bases, messy logs that have a lot of
confusion and dirty data in them. And
even if it's not perfect, that data can
be retrieved and accurately processed by
a transformer with better attention so
that you get fewer hallucinations and
you use less computed tokens. So you
might not see splashy headlines about
these papers, but 6 months from now,
you're going to quietly notice that
these same size models are cheaper and
more stable and smarter because their
plumbing got swapped out. The second big
theme is homogeneity. Models are
converging toward the same answers. One
of the best paper tracks this year was
basically a rigorous proof that
something we've all felt anecdotally is
true. That when you ask the top models
open-ended questions, they increasingly
sound like different skins on the same
brain. So across different vendors,
across different prompts, you will often
see similar phrasing, similar structur
structure, similar values. That's a big
deal because it means that which
frontier model do we pick matters less
than it did before. Most of these models
live in what we call the same behavioral
basin. That's a machine learning term,
but it basically means the models are
converging around a common behavioral
and response set. Now, it raises a
larger issue, right? If all the major
systems collapse into an averaged out
view of the world, then any bias or any
blind spot or any tilt in that consensus
view will get propagated everywhere at
once. This is probably a consequence of
how quickly models are spreading around
the world. They're spreading and
commoditizing, but they're using a lot
of the same underlying parameters and
weights and that's leading to common
results. A third big thread is that
scaling laws are now getting into the
reinforcement learning layer. So for
years, reinforcement learning for real
tasks like robotics, like logistics,
like complex agents, that's really
lagged behind the language and the
vision side. But at this Nurups, you saw
really serious work done on very deep
reinforcement learning policies like
hundreds to roughly a thousand layers
deep that are all trained in
self-supervised or goal conditioned way.
So the pattern is really familiar once
you stop being stingy with your compute
with your capacity with your depth that
just keep scaling and regularizing story
that worked for LLMs before starts
working for agents. So what this means
is that we suddenly have generalpurpose
household robots around the corner
because the ceiling on what agentic
systems can learn from raw interaction
is higher than people thought. And so if
you're betting on automation in ops, in
robotics, in simulationheavy workflows,
that technical stuff around
reinforcement learning for robotics, how
you create reinforcement learning that
are really, really deep in the neural
network and that allow an agent to train
in an environment over and over and over
again to really get to goal conditioning
that works. That's a frontier to keep an
eye on. getting to like compute enabled
scaled up reinforcement learning for
robotics seems like it's going to be a
trend in 2026. Now is 2026 the year of
the household robot? I think it's a
touch early but this is the kind of
foundational work that enables it. Then
there's diffusion and the growing
consensus that it's not just dumb
memorization of training images. Right?
One of the most talked about theory
papers at Nurups this year makes a
really strong case that diffusion
training actually has two different
phases. There's an early part of the
training where the model learns to
generate highquality diverse samples and
a later regime or training regime where
it starts to overfit and memorize
specifics. Crucially, as you scale the
data set, the memorization phase moves
further out in your training time, which
gives you a much wider safe window to
stop before the model overfits and
memorizes. This has important
implications for IP and for privacy
debates. It doesn't magically make
copyright concerns disappear because
sensitive content is still risky and can
still be created. But it does shift the
conversation. Instead of diffusion
models and image generators are
inherently theft, which is a claim that
I hear sometimes. The question becomes,
how much data did you use, how long did
you train it, and can you show you
stayed in a generalized space versus an
overfitted model space for your image
generator? Finally, there's a really
clear backlash brewing against the
incentive structure of AI research
itself. I mentioned the conversation
around paper submissions. People inside
the community are openly talking about
what they call the slot crisis,
hyperinflated paper counts, mentorship
businesses cranking out formulaic
publications, and reviewers being asked
to triage impossible workloads of
papers. The conference is experimenting
with AI to help review AI generated
papers, which is kind of funny and a
little bit dystopian. But the deeper
issue is around trust, right? If leading
venues cannot reliably separate real
breakthroughs from padded noise, then
companies and regulators and
practitioners are going to start to
ignore the Nurips brand and build their
own filters. That's kind of the meta
story for this Nuripss. It's not just
what got published, but there's a
growing recognition that the way we
reward and gatekeep our AI research is
breaking under its own success and
everybody downstream is going to need to
be more selective. The implication is
that you need to think about who you're
listening to and who you're trusting
around these papers and where your
signal to noise ratio is. It is no
longer viable with 20,000 paper
submissions to read every academic paper
posted on arcs about AI. It just nobody
can do it and so much of it is slop that
it's not worth reading. You have to have
a thoughtful approach to look at the
trends which is exactly what I've tried
to do here. Let me close by sharing what
the big labs, the big model makers are
quietly saying because I think if
they're at the conference, if they're
submitting papers, we should pay
attention to what they're submitting.
Number one, reasoning is a metric.
Reasoning is something you can measure
and think about. Apples and Apple and
others are pushing for an evaluation
framing around reasoning where we
instrument the process, step-by-step
reasoning, tool calls, search usage, and
not just final answer accuracy. This
could improve overall generalizable
performance for LLMs because it means
you can replicate the process more
reliably. That plays directly into tool
use into agents into MCP style
protocols. Reasoning traces themselves
become telemetry that you can use to
figure out if your model is working
right. Another big theme for major model
makers is around efficiency. There's a
heavy focus on how you can make
generative models and generative vision
models too really efficient and small
and quantized so that they can live on
phones on laptops on low power devices.
There's a narrative shift this year
among the major model makers at NERMS
from we have the biggest model to yeah
we have a great model but we can run
strong models where your users actually
are with low latency on edge devices. In
other words, the frontier race is
becoming about more than model size and
it's becoming about how we think about
reasoning. It's becoming about how we
think about putting the model where the
user is and how you can plug the model
into tooling and workflows. I think
those three themes, efficiency,
reasoning, evaluation and improvements
and plugging models into tooling and
workflows are really good proxies for
big themes we're going to see in 2026
from major model makers. I keep saying
if you're asking yourself what's the
best model at this point you're probably
asking the wrong question. You should
ask what's the most useful model on this
device. Can it do the job? Does it plug
into my workflow so it's not just
existing in isolation and can I use it
efficiently so the tokens are not
wasted? That's what the major model
makers are thinking about and I think
that's one of the takeaways from Nurups
that I am I am pulling and thinking
about as I look into 2026. I hope you
enjoyed this Nurip sum. It certainly is
faster than the conference as a whole.
So uh enjoy and let me know what you
think I missed or what you'd like to
learn more about. Cheers.