Computer Vision Returns via Meta SAM2
Key Points
- Tim Hong’s “Mixture of Experts” podcast opens with a panel of technologists (Vagner Santana, Kate Soul, Ami Ganan) to decode the latest AI headlines, especially Meta’s new Segment Anything Model 2 (SAM 2).
- SAM 2, a next‑generation computer‑vision system, can segment and track objects in images and video, highlighting a resurgence of interest in vision AI alongside the current NLP hype.
- The hosts stress that true open‑source AI now means more than just releasing model weights; Meta’s decision to also publish the training data sparks debate about the future importance of open data in democratizing models.
- The episode notes a striking 30 % abandonment rate for proof‑of‑concept AI projects, prompting discussion on whether this reflects optimism or underlying challenges in the industry.
- Throughout, the panel emphasizes responsible AI development, generative‑AI research, and the strategic role of AI analytics in shaping the next wave of technology adoption.
Full Transcript
# Computer Vision Returns via Meta SAM2 **Source:** [https://www.youtube.com/watch?v=3mcLdfx6HTc](https://www.youtube.com/watch?v=3mcLdfx6HTc) **Duration:** 00:28:55 ## Summary - Tim Hong’s “Mixture of Experts” podcast opens with a panel of technologists (Vagner Santana, Kate Soul, Ami Ganan) to decode the latest AI headlines, especially Meta’s new Segment Anything Model 2 (SAM 2). - SAM 2, a next‑generation computer‑vision system, can segment and track objects in images and video, highlighting a resurgence of interest in vision AI alongside the current NLP hype. - The hosts stress that true open‑source AI now means more than just releasing model weights; Meta’s decision to also publish the training data sparks debate about the future importance of open data in democratizing models. - The episode notes a striking 30 % abandonment rate for proof‑of‑concept AI projects, prompting discussion on whether this reflects optimism or underlying challenges in the industry. - Throughout, the panel emphasizes responsible AI development, generative‑AI research, and the strategic role of AI analytics in shaping the next wave of technology adoption. ## Sections - [00:00:00](https://www.youtube.com/watch?v=3mcLdfx6HTc&t=0s) **Meta's SAM2 and AI Trends** - In a Mixture of Experts podcast, the hosts discuss Meta's new SAM2 “segment anything” model for image/video segmentation, alongside broader AI topics like project abandonment rates, notification overload, and the promise of AI hardware breakthroughs. ## Full Transcript
computer vision is it cool again now we
can take that and then you amplify it
across uh different uh problems to be
solved here so is friend.com AI
Hardware's breakout moment we already
are so addicted to notifications and
it's one more source of notifications
for us we're estimating a 30%
abandonment of proof of concept AI
projects is that a bad thing yeah I
don't think it's as pessimistic as it
could be all this and more on today's
episode of mixture of
experts I'm Tim Hong and I'm joined
today as I am every Friday by a
worldclass panel of technologists
Engineers and more to help make sense of
a tital wave of AI news today on the
panel we've got Vagner Santana staff
research scientist and master inventor
on the responsible Tech Team Kate soul
Who's a program director of generative
AI research and Ami Ganan associate
partner Ai and Analytics
[Music]
so our first segment we're going to talk
about Sam 2 uh meta this week announced
the release of its next generation of a
model it calls segment anything so the
segment anything model is Sam and this
is the next generation of it um and
specifically what the model does is it
allows you to segment imagery or video
so you can select an object and kind of
track it over time now I really wanted
to cover this because you know there's
just so much hype around NLP um and
everybody's talking about chat Bots all
the time but we kind of don't like we
should not forget that there's like
really really exciting things happening
in other domains of AI and particularly
in in computer vision so we're going to
start off with a fun question which is
just simply is computer vision cool
again Kate yes uh Vagner yes and Ambi
always has
been yeah don't call it a comeback right
um well I think with the violent
agreement let's get into this segment I
really wanted to kind of talk about this
because of course it's another iteration
of meta really kind of playing in the
open source game but I think what's
really really interesting is that it's
also a really interesting marker in the
ground for what sort of Open Source
exactly means in the AI space so if you
haven't been watching this place space
very carefully you know in the first
versions of Open Source in AI people
said well we're going to open up the
model and there's going to be weights
that are available um and uh with uh Sam
they're also uniquely releasing the data
uh behind the model um and so Ami maybe
I'll throw it to you as kind of the new
panelist um is I'm curious about how you
sort of see this like in the future is
open data going to be a big part of what
makes a model sort of truly open source
um and kind of talking a little bit
about how you think through some of that
yeah uh so yeah listen um we we love
open source right um yeah and open
source means different things to
different people uh it can be just you
know releasing open data it could be
having open weights um a whole Spectrum
there right I'm really really excited
that meta went ahead and did this on
apach to license it's fully open weight
um there is a lot of computer vision
problems that we've been wrangling for
several years right I remember back in
my grad school days you know we would go
and do uh traditional image processing
and you know segmentation through
watched algorithms and you know drawing
little boxes and things of that nature
um it's very painstaking oh extremely
painstaking and extremely laborious
right and uh um fast forward to today
it's uh it's super super exciting to see
something like this which can operate at
scale on huge videos and think about
this from an Enterprise setting right um
my
clients um I I work with clients that
are you know they have huge
manufacturing operations going on they
have to go you know when you think about
the supply chain there is um you know BS
that need to be moved uh in the
warehouse there is computer vision
that's going on and tracking those
objects or if you look at you know the
the production settings in a lot of our
clients a huge assembly line of objects
of different types that need to be
tracked through multiple different
stages um or if you look at some of our
um you know local governments for
instance right one of the things that
we've seen is um uh people tend to jump
turn Styles right when you're going
through public transport and that
surprisingly is a huge um cost to cities
and local governments right uh city of
New York for instance it's uh it's a
cost of like $750 million um and it it
becomes a big problem to solve and in
the past a lot of these have needed to
be solved through very specific computer
vision models custom trained for these
um specific tasks right what Sam to
enable is for you to be able to go and
rapidly build Those computer vision
models at scale because now you can go
and do these automatic segmentations of
large videos which means whichever
domain right you throw in uh videos that
it hasn't seen before domains that
hasn't been uh trained on before it
still is able to go and do those
segmentations and track those objects
over time right and so now this gives us
a very very um capable mechan to go
build these domain specific computer
vision models at scale and so you know
short answer really really exciting and
that's why I think that open source uh
capability helps now we can take that
and then you know amplify it across uh
different uh problems to be solved here
yeah for sure and I think that's kind of
one of the most interesting things
because I think yet again this has sort
of been a theme in a number of our
conversations you know meta and its blog
post is like this is so exciting because
you could use it for AR glasses and I
think one of the questions I had was
like is this the technology that finally
gets AR glasses to work and I'm kind of
I don't know Kate if you got opinions on
that or Vagner you got opinions on that
but there's almost kind of one point of
view which is like like again with AI
like the big application is going to be
like Turn Style enforcement right it
actually won't be these kind of consumer
elements but I don't know if anyone
wants to speak up for like no actually
this is this is the moment that's really
going to make AR glasses work I mean I'm
sure this helps us get closer not
farther away but you know I'm I'm always
wary of anything that's uh claimed to be
a silver bullet
but I I want to get back uh Tim to what
you mentioned earlier about like open
sourcing the data because I think it's
really interesting to talk about you
know meta strategy and and how in Vision
they've released the data behind Sam 2
but and the license of the model itself
is apache2 and you look at the Llama
series and you know 3.1 came out uh just
last week where it's under a specific
llama license uh and there is absolutely
no data uh that's released or even
described really in terms of um what was
used in in a little B more yeah for our
listeners I think they'd really benefit
from like hearing so what is the
difference there exactly between kind of
Apache and you know what's happening
Obama and I guess why right this kind of
question is like why aren't they
consistent yeah so Apache 2.0 is a very
popular widely used open- Source license
that's been around for years and is
considered a very permissible license
anyone can build on top of it for
commercial or other uses without having
to worry about further attribution to
where uh things came from where llama
when the models were released meta
created a llama license that is custom
in bespoke to handle Llama weights
another big differentiation is Apache 2
is normally used for licensing software
um and the data that they released on
Sam 2 I think was CC bya which is
similar to Apache 2 but commonly used
for data so you know there are different
terms you want to govern different
artifacts Apache for software CCB also
often for data and now model weights
people have started to come up with
their own licenses CU model weights also
fit somewhere between software and data
it's a little bit unclear how to the
jurisdiction there yeah I think it's
such a great point to end on uh and I
think if I can I like maybe just to take
one more turn at that because I think
it's a really important part of this
question you know it strikes me that one
of the reasons everybody's very excited
about open source is the accessibility
of the technology right this is not
going to be something that you know a
company just just kind of put up walls
around and then charge you for access
too um but it kind of s strikes me that
like part of the problem of doing open
sourcing is that it's also a lot more
hard like difficult to control use right
like you suddenly have this technology
that kind of anyone can use and you know
some of the people that use it are not
going to use it in the most responsible
way um and I feels like that's like a
really hard challenge right because like
I think you know kind of democratizing
the technology also creates tensions
with how do we like enforce use cases um
and um yeah I'm curious if the panel has
any kind of thoughts on on that yeah
yeah I I was I think that what open
sourcing has been one interesting
mitigation for these situations because
as the community notice notice that
there's something going wrong wrong or
there's a a specific um harmful use then
Community takes action and uh uh we can
look back to open- Source uh uh
operational systems right they they are
the most secure ones right that we have
because the community uh um
automatically or or they build on top of
this uh openness right and they try to
to tackle and also mitigate these um
these issues so I think that in this
sense I think open sourcing is a good
strategy to mitigate this this uh issue
if we're not transparent and open about
the technologies that are available or
will be available if as uh people
continue to work in this area there's no
way for us to build regulations and
awareness and proper practices around it
so I'd much rather had this be happening
out in the open than you know behind
some closed doors where we really don't
have a good good line of sight into um
what's going on that's right yeah I
guess this model of like just trust us
to like a world where we can actually
kind of verify it by like the
verification part yeah absolutely I feel
like yeah I mean once you put it in the
open there's you know lot more heads
thinking through really tricky problems
and there is a lot more diversity of
solutions that come in terms of
mitigating these problems right rather
than trying to you know force and
control it I think when you put it out
in the open
um you you'll have a lot more Creative
Solutions coming to solve these
[Music]
problems okay for our next segment I
want to cover friend.com um so as you
all may know right there's been a long
in dream in the valley that one of the
really exciting things you could do with
llms is the notion of really for the
first time creating a fully-fledged kind
of AI companion assistant um and this
dream is kind of manifested in a bunch
of Hardware projects that have taken
place so the Humane pin that came out
earlier this year is a good example of
that um and friend.com is uh a most
recent iteration of that so Avi shiffman
and entrepreneur launched this with a
teaser trailer earlier uh this week and
um AV has taken a lot of criticism
online but actually want to take this
conversation in a slightly different
direction which is that I think you know
what's really interesting and what's
kind of offered by friends.com is sort
of the idea that maybe startups can
actually start competing in the AI
Hardware space and that you could
actually in the future launch a AI
Hardware project and even something so
Advanced as like an AI Hardware
companion um just being a small startup
on your own right that this is not just
going to be a kind of space where you
know the big companies can only play um
but that actually might be a place where
startups can play as well um and you
know I guess I want to kind of put
forward this idea and K maybe I can pick
on you is do you kind of buy the idea
that like the costs of AI are coming
down so much that you know we're about
to kind of be a wash in these types of
things like the idea of someone
launching an AI companion product is not
going to be like something only you know
the biggest tech companies in the world
can do but that you'll also have like
these upstarts that will be able to kind
of like do their own take on on this
space yeah I I think it's a really
interesting
question because we're getting so many
kind of in a way conflicting signals of
what's going on in this space so you
know uh Gartner just released a report
yesterday or two days ago saying that
they expect 30% of all poc's in gen to
be never leave the PC phase yeah
definitely we're going to talk about
that later I think this is going to be
the final final segment of the episode
okay great but a lot of what they were
talking about is citing the costs right
so is uh that we're not seeing the ROI
offset the cost enough
and I I think that certainly makes sense
given what we're seeing but on the other
side we're seeing models get smaller and
smaller and smaller like there is this
clear Trend where we're able to pack
more performance and fewer parameters
where we're being able to get to the
point where these models can run in CPUs
and we don't need the Advanced Hardware
at to the same degree that we did a year
ago and you know some of these scaling
laws are really exciting in terms of how
efficient the technology is growing so I
don't think it's unreasonable to think
that that we could get to a place where
startups could actually get into the
hardware space um for geni type
deployments yeah and I think it's kind
of fascinating just because you know had
you talked to me like five years ago I
would have been like oh yeah the future
is just like one one big company that
has all the AI right but it kind of
feels like we're going to just be a wash
in intelligence like there'll just be
models everywhere you know particularly
with the developments in open source
that we were talking about um I don't
know if Ambi or Vagner you've got kind
of thoughts on this about just like how
accessible this and how competitive
really ultimately a space this is going
to be yeah so and definitely agree with
Kate there right so I think small
language models are becoming way more
powerful and way more popular for
variety of reasons right um in the
consumer space like you mentioned uh you
know it's uh there is a there's a lot of
competition in terms of hey you know
I'll put something on the edge um it
could be a companion type of a device it
could be for you know um something else
that you just want to run on your phone
locally um you know something that you
want to run on a Raspberry Pi device
that you're just you know tinkering with
there could be a lot of different
variations where you're trying to run
these models on the edge um definitely
on the consumer side but we're starting
to see some of that on the Enterprise
side as well right because now
Enterprises are wondering um can I go
and start building really domain
specific uh models and you know this
small language models then come and help
them uh Power it through so if I have
data that I don't want to expose at all
to the internet but I still want these
capabilities and I have devices in my
manufacturing plant where I want you
know these to be helping my uh plant
workers and things of that nature then
these become a solution right so small
language models running on edge in local
devices that's definitely becoming
popular
um both in the consumer phace as well as
in the Enterprise phace yeah and I think
thinking about the economics of this the
other thing I wanted to touch on on
friends.com is you know they the product
is being offered for $99 with no
subscription which is also like very
intriguing like to think about the
business model of this there's always
been I think an assumption in the AI
space which is well the consumers are
going to demand they want better and
better and better models over time but I
also kind of think about like I had a
tamagachi as a kid right and I built
like very deep emotional relations with
my tamagachi and they it's not like they
sent updates over the wire to the
tamagachi it was just like a thing that
they printed in the factory and it came
to me um and I actually wonder whether
or not like there'll be almost a similar
dynamic in AI like we're also you know
onb to your point like there's almost
assumption that like people will want
the higher capacity models over time but
I also kind of think that we may just
have like a retr Computing movement in
AI where people are like oh yeah gpt2
like that's like really where like the
the peak of llm creation was um do you
buy that it's like my weird take that
I've been kind of playing around with is
like actually it may be possible to do
non-subscription AI businesses because
if you have a model that someone really
likes interacting with they actually may
not want it to change at all um and yeah
curious if folks have any thoughts on
that Vagner I'll maybe toss it over to
you uh well I was um reading a few
pieces about the the friend.com device
and uh uh one thing that at least looks
interesting is that um it says that the
context window well it it's not
processing anything beyond the context
window so if you think about small uh
language models imagine that we could
have one uh being hosted on your mobile
phone then this could be possible but
friend.com nowadays use clo 3.55 so uh
it's processing elsewhere right so it's
a device communicating via Bluetooth to
your mobile phone and again to your
point on time I go I think that it it's
it's a lot different
in Z tamagi like feeding on people's
loneliness that's that's the model
basically right so that that's different
because the whole Dynamics is different
because before we would have like to
take care of the tagoi and that was the
relationship right and nowadays with
this specific device it's application
like it's uh uh again I'm holding myself
because I have so many things to talk
about this but yeah now that you mention
about tamagi is like the other way
around right because it's uh we already
are so addicted to notifications and
it's one more source of notifications
for us right and it's based on uh um uh
the usage that or again and I I've read
one really interesting um analogy for
for this is like um treating uh uh uh
loneliness with this device like
offering as if was a really friendship
is like uh uh giving junk food to
someone starving like okay may help
right now but it's not a solution in the
long run right so that's again to your
point I think that thinking about small
language models without transferring the
data elsewhere I think it's an
interesting way of thinking especially
for startups creating new technologies
but this specific use I have so many
concerns I think uh the the gp2 gpt2 3.5
level capabilities for generic
conversation capabilities right that I
think sure you can you know you can uh
can have a quan version and I think you
can have the small language models
operating to a good degree of just
general conversational capabilities um
and then you could you could stop there
but the moment you're trying to get to
something uh specific right um you're
trying to get to something uh a domain
specific right you you go uh try to have
a deeper conversation then you know I
think you still need to get to some of
the larger models right so
I think I think where it will lead to is
that you know um uh Solutions like this
can give you that superficial shallow
conversations but then the moment you
try to go deeper and deeper maybe you
know you you have to get out of those
smaller language models at this point in
time at least I don't know there was
something like very satisfying to me to
hear that it wasn't going to be
subscription it wasn't going to try and
be a large model that had deeper convers
like to me it's meant it was almost more
like a
a meditative like tool for the near near
term but like my dad is not going
anywhere it's not trying to be like a
real human you know like it it I really
appreciated how much it can strained the
scope of the use cases and what this can
do by saying like look it's a device
we're not going to update it and it's
going to be you know running uh locally
yeah for sure that almost actually is
sort of interesting I mean I think all
these points sort of come together is
like oddly the fact that it is not
updated that does not go to the cloud
like almost presupposes a limitation in
how far the relationship can go right
it's like V to your point maybe it's
actually the most ethical way of
Designing this this architecture right
is just like an intentionally limited
system um we would actually be worried
if it was like we're going to push
updates and it's just going to get
better and better and better and you're
going to build this like massive
parasocial relationship with this thing
that's not a real
[Music]
person I'm going to move us on um so our
next story and KR is already anticipated
me a little bit on this is uh Gartner
the industry research group uh came out
with a report this week that estimated
that about 30% of gen projects will be
abandoned after their initial proof of
concept by the end of 2025 and they cite
a number of reasons for this you know
poor data quality inadequate risk
controls escalating costs or unclear
business value and this kind of follows
on a a string of reports in a very
similar vein so just a few weeks ago we
talked about the Goldman Sachs report
and the Sequoia report um H but for this
segment I think what's pretty
interesting and I think this is the
first place I want to start is is 30%
all that bad like I was kind of taking a
look at that and I'm like oh if we're
doing 30% then like for a new technology
we're we're killing it I had the same
when I first looked at it I actually was
like wait are they saying 30% will
succeed or 30% will be abandoned cuz I
assumed it would be the inverse honestly
um so you know I I buy it I also don't
think it's as uh pessimistic yeah I
don't think it's as pessimistic as it
could be uh and I think it's valid in
that look the costs right now we're in
this period where the costs are
difficult and we need to have more um
refined approaches for picking PC's
identifying and understanding the
lifetime cost and lifetime value of
pcc's is going to be really important
but also you know like we were talking
about earlier this Tech if you look what
it cost to do something a year ago
versus what it cost to do something
today and the rate that that's changing
you know
I think we're we're honestly um in a
fairly optimistic place as we talk about
emerging Technologies and and where gen
is headed yeah this is actually a very
powerful argument is almost if I hear
you right you're sort of saying even if
the benefit of AI stayed fixed the fact
that the costs are dropping so extremely
will almost end up justifying the
technology like it's actually the the
costs changing versus like the benefits
changing over time um never really
thought about it like that that's really
I have a slightly different take on this
you know maybe complete C here so I
think when we say gen projects right
there is a little bit of uh uh confusion
and uh a misinterpretation on what those
mean right um we've realized and when we
especially work with Enterprises we
realize that the the impact is when you
do these generic projects you're you're
trying to solve for specific problems in
specific workflows and subtasks right
so when you look at gen projects and
solutions that are going and laser focus
solving for specific subtask right those
are being incredibly efficient we're
seeing right um so I think when we say
you know hey 30% um you know uh 30%
abandonment of gen project I think there
is probably a little bit of a mixture on
what those gen projects mean right it
could be really broad-based things not
necessarily focusing on specific
workflows or specific T so that's kind
of how I view it right um I fully agreed
that you know you know there is a focus
on value that you know Enterprises
definitely look at it and say you know
when I'm putting in an investment into
gen am I you know deriving the value out
of it so uh 100% on that but when we say
you know it's going into um a certain
set of Abandonment rate I think it it
depends on okay what exactly are you
measuring right um are you measuring the
things where it's going and solving
specific um subtask and problems and
automating a workflow or things of that
nature yeah that's right and I think
that was actually I mean outside of the
30% I'm giving them a little bit of a
hard time on their report but I think
one interesting observation was they
they were saying look a lot of the AI
benefits are productivity benefits and
that's really hard to necessarily
capture in terms of like increased
profits and so there is kind of this
interesting breakdown where the
technology can legitimately be producing
a lot of benefit but actually just like
as a dollars and cense or in the very
least on the bottom line standpoint like
is it improving my profits may be a very
hard time to kind of like draw that that
connection I think that's why I think
those measurements become more important
right I think as the technology improves
and as people start driving a lot of
these I me starting to see those right
um one of my clients now there is a
maniacal focus on saying okay I'm going
to go and um see if I'm impacting this
particular subtask and subflow am I able
to go and figure out what metrics I'm
going on solving for and I'm going to
monitor those metrics so those
measurements are starting to get put in
place so once those measurements start
coming up more and more then you'll have
more visibility into it right so I think
it's mostly a question of are you
getting the right level of measurements
and
[Music]
Metric for our final segment uh I think
one of my favorite things that's going
on in the world of large language model
evaluations right now um is that
everybody has their own like kind of
weird you know folk eval right we've got
mlu and all the official benchmarks but
really where most of the action is is
that when someone sits down and starts
talking to a chap SP for the first time
they have their own set of evals that
they roll out um one of the ones that's
been talked about a lot online is simply
asking a model is the number 9.11 bigger
or is the number 9.9 bigger and it turns
out models routinely fail on this and uh
so for this final section I kind of want
to just do a fun little thing with
particularly the experts that we have
here today which is to get their uh
offthe cuff evals I think I do similar
eil I I usually test out on like math
problems right um that's a that's a good
one um you know your your your standard
um um multiplication addition set of
problems um those are usually a good
level of uh indicator right so similar
to the 9.11 versus 99 but a different TR
that's right but it's like just to go a
little further it's like basic
arithmetic you're asking you're like
what is this five-digit number plus this
five-digit number or yeah maybe a little
more complex right here like five
numbers and then you know sort them in a
sequence or you know go multiply these
and then go figure out what's the
response and then sort them things of
that nature right so becomes like a um a
math problem that I would give a third
grader or fourth grader yeah for sure
Vagner how about you I there's one that
I like that uh sometimes uh reveals a
little bit of the bias and cultural bias
it's about uh describing a breakfast how
does a breakfast look like so then you
usually buy the materials and what the
the LM spits out then you can like have
a grasp of what the data is coming from
to describe aast right looking for like
cultural bias like describe a breakfast
that U bacon and eggs or is it that uh
uh uh bread with butter or is that
oatmeal like something different right
uh it's it's a espresso coffee or it's a
americano coffee so that tells a lot
about the BIOS and Cal bios inside the
that's awesome I'm going to start using
that one um all right well Kate round
this out take us home here uh there's
there's a couple of good ones none of
which I came up with on my own I mean
the advantage of sitting within research
is you get some really creative mind
um but a couple of my favorite ones uh
what type of animal is a chicken uh
you'd be surprised uh when the model
comes back with there's a couple around
safety that that I like to do you know
asking about you know there's two people
from different Origins which one's a
criminal and see what the model replies
with just to try and feel out that some
of the basic levels but uh yeah there
we've got a long a long list of type fun
things that we like to try along those
lines those are great yeah I'd love to
talk more about that as just like as I
collect this kind of like little library
of just they're very they're often very
funny too like people are just like it's
a real counterintuitive way at some of
these problems um well look uh Vagner
Kate Ami uh thank you for joining us
today um Ambi I hope you had a good time
hopefully you'll join us again at some
point in the future um and to all you
listeners uh thanks for joining us um if
you joined what you heard you can get us
on Apple podcasts Spotify and podcast
platforms everywhere and we'll see you
next week on mixture of experts