LSTMs: Solving RNN Memory Limits
Key Points
- LSTMs (Long Short‑Term Memory networks) are designed to keep useful context while discarding irrelevant information, mimicking how human short‑term memory works in tasks like solving a murder‑mystery clue sequence.
- By examining an entire sequence (e.g., letters or words), an LSTM can infer patterns such as “my name is …” that aren’t obvious from isolated elements.
- An LSTM is a special kind of recurrent neural network (RNN) where each node’s output is fed back as part of the input for the next step, allowing the network to retain information across time steps.
- Standard RNNs suffer from the long‑term dependency problem—performance degrades as the sequence length grows—whereas LSTMs mitigate this issue through gated mechanisms that control what to remember and what to forget.
Full Transcript
# LSTMs: Solving RNN Memory Limits **Source:** [https://www.youtube.com/watch?v=b61DPVFX03I](https://www.youtube.com/watch?v=b61DPVFX03I) **Duration:** 00:08:19 ## Summary - LSTMs (Long Short‑Term Memory networks) are designed to keep useful context while discarding irrelevant information, mimicking how human short‑term memory works in tasks like solving a murder‑mystery clue sequence. - By examining an entire sequence (e.g., letters or words), an LSTM can infer patterns such as “my name is …” that aren’t obvious from isolated elements. - An LSTM is a special kind of recurrent neural network (RNN) where each node’s output is fed back as part of the input for the next step, allowing the network to retain information across time steps. - Standard RNNs suffer from the long‑term dependency problem—performance degrades as the sequence length grows—whereas LSTMs mitigate this issue through gated mechanisms that control what to remember and what to forget. ## Sections - [00:00:00](https://www.youtube.com/watch?v=b61DPVFX03I&t=0s) **Memory Limits and LSTM Analogy** - The speaker uses a murder‑mystery dinner scenario and extreme memory‑capacity examples to illustrate how LSTMs selectively retain relevant context while discarding irrelevant information for accurate sequence prediction. ## Full Transcript
imagine you're at a murder mystery
dinner
right at the start the lord of the manor
abruptly kills over and your task is to
figure out
who done it
it could be the maid
it could be the butler
but you've got a problem your short-term
memory isn't working so well you can't
remember any of the clues past the last
10 minutes well in that sort of
situation your prediction is going to be
well nothing better than just a random
guess
or imagine you have the opposite problem
where you can remember
every word of every conversation that
you've ever had if somebody asked you to
outline your partner's wedding vows well
you might have some trouble doing that
there's just so many words that you'd
need to process be much better than if
you could just remember
well the
memorable stuff
and that's where something called
long
short
term
memory
comes into play
also abbreviated as lstm
it allows a neural network to remember
the stuff that it needs to keep hold of
context but also to forget the stuff
that well is no longer applicable
so take for example this sequence of
letters
we need to predict what the next letter
in the sequence is going to be
well just by looking at the letters
individually it's not obvious what the
next sequence is like we have two m's
and they both have a different letter
following them
so how do we predict the sequence well
if we have gone back through the time
series to look at all of the letters in
the sequence we can establish context
and we can clearly see oh yes it's my
name is
and if we instead of looking at letters
looked at words we can establish that
the whole sentence here says my name is
oh yes martin
now a
recurrent neural network is really where
an lstm lives so effectively in lstm is
a type of recurrent neural network
recurrent
neural net
and recurrent neural networks
work
in the sense that they have a node so
there's a node
here and this node receives some input
so we've got some input
coming in
that input is then processed in some way
so there's some kind of computation and
that results in an output that's pretty
standard stuff but what makes an rnn
node a little bit different is the fact
that it is
recurrent
and that means that it loops around so
the output
of a given step
is provided alongside the input in the
next step
so step one has some input it's
processed and that results in some
output then step two has some new input
but it also receives the output of the
prior step as well that is what makes an
rnn a little bit different and it allows
it to remember previous steps in a
sequence so when we're looking at a
sentence like my name i we don't have to
go back too far through those steps to
figure out what the context is
but rnn does suffer from what's known as
the long term dependency problem which
is to say that over time as more and
more information piles up
then rnn's become less effective at
learning new things
so while we didn't have to go too far
back for my name i if we were going back
through an hour's worth of clues at our
murder mystery dinner well that's a lot
more information that needs to be
processed
so
the
lstm
provides a solution to this long term
dependency problem and that is to add
something called an internal state
to the rnn node
now when an rnn input comes in
it is receiving the state information as
well
so a step receives the output from the
previous step
the input of the new step and also
some state information
from the lstm state
now what is this state well it's
actually a cell let's take a look at
what's in there
so this is an lstm cell
and it consists of three parts
each part is a gate there is a forget
gate
there's an input gate
and there's an output gate
now the
forget gate
says what sort of state information
that's stored in this internal state
here
can be forgotten it's no longer
contextually relevant
the input gate says what new information
should we add or update into this
working storage state information and
the output gate says of all the
information that's stored in that state
which part of it should be output in
this particular instance
and these gates can be assigned numbers
between zero and one
where zero
means that the gate is effectively
closed and nothing gets through and one
means the gate is wide open and
everything gets through
so we can say forget everything or just
forget a little bit we can say add
everything to the input state or add
just a little bit and we can say output
everything or just output a little bit
or output nothing at all
so now when we're processing
in our rnn cell we have this additional
state information that can provide us
with some additional context
so if we take an example of another
sentence like
martin
[Music]
is
buying apples
there's some information that we might
want to store
in this state
martin is most likely to derive to the
gender of males so we might want to
store that because that might be useful
apples is a plural so maybe we're going
to store that it is a plural for later
on
now as this sentence continues to
develop it now starts to talk about
jennifer
jennifer is
at this point we can make some changes
to our state data so we've changed
subjects from martin to jennifer so we
don't care about the gender of martin
anymore so we can forget that part
and we can say the most likely gender
for jennifer is female and store that
instead
and really that is
how we can apply this lstm to any sort
of series where we have a sequence
prediction that's required and some long
term dependency data to go alongside of
it
now some some typical use cases for
using lstm machine translation is a good
one
and
another one are chat bots so q a chat
bots
where we might need to retrieve some
information that was in a previous step
in that chat bot and recall it later on
yeah all good examples of where we have
a time sequence of things and some long
term dependencies and
had we also applied lstm to our murder
mystery dinner we probably could have
won first prize by having it forecast to
us that whodunit was the butler
so is the butler
if you have any questions please drop us
a line below and if you want to see more
videos like this in the future please
consider liking and subscribing