Avoiding Common Forecasting Model Pitfalls
Key Points
- The video outlines three common forecasting pitfalls, focusing first on **under‑fitting**, where an overly simple model fails to capture the true relationship between inputs and outputs, resulting in high bias and low variance.
- To remedy under‑fitting, the presenter suggests **reducing regularization**, **adding more training data**, and **enhancing feature selection** to introduce stronger, more relevant predictors.
- The second pitfall discussed is **over‑fitting**, which occurs when a model is too tightly tuned to the training data, mistaking noise for signal and leading to low error on training data but poor performance on unseen data.
- Over‑fitting often arises from **over‑correcting under‑fitting** and is characterized by very low bias but high variance, making it harder to detect than under‑fitting.
- The overall message emphasizes balancing model complexity and regularization, using sufficient data and appropriate features, to achieve reliable, generalizable forecasts.
Full Transcript
# Avoiding Common Forecasting Model Pitfalls **Source:** [https://www.youtube.com/watch?v=0RT2Q0qwXSA](https://www.youtube.com/watch?v=0RT2Q0qwXSA) **Duration:** 00:06:48 ## Summary - The video outlines three common forecasting pitfalls, focusing first on **under‑fitting**, where an overly simple model fails to capture the true relationship between inputs and outputs, resulting in high bias and low variance. - To remedy under‑fitting, the presenter suggests **reducing regularization**, **adding more training data**, and **enhancing feature selection** to introduce stronger, more relevant predictors. - The second pitfall discussed is **over‑fitting**, which occurs when a model is too tightly tuned to the training data, mistaking noise for signal and leading to low error on training data but poor performance on unseen data. - Over‑fitting often arises from **over‑correcting under‑fitting** and is characterized by very low bias but high variance, making it harder to detect than under‑fitting. - The overall message emphasizes balancing model complexity and regularization, using sufficient data and appropriate features, to achieve reliable, generalizable forecasts. ## Sections - [00:00:00](https://www.youtube.com/watch?v=0RT2Q0qwXSA&t=0s) **Avoiding Underfitting in Forecasting** - The speaker defines underfitting as a bias‑heavy, low‑variance issue caused by overly simple models, shows how to spot it in training results, and recommends increasing model complexity or reducing regularization to improve predictive performance. ## Full Transcript
if your forecasting model looked like
this
but in reality the numbers came out
looking more like this
well you might be asking yourself what
went wrong
we're going to look at three common data
model forecasting pitfalls to understand
what they are
why they happen
and how to avoid them so first up number
one that is called
under
fitting
now this is a scenario in data science
where a data model is unable to capture
the relationship between the input and
the output variables accurately and
under fitting usually occurs when a
model is too simple it just cannot
establish the dominant trend within the
data and if a model can't generalize
well to new data
then it's not going to do a very good
job with prediction tasks you'll get bad
forecasting models now an optimal data
model in this training data set it might
look something
roughly like that
an underfit model well that will look
more like
this
there's high bias here and low variance
see how straight that line is and that
is a good indicator that you have under
fitting now fortunately under fitting is
usually quite easy to spot it shows up
even when modeling the training data set
to fix it we need to better establish
the dominant relationship between the
input and the output variables at the
onset to build a better fitting and
likely a little bit more complex model
now three ways that you can do that one
of those
is to decrease
something called
regular
regularization
now this really means that we're going
to let the model be a little more free
and how it draws its relationships
between inputs and outputs and there are
a number of methods like l1
regularization and lasso regularization
which help to reduce the noise and
outliers within a model
second thing you could do is to increase
your training
data
stopping training too soon is a common
cause of underfitting
more data can lead to a better fitting
model and finally the other thing you
might want to consider
is called feature
selection
now this is used in any model where we
want to take specific features to
determine a given outcome and if there
are not enough predictive features
present then more features or features
just of greater importance should be
introduced that's what feature selection
is
okay
number two we've had underfitting number
two is over
fitting
the
thing with overfitting is it occurs when
a statistical model fits exactly exactly
against its training data and when this
happens the algorithm can't perform
accurately against unseen data which
well rather defeats its purpose oh and
here's the rub with overfitting it can
be caused by addressing
underfitting a little too aggressively
now an overfit model it might look a bit
more
like
this
it has a low error rate and a very high
variance this is anything but straight
now here the model is just so well tuned
to the training data it's mistaken the
noise or some of the irrelevant
information from the training data set
as the signal now unlike under fitting
overfitting isn't always so easy to
initially detect and to find it we need
to test we can test for model fitness
using techniques like k-fold
cross-validation and that splits
training data into equally sized subsets
they're called folds and that results in
an evaluation score for your model
now to prevent overfitting you might
want to consider some of the following
techniques
one
is data
augmentation
while it is better generally to inject
clean relevant information into your
training data set sometimes a little bit
of noisy data is added to make the model
a little bit more stable
another method is
ensembl methods
well some more methods are made up of a
set of classifiers and their predictors
are aggregated to identify the most
popular result bagging is one such
method where multiple models are trained
in parallel on different subsets of data
and then
option number three something called
early
stopping
can be used and this method seeks to
pause training before the model starts
learning the noise within that model of
training data
of course you do need to be careful not
to stop too soon or you'll be dealing
with a bad case of
underfitting
okay so finally in addition to
underfitting and overfitting another
common problem is number three
bad
data
now this is data that is incorrect or
irrelevant or incomplete and bad
training data can lead to higher error
rates and biased decision-making even
when the underlying model is sound
data forecasting models are only as
effective as the data they're trained on
now some tips to avoid bad data
firstly you want to ensure that your
data is accurate and complete by
performing cross
checking
and by that i mean cross checking it
against other data sources
another thing to do is if we think about
outliers rid of them
so some outliers can really skew results
they can take an obscure thing and put
it into a training data set and make
that situation seem more likely to occur
again than it really is likely to
and then finally you need to make sure
that this data
is timely
outdated data
is bad data
now keep these three things in mind and
you should be well on your way to
developing better fitting models
if you have any questions please drop us
a line below and if you want to see more
videos like this in the future please
like and subscribe
thanks for watching