Learning Library

← Back to Library

Avoiding Common Forecasting Model Pitfalls

6m • Unknown Channel • ai-ml • tutorial • beginner • Watch on YouTube ↗

Key Points

The video outlines three common forecasting pitfalls, focusing first on **under‑fitting**, where an overly simple model fails to capture the true relationship between inputs and outputs, resulting in high bias and low variance.
To remedy under‑fitting, the presenter suggests **reducing regularization**, **adding more training data**, and **enhancing feature selection** to introduce stronger, more relevant predictors.
The second pitfall discussed is **over‑fitting**, which occurs when a model is too tightly tuned to the training data, mistaking noise for signal and leading to low error on training data but poor performance on unseen data.
Over‑fitting often arises from **over‑correcting under‑fitting** and is characterized by very low bias but high variance, making it harder to detect than under‑fitting.
The overall message emphasizes balancing model complexity and regularization, using sufficient data and appropriate features, to achieve reliable, generalizable forecasts.

Sections

00:00:00 Avoiding Underfitting in Forecasting - The speaker defines underfitting as a bias‑heavy, low‑variance issue caused by overly simple models, shows how to spot it in training results, and recommends increasing model complexity or reducing regularization to improve predictive performance.

Full Transcript

# Avoiding Common Forecasting Model Pitfalls **Source:** [https://www.youtube.com/watch?v=0RT2Q0qwXSA](https://www.youtube.com/watch?v=0RT2Q0qwXSA) **Duration:** 00:06:48 ## Summary - The video outlines three common forecasting pitfalls, focusing first on **under‑fitting**, where an overly simple model fails to capture the true relationship between inputs and outputs, resulting in high bias and low variance. - To remedy under‑fitting, the presenter suggests **reducing regularization**, **adding more training data**, and **enhancing feature selection** to introduce stronger, more relevant predictors. - The second pitfall discussed is **over‑fitting**, which occurs when a model is too tightly tuned to the training data, mistaking noise for signal and leading to low error on training data but poor performance on unseen data. - Over‑fitting often arises from **over‑correcting under‑fitting** and is characterized by very low bias but high variance, making it harder to detect than under‑fitting. - The overall message emphasizes balancing model complexity and regularization, using sufficient data and appropriate features, to achieve reliable, generalizable forecasts. ## Sections - [00:00:00](https://www.youtube.com/watch?v=0RT2Q0qwXSA&t=0s) **Avoiding Underfitting in Forecasting** - The speaker defines underfitting as a bias‑heavy, low‑variance issue caused by overly simple models, shows how to spot it in training results, and recommends increasing model complexity or reducing regularization to improve predictive performance. ## Full Transcript

0:00if your forecasting model looked like 0:02this 0:03but in reality the numbers came out 0:05looking more like this 0:07well you might be asking yourself what 0:09went wrong 0:11we're going to look at three common data 0:13model forecasting pitfalls to understand 0:16what they are 0:17why they happen 0:18and how to avoid them so first up number 0:21one that is called 0:24under 0:25fitting 0:28now this is a scenario in data science 0:30where a data model is unable to capture 0:32the relationship between the input and 0:34the output variables accurately and 0:37under fitting usually occurs when a 0:38model is too simple it just cannot 0:40establish the dominant trend within the 0:42data and if a model can't generalize 0:45well to new data 0:47then it's not going to do a very good 0:48job with prediction tasks you'll get bad 0:51forecasting models now an optimal data 0:54model in this training data set it might 0:57look something 0:59roughly like that 1:01an underfit model well that will look 1:03more like 1:05this 1:06there's high bias here and low variance 1:10see how straight that line is and that 1:12is a good indicator that you have under 1:15fitting now fortunately under fitting is 1:17usually quite easy to spot it shows up 1:20even when modeling the training data set 1:22to fix it we need to better establish 1:25the dominant relationship between the 1:26input and the output variables at the 1:28onset to build a better fitting and 1:31likely a little bit more complex model 1:33now three ways that you can do that one 1:35of those 1:37is to decrease 1:39something called 1:41regular 1:42regularization 1:47now this really means that we're going 1:49to let the model be a little more free 1:52and how it draws its relationships 1:53between inputs and outputs and there are 1:55a number of methods like l1 1:57regularization and lasso regularization 2:00which help to reduce the noise and 2:02outliers within a model 2:04second thing you could do is to increase 2:08your training 2:10data 2:11stopping training too soon is a common 2:13cause of underfitting 2:15more data can lead to a better fitting 2:17model and finally the other thing you 2:19might want to consider 2:20is called feature 2:23selection 2:26now this is used in any model where we 2:30want to take specific features to 2:32determine a given outcome and if there 2:34are not enough predictive features 2:36present then more features or features 2:40just of greater importance should be 2:42introduced that's what feature selection 2:44is 2:44okay 2:45number two we've had underfitting number 2:48two is over 2:51fitting 2:54the 2:55thing with overfitting is it occurs when 2:58a statistical model fits exactly exactly 3:01against its training data and when this 3:03happens the algorithm can't perform 3:05accurately against unseen data which 3:08well rather defeats its purpose oh and 3:11here's the rub with overfitting it can 3:14be caused by addressing 3:16underfitting a little too aggressively 3:19now an overfit model it might look a bit 3:23more 3:24like 3:26this 3:29it has a low error rate and a very high 3:32variance this is anything but straight 3:34now here the model is just so well tuned 3:37to the training data it's mistaken the 3:39noise or some of the irrelevant 3:41information from the training data set 3:43as the signal now unlike under fitting 3:46overfitting isn't always so easy to 3:48initially detect and to find it we need 3:51to test we can test for model fitness 3:54using techniques like k-fold 3:56cross-validation and that splits 3:59training data into equally sized subsets 4:01they're called folds and that results in 4:04an evaluation score for your model 4:07now to prevent overfitting you might 4:09want to consider some of the following 4:11techniques 4:12one 4:14is data 4:16augmentation 4:18while it is better generally to inject 4:21clean relevant information into your 4:23training data set sometimes a little bit 4:25of noisy data is added to make the model 4:27a little bit more stable 4:30another method is 4:32ensembl methods 4:36well some more methods are made up of a 4:38set of classifiers and their predictors 4:40are aggregated to identify the most 4:42popular result bagging is one such 4:45method where multiple models are trained 4:47in parallel on different subsets of data 4:50and then 4:51option number three something called 4:53early 4:55stopping 4:56can be used and this method seeks to 4:59pause training before the model starts 5:02learning the noise within that model of 5:04training data 5:05of course you do need to be careful not 5:07to stop too soon or you'll be dealing 5:09with a bad case of 5:11underfitting 5:13okay so finally in addition to 5:15underfitting and overfitting another 5:18common problem is number three 5:21bad 5:23data 5:26now this is data that is incorrect or 5:29irrelevant or incomplete and bad 5:32training data can lead to higher error 5:34rates and biased decision-making even 5:36when the underlying model is sound 5:39data forecasting models are only as 5:41effective as the data they're trained on 5:43now some tips to avoid bad data 5:46firstly you want to ensure that your 5:48data is accurate and complete by 5:50performing cross 5:53checking 5:55and by that i mean cross checking it 5:57against other data sources 5:59another thing to do is if we think about 6:04outliers rid of them 6:07so some outliers can really skew results 6:09they can take an obscure thing and put 6:12it into a training data set and make 6:14that situation seem more likely to occur 6:16again than it really is likely to 6:19and then finally you need to make sure 6:21that this data 6:23is timely 6:26outdated data 6:28is bad data 6:29now keep these three things in mind and 6:31you should be well on your way to 6:33developing better fitting models 6:37if you have any questions please drop us 6:39a line below and if you want to see more 6:41videos like this in the future please 6:43like and subscribe 6:45thanks for watching