Learning Library

← Back to Library

Avoiding Common Forecasting Model Pitfalls

Key Points

  • The video outlines three common forecasting pitfalls, focusing first on **under‑fitting**, where an overly simple model fails to capture the true relationship between inputs and outputs, resulting in high bias and low variance.
  • To remedy under‑fitting, the presenter suggests **reducing regularization**, **adding more training data**, and **enhancing feature selection** to introduce stronger, more relevant predictors.
  • The second pitfall discussed is **over‑fitting**, which occurs when a model is too tightly tuned to the training data, mistaking noise for signal and leading to low error on training data but poor performance on unseen data.
  • Over‑fitting often arises from **over‑correcting under‑fitting** and is characterized by very low bias but high variance, making it harder to detect than under‑fitting.
  • The overall message emphasizes balancing model complexity and regularization, using sufficient data and appropriate features, to achieve reliable, generalizable forecasts.

Full Transcript

# Avoiding Common Forecasting Model Pitfalls **Source:** [https://www.youtube.com/watch?v=0RT2Q0qwXSA](https://www.youtube.com/watch?v=0RT2Q0qwXSA) **Duration:** 00:06:48 ## Summary - The video outlines three common forecasting pitfalls, focusing first on **under‑fitting**, where an overly simple model fails to capture the true relationship between inputs and outputs, resulting in high bias and low variance. - To remedy under‑fitting, the presenter suggests **reducing regularization**, **adding more training data**, and **enhancing feature selection** to introduce stronger, more relevant predictors. - The second pitfall discussed is **over‑fitting**, which occurs when a model is too tightly tuned to the training data, mistaking noise for signal and leading to low error on training data but poor performance on unseen data. - Over‑fitting often arises from **over‑correcting under‑fitting** and is characterized by very low bias but high variance, making it harder to detect than under‑fitting. - The overall message emphasizes balancing model complexity and regularization, using sufficient data and appropriate features, to achieve reliable, generalizable forecasts. ## Sections - [00:00:00](https://www.youtube.com/watch?v=0RT2Q0qwXSA&t=0s) **Avoiding Underfitting in Forecasting** - The speaker defines underfitting as a bias‑heavy, low‑variance issue caused by overly simple models, shows how to spot it in training results, and recommends increasing model complexity or reducing regularization to improve predictive performance. ## Full Transcript
0:00if your forecasting model looked like 0:02this 0:03but in reality the numbers came out 0:05looking more like this 0:07well you might be asking yourself what 0:09went wrong 0:11we're going to look at three common data 0:13model forecasting pitfalls to understand 0:16what they are 0:17why they happen 0:18and how to avoid them so first up number 0:21one that is called 0:24under 0:25fitting 0:28now this is a scenario in data science 0:30where a data model is unable to capture 0:32the relationship between the input and 0:34the output variables accurately and 0:37under fitting usually occurs when a 0:38model is too simple it just cannot 0:40establish the dominant trend within the 0:42data and if a model can't generalize 0:45well to new data 0:47then it's not going to do a very good 0:48job with prediction tasks you'll get bad 0:51forecasting models now an optimal data 0:54model in this training data set it might 0:57look something 0:59roughly like that 1:01an underfit model well that will look 1:03more like 1:05this 1:06there's high bias here and low variance 1:10see how straight that line is and that 1:12is a good indicator that you have under 1:15fitting now fortunately under fitting is 1:17usually quite easy to spot it shows up 1:20even when modeling the training data set 1:22to fix it we need to better establish 1:25the dominant relationship between the 1:26input and the output variables at the 1:28onset to build a better fitting and 1:31likely a little bit more complex model 1:33now three ways that you can do that one 1:35of those 1:37is to decrease 1:39something called 1:41regular 1:42regularization 1:47now this really means that we're going 1:49to let the model be a little more free 1:52and how it draws its relationships 1:53between inputs and outputs and there are 1:55a number of methods like l1 1:57regularization and lasso regularization 2:00which help to reduce the noise and 2:02outliers within a model 2:04second thing you could do is to increase 2:08your training 2:10data 2:11stopping training too soon is a common 2:13cause of underfitting 2:15more data can lead to a better fitting 2:17model and finally the other thing you 2:19might want to consider 2:20is called feature 2:23selection 2:26now this is used in any model where we 2:30want to take specific features to 2:32determine a given outcome and if there 2:34are not enough predictive features 2:36present then more features or features 2:40just of greater importance should be 2:42introduced that's what feature selection 2:44is 2:44okay 2:45number two we've had underfitting number 2:48two is over 2:51fitting 2:54the 2:55thing with overfitting is it occurs when 2:58a statistical model fits exactly exactly 3:01against its training data and when this 3:03happens the algorithm can't perform 3:05accurately against unseen data which 3:08well rather defeats its purpose oh and 3:11here's the rub with overfitting it can 3:14be caused by addressing 3:16underfitting a little too aggressively 3:19now an overfit model it might look a bit 3:23more 3:24like 3:26this 3:29it has a low error rate and a very high 3:32variance this is anything but straight 3:34now here the model is just so well tuned 3:37to the training data it's mistaken the 3:39noise or some of the irrelevant 3:41information from the training data set 3:43as the signal now unlike under fitting 3:46overfitting isn't always so easy to 3:48initially detect and to find it we need 3:51to test we can test for model fitness 3:54using techniques like k-fold 3:56cross-validation and that splits 3:59training data into equally sized subsets 4:01they're called folds and that results in 4:04an evaluation score for your model 4:07now to prevent overfitting you might 4:09want to consider some of the following 4:11techniques 4:12one 4:14is data 4:16augmentation 4:18while it is better generally to inject 4:21clean relevant information into your 4:23training data set sometimes a little bit 4:25of noisy data is added to make the model 4:27a little bit more stable 4:30another method is 4:32ensembl methods 4:36well some more methods are made up of a 4:38set of classifiers and their predictors 4:40are aggregated to identify the most 4:42popular result bagging is one such 4:45method where multiple models are trained 4:47in parallel on different subsets of data 4:50and then 4:51option number three something called 4:53early 4:55stopping 4:56can be used and this method seeks to 4:59pause training before the model starts 5:02learning the noise within that model of 5:04training data 5:05of course you do need to be careful not 5:07to stop too soon or you'll be dealing 5:09with a bad case of 5:11underfitting 5:13okay so finally in addition to 5:15underfitting and overfitting another 5:18common problem is number three 5:21bad 5:23data 5:26now this is data that is incorrect or 5:29irrelevant or incomplete and bad 5:32training data can lead to higher error 5:34rates and biased decision-making even 5:36when the underlying model is sound 5:39data forecasting models are only as 5:41effective as the data they're trained on 5:43now some tips to avoid bad data 5:46firstly you want to ensure that your 5:48data is accurate and complete by 5:50performing cross 5:53checking 5:55and by that i mean cross checking it 5:57against other data sources 5:59another thing to do is if we think about 6:04outliers rid of them 6:07so some outliers can really skew results 6:09they can take an obscure thing and put 6:12it into a training data set and make 6:14that situation seem more likely to occur 6:16again than it really is likely to 6:19and then finally you need to make sure 6:21that this data 6:23is timely 6:26outdated data 6:28is bad data 6:29now keep these three things in mind and 6:31you should be well on your way to 6:33developing better fitting models 6:37if you have any questions please drop us 6:39a line below and if you want to see more 6:41videos like this in the future please 6:43like and subscribe 6:45thanks for watching