Learning Library

← Back to Library

Decision Trees, Random Forests, Golf Choice

5m • Unknown Channel • ai-ml • tutorial • intermediate • Watch on YouTube ↗

Key Points

A simple decision‑tree example classifies “golf yes” vs. “golf no” based on time availability, weather, and having clubs, illustrating how sequential rules make predictions.
Individual decision trees can suffer from bias and over‑fitting, prompting the use of ensemble methods like Random Forests.
Random Forest builds many trees on random subsets of data and features, aggregating their votes to improve accuracy and to mitigate over‑fitting and bias.
Setting up a Random Forest involves tuning parameters such as node size, number of trees, and number of features, balancing predictive performance against training time and memory usage.

Sections

Full Transcript

# Decision Trees, Random Forests, Golf Choice **Source:** [https://www.youtube.com/watch?v=gkXX4h3qYm4](https://www.youtube.com/watch?v=gkXX4h3qYm4) **Duration:** 00:05:07 ## Summary - A simple decision‑tree example classifies “golf yes” vs. “golf no” based on time availability, weather, and having clubs, illustrating how sequential rules make predictions. - Individual decision trees can suffer from bias and over‑fitting, prompting the use of ensemble methods like Random Forests. - Random Forest builds many trees on random subsets of data and features, aggregating their votes to improve accuracy and to mitigate over‑fitting and bias. - Setting up a Random Forest involves tuning parameters such as node size, number of trees, and number of features, balancing predictive performance against training time and memory usage. ## Sections - [00:00:00](https://www.youtube.com/watch?v=gkXX4h3qYm4&t=0s) **Decision Tree Golf Example** - The speaker walks through a basic decision‑tree model for choosing whether to play golf, highlights its role as a binary classification task, and briefly introduces random forests as an ensemble approach to mitigate tree bias and overfitting. - [00:03:08](https://www.youtube.com/watch?v=gkXX4h3qYm4&t=188s) **Configuring Random Forest Parameters** - The speaker explains how to set node size, number of trees, and feature selection for a random forest, balances accuracy against training time and memory usage, illustrates diverse real‑world applications, and humorously lets the model decide whether to play golf. ## Full Transcript

0:00I just can't decide, should I play a round of golf today? 0:04Well, let's use this decision tree to make the decision. 0:09So first off, do I have the time? 0:13If I don't, well, then that's an easy decision. 0:18No golf. 0:20But let's say I do. 0:22Second decision point, is it sunny today? 0:27If there's sun, then I don't care about any other factor. I'm playing golf. 0:33If there's no sun, let's go down to the next level. 0:36Well, do I have my clubs with me? 0:38Do I have them handy? 0:40If I do not, then I'm not going to bother playing if it's not sunny. 0:45If I do, then I absolutely will. 0:51The decision tree here is an example of a classification problem 0:55where the class labels are "golf yes" and "golf no". 1:00And, while they're helpful, decision trees they can though be prone to problems. 1:05Things like bias and overfitting. 1:07But that is where something called "random forest" comes in to play. 1:18Random forest is a type of machine learning model that uses an ensemble of decision trees to make its predictions. 1:24And why do we call it random forest? 1:26Well, the reason is because it's actually built by taking a random sample of my data 1:31and then building an ongoing series of decision trees on the subsets. 1:35So we're essentially creating a whole bunch of decision trees together. 1:45And those give us a larger model or group. 1:49Look, the chances are that other people have built different and maybe better decision trees to answer the same question. 1:56Maybe those trees consider things like the time of day, which I didn't consider, or the difficulty of the course. 2:02The more decision trees that I use with different criteria, 2:05the better my random forest will perform because it's essentially increasing my prediction accuracy. 2:11And if one or two of these smaller decision trees are not relevant on a certain day, well, we just ignore them. 2:21One of the primary benefits of random forest is that it can help reduce overfitting. 2:34And this occurs when your model starts to memorize the data 2:37rather than trying to generalize from making predictions on future data. 2:41Essentially, it helps me get around the limitations of my data, 2:45which might not be fully representative of all golfers or all the best features in my model. 2:50It can also help reduce something else, and that's bias. 2:54Bias can occur when there is a certain degree of error introduced into the model. 2:59Bias occurs when you're not evenly splitting your instance space during training. 3:03So instead of seeing all of the data points, you might see only half because of how you set your model up. 3:08Now to set up a random forest, you will set some parameters. 3:16We have parameters for node size. 3:23We have parameters for number of trees. 3:30And we also have parameters for a number of features. 3:39And it can be challenging at first because you'll want to use a lot of trees, like as many as you can, 3:45to get the best predictive accuracy, but you don't want so many trees that it'll take you a long time to train the model 3:51and use a lot of memory space. 3:53But once you've set up these parameters, you'll use a random forest model to make predictions on your test data. 3:59And you can even segment or slice your results by different criteria. 4:03Maybe you want to know how your random forest does on certain types of golf courses 4:07or how it performs during different times of day. 4:10Random forest is pretty popular among data science professionals and with good reason. 4:15It can be extremely helpful in all sorts of classification problems. 4:20In finance, for example, it can be used to predict the likelihood of a default. In a medical diagnosis, 4:34it can be used to predict prognosis or survival rates depending on treatment options and in economics. 4:41It can be used to sort of help understand whether a policy is effective or ineffective. 4:47So, what do you think? 4:48Should I play golf today? 4:50Well, the sum of all my random forest decision trees say yes. 4:56I'll see you out on the course. 4:58If you have any questions, please drop us a line below, 5:01and if you want to see more videos like this in the future, please like and subscribe. 5:06Thanks for watching.