Random Forest Algorithm FAQ

is widely known in the machine learning industry, there are still a lot of people who are not aware of or familiar with its functionality.

Let’s take a look at the most frequently asked questions (FAQ) for the random forest algorithm.

What is the Random Forest Algorithm?

The random forest algorithm is a technique used by machines to make decisions based on large datasets. These datasets are analyzed by many decision trees which, as a group, come up with the most popular answer to the question or scenario it’s working on.

How Does the Algorithm Work?

It picks a determined amount of records from a dataset and builds a decision tree model based on them. Each tree then predicts an output value, and each of those values is taken to calculate an average, which will then determine the final value of the random forest algorithm.

Can the Algorithm be Used for Categorical and Continuous Target Variables?

The short answer is yes. The combination of decision trees in a random forest model allows the system to use classification models for categorical dependent variables and regression models for numeric dependent variables.

What is Bagging in the Random Forest Algorithm?


In machine learning, the term bagging refers to bootstrap-aggregating. This happens when datasets are generated and choose a sample of data points with replacements from the original dataset. When they sample with replacements, it means that some data points can be repeated in each of the new datasets being trained.

Why Should a Random Forest be Used Instead of a Decision Tree?

Decision trees can increase the bias in the results they produce. However, when decision trees are combined to make a single ensemble model or random forest, the variance is increased and thus the bias is reduced.

What is an Out-of-Bag (OOB) Error?

When an OOB is produced, it’s usually as part of a validation or testing dataset, which is calculated internally as the random forest algorithm runs.

What is a Bootstrap Sample?


A bootstrap sample is a sample of data points that is drawn with replacements, which implies that some of the samples are going to be used multiple times in a single tree.

How Does the Algorithm Know Which Features are Important?

Apart from making predictions, some organizations use random forests to determine how important a certain variable or rank is in a classification problem. When this is the case, the random forest will find the number of votes for the correct class in out-of-bag data for every tree in a random forest and conduct a series of calculations to determine which of the larger values are ranked as most important.

What are Some Advantages and Disadvantages of This Algorithm?

This algorithm has various advantages and disadvantages. For example, the random forest model is an unbiased and stable algorithm. When new data is introduced into the model, the new data points do not affect the entirety of the forest. It’s also able to perform without the entire values of the dataset present. On the other hand, the complexity of this model can be a disadvantage for some as more resources are needed when a large number of decision trees are combined into one forest.