Bagging Vs Boosting

Samundeeswari

 

                           Bagging Vs Boosting

              We all apply the Decision Tree technique in our daily lives to make decisions. Similarly, organizations leverage this supervised machine learning method, such as decision trees, to enhance decision-making processes, increase efficiency, and generate higher profits and surplus. 

Ensemble:

            Ensemble methods combine multiple decision trees to achieve more accurate predictive outcomes compared to using a single decision tree. The key idea behind ensemble models is that a collection of weak learners work together to create a stronger, more effective learner.

Below are two techniques commonly used to implement ensemble decision tree methods

Bagging:  

                Bagging is applied when the goal is to reduce the variance of a decision tree. In this approach, several subsets of data are randomly created from the training sample with replacement. Each subset is then used to train its own decision tree, resulting in an ensemble of different models. The final prediction is derived by averaging the outputs from all the trees, which typically produces more robust results than relying on a single decision tree.

Random Forest:

                 Random Forest builds upon the bagging technique by adding an extra step in the prediction process. In addition to creating random subsets of data, it randomly selects a subset of features for each tree, rather than using all features. The collection of these numerous randomly generated trees is known as a Random Forest.

The following steps are involved in implementing a Random Forest:

 →      Consider a training dataset with X observations and Y features. First, a random                        sample of the training data is selected with replacement to create the initial model.

 →       The tree is expanded to its fullest depth.

 →      The process is repeated multiple times, and the final prediction is made based on the                aggregated predictions from the ensemble of trees.

Advantages of using Random Forest technique:

 →     It handles high-dimensional datasets very effectively.

 
 →     It handles missing values effectively and maintains accuracy even with incomplete                   data.

Disadvantages of using Random Forest technique:

            Since the final prediction is based on the average of predictions from the subset trees, it may not provide an exact value for the regression model.

Boosting:

           Boosting is another ensemble technique that creates a collection of predictors. In this approach, consecutive trees are fitted, often using random samples, and at each step, the goal is to address the errors from the previous trees.

If a given input is misclassified by the model, its weight is increased, making it more likely that the next model will classify it correctly. By combining all the models at the end, this process turns weak models into stronger ones.


Gradient Boosting

 It is an improved version of the boosting process.

Gradient Boosting is the combination of Gradient Descent and Boosting.

Gradient Boosting uses a gradient descent algorithm to optimize any differentiable loss function. It builds an ensemble of trees one by one, with each new tree added to reduce the error (the difference between actual and predicted values) from the previous trees.

Benefits of applying gradient boosting techniques:


It is compatible with many loss functions.

It functions well in group settings


Using a gradient boosting approach has the following drawbacks:


A careful adjustment of the various hyper-parameters is necessary.










Our website uses cookies to enhance your experience. Learn More
Accept !

GocourseAI

close
send