Gradient Boosting Ankit Tomar, July 1, 2025July 1, 2025 As we continue our journey into ML algorithms, in this post, we’ll go deeper into gradient boosting — how it works, what’s happening behind the scenes mathematically, and why it performs so well. 🌟 What is gradient boosting? Gradient boosting is an ensemble method where multiple weak learners (usually shallow decision trees) are combined sequentially. Each new tree corrects the errors (residuals) of the combined previous trees. 🧠 How does it actually work? Initial prediction: Start with a simple model, like predicting the mean target value. Compute residuals: Find the difference between true values and current predictions. Fit a new tree: Train a tree to predict these residuals (i.e., the model’s mistakes). Update: Add this new tree’s output to the current prediction, scaled by a learning rate. Repeat: Build many such trees iteratively. The final prediction is the sum of all trees. 🧮 Why is it called “gradient” boosting? At each step, instead of just predicting residuals, the algorithm fits to the negative gradient of the loss function (how error changes as predictions change). This is a form of numerical optimization: we take steps in the direction that most quickly reduces error. For example, with mean squared error (MSE): The negative gradient is simply the residuals (actual – predicted). But for log loss (classification), the gradient is different. This makes gradient boosting very flexible — it can optimize almost any differentiable loss function. ✏️ How does it pick the best split in each tree? When building each tree: For each feature and threshold, it computes how much splitting at that point reduces the chosen loss (e.g., MSE or log loss). It picks the split with the highest improvement. Efficient calculation: Libraries like XGBoost and LightGBM use clever tricks (histograms, sampling) to make this faster even with large datasets. 📐 Formulas that help in interviews Gini impurity: Entropy: In regression, the typical objective is to minimize mean squared error: And the negative gradient tells us how to adjust predictions to reduce this error. ⚙️ Why is gradient boosting powerful? Focuses learning on hard-to-predict data. Works with different loss functions. Builds complex nonlinear models. Can handle numerical and categorical data. But it can overfit, so tuning is essential. 🛡️ How to control overfitting Reduce tree depth. Use lower learning rate. Add subsampling (random rows or columns). Add regularization like shrinkage. We will discuss XGboost, Catboost and LightGBM in upcoming blogs. Post Views: 239 Machine Learning
Machine Learning 3. Validating a Machine Learning Model: Why It Matters and How to Do It Right June 20, 2025June 10, 2025 Validating a machine learning model is one of the most critical steps in the entire ML lifecycle. After all, you want to be sure your model is doing what it’s supposed to—performing well, generalizing to new data, and delivering real-world business impact. In this post, let’s explore what model validation… Read More
Machine Learning 4. How to Make a Machine Learning Model Live June 21, 2025June 9, 2025 So far, we’ve discussed how to train, test, and evaluate machine learning models. In this blog, let’s talk about the final—but one of the most important—steps: model deployment. You’ve built a great model. Now what? The real value of any machine learning (ML) model is unlocked only when it’s used… Read More
Career 10 Real Ways to Get Better at Data Science & AI June 13, 2025June 6, 2025 Over the past decade, I’ve built countless models, launched data products, and worked across geographies in the field of data science and AI. One thing that stands out to me is the wide skill gap among data science professionals. While many are good at the core task—model development—most fall short… Read More