2. How Do Machine Learning Models Get Trained? Ankit Tomar, June 19, 2025June 10, 2025 So far, we’ve talked about what machine learning models do at a high level—they take in historical data, learn patterns, and help us make predictions. But how exactly does a machine learning model get trained, tested, and prepared for the real world? Let’s walk through that journey step by step. The Basic Flow The typical machine learning pipeline looks like this: Data → Model Training → Model Validation → Deployment Each step plays a crucial role, but in this post, I’ll focus on the training and testing processes and share some real-world insights from my experience. 1. Training: How Does a Model Learn? Training a machine learning model means teaching it to recognize patterns in historical data. Here’s what this involves: Step 1: Data Preparation Before training starts, data needs to be cleaned, transformed, and processed. Missing values, outliers, and inconsistent formatting need to be handled. Garbage in, garbage out—as they say. Step 2: Feature Engineering This is where we derive new variables (called features) from the existing data to help the model learn better. For example, converting a date of birth into age, or creating ratios, differences, or flags from raw variables. Step 3: Algorithm Selection Depending on the problem you’re trying to solve (e.g., classification vs. regression), you choose an appropriate algorithm—like Linear Regression, Decision Trees, Random Forests, or Gradient Boosting. Step 4: Model Training Now comes the actual training part. You pass the processed data into the algorithm. The model analyzes the data, learns patterns, and adjusts internal parameters to minimize errors. And just like that—you have a trained model! But hold on. That’s only half the job done. 2. Testing: Is the Model Really Working? Once a model is trained, it needs to be validated and tested to check if it performs well outside the training environment. There are two main steps here: A. Model Validation Validation happens during training. It helps tune the model by checking its performance on a validation set (a subset of the training data that wasn’t used in training itself). This step helps with: Hyperparameter tuning Avoiding overfitting Early stopping (if needed) B. Model Testing Testing is the real checkpoint. This is done on a separate dataset that the model has never seen before. Think of it as a simulation of how the model will behave in the real world. It gives a true estimate of model performance. You feed in the test data and evaluate the predictions using suitable performance metrics like: Accuracy / Precision / Recall / F1-Score (for classification) R² / MAE / MAPE (for regression) Real-World Tips from Experience Here are a few insights I’ve gathered over years of working in real-world ML projects: ✅ Model Training & Testing is Not the Hardest Part The bulk of your time will go into data collection, data understanding, and feature engineering. Training a model is often the most straightforward step if everything else is done well. ✅ Work Closely with Business Teams The more time you spend with the domain experts and stakeholders early on, the fewer iterations you’ll need later. You’ll be able to encode business logic and hypotheses into your features up front—saving time and confusion. ✅ Beware of Overperforming Models If your model is showing exceptionally high performance early on—be skeptical. It often means there’s data leakage (i.e., your model is learning from information it shouldn’t have access to, like future data or target variables). Always review your feature selection process when this happens. In Summary Training and testing a machine learning model isn’t just about running algorithms—it’s about building the right foundation through clean data, strong features, and clear business alignment. A well-trained model that’s tested correctly doesn’t just perform well statistically—it delivers real value when deployed. Post Views: 71 Machine Learning ML
Machine Learning 7. Model Metrics – Classification June 24, 2025June 24, 2025 Let’s talk about a topic that often gets underestimated — classification metrics in machine learning. I know many of you are eager to dive into LLMs and the shiny new world of GenAI. But here’s the truth: without building a strong foundation in traditional ML, your understanding of advanced systems… Read More
Machine Learning CatBoost – An Algorithm you need July 2, 2025July 3, 2025 Hi there! In this post, we’ll explore CatBoost in depth — what it is, why it was created, how it works internally (including symmetric trees, ordered boosting, and ordered target statistics), and guidance on when to use or avoid it. 🐈 What is CatBoost? CatBoost is a gradient boosting library… Read More
Career 10 Real Ways to Get Better at Data Science & AI June 13, 2025June 6, 2025 Over the past decade, I’ve built countless models, launched data products, and worked across geographies in the field of data science and AI. One thing that stands out to me is the wide skill gap among data science professionals. While many are good at the core task—model development—most fall short… Read More