Skip to content
Ankit Tomar
Ankit Tomar

AI Products

  • AIML
  • Product Management
  • Interview Prep
    • Data Science Interview Questions and Answers
  • Books
  • Blog
    • Generic
    • GenAI
    • Data Pipeline
    • Education
    • Cloud
    • Working in Netherlands
  • About Me
Schedule
Ankit Tomar

AI Products

2. How Do Machine Learning Models Get Trained?

Ankit Tomar, June 19, 2025June 10, 2025

So far, we’ve talked about what machine learning models do at a high level—they take in historical data, learn patterns, and help us make predictions. But how exactly does a machine learning model get trained, tested, and prepared for the real world?

Let’s walk through that journey step by step.


The Basic Flow

The typical machine learning pipeline looks like this:

Data → Model Training → Model Validation → Deployment

Each step plays a crucial role, but in this post, I’ll focus on the training and testing processes and share some real-world insights from my experience.


1. Training: How Does a Model Learn?

Training a machine learning model means teaching it to recognize patterns in historical data. Here’s what this involves:

Step 1: Data Preparation

Before training starts, data needs to be cleaned, transformed, and processed. Missing values, outliers, and inconsistent formatting need to be handled. Garbage in, garbage out—as they say.

Step 2: Feature Engineering

This is where we derive new variables (called features) from the existing data to help the model learn better. For example, converting a date of birth into age, or creating ratios, differences, or flags from raw variables.

Step 3: Algorithm Selection

Depending on the problem you’re trying to solve (e.g., classification vs. regression), you choose an appropriate algorithm—like Linear Regression, Decision Trees, Random Forests, or Gradient Boosting.

Step 4: Model Training

Now comes the actual training part. You pass the processed data into the algorithm. The model analyzes the data, learns patterns, and adjusts internal parameters to minimize errors.

And just like that—you have a trained model! But hold on. That’s only half the job done.


2. Testing: Is the Model Really Working?

Once a model is trained, it needs to be validated and tested to check if it performs well outside the training environment.

There are two main steps here:

A. Model Validation

Validation happens during training. It helps tune the model by checking its performance on a validation set (a subset of the training data that wasn’t used in training itself). This step helps with:

  • Hyperparameter tuning
  • Avoiding overfitting
  • Early stopping (if needed)

B. Model Testing

Testing is the real checkpoint. This is done on a separate dataset that the model has never seen before. Think of it as a simulation of how the model will behave in the real world. It gives a true estimate of model performance.

You feed in the test data and evaluate the predictions using suitable performance metrics like:

  • Accuracy / Precision / Recall / F1-Score (for classification)
  • R² / MAE / MAPE (for regression)

Real-World Tips from Experience

Here are a few insights I’ve gathered over years of working in real-world ML projects:

✅ Model Training & Testing is Not the Hardest Part

The bulk of your time will go into data collection, data understanding, and feature engineering. Training a model is often the most straightforward step if everything else is done well.

✅ Work Closely with Business Teams

The more time you spend with the domain experts and stakeholders early on, the fewer iterations you’ll need later. You’ll be able to encode business logic and hypotheses into your features up front—saving time and confusion.

✅ Beware of Overperforming Models

If your model is showing exceptionally high performance early on—be skeptical. It often means there’s data leakage (i.e., your model is learning from information it shouldn’t have access to, like future data or target variables).

Always review your feature selection process when this happens.


In Summary

Training and testing a machine learning model isn’t just about running algorithms—it’s about building the right foundation through clean data, strong features, and clear business alignment.

A well-trained model that’s tested correctly doesn’t just perform well statistically—it delivers real value when deployed.

Loading

Post Views: 71
Machine Learning ML

Post navigation

Previous post
Next post

Related Posts

Machine Learning

7. Model Metrics – Classification

June 24, 2025June 24, 2025

Let’s talk about a topic that often gets underestimated — classification metrics in machine learning. I know many of you are eager to dive into LLMs and the shiny new world of GenAI. But here’s the truth: without building a strong foundation in traditional ML, your understanding of advanced systems…

Loading

Read More
Machine Learning

CatBoost – An Algorithm you need

July 2, 2025July 3, 2025

Hi there! In this post, we’ll explore CatBoost in depth — what it is, why it was created, how it works internally (including symmetric trees, ordered boosting, and ordered target statistics), and guidance on when to use or avoid it. 🐈 What is CatBoost? CatBoost is a gradient boosting library…

Loading

Read More
Career

10 Real Ways to Get Better at Data Science & AI

June 13, 2025June 6, 2025

Over the past decade, I’ve built countless models, launched data products, and worked across geographies in the field of data science and AI. One thing that stands out to me is the wide skill gap among data science professionals. While many are good at the core task—model development—most fall short…

Loading

Read More

Search

Ankit Tomar

AI product leader, Amsterdam

Archives

  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • December 2024
  • August 2024
  • July 2024
Tweets by ankittomar_ai
©2025 Ankit Tomar | WordPress Theme by SuperbThemes