Apponix Technologies
POPULAR COURSES
Master Programs
Career Career Career Career

How to Handle Missing Data: Best Practices in Data Preparation

Published By: Apponix Academy

Published on: 14 Jul 2025

How to Handle Missing Data: Best Practices in Data Preparation

 

Table of contents

1. Why Should You Care About Missing Data?

2. Why Does Data Go Missing?

3. Different Types of Missing Data

  1. Missing Completely at Random (MCAR)

  2. Missing at Random (MAR)

  3. Missing Not at Random (MNAR)

4. How to Handle Missing Data: Easy Techniques

  1. Removing Missing Data

  2. Filling Missing Data (Imputation)

  3. Using Algorithms That Handle Missing Data

  4. Creating a New Category for Missing Data

  5. Adding a Missing Data Flag

5. Simple Example Using Python (pandas)

6. Best Tips for Handling Missing Data

7. Mistakes to Avoid

8. Tools You Can Use

9. Final Thoughts

 

Have you ever looked at a dataset and noticed some empty boxes or blank spaces? That’s called missing data. It happens often and can create problems in your analysis or machine learning models.

In this blog by Apponix Academy, let’s learn how to handle missing data easily, using simple language and practical tips. By the end, you’ll feel confident cleaning your data for any project.

Why Should You Care About Missing Data?

Data science

You might think, “Why does it matter if some data is missing?”

Here’s why:

That’s why data scientists spend a lot of time fixing missing data before any analysis.

Why Does Data Go Missing?

Data can be missing for many reasons:

Knowing why data is missing helps you decide how to handle it properly.

Different Types of Missing Data

Here are the three types of missing data:

1. Missing Completely at Random (MCAR)

The data is missing by chance. For example, if a sensor stops working randomly and skips recording temperature for one hour.

2. Missing at Random (MAR)

The missing data is related to some other data in your dataset. For example, income is missing, but you know their job title, which can help you guess it.

3. Missing Not at Random (MNAR)

The missing data is missing for a reason related to itself. For example, people with very high incomes may not want to reveal their salary.

How to Handle Missing Data: Easy Techniques

1. Removing Missing Data

If only a few rows or columns have missing data, you can simply remove them.

But remember, don’t remove too much data. You might lose important information.

2. Filling Missing Data (Imputation)

If you don’t want to remove data, you can fill it with other values. This is called imputation.

a. Mean or Median

For numbers, you can fill in missing values with:

Example: If someone’s age is missing, fill it with the average or median age of the group.

b. Mode

For categories like city or gender, fill in missing values with the most common value.

Example: If many people live in Bangalore and one entry is missing the city, fill it with Bangalore.

c. Forward Fill / Backward Fill

For time-based data like stock prices, use:

d. Constant Value

Sometimes, you can fill missing values with a constant like:

e. Predicting Missing Values

This is an advanced method where you use other data to predict missing values. For example, using a regression model or KNN (nearest neighbors). It takes extra effort but gives better results.

3. Using Algorithms That Handle Missing Data

Some machine learning algorithms, like XGBoost, can handle missing data on their own. But it’s still better to clean the data yourself for better control.

4. Creating a New Category for Missing Data

For categories, you can create a new value called “Missing” or “Unknown”. This way, you keep the data and let your model know it was missing.

5. Adding a Missing Data Flag

Create a new column showing if data was missing (1) or not (0). This helps your model learn patterns related to missing data.

Simple Example Using Python (pandas)

Here’s how you can handle missing data using Python:

python

CopyEdit

import pandas as pd

 

# Load data

df = pd.read_csv('data.csv')

 

# Check missing values

print(df.isnull().sum())

 

# Fill missing ages with median

df['Age'].fillna(df['Age'].median(), inplace=True)

 

# Fill missing city with mode

df['City'].fillna(df['City'].mode()[0], inplace=True)

 

# Drop rows where purchase is missing

df.dropna(subset=['Purchase'], inplace=True)

In the data science course in Bangalore by Apponix Academy, you will practice such techniques with real datasets.

Best Tips for Handling Missing Data

Mistakes to Avoid

  1. Ignoring missing data completely

  2. Removing too many rows and losing useful data

  3. Using the mean for skewed data instead of the median

  4. Filling the target (output) variable with guesses – never do this

Tools You Can Use

Here are some tools that make handling missing data easier:

At Apponix Academy, we teach these tools step by step to build your practical confidence.

Final Thoughts

Missing data is a normal part of working with real-world data. Don’t be scared of it. Just remember:

If you want to learn data cleaning, data preparation, and data science skills from scratch, join the data science course in Bangalore offered by Apponix Academy. You will practice these techniques with expert guidance and real projects.

Key Takeaways

Apponix Academy

Apponix Academy