How To Handle Missing Data Pandas DataFrame

How To Handle Missing Data Pandas DataFrame

In the real world, a dataset comes with some sort of missing value and this missing value is not good for our dataset. So in this article, we learn how to remove missing data from our pandas dataframe. Keep reading💥

Load Dataset Pandas DataFrame

🔥
Download this dataset – Car Dataset 🚗

Our dataset is CSV format, so i am using pandas read_csv method read this dataset. If you learn more about the in-depth Pandas library read our this end-to-end guide.

import pandas as pd
car_india_dataset = pd.read_csv("drive/MyDrive/Dataset/Cars_India_dataset.csv")
car_india_dataset.head()

Google colab output pandas dataframe

Check Missing Value Pandas DataFrame

Checking the missing values in pandas is very easy because one method has achieved this goal.

car_india_dataset.isnull() # Detect missing values dataframe.

You will notice 💡 that our dataset shows False and True because when you apply isnull ( ) method this method returns the Boolean value. In simple terms True means that it is a null value False mean doesn’t have a null value.

Google colab output pandas dataframe

Drop All Missing Values DataFrame

car_india_dataset = car_india_dataset.dropna()
car_india_dataset

Note 🔥 : Our dataFrame actual 156 rows but when we apply dropna( ) method that time we see 127 rows. Because the dropna( ) method removes all null values.

pandas data frame

Second Method

$$👇$$

Replace Null Value Pandas DataFrame

# Read car dataset 👇

import pandas as pd
car_india_dataset = pd.read_csv("drive/MyDrive/Dataset/Cars_India_dataset.csv")
car_india_dataset.head()

pandas data frame

Now it’s time to fill in all the missing values in our dataFrame and this time we are using pandas fillna ( ) method

car_india_dataset = car_india_dataset.fillna("unknown")
car_india_dataset

pandas data frame

You can see our dataset under have an unknown word everywhere.

If you use the fillna( ) method, make sure to fill in the expected value. But my suggestion is that when you use this method that time only fills each column not the other.

Because different columns have different values and not the same dataType [ int, string, float ] . See the code below on how you can use this method correctly.

$$👇$$

# read the dataset file 
import pandas as pd
car_india_dataset = pd.read_csv("drive/MyDrive/Dataset/Cars_India_dataset.csv")
car_india_dataset

pandas data frame

car_india_dataset["Displacement"] = car_india_dataset['Displacement'].fillna(3000)
car_india_dataset

pandas data frame missing value

You will notice in the Displacement column that all null values are replaced with 3000.

🔥
Thanks for reading! I hope you found this helpful article... If you have any questions, please ask me in the comments below. I will do my best to answer all of your questions. You can also write any suggestions for me. To learn more about the Pandas Library end-to-end, read this article.

Did you find this article valuable?

Support Hi 👋 by becoming a sponsor. Any amount is appreciated!

Â