In the real world, a dataset comes with some sort of missing value and this missing value is not good for our dataset. So in this article, we learn how to remove missing data from our pandas dataframe. Keep reading💥
Load Dataset Pandas DataFrame
Our dataset is CSV format, so i am using pandas read_csv method read this dataset. If you learn more about the in-depth Pandas library read our this end-to-end guide.
import pandas as pd
car_india_dataset = pd.read_csv("drive/MyDrive/Dataset/Cars_India_dataset.csv")
car_india_dataset.head()
Check Missing Value Pandas DataFrame
Checking the missing values in pandas is very easy because one method has achieved this goal.
car_india_dataset.isnull() # Detect missing values dataframe.
You will notice 💡 that our dataset shows False and True because when you apply isnull ( )
method this method returns the Boolean value. In simple terms True means that it is a null value False mean doesn’t have a null value.
Drop All Missing Values DataFrame
car_india_dataset = car_india_dataset.dropna()
car_india_dataset
Note 🔥 : Our dataFrame actual 156 rows but when we apply dropna( )
method that time we see 127 rows. Because the dropna( )
method removes all null values.
Second Method
$$👇$$
Replace Null Value Pandas DataFrame
# Read car dataset 👇
import pandas as pd
car_india_dataset = pd.read_csv("drive/MyDrive/Dataset/Cars_India_dataset.csv")
car_india_dataset.head()
Now it’s time to fill in all the missing values in our dataFrame and this time we are using pandas fillna ( ) method
car_india_dataset = car_india_dataset.fillna("unknown")
car_india_dataset
You can see our dataset under have an unknown
word everywhere.
If you use the fillna( )
method, make sure to fill in the expected value. But my suggestion is that when you use this method that time only fills each column not the other.
Because different columns have different values and not the same dataType [ int, string, float ] . See the code below on how you can use this method correctly.
$$👇$$
# read the dataset file
import pandas as pd
car_india_dataset = pd.read_csv("drive/MyDrive/Dataset/Cars_India_dataset.csv")
car_india_dataset
car_india_dataset["Displacement"] = car_india_dataset['Displacement'].fillna(3000)
car_india_dataset
You will notice in the Displacement
column that all null values are replaced with 3000.