Reindexing in Pandas DataFrame
Last Updated :
31 Jul, 2023
Reindexing in Pandas can be used to change the index of rows and columns of a DataFrame. Indexes can be used with reference to many index DataStructure associated with several pandas series or pandas DataFrame. Let’s see how can we Reindex the columns and rows in Pandas DataFrame.
Reindexing the Rows
One can reindex a single row or multiple rows by using reindex() method. Default values in the new index that are not present in the dataframe are assigned NaN.
Example #1:
Python3
import pandas as pd
import numpy as np
column = [ 'a' , 'b' , 'c' , 'd' , 'e' ]
index = [ 'A' , 'B' , 'C' , 'D' , 'E' ]
df1 = pd.DataFrame(np.random.rand( 5 , 5 ),
columns = column, index = index)
print (df1)
print ( '\n\nDataframe after reindexing rows: \n' ,
df1.reindex([ 'B' , 'D' , 'A' , 'C' , 'E' ]))
|
Output:
a b c d e
A 0.129087 0.445892 0.898532 0.892862 0.760018
B 0.635785 0.380769 0.757578 0.158638 0.568341
C 0.713786 0.069223 0.011263 0.166751 0.960632
D 0.913553 0.676715 0.141932 0.202201 0.346274
E 0.050204 0.132140 0.371349 0.633203 0.791738
Dataframe after reindexing rows:
a b c d e
B 0.635785 0.380769 0.757578 0.158638 0.568341
D 0.913553 0.676715 0.141932 0.202201 0.346274
A 0.129087 0.445892 0.898532 0.892862 0.760018
C 0.713786 0.069223 0.011263 0.166751 0.960632
E 0.050204 0.132140 0.371349 0.633203 0.791738
Example #2:
Python3
import pandas as pd
import numpy as np
column = [ 'a' , 'b' , 'c' , 'd' , 'e' ]
index = [ 'A' , 'B' , 'C' , 'D' , 'E' ]
df1 = pd.DataFrame(np.random.rand( 5 , 5 ),
columns = column, index = index)
new_index = [ 'U' , 'A' , 'B' , 'C' , 'Z' ]
print (df1.reindex(new_index))
|
Output:
a b c d e
U NaN NaN NaN NaN NaN
A 0.523572 0.378545 0.871649 0.980319 0.397569
B 0.796003 0.516602 0.839177 0.835811 0.831672
C 0.160613 0.833154 0.810910 0.771017 0.225579
Z NaN NaN NaN NaN NaN
Reindexing the columns using the axis keyword
One can reindex a single column or multiple columns by using reindex() method and by specifying the axis we want to reindex. Default values in the new index that are not present in the dataframe are assigned NaN.
Example #1:
Python3
import pandas as pd
import numpy as np
column = [ 'a' , 'b' , 'c' , 'd' , 'e' ]
index = [ 'A' , 'B' , 'C' , 'D' , 'E' ]
df1 = pd.DataFrame(np.random.rand( 5 , 5 ),
columns = column, index = index)
column = [ 'e' , 'a' , 'b' , 'c' , 'd' ]
print (df1.reindex(column, axis = 'columns' ))
|
Output:
e a b c d
A 0.592727 0.337282 0.686650 0.916076 0.094920
B 0.235794 0.030831 0.286443 0.705674 0.701629
C 0.882894 0.299608 0.476976 0.137256 0.306690
D 0.758996 0.711712 0.961684 0.235051 0.315928
E 0.911693 0.436031 0.822632 0.477767 0.778608
Example #2:
Python3
import pandas as pd
import numpy as np
column = [ 'a' , 'b' , 'c' , 'd' , 'e' ]
index = [ 'A' , 'B' , 'C' , 'D' , 'E' ]
df1 = pd.DataFrame(np.random.rand( 5 , 5 ),
columns = column, index = index)
column = [ 'a' , 'b' , 'c' , 'g' , 'h' ]
print (df1.reindex(column, axis = 'columns' ))
|
Output:
a b c g h
A 0.390460 0.795073 0.369077 NaN NaN
B 0.855556 0.856980 0.132092 NaN NaN
C 0.662565 0.230554 0.215567 NaN NaN
D 0.712128 0.424346 0.813452 NaN NaN
E 0.543142 0.847750 0.168018 NaN NaN
Replacing the missing values
Code #1: Missing values from the dataframe can be filled by passing a value to the keyword fill_value. This keyword replaces the NaN values.
Python3
import pandas as pd
import numpy as np
column = [ 'a' , 'b' , 'c' , 'd' , 'e' ]
index = [ 'A' , 'B' , 'C' , 'D' , 'E' ]
df1 = pd.DataFrame(np.random.rand( 5 , 5 ),
columns = column, index = index)
column = [ 'a' , 'b' , 'c' , 'g' , 'h' ]
print (df1.reindex(column, axis = 'columns' , fill_value = 1.5 ))
|
Output:
a b c g h
A 0.945594 0.492603 0.705738 1.5 1.5
B 0.794345 0.068308 0.017898 1.5 1.5
C 0.622142 0.880565 0.035528 1.5 1.5
D 0.577288 0.934063 0.824655 1.5 1.5
E 0.636026 0.316232 0.244597 1.5 1.5
Code #2: Replacing the missing data with a string.
Python3
import pandas as pd
import numpy as np
column = [ 'a' , 'b' , 'c' , 'd' , 'e' ]
index = [ 'A' , 'B' , 'C' , 'D' , 'E' ]
df1 = pd.DataFrame(np.random.rand( 5 , 5 ),
columns = column, index = index)
column = [ 'a' , 'b' , 'c' , 'g' , 'h' ]
print (df1.reindex(column, axis = 'columns' , fill_value = 'data missing' ))
|
Output:
a b c g h
A 0.227380 0.809179 0.879175 data missing data missing
B 0.212493 0.335610 0.306006 data missing data missing
C 0.406346 0.852985 0.422182 data missing data missing
D 0.145821 0.648285 0.004842 data missing data missing
E 0.002305 0.694541 0.657602 data missing data missing
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...