How to Drop Rows that Contain a Specific String in Pandas?
Last Updated :
03 Dec, 2023
In Pandas, we can drop rows from a DataFrame that contain a specific string in a particular column. In this article, we are going to see how to drop rows that contain a specific string in Pandas.
Eliminating Rows Containing a Specific String
Basically, this function will search for the string in the given column and return the rows respective to that. For this, we need to create a new data frame by filtering the data frame using this function.
Syntax:
df[ df[ “column” ].str.contains( “someString” )==False ]
Creating a Sample Pandas DataFrame
Here, we will create a sample DataFrame that we will use in further examples.
Python3
import pandas as pd
df = pd.DataFrame({ 'team' : [ 'Team 1' , 'Team 1' , 'Team 2' ,
'Team 3' , 'Team 2' , 'Team 3' ],
'Subject' : [ 'Math' , 'Science' , 'Science' ,
'Math' , 'Science' , 'Math' ],
'points' : [ 10 , 8 , 10 , 6 , 6 , 5 ]})
df
|
Output:
team Subject points
0 Team 1 Math 10
1 Team 1 Science 8
2 Team 2 Science 10
3 Team 3 Math 6
4 Team 2 Science 6
5 Team 3 Math 5
Drop Rows that Contain a Specific String in Pandas
Below are the ways by which we can drop rows that contains a specific string in Pandas:
- Dropping the rows that contain a specific string
- Dropping the rows with more than one string
- Drop rows with the given partial string
Dropping the Rows that Contain a Specific String
In this method, we are going to find the rows with str.contains() function which will basically take the string from the series and check for the match of the given string, and using a boolean we are selecting the rows and setting them to False will help us to neglect the selected rows and keep the remaining rows.
Syntax: df[df[“column_name”].str.contains(“string”)==False]
In the following example, we are going to select all the teams except “Team 1”.
Python3
import pandas as pd
df = pd.DataFrame({ 'team' : [ 'Team 1' , 'Team 1' , 'Team 2' ,
'Team 3' , 'Team 2' , 'Team 3' ],
'Subject' : [ 'Math' , 'Science' , 'Science' ,
'Math' , 'Science' , 'Math' ],
'points' : [ 10 , 8 , 10 , 6 , 6 , 5 ]})
df = df[df[ "team" ]. str .contains( "Team 1" ) = = False ]
df
|
Output:
team Subject points
2 Team 2 Science 10
4 Team 2 Science 6
5 Team 3 Math 5
Dropping the Rows with More Than One String
Same as method 1, we follow the same steps here but with a bitwise or operator to add an extra string to search for.
Syntax: df = df[df[“column_name”].str.contains(“string1|string2”)==False]
In the following, program we are going to drop the rows that contain “Team 1” or “Team 2”.
Python3
import pandas as pd
df = pd.DataFrame({ 'team' : [ 'Team 1' , 'Team 1' , 'Team 2' ,
'Team 3' , 'Team 2' , 'Team 3' ],
'Subject' : [ 'Math' , 'Science' , 'Science' ,
'Math' , 'Science' , 'Math' ],
'points' : [ 10 , 8 , 10 , 6 , 6 , 5 ]})
df = df[df[ "team" ]. str .contains( "Team 1|Team 2" ) = = False ]
df
|
Output:
team Subject points
3 Team 3 Math 6
5 Team 3 Math 5
Drop Rows With the Given Partial String
Here we are using the same function with a join method that carries the part of the word we need to search.
Syntax: df[ ~df.column_name.str.contains(‘|’.join([“string”])) ]
In this following program, the situation is different from the above two cases. Here we are going to select and drop the rows with the given partial string. For example, we are going to drop the rows with “Sci” on the column subjects.
Python3
import pandas as pd
df = pd.DataFrame({ 'team' : [ 'Team 1' , 'Team 1' , 'Team 2' ,
'Team 3' , 'Team 2' , 'Team 3' ],
'Subject' : [ 'Math' , 'Science' , 'Science' ,
'Math' , 'Science' , 'Math' ],
'points' : [ 10 , 8 , 10 , 6 , 6 , 5 ]})
discard = [ "Sci" ]
df[~df.Subject. str .contains( '|' .join(discard))]
df
|
Output:
team Subject points
0 Team 1 Math 10
3 Team 3 Math 6
5 Team 3 Math 5
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...