How to Make an Email Extractor in Python?

Last Updated : 21 Mar, 2024

In this article, we will see how to extract all the valid emails in a text using python and regex.

A regular expression shortened as regex or regexp additionally called a rational expression) is a chain of characters that outline a seek pattern. Usually, such styles are utilized by string-looking algorithms for “locate” or “locate and replace” operations on strings, or to enter validation.
It is a method evolved in theoretical computer technology and natural language theory.
The re module in python provides full support for Perl-like regular expressions in Python. It offers a set of functions that allows us to search a string for a match.
The re.findall() function defined in the re python module accepts two parameters and returns a list of all the matching strings found.

Syntax: re.findall( regex , string )

Parameters:

The regex is the regular expression which is made of various predefined symbols used to search for the pattern we are looking for.

The string is the original string on which we are going to perform search action on.

After importing the necessary module, we will call findall() method defined in the re module to find all the strings that match the regex expression passed as a parameter.

The regex expression can be divided into three parts:

1. r”[A-Za-z0-9_%+-.]+”

This expression looks for a continuous sequence of characters consist of all capital alphabets defined by A-Z, lowercase alphabets a-z, all digits 0-9, and special characters such as _%+-. . The ‘+’ is used to append the second regex to the first.

2. r”@[A-Za-z0-9.-]+”

This expression looks for a continuous sequence of characters consist of all capital alphabets defined by A-Z, lowercase alphabets a-z, all digits 0-9, and special characters such as ._. The ‘+’ is used to append the second regex to the first.

3. r”\.[A-Za-z]{2,5}”

This expression looks for a continuous sequence of characters consist of all capital alphabets defined by A-Z, lowercase alphabets a-z such that the size of this continuous sequence is between 2-5 both inclusive.

Example 1: Extract valid emails from a string

Python3

# Raw text 
text = "Duis info@geeksforgeeks.com convallis. Parturient montes nascetur ridiculus mus \ 
geeksforgeeks@rocks.xyz mauris. Odio eu feugiat pre@rsos_tium.index nibh ipsum consequat love@gfg.in \ 
pretium aenean pharetra magna ac placerat. Vitae justo eget magna fermentum iaculis eu non." 
  
#import regex module 
import re 
  
#finding all valid emails using regex 
reg = re.findall(r"[A-Za-z0-9_%+-.]+"
                 r"@[A-Za-z0-9.-]+"
                 r"\.[A-Za-z]{2,5}",text) 
  
#printing all the valid emails found 
print(reg)

Output:

['info@geeksforgeeks.com', 'geeksforgeeks@rocks.xyz', 'love@gfg.in']

Example 2: Extract valid emails from a text file

Using open() function we open the required file in “r” mode, read mode only. And for each line, we strip the line so as to remove white spaces and the process them similarly to the first example.

Python3

#importing module

import re



with open('sample.txt','r') as file:

  for line in file:

    line = line.strip()



    # finding all valid emails

    reg = re.findall(r"[A-Za-z0-9_%+-.]+"

                      r"@[A-Za-z0-9.-]+ "

                      r"\.[A-Za-z]{2,5}",line)



#printing all the valid emails found

print(reg)

Output:

['info@geeksforgeeks.com', 'geeksforgeeks@rocks.xyz', 'love@gfg.in']

Like Article

Suggest improvement

Previous
How to move all files from one directory to another using Python ?

Next
How to create GitHub repository using Python Selenium?

Share your thoughts in the comments

Please Login to comment...

How to Make an Email Extractor in Python?

Python3

Python3

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?