Open In App

Controlling the Web Browser with Python

Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we are going to see how to control the web browser with Python using selenium. Selenium is an open-source tool that automates web browsers. It provides a single interface that lets you write test scripts in programming languages like Ruby, Java, NodeJS, PHP, Perl, Python, and C#, etc.

To install this module, run these commands into your terminal:

pip install selenium

For automation please download the latest Google Chrome along with chromedriver from here.

Here we will automate the authorization at “https://auth.geeksforgeeks.org” and extract the Name, Email, Institute name from the logged-in profile.

Initialization and Authorization

First, we need to initiate the web driver using selenium and send a get request to the url and Identify the HTML document and find the input tags and button tags that accept username/email, password, and sign-in button.

To send the user given email and password to the input tags respectively:

driver.find_element_by_name('user').send_keys(email)
driver.find_element_by_name('pass').send_keys(password)

Identify the button tag and click on it using the CSS selector via selenium webdriver:

 driver.find_element_by_css_selector(‘button.btn.btn-green.signin-button’).click()

Scraping Data

Scraping Basic Information from GFG Profile

After clicking on Sign in, a new page should be loaded containing the Name, Institute Name, and Email id. Identify the tags containing the above data and select them.

container = driver.find_elements_by_css_selector(‘div.mdl-cell.mdl-cell–9-col.mdl-cell–12-col-phone.textBold’)

Get the text from each of these tags from the returned list of selected css selectors:

name = container[0].text
try:
    institution = container[1].find_element_by_css_selector('a').text
except:
    institution = container[1].text
email_id = container[2].text

Finally, print the output:

print({"Name": name, "Institution": institution, "Email ID": email})

Scraping Information from Practice tab

Click on the Practice tab and wait for few seconds to load the page.

driver.find_elements_by_css_selector('a.mdl-navigation__link')[1].click()

Find the container containing all the information and select the grids using CSS selector from the container having information.

container = driver.find_element_by_css_selector(‘div.mdl-cell.mdl-cell–7-col.mdl-cell–12-col-phone.whiteBgColor.mdl-shadow–2dp.userMainDiv’)

grids = container.find_elements_by_css_selector(‘div.mdl-grid’)

Iterate each of the selected grids and extract the text from it and add it to a set/list for output.

res = set()
for grid in grids:
    res.add(grid.text.replace('\n',':'))

Below is the full implementation:

Python3




# Import the required modules
from selenium import webdriver
import time
  
# Main Function
if __name__ == '__main__':
  
    # Provide the email and password
    email = 'example@example.com'
    password = 'password'
  
    options = webdriver.ChromeOptions()
    options.add_argument("--start-maximized")
    options.add_argument('--log-level=3')
  
    # Provide the path of chromedriver present on your system.
    driver = webdriver.Chrome(executable_path="C:/chromedriver/chromedriver.exe",
                              chrome_options=options)
    driver.set_window_size(1920,1080)
  
    # Send a get request to the url
    driver.get('https://auth.geeksforgeeks.org/')
    time.sleep(5)
  
    # Finds the input box by name in DOM tree to send both 
    # the provided email and password in it.
    driver.find_element_by_name('user').send_keys(email)
    driver.find_element_by_name('pass').send_keys(password)
      
    # Find the signin button and click on it.
    driver.find_element_by_css_selector(
        'button.btn.btn-green.signin-button').click()
    time.sleep(5)
  
    # Returns the list of elements
    # having the following css selector.
    container = driver.find_elements_by_css_selector(
        'div.mdl-cell.mdl-cell--9-col.mdl-cell--12-col-phone.textBold')
      
    # Extracts the text from name, 
    # institution, email_id css selector.
    name = container[0].text
    try:
        institution = container[1].find_element_by_css_selector('a').text
    except:
        institution = container[1].text
    email_id = container[2].text
  
    # Output Example 1
    print("Basic Info")
    print({"Name": name, 
           "Institution": institution,
           "Email ID": email})
  
    # Clicks on Practice Tab
    driver.find_elements_by_css_selector(
      'a.mdl-navigation__link')[1].click()
    time.sleep(5)
  
    # Selected the Container containing information
    container = driver.find_element_by_css_selector(
      'div.mdl-cell.mdl-cell--7-col.mdl-cell--12-col-phone.\
      whiteBgColor.mdl-shadow--2dp.userMainDiv')
      
    # Selected the tags from the container
    grids = container.find_elements_by_css_selector(
      'div.mdl-grid')
      
    # Iterate each tag and append the text extracted from it.
    res = set()
    for grid in grids:
        res.add(grid.text.replace('\n',':'))
  
    # Output Example 2
    print("Practice Info")
    print(res)
  
    # Quits the driver
    driver.close()
    driver.quit()


Output:



Last Updated : 23 Sep, 2021
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads