Architecture of Selenium Webdriver

We have already seen a details explanation about Selenium. To summarize, Selenium is a suite of tools which automates browser or we can say automates the actions performed on browser.

For example:- You launch a browser and load Gmail url. You provide username and password and gets signed in. All these are performed within a browser. Selenium can automate the same flow and much more. We will see everything in details.

Before that, we must understand the internal architecture of Selenium. In this post, we will discuss about architecture of Selenium Webdriver.

Selenium supports different language bindings for Java, C#, Python, JavaScript etc. We can run our automated test scripts across different platforms and browsers. So, question is how selenium Webdriver does this? How does a browser understand statement written in programming language? How all these communication is interpreted? All these answers we can get if we understand architecture of Selenium WebDriver.

Starts with introduction of an API:

An application programming interface (API) is a software program which acts as a communication channel so that a software program can communicate with other software programs. It acts as an interface between different software programs to give a way for interaction. It is similar to the way the user interface(UI) facilitates interaction between humans and computers.

Didn’t understand? In simple words, an API is just a medium of communication among software applications/programs. Read about API in much details here.

We have several programming language bindings for Selenium like JAVA, C#, Python, JavaScript etc and also we have several browsers like Chrome, Firefox, Edge , Safari etc. Selenium Webdriver is a set of well-designed object oriented APIs which helps in communication between these language and browser.

Every browser may have different internal logic of performing actions like loading a webpage ( URL) , closing the browser, getting the title, clicking on an element etc.  Selenium WebDriver plays a role of mediator so a programming language and a browse can communicate easily. Programming statements send commands to browsers through Selenium WebDriver APIs and vice versa. This is the reason we have different language binding of Selenium WebDriver.

Selenium WebDriver APIs can not directly communicate with browsers as well. They also need some mediator. Selenium WebDriver requires exclusive browser executable files ( browser specific server i.e. chromedriver.exe for chrome -Windows ). Selenium WebDriver launches browser specific server first then send instructions provided by programming statements to launched server such as load a URL. To be more technical, WebDriver API that communicate with the browser use a common wire protocol. This wire protocol defines a RESTful web service using JSON over HTTP. Response from browser after execution of command is also sent back to Selenium WebDriver API through the same server.

Refer below picture for pictorial representation:

Now , we can say selenium Webdriver architecture consists of four layers:

  1. Language binding: To support multiple languages, selenium people has developed language bindings. If you want to use the browser driver in Java, use the Java bindings for Selenium Webdriver. If you want to use the browser driver in C#, Ruby or Python, use the binding for that language. All language binding can be downloaded from selenium official website.
  2. Selenium Webdriver: It is a set of APIs which makes possible to communication between programming languages and browsers. It has specific commands to perform actions on browsers like launching a URL etc.
  3. Browser drivers: A browser driver can be considered as a personal secretary of a boss. A browser drivers helps Selenium WebDriver APIs to communicate with browser without revealing the internal logic of browser’s functionality. The browser driver is the same regardless of the language used for automation.
  4. Browser:- It is where actions are performed.

When the automation script is executed, the following steps are done internally:

  1. A HTTP request is created and sent to browser driver for each selenium instruction or commands.
  2. A browser driver receives the HTTP request through HTTP server.
  3. HTTP server decides all steps to perform instructions which are executed on browser.
  4. Execution status is sent back to HTTP server which is sent back to automation script.

I tried to explain in easy words. If you have any doubts, suggestions or feedback, please comment.

#ThanksFoReading

You can find all Selenium related post here.
You can find all API manual and automation related posts here.
You can find frequently asked Java Programs here.

Author: Amod Mahajan

17 thoughts on “Architecture of Selenium Webdriver

  1. Thank for the explanation

    I think box1 (Language bindings) and second box(Webdriver1) should be combined because if i goto the selenium download page i can see we have different jars for different languages like for C#, python, java

    So can we say

    Selenium jar of Java = webdriver API —-> Driver of chrome(secretary)- (Chrome Browser)

    1. You answered in ur question itself. Selenium provides different binding jar based on programming language.

    1. I have the same question.where does this HTTP Server reside ? because the browser drivers anyhow reside on our local machine after download so what is the need for the HTTP request to go through the HTTP Server?

      Thanks in Advance,
      Satish

      1. Call to/from Selenium standalone server over on http. If you are aware of protractor, you can see that how we need to start selenium stabdalone server to communicate to browser.

Leave a Reply

Please wait...

Subscribe to new posts to become automation expert

Want to be notified when my new post is published? Get my posts in your inbox.
%d bloggers like this: