We have already seen a details explanation about Selenium. To summarize, Selenium is a suite of tools which automates browser or we can say automates the actions performed on browser.
For example:- You launch a browser and load Gmail url. You provide username and password and gets signed in. All these are performed within a browser. Selenium can automate the same flow and much more. We will see everything in details.
Before that, we must understand the internal architecture of Selenium. In this post, we will discuss about architecture of Selenium Webdriver.
Starts with introduction of an API:
An application programming interface (API) is a software program which acts as a communication channel so that a software program can communicate with other software programs. It acts as an interface between different software programs to give a way for interaction. It is similar to the way the user interface(UI) facilitates interaction between humans and computers.
Didn’t understand? In simple words, an API is just a medium of communication among software applications/programs. Read about API in much details here.
Every browser may have different internal logic of performing actions like loading a webpage ( URL) , closing the browser, getting the title, clicking on an element etc. Selenium WebDriver plays a role of mediator so a programming language and a browse can communicate easily. Programming statements send commands to browsers through Selenium WebDriver APIs and vice versa. This is the reason we have different language binding of Selenium WebDriver.
Selenium WebDriver APIs can not directly communicate with browsers as well. They also need some mediator. Selenium WebDriver requires exclusive browser executable files ( browser specific server i.e. chromedriver.exe for chrome -Windows ). Selenium WebDriver launches browser specific server first then send instructions provided by programming statements to launched server such as load a URL. To be more technical, WebDriver API that communicate with the browser use a common wire protocol. This wire protocol defines a RESTful web service using JSON over HTTP. Response from browser after execution of command is also sent back to Selenium WebDriver API through the same server.
Refer below picture for pictorial representation:
Now , we can say selenium Webdriver architecture consists of four layers:
- Language binding: To support multiple languages, selenium people has developed language bindings. If you want to use the browser driver in Java, use the Java bindings for Selenium Webdriver. If you want to use the browser driver in C#, Ruby or Python, use the binding for that language. All language binding can be downloaded from selenium official website.
- Selenium Webdriver: It is a set of APIs which makes possible to communication between programming languages and browsers. It has specific commands to perform actions on browsers like launching a URL etc.
- Browser drivers: A browser driver can be considered as a personal secretary of a boss. A browser drivers helps Selenium WebDriver APIs to communicate with browser without revealing the internal logic of browser’s functionality. The browser driver is the same regardless of the language used for automation.
- Browser:- It is where actions are performed.
When the automation script is executed, the following steps are done internally:
- A HTTP request is created and sent to browser driver for each selenium instruction or commands.
- A browser driver receives the HTTP request through HTTP server.
- HTTP server decides all steps to perform instructions which are executed on browser.
- Execution status is sent back to HTTP server which is sent back to automation script.
I tried to explain in easy words. If you have any doubts, suggestions or feedback, please comment.