Make Selenium Easy

Compare Two Excel Workbooks Using Apache POI

Comparing two or multiple excel workbooks is very common requirement specially for automated test scenarios. In this post, we will learn How to compare two excel workbooks or to verify if two excel workbooks have same data set.

I hope you are aware of a normal excel workbook. An Excel Workbook is also an Excel File. An Excel Workbook consists of one or multiple sheets. Each sheet consists of rows and columns. Each intersection point of Row and Column is called a Cell. Here is the link o know basics about Excel.

Steps to compare excel workbooks

We must not start comparing excel sheets at cell level or cell data directly. We should first check for basic conditions to be true then proceed for cell comparison.

Basic ordered conditions to be checked first are as below:-

  1. Do workbooks have same number of sheets?
  2. Do workbooks have same name of sheets? You can relax order of sheets depending on requirements. In this post I am relaxing order of sheets but keeping same order to assert cell data.
  3. Do workbooks have same numbers of rows?
  4. Do workbooks have same numbers of columns/cells of corresponding rows? Corresponding rows mean each row may have different number of columns.

When above checked are passed then we should go for cell data comparison. If either check fails then we should not proceed further.

Check if both excel workbooks have same number of sheets

To get number of sheets in a workbook, we need to use getNumberOfSheets() of Workbook interface provided by Apache POI. Workbook interface is implemented by major classes like XSSFWorkbook , HSSFWorkbook.

Check if both excel workbooks have sheets of same name

Here we have two scenarios.

First – Just verify if sheet names are same irrespective of sheet orders.

Second – Verify sheet names are same and appears in same order.

We can get sheet name using getSheetName(int sheetIndex) method.

If you want to check if sheets are in order in both workbook, remove sort() lines of code from above. If you are relaxing sheet order, remember for cell comparison, get sheet by name not by index.

Check if both workbooks have same number of rows in all sheets

We need to check if number of rows in each sheet of workbooks are same. If sheet1 of workbook 1 has 10 rows and sheet2 has 15 rows then workbook 2 should also have 10 rows in sheet1 and 15 rows in sheet2.

To get number of rows in a sheet we need to use getPhysicalNumberOfRows() method of Sheet interface. This method returns the number of physically defined rows (NOT the number of rows in the sheet). If we have 10 rows in a sheet, each row will be indexed or numbered staring from zero. Same we need to do for all sheets. Below is an example code for one sheet.

Check if both workbooks have same columns for row

A row can have different number of columns. So we need to check for each row in all sheets. To iterate through rows, we can use rowIterator() method of Sheet interface which returns an iterator of the physical rows. During iteration of rows, we need to get physical number of cells. Same logic need to do for all sheets.

Now we need to check for cell data and assert.

Each cell has a type like String, Numeric, Boolean etc. To get value as per type, different method needs to be used. For example :- To get data from a Numeric cell type, we need to use getNumericCellValue() and getStringCellValue() for String cell type.

If a cell consists Date, we do not have any such type for Cell. A Cell with date falls under Numeric type. We need to check if cell is date formatted. This we can check using DateUtil.isCellDateFormatted(Cell cell).

Complete Code

Sample Excel Files:-

Unit Tests

Ouput

Note :-

  1. I have not covered for all Cell type. You can add as per your requirements.
  2. Formatting of cells like fonts, styles etc is not covered.

You can download/clone above sample project from here.

If you have any doubt, feel free to comment below.
If you like my posts, please like, comment, share and subscribe.
#ThanksForReading
#HappyLearning

Find all Selenium related post here, all API manual and automation related posts here and find frequently asked Java Programs here.

Many other topics you can navigate through menu.

Author: Amod Mahajan

My name is Amod Mahajan and I am an IT employee with 6+ years of experience in Software testing and staying in Bengaluru. My area of interest is Automation testing. I started from basics and went through so many selenium tutorials. Thanks to Mukesh Otwani as his tutorials are easy and cover basics to advance. I have habit of exploring concepts by deep diving. I used to make notes. I thought of sharing my knowledge through posts and now I am here. #KeepLearning #ShareLearning

1 thought on “Compare Two Excel Workbooks Using Apache POI

Leave a Reply

Please wait...

Subscribe to new posts to become automation expert

Want to be notified when my new post is published? Get my posts in your inbox.

Compare Two Excel Workbooks Using Apache POI

by Amod Mahajan time to read: 15 min
1
%d bloggers like this: