Implementing pagination using offset limits in a database

July 23, 2020

What happens when you’re challenged with fetching a huge number of records from the database? If you have a limited number of records, you can do it with a simple select query. But what if the record size is in lakhs or ten-thousands? It still can be done with a simple select query. However, your Mule application might run out of memory space. There are also more chances for the design to harm application performance, and it can be unreliable.

The best way to overcome this challenge is to use pagination. Instead of retrieving records in huge numbers at a time, you use limits and offset values to retrieve records in a limited size and process those records until they’ve all been fetched from the database.

NOTES:

This tutorial also explains how to give dynamic queries to databases. You need to do this when the query parameters change during runtime.
The LIMIT key field applies to Oracle and Snowflake databases. If you’re using a SQL database, please use the “Rows Fetch Next” key in place of LIMIT.

Implementation

Step 1: Call the database and retrieve the total number of records.

Select count(*) from employee_db;

NOTE: You can use a transform message and put the query in a string, as shown above, to dynamically query from the database.

Step 2. Using a set variable, initialize the offset and limit values — for example, offset: 0 and limit: 500.

NOTES:

Limit is the maximum number of records to be retrieved at one time from the database. The value depends on your requirement.
Offset is the value that helps retrieve data from that point of record. That is, if the offset equals 100, then it will fetch records from the 100th line.

Step 3: Generate random arrays based on the total record count. The size of the arrays depends on the limit.

For example: If the total record count is 2,000 and the limit is 500, then 2,000 divided by 500 equals 4. So, we’re forming four random arrays that will iterate four times to retrieve records from the database using the limit. That is, arr = [1,2,3,4]. This will be passed on to each component, which will loop four times and fetch 500 records on every iteration.

Here’s the DataWeave logic:

%dw 2.0 output application/java var rowcount = payload."COUNT(*)" reduce ($$ ++ $) as Number var maxSize = ceil(rowcount/vars.limit) fun prepareList(list:Array, maxSize: Number) = if(sizeOf(list) >= maxSize ) list else prepareList(list ++ [(sizeOf(list) + 1) as Number],maxSize) --- prepareList([],maxSize) as Array

Step 4. Retrieve records using this query: select * from emp_db LIMIT : vars.limit OFFSET : vars.offset

Step 5: After calling the database, set an offset variable inside each. Once the data has been retrieved from the database, set a new offset value. This will help retrieve the next 500 records, starting at the 500th record.

Set variable logic: vars.offset + vars.limit

Conclusion

Pagination is one of the best approaches when you want to break a large set of data into chunks without affecting application performance. The method described here can be implemented on any connector where a large set of data is being processed — unless you want to use batch processing.

— By Sidhanth Mohapatro