MuleSoft For Each, Parallel For Each and batch processing comparison
- August 31, 2020
As we know, MuleSoft provides For Each, Parallel For Each and Batch Processing to process a list of records. In this technical article, we’ll compare them to see which use cases are suitable.
What is For Each scope?
- The For Each scope splits a collection into elements, processes them iteratively through the processors embedded in the scope and then returns the original message to the flow.
What is Parallel For Each Scope?
- The Parallel For Each scope enables you to process a collection of messages by splitting the collection into parts that are simultaneously processed in separate routes within the scope of any limitation configured for concurrent processing. After all messages are processed, the results are aggregated following the same order they were in before the split and then the flow continues.
What is batch job?
- Mule allows you to process messages in batches. You can initiate a batch job scope, which splits messages into individual records, performs actions upon each record and then reports on the results and potentially pushes the processed output to other systems or queues.
Comparison between For Each, Parallel For Each and batch processing
For Each | Parallel For Each | Batch Processing | |
Execution Support | Mule 3.x Onwards | Mule 4.2 Onwards | Mule 3.x Onwards |
Graphical Support | Mule 3.x Onwards | Mule 4.3 Onwards | Mule 3.x Onwards |
Execution Pattern | Synchronous | Synchronous | Asynchronous |
Execution Order | Sequential | Parallel | Parallel |
Record Grouping | Possible using Batch Size | Not Possible | Possible using Batch Aggregator |
Error Handling | In exception scenarios stops processing if error not handled | In exception scenarios does not stop processing however raise MULE:COMPOSITE_ROUTE error type | The Behaviour can be configured |
Suitable for? | Sequential Processing | Parallel Processing With Synchronous | Parallel Processing With Asynchronous |
# of Records ? | Small Data | Medium Data | Large Data |
Output | Original Payload, Custom Logic required to get each record processing output | Accumulated Payload | Original Payload |
- Sequential processing required
- Synchronous processing required
- Small data set
- Processing of records in batch required
- Process records only if previous records are processed successfully
Parallel For Each use cases
- Synchronous processing required with parallelism
- Medium data set
- Accumulated output required
- Process records irrespective of previous records status
Parallel For Batch Job use cases
- Asynchronous processing required
- Ordering of process records not needed
- Large data set
- Processing logic is complex and filtering is optional
- Process records irrespective of previous records status
Conclusion
In general, the number of records and behavior (sync or async) determines which option to choose. However, for a medium number of records, choosing between Parallel For Each and Batch Job mostly governs whether we want accumulated output. If you’re choosing Parallel For Each because your use case requires accumulated output, remember large accumulated output can cause Java Virtual Machine (JVM) OutOfMemory issues.
— By Mohammad Mazhar Ansari