Comparing performance of For loop vs Java streams (Sequential and Parallel)

Introduction

We all know that java streams (introduced in JDK 8) support functional programming paradigm and hence code written with streams are more concise / compact than traditional for loops. However I also wanted to experiment the performance of for loops vs the streams (both the sequential and parallel variants) to understand where to use which one when performance is the key criterion. So on this article, I am going to walk you through the code snippets which I used for my experiments.

Summation experiment

First I wanted to do a super simple experiment of summing n integers (starting from 0) using for loop , sequential and parallel streams respectively and compare their performances. Also then I execute each of the operations some 1000 times to see which of the options get maximum benefit from JIT optimisations. Below is the source code:

import java.util.function.IntFunction;
import java.util.stream.IntStream;

public class SumDemo {
    public static void main(String[] args) {
        int targetNumber = 1000000;
        IntFunction forLoopFunc = n -> {
            int sum = 0;
            for (int i =0; i < n; i++) {
                sum = sum + i;
            }
            return sum;
        } ;

        IntFunction seqStreamFunc = n -> {
            int sum;
            sum = IntStream.rangeClosed(0,n-1).sum();
            return sum;
        } ;

        IntFunction parallelStreamFunc = n -> {
            int sum;
            sum = IntStream.rangeClosed(0,n-1).parallel().sum();
            return sum;
        } ;

        calculateExecutionTime(forLoopFunc, targetNumber);
        calculateExecutionTime(seqStreamFunc, targetNumber);
        calculateExecutionTime(parallelStreamFunc, targetNumber);
    }

    private static void calculateExecutionTime(IntFunction func, int n) {
        int iterationCount = 1;
        long startTime = System.currentTimeMillis();
        for (int i = 1; i <= iterationCount; i++) {
            func.apply(n);
        }
        System.out.println("Time taken : "+(System.currentTimeMillis() - startTime));
    }
}

Result of executing the above code in my local system (Macbook M1 2020) to calculate the sum of numbers from 0 to 1 million is as follows : (this result was expected because for loop should be the fastest followed by sequential stream due to stream setup overhead and slowest should be the parallel stream because of even higher overhead to distribute the data):

Time taken : 4
Time taken : 9
Time taken : 11

However, when I re-executed the above program by setting the iterationCount variable to 1000 in the calculateExecutionTime method, following was the result which proved the power of JIT optimisation on the streams library, and also very importantly the execution time drastically reduced for parallel streams (Note that I measured the total execution time for 1000 iterations and hence the below numbers stand for 1000 iterations of each operations but that is okay as I am interested in relative performance and all the 3 operations were each executed 1000 times):

Time taken : 732
Time taken : 574
Time taken : 167

Also when I increased the targetNumber or iterationCount further, the results were in line with the above one – which means for loop and sequential streams were almost similar in performance while parallel stream was way faster.

SHAGeneration demo

Now I wrote another program which does a more CPU intensive operation (calculating the SHA-256 hash of a given string) and the results are even more stunning. First the source code below:

import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.function.Consumer;
import java.util.stream.IntStream;

public class SHAGenerationDemo {
    public static void main(String[] args) throws Exception{
        int iterationCount = 10000;
        String baseText = "good morning";
        Consumer<Integer> forLoopConsumer = n -> {
            for (int i =0; i < n; i++) {
                try {
                    MessageDigest md = MessageDigest.getInstance("SHA3-256");
                    md.digest(baseText.getBytes());
                } catch (NoSuchAlgorithmException e) {
                    throw new RuntimeException(e);
                }
            }
        };

        Consumer<Integer> seqStreamConsumer = n -> {
            IntStream.rangeClosed(0,n-1).forEach(_ -> {
                MessageDigest md;
                try {
                    md = MessageDigest.getInstance("SHA3-256");
                } catch (NoSuchAlgorithmException e) {
                    throw new RuntimeException(e);
                }
                md.digest(baseText.getBytes());
            });
        };

        Consumer<Integer> parallelStreamConsumer = n -> {
            IntStream.rangeClosed(0,n-1).parallel().forEach(_ -> {
                MessageDigest md;
                try {
                    md = MessageDigest.getInstance("SHA3-256");
                } catch (NoSuchAlgorithmException e) {
                    throw new RuntimeException(e);
                }
                md.digest(baseText.getBytes());
            });
        };

        calculateExecutionTime(forLoopConsumer, iterationCount);
        calculateExecutionTime(seqStreamConsumer, iterationCount);
        calculateExecutionTime(parallelStreamConsumer, iterationCount);

    }
    private static void calculateExecutionTime(Consumer consumer, int n) {
        long startTime = System.currentTimeMillis();
        consumer.accept(n);
        System.out.println("Time taken : "+(System.currentTimeMillis() - startTime));
    }
}

Results of running the program even for 1000 iterations were stunning. The for loop was slower than the sequential streams and the sequential streams was slower than the parallel stream. I executed the program for 100, 1000, 10K and 1 mil operations and the results were similar for all runs.

Conclusion

So that was an interesting exercise and following are my takeaways:

For loop is definitely faster than either sequential or parallel streams for small datasets because there is some overhead for streams setup and even more overhead for setup of parallel streams.
But for larger datasets and also if operations are executed many many times, then first JIT optimisation helps even sequential streams to perform faster than for loop. Further with datasets becoming larger and with higher complexity of operations (like calculating hashes or encryptions which are CPU intensive operations), parallel streams start becoming the fastest approach.

Hope it was an interesting read but please do share your comments / feedback !!

Introduction

Summation experiment

SHAGeneration demo

Conclusion

Leave a Comment Cancel Reply