How and when to use memory mapped file in Java ?

Introduction

Memory mapped file is a technique used by modern operating system where a file in the operating system is directly mapped (byte-for-byte) to a portion of the virtual memory. Hence the performance of memory mapped files can often be much faster than normal IO operations as the IO system calls are significantly costly. It has been experienced that memory mapped files perform significantly faster
when reading larger files (at least greater than 1 GB) but it will be slower to read smaller files because there is some overhead in creating the memory maps. So one of the common use cases of memory mapped files can be reading a large file from the OS. Another common use case of memory mapped file
is the ability to share data between 2 different running processes in the OS. Memory mapped file is now supported in many mainstream programming languages like Java, Python, Ruby, Perl, Julia etc.

How to use memory mapped file to read data

We will see a small Java program where 2 different approaches have been used to read the data – first approach using a traditional Buffered reader based IO and the second approach involving a MappedByteBuffer (memory mapped) file. Let us now see the code below and feed a file of 10 KiB to the code (the code has 2 distinct methods , one reading the file normally and the other reading the file using mmap approach) :

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.RandomAccessFile;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;

public class ReadFile {
    public static void main(String[] args) throws Exception {
        String fileName = "";
        readUsingMMapFile(fileName);
        readFileNormally(fileName);
    }

    private static void readFileNormally(String fileName) throws Exception{
        long startTime = System.currentTimeMillis();
        BufferedReader br = new BufferedReader(new FileReader(fileName));
        while (br.readLine() != null) {
            // Code to deal with the line content
        }
        br.close();
        System.out.println("Total time taken to read file normally : "+(System.currentTimeMillis() - startTime));
    }

    private static void readUsingMMapFile(String fileName) throws Exception{
        long startTime = System.currentTimeMillis();
        try (RandomAccessFile file = new RandomAccessFile(new File(fileName), "r"))
        {
            FileChannel fileChannel = file.getChannel();
            MappedByteBuffer buffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, 0, fileChannel.size());
            for (int i = 0; i < buffer.limit(); i++)
            {
                // Code to deal with the line content
            }
        }
        System.out.println("Total time taken to read file using mmap : " + (System.currentTimeMillis() - startTime));
        }
}

When the above program is run, following is the output (which clearly indicates that the normal IO is much faster. Note that time measurements are in miliseconds):
Total time taken to read file using mmap : 39
Total time taken to read file normally : 6

When I run the same program against a file which is approximately 148 MB, I get the following output which indicates that mmap is almost 16x faster (time measurements are in mili seconds) :
Total time taken to read file using mmap : 38
Total time taken to read file normally : 635

Again when I run the same program against a file which is approximately 1.58 GB, I get the following output which indicates that mmap is almost 77x faster (time measurements are in mili seconds):
Total time taken to read file using mmap : 47
Total time taken to read file normally : 3644

Another very interesting finding in connection to this memory mapped file is that when the same program was converted to a native binary using the GraalVM, below was the result:
Total time taken to read file using mmap : 5
Total time taken to read file normally : 5194

So using a native version of the program, reading file using mmap got drastically reduced. It was also not a surprise given that this mmap is more of an OS feature and hence the native version is able to read memory mapped file at a much lesser time.

Sharing data between JVMs using memory mapped file

There seems to be another interesting application of memory mapped files and that is when sharing data between 2 JVM processes and without using any socket programming. However to be honest, I did not get a chance to experiment with this in detail but here (https://stackoverflow.com/questions/25396664/shared-memory-between-two-jvms) is a sample program which has a client server pair sharing data using memory mapped files. Also came to know that Chronicle Queue (https://github.com/OpenHFT/Chronicle-Queue) has memory mapped file as one of the building blocks but again I did not get a chance to experiment with it. But if you have any experience with using memory mapped files to share data between 2 different JVM process and the use cases, kindly share your experiences.

Conclusion

So that’s it – I think the power of memory mapped file is tremendous for reading large files and whenever we have the requirement of reading large files, we should use memory map. In fact in the 1 billion row challenge (https://github.com/gunnarmorling/1brc) , some of the best performing versions were using mmap files. So let me summarise my findings:

  • Memory mapped file can drastically reduce the read time for large files anywhere from 10x – 100x. Larger the file size, more is the gain using memory mapped approach.
  • However for smaller file, memory mapped approach can be slower because of the overhead involved in the setup. However even for smaller files, the overhead in setting up the memory mapped files can be significantly reduced by using a native version (compiling using GraalVM and then converting to native binary also using GrralVM.)

Last but not the least, would request you to comment ori provide your feedback or share any experience around this topic. Thank you!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top