Friday, June 20, 2008

Memory Mapped Files

I've only recently learned about memory-mapped files in Linux. Until recently, to perform file I/O, I followed the textbook examples of using read and write calls on file descriptors or streams. It wasn't until I browsed the git source (to help with my git clone) that I came across a mysterious mmap() call.

Not only is it more efficient for the reasons given in the Wikipedia article, but it is easier to work with file contents as if they were already in memory. There's no need to learn the syntax and semantics of various read and write calls.

The only tough part is remembering the mmap syntax. But the call can be readily encapsulated. For example, to read a file, I define a function like the following:
void *map_file_to_memory(size_t *len, const char *path) {
int fd = open_or_die(path, O_RDONLY);
*len = get_size_of(fd);
if (!*len) {
close(fd);
return NULL;
}
void *map = mmap(NULL, *len, PROT_READ, MAP_PRIVATE, fd, 0);
if (MAP_FAILED == map) die("mmap failed");
close(fd);
return map;
}
There are concerns with large files, but hopefully madvise() is well-written and can prevent excessive page faults.