sethserver / Python

Python: Recursively List All Files in a Directory - Efficient File Traversal Guide

By Seth Black Updated September 27, 2024

When using Python for Data Science or general Systems Administration you'll find yourself needing to recursively read a directory tree, remember all (or some) of the files in the directories and then do something fun with those files. The code snippet below should get you going in the right direction.

import os files = [] dirlist = ['/home/sethblack/Programming/Python/recursive-list/files/'] while len(dirlist) > 0: for (dirpath, dirnames, filenames) in os.walk(dirlist.pop()): dirlist.extend(dirnames) files.extend(map(lambda n: os.path.join(*n), zip([dirpath] * len(filenames), filenames))) print(files)

At its core the above code uses a very simple function tucked away in the os Python library: os.walk. This function returns a tuple with the current directory, a list of the directories contained within and a list of the files contained within. From here we use some of Python's built-in functions and some Python wizardry to keep our code concise. Let's take this apart from the inside out.

[dirpath] * len(filenames)

Multiplying a list by an integer results in the list being extended by the multiplier. This means that the list will be repeated the given number of times. We need this because we only have one instance of the current directory and a list containing the file names in the current directory - ultimately we want a list of full file paths (not just the names of the files, since that wouldn't be of much use).

zip([dirpath] * len(filenames), filenames)

Zip returns an iterator of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. Since the first list we provide is the current directory this will give us a list that looks something like this:

[('/current/path/', 'filename1.txt'), ('/current/path/', 'filename2.txt'), ('/current/path/', 'filename3.txt')]

We now need a way to combine these tuples into strings so that we end up with clean file paths. To do this we use the built-in function map with a small anonymous function that passes each tuple into the os.path.join function as a variadic parameter. This will take the list of tuples that we have above and return a much cleaner list:

['/current/path/filename1.txt', '/current/path/filename2.txt', '/current/path/filename3.txt']

The list extend method allows us to take the above list and shove it onto the end of our current list of files.

Good luck and happy walking!