Jeremy Satterfield
Coding, Making and Tulsa Life

Mocking Python's built-in open function

A while back I was working on my podcast client and ran into an issue downloading large files. The root problem was that all of Django's FileField backends store the file (or file-like object) to memory then saves it to disk, and the cubieboard system I was using had limitied memry resources, resulting in "out of memory" errors. After much searching and hacking I finally settled on just storing the file to disk myself using requests streaming argument. This allowed me to download the file in chunks and save directly to disk and then tell the Django field where I placed it, as you can see here.

Then I went to update tests. This is when a new problem presented itself. I wanted to test that proper calls were being made to store the file to disk, but I didn't want to store a file to disk during tests. Mocking never worked the way you'd expect for the built-in open. I placed it on the back burner for a while until I ran into a similar issue an another project. When I finally found the solution, it was incredibly simple and incredibly nonobvious.

Pythons has a bunch of built-in functions that are just preloaded, such as open for opening files, print for outputting to the console, str, bool, dict, list, etc. for casting or creating an object, among others. Most of these just do what they do, with no affect on the overall environment; therefore, when testing I just let them do what they do and mocking them is probably a bad idea anyway. Except for open, it has real world affects on the environment in that it accesses, and allows writing of, files on the disk.

Finally, I stumbled across the __builtin__ module. This is the magic module where all those functions actually reside, and if you can access where a function resides you can mock it out. The following should work on most Python mocking frameworks, but this is how to use PyMox to do it.

class Client(APIClient):
...
    def open_file(self):
        file = open(self.path, 'r')
        contents = file.read()
        file.close()
        return self.prepare_file(contents)
...


# test.py
import __builtin__
from StringIO import StringIO

class ClientTest(TestCase):
...
def test_open_file(self):
    self.mock.StubOutWithMock(__builtin__, 'open')
    mock_content = StringIO('test')
    self.mock.StubOutWithMock(mock_content, 'close')

    open('/tmp/mypath.txt', 'r').AndReturn(mock_content)
    mock_content.close()

    self.mock.ReplayAll()
    file = self.client.open_file()
    self.mock.VerifyAll()
    ...

In this case, returning an instance of StringIO results in a file-like object allowing me to set the contents of the "file" being accessed directly in the test. No fixtures or creating files on the disk. This pattern works for both reads and writes.

But the code I'm testing here doesn't do all the error handling to make sure the file is closed even if there's an error during the read. Frankly, they made it possible to use open as a context manager so that you don't have to worry about it. So combining this new knowledge with my previous post about mocking context managers, we can easily write cleaner code that's still testable.

class Client(APIClient):
...
    def write_file(self):
        with open(self.path, 'wb') as f:
            f.write(self.content)
...

# test.py
class ClientTest(TestCase):
...
    def test_write_file(self):
        self.mock.StubOutWithMock(__builtin__, 'open')
        mock_file1 = self.mock.CreateMockAnything()
        open('/tmp1/mypath.txt', 'wb').AndReturn(mock_file1)
        mock_file1.__enter__().AndReturn(mock_file1)
        mock_file1.write('this is my content')
        mock_file1.__exit__(None, None, None)

        self.mock.ReplayAll()
        self.client.write_file()
        self.mock.VerifyAll()
...

Since StringIO can't be used as a context manager, this pattern doesn't allow for the use of StringIO. This means you have to mock out the reads, writes and any other methods you would call on a file-like object, but I think that's a small price for being able to write cleaner, testable code.