Unit tests are an important part of modern software development, but many developers don't know the difference between good unit tests and bad unit tests. Commonly used measurements, like code coverage, are often inadequate and can be misused. Let's look at various examples to learn how to write good, reliable and maintainable unit tests!
Let’s start with an encoder that encodes an integer into one or more words from a word list. This is similar to the procedure used by BIP39 with a few simplifications:
class WordEncoder:
def __init__(self, wordlist):
self.wordlist = wordlist
self.wordlist_size = len(wordlist)
def encode(self, value):
seed = []
while value:
index = value % self.wordlist_size
seed.append(self.wordlist[index])
value = value // self.wordlist_size
return seed if seed else [self.wordlist[0]]
When writing unit tests, it’s desirable to exercise as many code branches as possible in the code under test.
This will lead to high code coverage numbers, which are calculated by dividing the number of exercised branches by the total number of branches.
By analyzing the encode
function above, we can see three code branches:
value
is zero and the body of the while loop is not executed (this also covers the else case in the return statement)value
is less than len(wordlist)
and the body of the while loop is executed exactly oncevalue
is greater than or equal to `len(wordlist)`` and the body of the while loop is executed multiple timesAt a minimum, we want to cover all three code branches in our tests. So, we could write a test like this:
class EncodeIntAsWordsV1Test(unittest.TestCase):
def test_encode(self):
encoder = WordEncoder(['alpha', 'beta', 'gamma', 'delta', 'epsilon', 'eta', 'theta', 'iota', 'kappa', 'lambda'])
self.assertEqual(['alpha'], encoder.encode(0))
self.assertEqual(['iota'], encoder.encode(7))
self.assertEqual(['eta', 'gamma', 'theta'], encoder.encode(625))
While the previous test is complete from a branch coverage point of view, it is testing multiple units at once. Ideally, each unit test should test exactly one thing. While this is not always possible in practice, it makes it much easier to pinpoint problems when tests fail.
For example, if the previous test fails, it’s not immediately obvious if there is a bug with the edge case (0) or when multiple loops are needed (625). In contrast, if a separate 0 test failed and a separate 625 test succeeded, it would be clear the bug is related to handling of 0.
From the above, it should hopefully be clear that we need three tests. One for each of the code branches listed above.
The tests could look something like this:
class EncodeIntAsWordsV2Test(unittest.TestCase):
def test_can_encode_int_zero(self):
self.assertEqual(
['alpha'],
WordEncoder(['alpha', 'beta', 'gamma', 'delta', 'epsilon', 'eta', 'theta', 'iota', 'kappa', 'lambda']).encode(0))
def test_can_encode_int_to_single_word(self):
self.assertEqual(
['iota'],
WordEncoder(['alpha', 'beta', 'gamma', 'delta', 'epsilon', 'eta', 'theta', 'iota', 'kappa', 'lambda']).encode(7))
def test_can_decode_int_to_multiple_words(self):
self.assertEqual(
['eta', 'gamma', 'theta'],
WordEncoder(['alpha', 'beta', 'gamma', 'delta', 'epsilon', 'eta', 'theta', 'iota', 'kappa', 'lambda']).encode(625))
While the previous tests are each testing a single thing, they are hard to follow because they don’t separate the three primary test phases: Arrange, Act, Assert.
Separating these phases makes it much easier for the reader to quickly understand what is being tested (Act) as well as the success conditions (Assert).
Splitting up these phases might result in something like this:
class EncodeIntAsWordsV3Test(unittest.TestCase):
def test_can_encode_int_zero(self):
# Arrange:
wordlist = ['alpha', 'beta', 'gamma', 'delta', 'epsilon', 'eta', 'theta', 'iota', 'kappa', 'lambda']
encoder = WordEncoder(wordlist)
# Act:
words = encoder.encode(0)
# Assert:
self.assertEqual(['alpha'], words)
def test_can_encode_int_to_single_word(self):
# Arrange:
wordlist = ['alpha', 'beta', 'gamma', 'delta', 'epsilon', 'eta', 'theta', 'iota', 'kappa', 'lambda']
encoder = WordEncoder(wordlist)
# Act:
words = encoder.encode(7)
# Assert:
self.assertEqual(['iota'], words)
def test_can_decode_int_to_multiple_words(self):
# Arrange:
wordlist = ['alpha', 'beta', 'gamma', 'delta', 'epsilon', 'eta', 'theta', 'iota', 'kappa', 'lambda']
encoder = WordEncoder(wordlist)
# Act:
words = encoder.encode(625)
# Assert:
self.assertEqual(['eta', 'gamma', 'theta'], words)
ℹ️ Some tests might not have any setup. These will not need an Arrange phase.
Unit tests are the de facto contract of your production code - assuming they all pass, which they should! As such, they should be treated with as much care as your production code and written with the same standards in mind.
For example, while the previous tests are complete and easy to read they’re very duplicative. The only variances are the input into the encoder and the expected output. We can refactor these tests into something much more concise that doesn’t lose any of the coverage or readability!
class EncodeIntAsWordsV4Test(unittest.TestCase):
def _assert_can_encode_int(self, value, expected_words):
# Arrange:
wordlist = ['alpha', 'beta', 'gamma', 'delta', 'epsilon', 'eta', 'theta', 'iota', 'kappa', 'lambda']
encoder = WordEncoder(wordlist)
# Act:
words = encoder.encode(value)
# Assert:
self.assertEqual(expected_words, words)
def test_can_encode_int_zero(self):
self._assert_can_encode_int(0, ['alpha'])
def test_can_encode_int_to_single_word(self):
self._assert_can_encode_int(7, ['iota'])
def test_can_decode_int_to_multiple_words(self):
self._assert_can_encode_int(625, ['eta', 'gamma', 'theta'])
ℹ️ If a test function simply calls a helper, you can avoid the Arrange/Act/Assert comments since they should be present in the helper.