In Python, you have a lot of choices when representing user defined data types. In this post we’ll look at three different choices and their tradeoffs.

Python Tuples

First, let’s try implement a todo item with the built in Python tuple type. The todo item will have a task, due date, and a boolean to mark it as complete. To make things easy I’ll write a simple function to compute the due date in the future, three days by default

import datetime

def compute_due_date(days=3):
  return datetime.datetime.now().date() + datetime.timedelta(days=days)

Now we can implement the todo item tuple.

todo_as_tuple = (
  "Finish coding",
  compute_due_date(),
  False
)

Add a print call to output the todo item tuple in the console and run the code. Here is the result on my machine:

('Finish coding', datetime.date(2024, 11, 18), False)

This is not very user friendly. For example, the date is using the default Python string representation. Let’s write a function to format the output.

def display_todo(todo):
  print(f"Task: {todo[0]}")
  print(f"Due date: {todo[1].strftime('%Y/%m/%d')}")
  print(f"Completed: {todo[2]}")
  
display_todo(todo_as_tuple)

Running the code again, we can see the formatted output.

Task: Finish coding
Due date: 2024/11/18
Completed: False

Now let’s write another function to mark the todo item as complete:

def mark_todo_as_complete(todo):
  todo[2] = True
  
mark_todo_as_complete(todo_as_tuple)
display_todo(todo_as_tuple)

This code raises an error!

Traceback (most recent call last):
  File "todo.py", line 26, in <module>
    mark_todo_as_complete(todo_as_tuple)
  File "todo.py", line 15, in mark_todo_as_complete
    todo[2] = True
    ~~~~^^^

As Python tells us in the traceback, the mark_todo_as_complete function cannot modify the value at index 2 in the todo item because tuples are immutable. Also, it can be problematic to refer to the values in the tuple by index. So a tuple is not the best choice for this use case.

Named Tuples

The collections module in the Python Standard Library offers the namedtuple function that will generate a named tuple. This is an improvement on the built-in Python tuple type. The named tuple allows you to provide field names for the value in the tuple as opposed to accessing them via a numeric index. And you provide default values as well. You’ll need to import the namedtuple function from the collections module:

from collections import namedtuple

Call the namedtuple function. The first parameter is a string which will be the name of the named tuple used by Python. The second parameter is a list of strings used as the field names.

Todo = namedtuple("Todo", ["task", "due_date", "complete"])

You can use the return value of namedtuple, just like it were a class initializer.

todo_as_namedtuple = Todo(
  task="Finish coding", 
  due_date=compute_due_date(),
  complete=False
)

Using the print function to output todo_as_namedtuple in the console yields a slightly more friendly and informative format.

Todo(task='Finish coding', due_date=datetime.date(2024, 11, 18), complete=False)

However, it’s not as nice as the display_todo function. The date is still using the default Python string representation So we can refactor that function and take advantage of referring to values by field name in the named tuple.

def display_todo(todo):
  print(f"Task: {todo.task}")
  print(f"Due date: {todo.due_date.strftime("%Y/%m/%d")}")
  print(f"Completed: {todo.complete}")
  
display_todo(todo_as_namedtuple)
Task: Finish coding
Due date: 2024/11/18
Completed: False

Another advantage of the named tuple is providing a list of default values in the defaults keyword arguments. The values will be applied to the fields in order.

Todo = namedtuple(
  "Todo", 
  ["task", "due_date", "complete"],
  defaults=["My new task", compute_due_date(), False],
)

This allows us to create a new todo item providing only a task name. The compute_due_function will be used as the default value for the due date and False will the the complete default value.

todo_as_namedtuple = Todo("My Next Task")
display_todo(todo_as_namedtuple)
Task: Finish coding
Due date: 2024/11/18
Completed: False

Now let’s refactor the mark_todo_as_complete function:

def mark_todo_as_complete(todo):
  todo.complete = True
    
mark_todo_as_complete(todo_as_namedtuple)
display_todo(todo_as_namedtuple)
Traceback (most recent call last):
  File "todo.py", line 29, in <module>
    mark_todo_as_complete(todo)
  File "todo.py", line 16, in mark_todo_as_complete
    todo.complete = True
    ^^^^^^^^^^^^^

Again, Python raises an exception when attempting to modify the value of the complete field as named tuple is still a tuple and thus is immutable. So close, but no cigar with the named tuple.

Dataclasses

In order to meet the need to update the complete attribute, we’ll have use a class. However, writing a full-blown class is going to involve a lot of boilerplate code such as implementation for the initializer and string representation methods. Using a dataclass, we can use a simple syntax to implement the type, and Python will provide will some default implementations.

To create a dataclass, you’ll need the dataclass decorator from the dataclasses module:

from dataclasses import dataclass

Now define the Todo class, but decorate with the @dataclass decorator.

@dataclass
class Todo:
  pass

Instead of implementing an initializer, you can add attributes to the class using Python type hints.

@dataclass
class Todo:
  task: str
  due_date: datetime.date
  complete: bool

You can also assign default values to the attributes

@dataclass
class Todo:
  task: str
  due_date: datetime.date = compute_due_date()
  complete: bool = False

Now the due_date and complete are optional. To create a new instance of the Todo class, call the initializer:

todo_as_dataclass = Todo("My Next Task")

print(todo_as_dataclass)

And you’ll see an acceptable default implementation of the __str__ method has been added. This means that we don’t need to refactor the display_todo function, we need to refactor the __str__ method instead:

@dataclass
class Todo:
  # ...
  def __str__(self):
    return f"""
Task: {self.task}
Due date: {self.due_date.strftime("%Y/%m/%d")}
Completed: {self.complete}
    """

And we don’t need the mark_todo_as_complete function either. We can add a method to the dataclass instead because it’s still a class.

@dataclass
class Todo:
  # ...
  def mark_as_complete(self):
    self.complete = True

Now the API is much more user friendly

todo = Todo("My new task")
print(todo)
todo.mark_as_complete()
print(todo)

Summary

In this post you saw three ways to represent data in Python: tuples, named tuples and dataclasses. For simple needs, a tuple is enough. But a named tuple lets you add field names to access data instead of numeric positions. The dataclass is even more flexible as it is a class but you get a lot for free such as a default initializer and string representations. But you can still override this functionality or add to it as you wish.

By Douglas Starnes

Entrepreneur, 5x Microsoft MVP, AI/BI nerd, crypto investor, content creator, trained composer, challenging the status quo, proud American

Leave a Reply

Your email address will not be published. Required fields are marked *