Mutable defaults in Python dataclass
Me learning about how NOT to use mutable defaults in dataclass
.
I was working through the book Pythonic Application Architecture Patterns and ran into an issue that took me a while to figure out. Turned out it had to do with mutable defaults in dataclass
.
PyCharm always warns me about mutable defaults when I put things like
def my_function(x=[]):
...
but I didn’t know why. It got me this time.
My issue
In the code snippet used in chapter 1 of the book, there is a Batch
object:
class Batch:
def __init__(
self, ref: str, sku: str, qty: int, eta: Optional[date]
):
self.ref = ref
self.sku = sku
self.qty = qty
self.eta = eta
self._allocations = set() # type: Set[OrderLine]
I decided to be cute and used dataclass
instead to define Batch
:
@dataclass
class Batch:
ref: str
sku: str
qty: int
eta: date = None
_allocations: set()
But when I run the tests,
def test_prefers_earlier_batches():
earliest = Batch("speedy-batch", "MINIMALIST-SPOON", 100, eta=today)
latest = Batch("slow-batch", "MINIMALIST-SPOON", 100, eta=later)
line = OrderLine("order1", "MINIMALIST-SPOON", 10)
allocate(line, [medium, latest])
assert earliest.available_quantity == 90
assert latest.available_quantity == 100
Both assertions gave me 90?! In other words, my code was modifying latest
when it was not supposed to.
When I run the PyCharm debugger, I found that the latest
instance got modified at the same time as the earliest
. I then change my code to exactly what the book says and it worked. That’s when I figure it could be a mutiple default issue.
What was the issue?
It turned out my Batch
through dataclass
was evaluated to
class Batch:
_allocations = set()
def __init__(
self, ref: str, sku: str, qty: int, eta: Optional[date]
):
self.ref = ref
self.sku = sku
self.qty = qty
self.eta = eta
according to the official doc, so all my instances are sharing the same _allocations
reference in memory!
- The doc: https://docs.python.org/3/library/dataclasses.html#mutable-default-values
How to properly declare the dataclass
Will Gaggioli from Penny University pointed out that I should use the field()
function:
- https://docs.python.org/3/library/dataclasses.html#dataclasses.field
@dataclass
class Batch:
ref: str
sku: str
qty: int
eta: date = None
_allocations: Set = field(default_factory=set)
Bottom line: Read the docs and be careful about using mutable defaults in Python!