March 8, 2022 pydantic python ☕️ buy me a coffee
I recently encountered an issue where I was relying on Pydantic to know what model I wanted it to select based on the parameters alone. This was fine until I started using models with the same parameters. Hopefully this short guide will quickly cover how to deal with this.
Let’s quickly define the problem. We have a base-class Base
which A
and B
inherits all their attributes from and does not add anything themselves. We then create another class C
which houses either A
or B
in C.baz
.
class Base(pydantic.BaseModel):
foo: float
bar: float
class A(Base):
pass
class B(Base):
pass
class C(pydantic.BaseModel):
baz: Union[A,B]
If we were then to serialize from pydantic to json, and then deserialize back from json to pydantic, an instance of C
with baz
holding a B
object, the resultant C
class would actually be holding a A
object at baz
after all is said and done. This is because json doesn’t hold type information by design which forces Pydantic to pick either A
or B
based on order listed in the type definition.
c = C(baz=B(foo=0.1, bar=0.2))
print(c.json()) # {"baz": {"foo": 0.1, "bar": 0.2}}
# The problem
# \/
print(C.parse_raw(d.json())) # baz=A(foo=0.1, bar=0.2)
We need to define custom en- and de-coders to store type data along the actual data. Pydantic sadly does not handle this very elegantly so extra care needs to be take when using these methods within your codebase.
First, let’s define the encoder that will store the class name as under _type
.
custom_encoder = lambda obj: dict(_type=type(obj).__name__, **obj.dict())
We then add the json_encoders
configuration to the model.
class C(pydantic.BaseModel):
baz: Base
class Config:
json_encoders = {
Base: custom_encoder
}
c = C(baz=B(foo=0.1, bar=0.2))
print(c.json(models_as_dict=False)) # '{"baz": {"_type": "B", "foo": 0.1, "bar": 0.2}}'
The next stage is the decoder. For this we turn to json.JSONDecoder
and create a custom hook.
class CustomDecoder(json.JSONDecoder):
def __init__(self, *args, **kwargs):
json.JSONDecoder.__init__(self, object_hook=self.object_hook, *args, **kwargs)
def object_hook(self, obj):
_type = obj.get('_type')
if _type is None:
return obj
del obj['_type'] # Delete the `_type` key as it isn't used in the models
mapping = {f.__name__:f for f in [A, B]} # Create a look-up object to avoid an if-else chain
return mapping[_type].parse_obj(obj)
Then finally add it to the C
model using functools.partial
.
class C(pydantic.BaseModel):
baz: Base
class Config:
json_encoders = {
Base: custom_encoder
}
json_loads = functools.partial(json.loads, cls=CustomDecoder)
Finally we have a suitable method for serializing and de-serializing a custom object in a way that retains type data!
Below is a quick demonstration which shows how to use this method along with the models_as_dict=False
parameter.
c = C(baz=B(foo=0.1, bar=0.2))
c_json = c.json(models_as_dict=False) # models_as_dict=False is VERY important! Excluding it will invalidate all of this.
# It works!
# \/
print(C.parse_raw(c_json)) # baz=B(foo=0.1, bar=0.2)
Union
type. I don’t know why, but it just doesn’t. That is why I have set baz: Base
rather than baz: Union[A,B]
, where the latter is more readable in my opinion.