Have you ever experienced issues when using JPA (Jakarta Persistence, formerly known as Java Persistence API) in combination with the
hashCode methods generated by Lombok? I certainly have. For example, we might observe unexpected behavior when attempting to save an item to a collection, or encounter unforeseen issues when trying to retrieve the element we have just saved. Suddenly we don’t see the element we just saved anymore, or we can’t find the element we’re looking for in the collection, although it clearly should be there.
These issues can arise when we use Lombok’s
@Data annotation, which is often used in conjunction with entities, in combination with JPA and some of its features. Or, rather with Hibernate, which is the default implementation of JPA. It certainly feels like we shouldn’t have to deal with these annoying bugs when utilizing well-crafted and widely adopted frameworks and libraries.
But the problem lies a little deeper. Let me explain.
Let’s simplify the problem
First of all, we can recreate the problem entirely without using JPA. This clarifies that the problem does not come from JPA itself, as we can see in a moment.
So what is the problem? Let’s say we have an entity, in this case a
Plant, and want to add it to a
Set so we don’t save the exact same plant twice. We do that in the following code example:
Note that we use an auto-generated ID. This is a common practice found in many JPA tutorials, usually with the
@GeneratedValue annotation, but here we use a simplified form. The
id gets assigned on saving the entity, just like when using Hibernate and auto-generated ids. We use Lombok’s
@Data annotation to generate getters as well as
hashcode for us.
The baffling output is:
Why is that?
Lombok’s equals and hashCode
If we try the same without Lombok’s
@Data annotation, we can observe in the output that the
plant is indeed identified as the same plant. This recognition is now facilitated by the standard
equals implementation in Java, which checks whether the same object in memory is being referred to. But this would not really help us in the case where we are using JPA, because there is more magic happening then. Saving to and retrieving elements from the database makes it impossible to guarantee that we are always dealing with the same object in memory. So let’s go back to the simple example, and have a look at the Lombok generated equals and hashCode methods.
The first impulse would be to look at the equals method and find out what is wrong here. But actually the method that is used first when
set.contains(element) is called would be
hashCode, since it calculates the hash for the hash bucket where the element will be stored. So let’s leave aside
equals for a moment. If we take a closer look at the generated
hashCode method, we see that a constant number is returned if
id is null, and a completely different number is returned if the
id is set.
In our example:
house.add(plant) calculates a hash code the first time and stores the plant in that bucket. In the lookup that same hash will be used, and the plant is found. After calling
plant.save(), in which the
id is assigned, the plant cannot be found in the set anymore. This is because for the second lookup
hashCode is called again, but now the plant has an
id and therefore a different hash. That hash bucket is empty and
house.contains(plant) returns false.
equals method generated by Lombok also uses the id to differentiate between elements, so we would have a problem there as well. But in this case we don’t even get there.
What is identity
Now we see: The problem is with auto-generated ids on save, in combination with Lombok’s generated
hashCode method. But perhaps it is more accurate to say that we have a problem with identity here. When can we assume that a plant is identical to another plant? Is the plant already the same plant if it has all the same attributes?
To extend the above example, let’s say we also have the attributes
height. Can the identity of a plant be derived from all of its attributes? Does the plant itself change as it grows? Or is it still the same plant? We cannot let the framework or library make this decision for us, we have to take the responsibility of what identity means into our own hands.
For these reasons, we usually introduce an id and keep track of everything plant related under that same id. Actually, we can make use of an id and solve the above problems. How that can be achieved, we will show in the next part.
In addressing issues with JPA and specific Lombok annotations, introducing an id becomes a reasonable solution. A quick recap: We observed problems with
hashCode, elements in sets and auto-generated ids in conjunction with each other. We concluded that we need to base the implementation of
equals on an identifier that does not change in the process.
Let’s dive right in.
In our example above we used 🪴🪴 to illustrate the problems. Let’s continue with that and make these plants unique. We’ll just introduce a unique identifier and everything should be fine, right?
Well no, we still have a problem identifying the same plant in the set. The output here is:
Actually, it’s logical. How should Lombok or the set know what constitutes a unique plant if we don’t tell them? To address this, we need to make uniqueness explicit in our code. This involves implementing our own
hashCode methods that rely on our unique identifier, the UUID. Or alternatively, we could let Lombok generate the methods, by telling it to only use the id and no other attributes via the annotations
@EqualsAndHashCode(onlyExplicitlyIncluded = true) on the class and
@EqualsAndHashCode.Include on our id.
hashCode, please make yourself familiar with the official Java equals and hashCode contracts or follow the best practices laid out by Joshua Bloch in “Effective Java”. The following code should just serve as an example. We now add these methods where
hashCode is derived from the UUID and
equals also relies on it:
And now, when we run the application again, we see the following output:
Voilà! It now works the way we want it to!
Do UUIDs work well as primary keys?
Implementing ids and
hashCode in this way raises the question of whether UUIDs are a good choice for primary keys. This blog post cannot cover the discussion in all its detail, but a few things should be addressed briefly. While there are downsides to using UUIDs, such as increased memory usage and slower indexing for new entries on the database side, this approach also has its advantages.
One advantage of using UUIDs, aside from the easier implementation of
equals, would be: Distributed systems can assign ids without having to be careful about what ids other systems have assigned or are about to assign. Additionally, the usage of UUIDs instead of serial ids mitigates the risk of unintentionally leaking business knowledge to the outside world. If your ids are publicly visible, a business competitor could e.g. sign up two times on your app and find out how many other people have signed up in the meantime. Surely this can be circumvented by not making the ids public, but it still needs to be considered. When you use UUIDs as business keys on the other hand, this issue does not arise. Unfortunately, another one does: Typing and reading UUIDs is inconvenient for humans.
When using UUIDs as primary keys, you should keep in mind to always use UUID data types in the database, instead of normal string types. UUID data types are optimized for this use case. Also be aware of the different UUID versions. It is crucial to know how they are generated and what kind of information they contain, or if they are truly random, so you can pick the version you prefer.
In our work, we have not encountered any performance issues that were related to using UUIDs as primary keys. The concise implementation of
hashCode and the advantages for distributed systems make using UUIDs as primary keys a viable option.
Many thanks to my colleagues Michael Vitz, Michael Schürig, Lara Pourabdolrahim and Piet Schijven, who helped with an earlier draft of this post. Photo by Kazden Cattapan on Unsplash.