Have you ever experienced issues when using JPA (Jakarta Persistence, formerly known as Java Persistence API) in combination with the equals and hashCode methods generated by Lombok? I certainly have. For example, we might observe unexpected behavior when attempting to save an item to a collection, or encounter unforeseen issues when trying to retrieve the element we have just saved. Suddenly we don’t see the element we just saved anymore, or we can’t find the element we’re looking for in the collection, although it clearly should be there. These issues can arise when we use Lombok’s @Data annotation, which is often used in conjunction with entities, in combination with JPA and some of its features. Or, rather with Hibernate, which is the default implementation of JPA. It certainly feels like we shouldn’t have to deal with these annoying bugs when utilizing well-crafted and widely adopted frameworks and libraries.

But the problem lies a little deeper. Let me explain.

Let’s simplify the problem

First of all, we can recreate the problem entirely without using JPA. This clarifies that the problem does not come from JPA itself, as we can see in a moment.

So what is the problem? Let’s say we have an entity, in this case a Plant, and want to add it to a Set so we don’t save the exact same plant twice. We do that in the following code example:

package org.example;

import lombok.Data;
import java.util.*;

public class HousePlant {

    static int generatedIds = 1;

    public static void main(String[] args) {
        var house = new HashSet<Plant>();

        var plant = new Plant();
        house.add(plant);

        System.out.println("🪴 in set before save: " + house.contains(plant));

        save(plant);
        System.out.println("🪴 in set after save: " + house.contains(plant));
    }

    static void save(Plant e) {
            e.setId(generatedIds++);
    }

    @Data
    static final class Plant {

        private Integer id;
    }
}

Note that we use an auto-generated ID. This is a common practice found in many JPA tutorials, usually with the @GeneratedValue annotation, but here we use a simplified form. The id gets assigned on saving the entity, just like when using Hibernate and auto-generated ids. We use Lombok’s @Data annotation to generate getters as well as equals and hashcode for us.

The baffling output is:

🪴 in set before save: true
🪴 in set after save: false

Why is that?

Lombok’s equals and hashCode

If we try the same without Lombok’s @Data annotation, we can observe in the output that the plant is indeed identified as the same plant. This recognition is now facilitated by the standard equals implementation in Java, which checks whether the same object in memory is being referred to. But this would not really help us in the case where we are using JPA, because there is more magic happening then. Saving to and retrieving elements from the database makes it impossible to guarantee that we are always dealing with the same object in memory. So let’s go back to the simple example, and have a look at the Lombok generated equals and hashCode methods.

public boolean equals(Object o) {
    if (o == this) {
        return true;
    } else if (!(o instanceof Plant)) {
        return false;
    } else {
        Plant other = (Plant)o;
        Object this$id = this.getId();
        Object other$id = other.getId();
        if (this$id == null) {
            if (other$id != null) {
                return false;
            }
        } else if (!this$id.equals(other$id)) {
            return false;
        }
        return true;
    }
}

public int hashCode() {
    int PRIME = true;
    int result = 1;
    Object $id = this.getId();
    result = result * 59 + ($id == null ? 43 : $id.hashCode());
    return result;
}

The first impulse would be to look at the equals method and find out what is wrong here. But actually the method that is used first when set.contains(element) is called would be hashCode, since it calculates the hash for the hash bucket where the element will be stored. So let’s leave aside equals for a moment. If we take a closer look at the generated hashCode method, we see that a constant number is returned if id is null, and a completely different number is returned if the id is set.

In our example: house.add(plant) calculates a hash code the first time and stores the plant in that bucket. In the lookup that same hash will be used, and the plant is found. After calling plant.save(), in which the id is assigned, the plant cannot be found in the set anymore. This is because for the second lookup hashCode is called again, but now the plant has an id and therefore a different hash. That hash bucket is empty and house.contains(plant) returns false.

The equals method generated by Lombok also uses the id to differentiate between elements, so we would have a problem there as well. But in this case we don’t even get there.

What is identity

Now we see: The problem is with auto-generated ids on save, in combination with Lombok’s generated hashCode method. But perhaps it is more accurate to say that we have a problem with identity here. When can we assume that a plant is identical to another plant? Is the plant already the same plant if it has all the same attributes?

To extend the above example, let’s say we also have the attributes plantType and height. Can the identity of a plant be derived from all of its attributes? Does the plant itself change as it grows? Or is it still the same plant? We cannot let the framework or library make this decision for us, we have to take the responsibility of what identity means into our own hands.

For these reasons, we usually introduce an id and keep track of everything plant related under that same id. Actually, we can make use of an id and solve the above problems. How that can be achieved, we will show in the next part.

Unique plants

In addressing issues with JPA and specific Lombok annotations, introducing an id becomes a reasonable solution. A quick recap: We observed problems with equals, hashCode, elements in sets and auto-generated ids in conjunction with each other. We concluded that we need to base the implementation of hashCode and equals on an identifier that does not change in the process.

Let’s dive right in.

In our example above we used 🪴🪴 to illustrate the problems. Let’s continue with that and make these plants unique. We’ll just introduce a unique identifier and everything should be fine, right?

public class UniqueHousePlant {

    public static void main(String[] args) {
        var house = new HashSet<UniquePlant>();

        var plant = new UniquePlant();
        plant.setHeight(50);
        plant.setPlantType("Dracaena");
        house.add(plant);

        System.out.println("🪴 in set: " + house.contains(plant));
        System.out.println("number of 🪴🪴 in set: " + house.size());

        plant.setHeight(60);
        System.out.println("🪴 in set after growth: " + house.contains(plant));

        house.add(plant);
        System.out.println("number of 🪴🪴 in set after adding another time: " + house.size());
    }

    @Data
    static final class UniquePlant {

        private UUID id;
        private Integer height;
        private String plantType;

        public UniquePlant() {
            this(UUID.randomUUID(), null, null)
        }

        private UniquePlant(UUID id, Integer height, String plantType) {
            this.id = id;
            this.height = height;
            this.plantType = plantType;
        }

        private void setId(UUID id) {
            this.id = id;
        }

        public void setHeight(Integer height) {
            this.height = height;
        }

        public void setPlantType(String plantType) {
            this.plantType = plantType;
        }
    }
}

Well no, we still have a problem identifying the same plant in the set. The output here is:

🪴 in set: true
number of 🪴🪴 in set: 1
🪴 in set after growth: false
number of 🪴🪴 in set after adding another time: 2

Identity revisited

Actually, it’s logical. How should Lombok or the set know what constitutes a unique plant if we don’t tell them? To address this, we need to make uniqueness explicit in our code. This involves implementing our own equals and hashCode methods that rely on our unique identifier, the UUID. Or alternatively, we could let Lombok generate the methods, by telling it to only use the id and no other attributes via the annotations @EqualsAndHashCode(onlyExplicitlyIncluded = true) on the class and @EqualsAndHashCode.Include on our id.

Before implementing equals and hashCode, please make yourself familiar with the official Java equals and hashCode contracts or follow the best practices laid out by Joshua Bloch in “Effective Java”. The following code should just serve as an example. We now add these methods where hashCode is derived from the UUID and equals also relies on it:

@Override
public boolean equals(Object o) {
    if (this == o) return true;
    if (o == null || getClass() != o.getClass()) return false;
    UniquePlant that = (UniquePlant) o;
    return Objects.equals(getId(), that.getId());
}

@Override
public int hashCode() {
    return Objects.hashCode(getId());
}

And now, when we run the application again, we see the following output:

🪴 in set: true
number of 🪴🪴 in set: 1
🪴 in set after growth: true
number of 🪴🪴 in set after adding another time: 1

Voilà! It now works the way we want it to!

Do UUIDs work well as primary keys?

Implementing ids and equals and hashCode in this way raises the question of whether UUIDs are a good choice for primary keys. This blog post cannot cover the discussion in all its detail, but a few things should be addressed briefly. While there are downsides to using UUIDs, such as increased memory usage and slower indexing for new entries on the database side, this approach also has its advantages.

One advantage of using UUIDs, aside from the easier implementation of hashCode and equals, would be: Distributed systems can assign ids without having to be careful about what ids other systems have assigned or are about to assign. Additionally, the usage of UUIDs instead of serial ids mitigates the risk of unintentionally leaking business knowledge to the outside world. If your ids are publicly visible, a business competitor could e.g. sign up two times on your app and find out how many other people have signed up in the meantime. Surely this can be circumvented by not making the ids public, but it still needs to be considered. When you use UUIDs as business keys on the other hand, this issue does not arise. Unfortunately, another one does: Typing and reading UUIDs is inconvenient for humans.

When using UUIDs as primary keys, you should keep in mind to always use UUID data types in the database, instead of normal string types. UUID data types are optimized for this use case. Also be aware of the different UUID versions. It is crucial to know how they are generated and what kind of information they contain, or if they are truly random, so you can pick the version you prefer.

In our work, we have not encountered any performance issues that were related to using UUIDs as primary keys. The concise implementation of equals and hashCode and the advantages for distributed systems make using UUIDs as primary keys a viable option.

Many thanks to my colleagues Michael Vitz, Michael Schürig, Lara Pourabdolrahim and Piet Schijven, who helped with an earlier draft of this post. Photo by Kazden Cattapan on Unsplash.