Search

The Join That Quietly Breaks Your Analysis

1 hour ago
3 min read

Why Understanding Cardinality Matters Before Joining Tables

One of the most common mistakes in data analysis happens quietly.

The query runs without errors. The dashboard loads. The numbers even look reasonable at first glance. Yet somewhere in the process, the analysis has already gone wrong.

This often happens when two tables are joined simply because they share a column with the same name. The fields look compatible, the join executes successfully, and everything appears to be working as expected.

But what many analysts forget to ask is a more fundamental question: how exactly do the records in these two tables relate to each other?

When that relationship is misunderstood, the results can quietly distort the analysis. The underlying concept behind this problem is something called cardinality.

When Correct Data Produces the Wrong Result

Consider a simple example:

Imagine you have an Orders table that records revenue at the order level. Each row represents a single order. You also have a second table called OrderItems, which lists the individual products included in each order.

At this point, both tables are perfectly correct. One records orders, while the other records the items within each order.

Now suppose we join the two tables using the OrderID field.

The revenue value is now repeated once for each product in the order. If someone were to sum the revenue column after this join, the result would become 200 instead of 100.

The data itself was correct. The join was technically valid. But the analysis is now wrong.

The Idea Behind the Problem: Cardinality

To avoid issues like this, analysts need to understand the relationship between the tables they are joining. This relationship is what we call cardinality — it simply describes how many records in one table correspond to records in another.

In a one-to-one relationship, each record in one table corresponds to exactly one record in the other. An example might be an Employee table and an Employee Profile table, where each employee has exactly one profile.

In a one-to-many relationship, a single record in one table corresponds to multiple records in another. A typical example would be Customers and Orders — one customer can place many orders.

Finally, there is the many-to-many relationship, where records in both tables can relate to multiple records in the other. Students can enroll in many courses, and each course contains many students.

The more complex the relationship, the more careful analysts need to be when joining tables.

Why This Happens More Often Than We Think

Many people associate joins primarily with SQL queries, but cardinality issues appear across many analytics tools.

In Tableau, relationships between tables can lead to duplicated values in visualisations if the underlying grain of the data is misunderstood. In Power BI, incorrect model relationships can cause measures to inflate unexpectedly. Even in Excel Power Pivot, joining tables without understanding their structure can produce totals that appear inconsistent.

In these situations, the tools are behaving exactly as designed. The issue lies not in the software, but in how the data relationships are understood.

The Hidden Factor: Data Grain

Another concept closely related to cardinality is data grain — the level of detail that each row in a table represents.

In our earlier example, the Orders table has a grain of one row per order, while the OrderItems table has a grain of one row per product within each order. When the tables are joined, the resulting dataset inherits the lowest level of detail, which is the product level. That is why the revenue value repeats for every product.

Understanding the grain of each table before joining them prevents many of the most common analytical mistakes.

A Simple Habit That Prevents Big Errors

Before joining tables, it helps to pause and work through four simple questions. The answers take only a moment — but they change everything that follows.

The Bigger Lesson

Many mistakes in data analysis are not caused by incorrect queries or faulty calculations. They occur because the relationships within the data were never fully examined.

Cardinality rarely appears in dashboards or reports, yet it quietly determines whether the numbers we present are trustworthy.

Just because two fields share the same name does not mean they should be joined.

Understanding the relationship behind the data is what turns a technically correct query into a reliable analysis.

The danger with JOIN errors is not that they produce errors. It’s that they produce results that look correct. In analytics, the biggest mistakes are rarely technical failures. They are quiet misunderstandings of how the data is connected.