A SQL join is a logical combination of two SQL tables. There are a number of ways to join tables together, most of which involve finding common values between the two. SQL joins are what make normalization possible, as they allow queries to return results that stretch across multiple tables.
How does SQL Join Work?
When a SQL query contains a join, the RDBMS will search for data in all tables included in the join. It will then combine the query results into a unified format, which makes them appear as if they have come from a single multi-column table.
To understand how this works, we must consider how a relational database works to reduce redundancy as much as possible. For example, a company might hold employee records in a spreadsheet that looks like this:
This method of storage isn't very efficient. If there is a change to any of the departments, the database owner will need to update every record. On a large scale, this could slow processes down and run the risk of data loss.
In a relational database, the administrator would use normalization to break this down into smaller tables with non-repeating values, like so:
These are two separate tables that contain non-repeating values. The logical relationships between data still exist via the primary key relationship, which is the new DEPT_ID field in this example. The database owner can manage them independently, as long they preserve that relationship.
However, sometimes a user needs the full employee record, including the full details of the employee's department. We achieve this by using a SQL join.
What are the Types of SQL Join?
SQL developers can add joins to their queries to create extremely complex views. In this example, we'll focus on the very basic test data above to illustrate how the different types of join work.
Joins have a left table and a right table, which simply refers to the side of the join command on which they appear. So, if a developer writes "SELECT * FROM tableA JOIN tableB", then tableA is the left table, and tableB is the right table.
There are five main types of join defined in ANSI-standard SQL: the inner join, which is the most common; the left outside join; the right outside join; the full outside join; and the cross join, also known as the Cartesian join.
Inner Join
A JOIN query on standard SQL will produce what's known as an inner join. This offers results only where there is a match on both tables in the join query. For example, if we use this query on the example data:
SELECT * FROM employee INNER JOIN department ON employee.DeptID == department.DeptID;
The query results will look like the original spreadsheet in the example:
There are some records in the employee table that do not have a relationship with the department table and vice versa. These records don't appear in the query results.
An inner join is the most common type of join. It's the default type of join, so if you were to write the following query:
SELECT * FROM employee INNER JOIN department ON employee.DeptID == department.DeptID;
You would get the same results.
Left Outer Join
A left outer join will combine rows from tables in the same way as above. The difference here is that it will also include all data from the left table, even if there are no corresponding relationships with the right table.
The syntax for the query is very similar:
SELECT * FROM employee LEFT JOIN department ON employee.DeptID == department.DeptID;
However, there are now some additional rows returned which do not contain any departmental data:
Left joins are useful when the focus of the query is on the left table, and data on the right table may or may not exist.
Right Outer Join
Right outer joins work in the same way as left joins, except that they include all results from the right table. The convention is often to place the primary table on the left, but developers are free to use right joins where convenient.
As above, the syntax just requires a specification of the type of join:
SELECT * FROM employee RIGHT JOIN department ON employee.DeptID == department.DeptID;
The results include everything from the right table, even if there are no corresponding entries on the left.
Note that, although the focus of this right join is on the department table, it returns multiple results for each department. An outer join query returns all logical rows resulting from the query, as well as any remaining rows that don't have a relationship with the other table.
Full Outer Join
The full outer join returns all rows from both tables. If there is a relationship between rows, the join will combine them. If not, it will return the result with NULL values where required.
The query syntax looks like this:
SELECT * FROM employee FULL JOIN department ON employee.DeptID == department.DeptID;
If we execute this query on our sample data, the results look like this:
All data values from both tables appear in these results.
Cross Join
A cross join, or Cartesian join, combines every row on the left table with every row on the right table.
Consider the following sample data:
We can create a Cartesian join explicitly with this query:
SELECT * FROM table_A CROSS JOIN table_B;
Or implicitly like this:
SELECT * FROM table_A, table_B;
The results of this Cartesian join would include every possible combination of rows, and looks like this:
In practice, this is equivalent to an inner join without a WHERE condition.
SQL also allows for other types of join, such as self-joins and natural joins. You achieve these with a combination of JOIN and UNION statements.