EXCEPT and INTERSECT…

Often times when data in organizations has to be shared by multiple departments and by multiple systems, data can get out of sync for a number of reasons. I recently encountered a problem where data in two line of business applications was incorrect, one system had higher totals than another.

I was positive the ETL job hadn’t failed for months, I was sure of that, so I had to start digging to find the descrepancies. I was able to narrow down the problem by finding units that existed in both systems first, then finding the units that existed in one system but not in the other system. In order to find units that were in both systems I used the INTERSECT T_SQL set operator. And to find the ones that existed in one system and not the other, I used the EXCEPT T-SQL set operator.Both EXCEPT and INTERSECT return distinct rows by combining the results of two queries. The EXCEPT operator returns results that are in the first table specified (left input query) that are not in the second table specified (right input query). The INTERSECT operator returns resultst that exist in the first (left) and second (right) table specified. Also, for INTERSECT and EXCEPT to work the number and order of columns in each query must match, and the data types between the two queries must be compatible.

I’m going to use the AdventureWorks 2012 database, and a modified example from the one in books online. At the end of the post, I’ll provide a link to the books online section of EXCEPT and INTERSECT. Let’s have a look at INTERSECT. Basically I want to see all the ProductIDs that exist in the WorkOrder table AND ALSO the SalesOrderDetail table.

--#1
select ProductId 
  from Production.WorkOrder
INTERSECT
select ProductId
  from Sales.SalesOrderDetail

190 rows.
With the INTERSECT operator I can change the order of the tables in the above query, and get the same results.

--#2
select ProductId
  from Sales.SalesOrderDetail
INTERSECT
select ProductId 
  from Production.WorkOrder

190 rows.
INTERSECT combines the results that exist in both tables. Therefore changing the order of the first table (left input) and the second table (right input) doesn’t affect the results of the query. Now I want to see all the ProductIDs that exist in the WorkOrder table, that DO NOT exist in the SalesOrderDetailTable, using the EXCEPT statement.

--except statements
--#1 
select ProductId 
  from Production.WorkOrder
EXCEPT
select ProductId
  from Sales.SalesOrderDetail

48 rows
Watch what happens when I change the order of tables in the above query.

--#2
select ProductId
  from Sales.SalesOrderDetail
EXCEPT
select ProductId 
  from Production.WorkOrder

76 rows.
This effectively changes the query to let me see all the ProductIds that exist in the SalesOrderDetail table, that DO NOT exist in the Work Order table. Hence the different number of rows returned. Another difference between the two operators is when viewing the execution plans. The INTERSECT operation appears as a Left Semi Join, and the EXCEPT operator as an Anti-Left Semi Join.
Here is the INTERSECT execution plan:

query_in_plan_5

and the EXCEPT execution plan:

query_ex_plan_6

I can’t say as I’ve done a lot of performance tuning of queries with these operators inside the queries I was tuning. I find both of these operators extremely useful for debugging and investigation, but I thought it was kind of neat to point out that they are different when viewing their respective execution plans.

As with everything in T-SQL, there is more than one way to skin a cat. I actually wrote both of the above examples as correlated sub queries, and got the exact same record counts. Here is an example of the INTERSECT query above, using and EXISTS clause with a correlated sub query:

--alternate INTERSECT
SELECT DISTINCT(ProductID) 
FROM Production.WorkOrder
WHERE EXISTS (select ProductId
		       from Sales.SalesOrderDetail
	               Where Production.WorkOrder.ProductId = Sales.SalesOrderDetail.ProductID)
order by ProductId;

Then here is the equivalent EXCEPT query, using NOT EXISTS.

--alternate EXCEPT
SELECT DISTINCT(ProductID) 
FROM Production.WorkOrder
WHERE NOT EXISTS (select ProductId
			   from Sales.SalesOrderDetail
			   Where Production.WorkOrder.ProductId = Sales.SalesOrderDetail.ProductID)
order by ProductId;

The beauty of doing it this way is in the main query, I could bring back more columns if I wanted to, and the columns I bring back do not need to match inside of the sub query. I have to say I see more of EXISTS and NOT EXISTS vs. EXCEPT and INTERSECT in production code. But these operators do work pretty well, especially for finding differences between two results sets rather quickly. Here is the link to the EXCEPT and INTERSECT page for books online for more info. EXCEPT and INTERSECT


One Comment on “EXCEPT and INTERSECT…”

  1. […] EXCEPT and INTERSECT… | Matt Batalon […]


Leave a comment