Sql server, .net and c# video tutorial: What to choose for performance - SubQueries or Joins

Suggested Videos
Part 59 - Subqueries
Part 60 - Correlated subquery
Part 61 - Creating a large table with random data for performance testing

According to MSDN, in sql server, in most cases, there is usually no performance difference between queries that uses sub-queries and equivalent queries using joins. For example, on my machine I have
400,000 records in tblProducts table
600,000 records in tblProductSales tables

The following query, returns, the list of products that we have sold atleast once. This query is formed using sub-queries. When I execute this query I get 306,199 rows in 6 seconds
Select Id, Name, Description
from tblProducts
where ID IN
(
Select ProductId from tblProductSales
)

At this stage please clean the query and execution plan cache using the following T-SQL command.
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS; -- Clears query cache
Go
DBCC FREEPROCCACHE; -- Clears execution plan cache
GO

Now, run the query that is formed using joins. Notice that I get the exact same 306,199 rows in 6 seconds.

Select distinct tblProducts.Id, Name, Description
from tblProducts
inner join tblProductSales
on tblProducts.Id = tblProductSales.ProductId

Please Note: I have used automated sql script to insert huge amounts of this random data. Please watch Part 61 of SQL Server tutorial, in which we have discussed about this automated script.

According to MSDN, in some cases where existence must be checked, a join produces better performance. Otherwise, the nested query must be processed for each result of the outer query. In such cases, a join approach would yield better results.

The following query returns the products that we have not sold at least once. This query is formed using sub-queries. When I execute this query I get 93,801 rows in 3 seconds

Select Id, Name, [Description]
from tblProducts
where Not Exists(Select * from tblProductSales where ProductId = tblProducts.Id)

When I execute the below equivalent query, that uses joins, I get the exact same 93,801 rows in 3 seconds.

Select tblProducts.Id, Name, [Description]
from tblProducts
left join tblProductSales
on tblProducts.Id = tblProductSales.ProductId
where tblProductSales.ProductId IS NULL

In general joins work faster than sub-queries, but in reality it all depends on the execution plan that is generated by SQL Server. It does not matter how we have written the query, SQL Server will always transform it on an execution plan. If sql server generates the same plan from both queries, we will get the same result.

I would say, rather than going by theory, turn on client statistics and execution plan to see the performance of each option, and then make a decision.

In a later video session we will discuss about client statistics and execution plans in detail.

3 comments:

AnonymousJune 26, 2013 at 2:56 AM
Hi Venkat ,

Please upload a video session about reading client statistics and execution plans .
Manish Kumar BlogApril 9, 2017 at 11:18 PM
Hello Venkat Sir,

Hope you are fine!.

I am positively looking for client statistics and execution plans tutorial.

Yours
One of the fan
Manish
UnknownMay 15, 2017 at 8:40 PM
Hi
thank you for the presented material
I would like to ask why I get different results from you for instance
the query only retrieve 40492 while you get 93,801
please notice I used 400000 and 600000 records just like you
thank you in advance

It would be great if you can help share these free resources

What to choose for performance - SubQueries or Joins - Part 62

3 comments: