SELECT * Vs. SELECT COLUMNS – SQL Server Optimization Techniques

“Don’t use * in SELECT query, instead use only required column”.

This is one of the tips to optimize SELECT query. However, does this really give better performance in SQL Server query processing?

Selecting only relevant columns doesn’t improve the performance of query processing; however there is an overall performance improvement to the system. For example, when you select only relevant column(s) from a table, the amount of data transferred from database server to frontend server will be less and data get transferred quickly.

In some cases, there will be improvement in query processing too. For example, you are extracting records from a column which is the key column of non-clustered index; in such case, records will be retrieved from non-clustered index page, thus less number of physical database page reads issued. The less physical page read is always good sign for better performance.

Is there any difference in physical page reads when we use SELECT * and SELECT COLUMNS?

“SQL Server by default copy the complete table to buffer pool irrespective of columns selection” – So no matter whether you use * or few columns.

I will elaborate this with an example. Let’s create a table and insert some records.

create table tblBuffer
(
r1 int identity(1,1),
c1 varchar(900) default replicate('a',900),
c2 varchar(900) default replicate('b',900),
c3 varchar(900) default replicate('c',900),
c4 varchar(900) default replicate('d',900),
c5 varchar(900) default replicate('e',900),
c6 varchar(900) default replicate('f',900),
c7 varchar(900) default replicate('g',900),
c8 varchar(900) default replicate('h',900)
) declare @count int
set @count = 1
while @count <= 1000
begin
    insert into tblBuffer default values
    set @count = @count +1
end

Once the table is ready, verify space used by the table. Executing “sp_spaceused tblBuffer” I got below result. Data size is 8000 KB approximately 1000 pages (1 page = 8KB).

Name

Rows

Reserved

Data

Index_size

Unused

tblBuffer

1000

8008 KB

8000 KB

8 KB

0 KB

In order to capture stats of page read operation, I prefer to use the performance monitor (perfmon) with following counters.

SQL Server: Buffer Manager
Database pages - Number of pages in the buffer pool with database content.
Free Pages - Total number of pages on all free lists.
Total Pages - Number of pages in the buffer pool (includes database, free, and stolen).

By now, I have everything setup and ready to do some experiment.

To verify the page read operation, we need to run the SELECT statements, but before that it is good to clean buffer. DBCC DROPCLEANBUFFERS command can be used to clean buffers from the pool.

Let’s execute DBCC DROPCLEANBUFFERS and capture the performance counters value.

clip_image002(This screen scrap taken from perfmon after execution of DBCC DROPCLEANBUFFERS command)

Now, execute the query “select c1 from tblBuffer” and capture the performance counters values.

clip_image004(This screen scrap taken from perfmon after execution of select c1 from tblBuffer)

I have taken the performance counter values into below table for clarity.

 

After

Buffer Cleaned

After query exaction :

select c1 from tblBuffer

Difference

Database Pages

700

1719

1019

Free Pages

2866

1839

1024

After execution of query “select c1 from tblBuffer”, the free page has reduced to 1839 from 2866 similarly number of database pages got increased from 700 to 1719. And the difference shown in the above is closely matching with the number of pages of the table.

It clearly shows that, though I select single column, the complete table has been brought into buffer pool.

Next, I will show you an example for table with non-clustered index briefly.

Let’s create a non clustered index on column “c2”

create nonclustered index idx_tblBuffer_c2 on tblBuffer(c2)

The number of pages copied to buffer pool will be approximately 110 when you run this query “select c2 from tblBuffer”... It is because, this time data are copied from non-clustered index “idx_tblBuffer_c2”.

You can verify it in perfmon as we did earlier or use the “set statistics io on” settings to verify the same.

Summary

1. SQL Sever by default copies all the columns to buffer pool, irrespective of column used in the SELECT query. 2. Though you don’t find much performance impact in the SQL Server query processing, it is good to use only the columns you needed as this will increase the overall system performance.

JOIN – Inner Join and Outer Join

JOIN is one of the fundamental units of relational database system. Generally JOIN is used to combine related data from many tables to produce result.

There are different types of JOINs available,

1. Inner Join
2. Outer Join
3. Cross Join
4. Full Join

In this article, I have used few examples to demonstrate inner and outer joins. before dive into the examples, lets see the definition of inner and outer joins.

Inner Join, The default join in SQL Server is inner join. Inner join returns all matching records from the given datasets. The match is identified using predicates.

Outer join can be classified into Left outer join and Right outer join.

Left outer join returns all records from left table, at same time, it brings all matching rows from right table for the given predicate and NULL appears in right side when no matching row exists.

Left outer join can also be defined as “result set of inner join + missed records from left table with NULL in the right hand side”

Right outer join returns all records from right table, at the same time, it brings all matching rows from left table for the given predicate, and NULL appears in left side when no matching row exists.

Right outer join can also defined as “result set of inner join + missed records from right table with NULL in the left hand side”

Below example used to illustrate the “join”

Let’s assume we have two datasets, one contains the student’s details of class “A” and other contains exam results details. This has been illustrated using Venn diagram, however Venn diagram shouldn’t be used to represent table since a table can have many group of related data. Just to give a clarity and visual representation, I have used Venn diagram.

There are six students in class “A” which represented in the blue circle. But only two students from class “A” passed in exam. The green circle contains students who passed the exam, in this case, two students from Class “A” and one student from other class.

clip_image002[4]

From here, let’s go by some questions.

Q. Find out the students from class “A” who passed in exam?

In order to answer this question, the best suitable join is inner join. Because, we need to extract only the matching records from both datasets. As shown in the figure, each row in the CLASS table is matched with all rows of the EXAM table using student_id predicate. When the match is found the record will be copied down.

inner join

The corresponding transact sql code is :

“Select * from
Class a join Exam b
On a.student_id = b.student_id”

Q. Find out the students from class “A” who failed in exam?

This can be answered using the “Left Outer Join”. Though there are different ways to get the result, I will be using left outer join since this article about join.

When joining CLASS and EXAM table using left outer join, we get all the records from CLASS table and matching records from EXAM table and NULL for the unmatched rows.

In order to answer the question, We need to extract all records where student_id is null in right side table. Referring to the below picture, you will get better understating of exactly how left outer join works.

left outer join

The corresponding transact sql code is :

“Select * from Class a left outer join Exam b
On a.student_id = b.student_id
where b.student_id is null”

Q. Find out the students who passed the exam but not studied in class “A” ?

This can be answered using the “Right Outer Join”.

When joining EXAM and CLASS table using right outer join, we get all the records from EXAM table and matching records from CLASS “A” table and NULL for the unmatched rows.

In order to answer the question, We need to extract all records where student_id is null in left side table

right outer join

The corresponding transact sql code is :

“Select * from
Class a right outer join Exam b
On a.student_id = b.student_id
where a.student_id is null”

I hope, you had a good time in reading and with this I am ending this article. The next article will be about Internals of Join in SQL Server.

Table variable and Table type in transaction

The myth is to roll back the work done inside the transaction, in case of any error during the transaction. However this is not the case with the SQL Server table variable, table type and variables. That is, the changes made on these types are not rolled back even when there is an error.

Let me explore this with an example.

In order to distinguish variable from table, I have created one temporary table and local variable. Inside the transaction, value 1 is assigned to the variable then 256. Assigning of value 256 will cause error as the variable data type is tinyint. The same applied for the temporary table. Due to the error, the transaction enter into the roll back section and the entire transaction rolled back successfully. However by running the select statement “select @num as num” value 1 is return whilst “select * from #tbl” return nothing as expected. This clearly shows that there is no effect for local variable even though the transaction rolled back.

use tempdb
GO
create table #tbl (c1 tinyint)
declare @num tinyint
begin tran
set @num = 1
set @num = 256
insert into #tbl (c1) values(1)
insert into #tbl (c1) values(256)
if @@ERROR =0
commit tran
else
rollback tran
select @num as num
select * from #tbl

The same is applicable for table variable and table type. Sample using table variable:

declare @tbl as table (c1 int)
create table #tbl (c1 int)
begin tran
insert into @tbl (c1) values (1)
insert into @tbl (c1) values (1/0)
insert into #tbl (c1) values(1)
insert into #tbl (c1) values(1/0)
if @@ERROR =0
commit tran
else
rollback tran
select * from @tbl
select * from #tbl

Sample using table type:

create type mytab as table(c1 tinyint)
GO
declare @mytab as mytab
begin tran
insert into @mytab(c1) values (1)
insert into @mytab(c1) values (256)
if @@ERROR = 0
commit tran
else
rollback tran
select * from @mytab

Database Modelling – Many to Many Relationship

Let’s start with an example to understand what is many to many relationship and how this can be represented in database.

 

If you have shopping experience with any online shopping websites like amazon.com, you could have noticed that the way they organized their products to display.

The online store may sell same type of products of different manufacturer and different type of products from same manufacturer. Example Mobile can be manufactured by Nokia, Apple and Samsung. Similarly Apple manufactures Mobile and Computer. I have represented the same scenario in the below diagram.

 

 
  clip_image001[6]
 

 

 

 

 

 

 

 

 

 


I hope, by now we know what is many to many relationships. Having this knowledge let’s design database table for this use case.

 

Department

 

department_id

department_name

1

Mobile Phone

2

Computer

     

 

Manufacturer

 

manufacturer _id

manufacturer _name

1

Apple

2

Dell

3

Nokia

 

The “Department” table can be used to store all the department of the online store and “Manufacturer” table can be used to store all the manufacturer information. So far everything looks good, but the store wants to display the menu in a way where user can browser product by manufacturers. In our example, computer by manufacturers.

 

The below given menu is what the online store need.

 

Browse By Department

 

 

Browse By Department

 

Mobile Phone ›

Apple

 

Mobile Phone

 

Computer

Samsung

 

Computer ›

Apple

 

 

 

 

Dell

 

 

 

 

 

Browse By Manufacturer

 

 

Browse By Manufacturer

 

Apple

Mobile Phone

 

Apple

 

Dell

Computer

 

Dell

Computer

Samsung

 

 

Samsung

 

 

 

 

 

 

 

To achieve this, we create a new table to hold the department and manufacturer relation, this table is called junction table.   

 

Department_Manufacturer

department_id

manufacturer_id

1

1

1

3

2

1

2

2

 

Using this table, we will be able to answer the below questions

  1. Which departments belong to manufacturer “Apple”
  2. List of manufacturers who belong to department “Mobile”