Updating a row in db2 Free online sex chat room for girls without plugins
As data gets stored from a single column on a page and a column is more likely to have repeating data values (more data redundancy) than a complete row, the page can be compressed to a greater extent.
For example, consider a column called City in a table, now with column store index the page contains data of City column only and the chance to contain many occurrences of a city called “New York” (more data redundancy) is greater than the row store index page, which contains data of all the columns (Name, Address, City, Country, Zip etc.).
The SQL Server team realized this fact and came up with a new column store index type in SQL Server 2012 which greatly reduces the disk I/O needed to serve the request of the data warehousing queries by storing data in columnar (also called column store) fashion instead of traditional row store (which B-Tree and heap uses).
Now wait, you must be wondering how come a different physical layout representation of the same data (storing same data in columnar format instead of traditional row-wise format) can reduce disk I/O significantly and can improve the performance of the data warehousing queries up to 100 times or even more.
Well if you have this question then I am completely with you on this but before I answer this simple but worthwhile question let me first talk about row-store storage vs. The data in B-Tree and heap gets stored in row-wise (also called row-store) fashion, which means data from all the columns of a row are stored together contiguously on the same page.
For example consider a table with ten columns (C1 to C10), so if you look at the image below the data of all the ten columns from each row gets stored together contiguously on the same page.
Storage of all the columns on a single page reduces repetitions and ultimately the compression ratio, unlike a page of column store index, which has higher redundancy and greater compression ratio.
As per a performance improvement study done by Microsoft on a 32-logical processor machine with 256GB of RAM on a table with 1 TB of data and 1.44 billion rows and stunning result published in this white paper; the queries gained a 16X speed-up in CPU time and a whopping 455X improvement in elapsed time.
Let's consider a query (SELECT C1, C2, C3 FROM T1) from the table in the above image; if data is stored in the row-store all the disk pages will be brought into memory but as these disk pages also contain data of other columns (C4-C10), the number of pages that need to be brought into memory is significantly higher.
Now coming back to the question we raised in last section; how come a different physical layout representation of the same data (storing same data in columnar format instead of traditional row-wise format) can reduce disk I/O significantly and can improve the performance of the query up to 100 times or even more.
Well there are basically two main reasons for this performance improvement with respect to data storage, apart from the new batch mode processing (Query Optimizer and Query Execution engine has been enhanced to use batch mode processing for column store index, which in turns uses a new iterator model for processing data a batch at a time instead of a row at a time, which is optimized for multicore CPUs and increased memory throughput of modern hardware architecture).
It means we are bringing the data into memory, which we actually don't require for this query.
Now let's take the same query when the data is stored in column-store index, in this case all disk pages will not be brought into memory as these pages contain data for individual columns only, pages which contain data for columns C1, C2 and C3 will be brought into the memory.