Brief description of the architecture
Data Layout and Access Patterns:
physically partition the database into vertical columns rather than in rows. In
the above employee example, the row oriented database approach all the data for
a separate entity will be stored together whereas in column approach all the names,
id’s etc. will be saved together. This approach of saving data enables less
processing as the queries need to only search for the required attribute rather
than reading all the attributes and then discarding the unwanted.
Along with the
features, there are also some interesting trade-offs. For example, if a query
wants to access only a single record, row based system will be faster, since it
will need to go through only one records. But the column approach will need to
search all the content for all the column to find all the data for that record.
But as soon as the number of records involved increase, column oriented model
is a better fit. For the same reason, column model is best for analytical systems
where a lot of data needs to be accessed to analyse it.
Other than the
vertical partitioning feature, there are many other architectural features. The
architecture is designed to maximize the performance on Analytical workloads.
A distinct id
is given to all the records in every column to distinguish them from each
Block Oriented and Vectorized processing
It is easy to
iterate through cache sized blocks of data.
It means that data
stores not only store data in the form of columns, they also process data
column by column.
Column specific compression
Storing data from
the same attribute together, data stores can achieve higher compression ratios.
Direct operation on compressed data
compressed data saves a lot of processing speed.
Efficient join implementations
joins can be designed to display data when it is stored in the form of columns.
Redundant representation of individual columns in different sort orders
It would take
less effort to sort data on the basis of columns.
Database cracking and adaptive indexing
indexing or rearranging is done easily whenever a query accesses the column.
The data is left usable for the next query.
Efficient loading architectures
1 Performance of C-Store
versus a commercial database system
Contrast with standard row based storage
Row oriented database
is write-optimised whereas column oriented database is read-optimised. Write is
not as common function to be performed as read in large database. Figure 1
demonstrates the physical differences in architecture of both the column based
and traditional row based database models.
While row oriented
databases may be faster for systems that involve lot of transitions i.e. OLTP,
since it stores all the data for an attribute together, it is easier to
retrieve that. Column databases provide better solutions for the analytical
systems i.e. OLAP.
Column databases support datasets to be stored column
by column rather than row by row in row based databases. Abadi, Madden and
Hachem did a research that proved the fact that column-stores were faster than
row-stores when reading large datasets optimised for analysis. This conclusion
was made on four main advantages that revealed themselves in experiments: late
materialisation, block iteration, compression and invisible joints.
The differentiating feature of this database model is
that the data is stored in columns in place of rows. Data is stored in long
columns with the corresponding serial numbers to their link in other columns.
So, let’s suppose if retrieval of data from only one or two columns, the query
needs to check only relevant columns. On the other hand, if same was needed to
be achieved with the row oriented database system, it would take a lot more
queries to be performed to gather the required data.
The column oriented approach is read optimised. Let’s
suppose an employee table has 10 columns namely: emp_id, name, rank, post,
salary, age, experience, qualification, marital_status, no_children. But we
frequently access only three columns: (e.g. emp_id, name, rank), so there is no
need to read the data in all the irrelevant columns. Instead, we can get the
required information by just performing queries on the 3 columns instead of all
the 10 columns. This saves a lot of processing especially when working with
Since data is stored and read from the memory in
blocks, a single block that holds data for the employee table has the
data for a single column in column oriented database systems and the data for a
row or entity in row oriented database systems. In the above example since we
assumed that the three columns i.e. name, id and rank need to be accessed
repeatedly, a column oriented database system will only access the three block
that contain the mentioned columns.
In the recent years there has been continuous interest
in the column oriented database is also called as column Stores. Initial
efforts were taken by the educational systems such as Monnet dB, vector wise,
etc. This trend was followed by the commercial industry. In fact, by the end of
2013 all major set ups (IBM, Microsoft and oracle to name a few) had shifted to
column oriented databases following this trend. Some other products like Aster data, Greenplum, info bright, Paracel et
cetera have benefited from the column approach of storing data. Oracle, Exadata
have also Incorporated column approach in some way in their products. These products have been beneficial over the
traditional row based approach in Data Analytics and compression.
Despite success of
column based systems in commercial and educational industries, there is a huge
scope for future research in many interesting directions. There may be work on
hybrid systems that are partially column oriented. Systems that choose between row and column
oriented databases adapting to the requirements will become really important.
These systems will guide the users to
decide which approach will be best suited for their requirements. Microsoft announced columnar storage with
advanced query processing for its SQL server product. It doesn’t yet have a
full functionality but it is a step in this direction. It is expected that
these ideas will be implemented on all the other platforms in the near future.