The fourth category of NoSQL database that we are going to discuss is wide column store
What is wide column store and how are values accessed?
Wide column store is sparse, distributed, multi-dimensional sorted map. Value in the map is indexed by (row key, column key, timestamp). A good example would be storing webpages and related information that could be used by other projects. The row key would be URLs, column name would be various features of webpage(eg storing content of webpage in ‘contents’ column).
Rows: Size of each row key could be up to 64KB. Every read or write of data under single row key is atomic regardless of number of different columns being read or written in the row. The data is maintained lexicographically by row key and row range for the table is dynamically partitioned. This is done so that reads would require communication with only small number of machines.
Column families: Column keys are grouped into sets called column families, which is the basic unit of access control. All data stored in column family is of same type and column family should be created before storing any data in that family.
Timestamps: Each cell can contain multiple versions of same data, these versions are accessed by timestamps. Different version of cell are stored in decreasing timestamp order, so that most recent version can be ready first.
How does wide column database differ from RDBMS?
RDBMS stores data in a table with rows that span all the columns. If one row needs an additional column, that column will be added to the entire table will null(or default) values for all the other rows. If we need to query one value that isn’t indexed, the table scan is done to locate the value.
In wide column database, reading or writing a row of data consists reading or writing individual columns. A column would be written only if there’s data element for it. Querying for a value is optimized as querying an index in RDBMS, instead of entire table scan.
- Compression: Efficient at data compression and data partitioning
- Quick to load: They are very fast to load. A billion rows can be loaded in few seconds.
- Aggregate queries: Due to their structure, column databases perform very good with aggregate queries i.e. sum, count, average
- Scalability: Column databases are very scalable and are well suited to do massive parallel processing, which involves spreading data across many clusters
- Updates can be inefficient
- Perform poorly with joins making them unsuitable for Online Transaction Processing (OLTP)
Wide column database works well for scenario where the columns aren’t same for every row.
- Works well with OLAP(Online Analytical Processing)
- Attribute based data such as equipment features
- Internet of Things sensor data
- Store data that gets logged in application
Wide column database: HBase, Cassandra, Bigtable
To read further about wide column database, check out the research paper by google: https://www.read.seas.harvard.edu/~kohler/class/cs239-w08/chang06bigtable.pdf
Subscribe to get notified when i publish upcoming blogs.