Thursday, December 15, 2016

Understanding Database Technology in IT world: RDBMS Vs NoSQL

There are basically two types of databases.
1. RDBMS
2. NOSQL


      RDBMS vs NoSQL

NoSQL isn’t relational, and it is designed for distributed data stores for very large scale data needs (e.g. Facebook or Twitter accumulate Terabits of data every day for millions of its users), there is no fixed schema and no joins. Meanwhile, relational database management systems (RDBMS) “scale up” by getting faster and faster hardware and adding memory. NoSQL, on the other hand, can take advantage of “scaling out” – which means spreading the load over many commodity systems.

The acronym NoSQL was coined in 1998, and while many think NoSQL is a derogatory term created to poke fun at SQL, in reality it means “Not Only SQL” rather than “No SQL at all.” The idea is that both technologies (NoSQL and RDBMSs) can co-exist and each has its place. Companies like Facebook, Twitter, Digg, Amazon, LinkedIn and Google all use NoSQL in some way — so the term has been in the current news often over the past few years.

       What’s Wrong with RDBMS?

Well, nothing, really. They just have their limitations. Consider these three problems with RDBMSs:
RDBMSs use a table-based normalization approach to data, and that’s a limited model. Certain data structures cannot be represented without tampering with the data, programs, or both.
They allow versioning or activities like: Create, Read, Update and Delete. For databases, updates should never be allowed, because they destroy information. Rather, when data changes, the database should just add another record and note duly the previous value for that record.

Performance falls off as RDBMSs normalize data. The reason: Normalization requires more tables, table joins, keys and indexes and thus more internal database operations for implement queries. Pretty soon, the database starts to grow into the terabytes, and that’s when things slow down.
  
        Four Categories of NoSQL

1. Key-values Stores
  • Examples                     Tokyo Cabinet/Tyrant, Redis, Voldemort, Oracle BDB
  • Typical applications     Content caching (Focus on scaling to huge amounts of data, designed to handle massive load), logging, etc.
  • Data model                   Collection of Key-Value pairs
  • Strengths                        Fast lookups
  • Weaknesses                    Stored data has no schema

2.   Column Family Stores
  •  Examples                    Cassandra, HBase, Riak
  • Typical applications     Distributed file systems
  • Data model                   Columns → column families
  • Strengths                      Fast lookups, good distributed storage of data
  • Weaknesses                   Very low-level API

3. Document Databases
  • Examples                      CouchDB, MongoDb
  • Typical applications      Web applications (Similar to Key-Value stores, but the DB knows what the Value is)
  • Data model                     Collections of Key-Value collections
  • Strengths                         Tolerant of incomplete data
  • Weaknesses                    Query performance, no standard query syntax

4. Graph Databases
  • Examples                      Neo4J, InfoGrid, Infinite Graph
  • Typical applications Social networking, Recommendations (Focus on modeling the structure of data – inter connectivity)
  • Data model                  “Property Graph” – Nodes
  • Strengths                       Graph algorithms e.g. shortest path, connectedness, and degree relationships, etc.
  • Weaknesses                   Has to traverse the entire graph to achieve a definitive answer. Not easy to cluster.


Purpose of four categories of NOSQL

1. Key-values Stores

The main idea here is using a hash table where there is a unique key and a pointer to a particular item of data. The Key/value model is the simplest and easiest to implement. But it is inefficient when you are only interested in querying or updating part of a value, among other disadvantages.

Examples: Tokyo Cabinet/Tyrant, Redis, Voldemort, Oracle BDB, Amazon SimpleDB, Riak

2. Column Family Stores

These were created to store and process very large amounts of data distributed over many machines. There are still keys but they point to multiple columns. The columns are arranged by column family.

Examples: Cassandra, HBase

3. Document Databases

These were inspired by Lotus Notes and are similar to key-value stores. The model is basically versioned documents that are collections of other key-value collections. The semi-structured documents are stored in formats like JSON. Document databases are essentially the next level of Key/value, allowing nested values associated with each key.  Document databases support querying more efficiently.

Examples: CouchDB, MongoDb

4. Graph Databases

Instead of tables of rows and columns and the rigid structure of SQL, a flexible graph model is used which, again, can scale across multiple machines. NoSQL databases do not provide a high-level declarative query language like SQL to avoid overtime in processing. Rather, querying these databases is data-model specific. Many of the NoSQL platforms allow for RESTful interfaces to the data, while other offer query APIs.

Examples: Neo4J, InfoGrid, Infinite Graph


    What Type of Storage Should you use?

NoSQL
  • Storage should be able to deal with very high load
  • You do many write operations on the storage
  • You want storage that is horizontally scalable
  • Simplicity is good, as in a very simple query language (without joins)

RDBMS
  • Storage is expected to be high-load, too, but it mainly consists of read operations
  • You want performance over a more sophisticated data structure
  • You need powerful SQL query language








No comments:

Post a Comment

System Design :: Performace Tuning: Scaling, Resiliency, persistence

Netflix System Deisgn