Coding Fungus: Understanding Database Technology in IT world: RDBMS Vs NoSQL

There are basically two types of databases.
1. RDBMS
2. NOSQL

RDBMS vs NoSQL

NoSQL isn’t relational, and it is designed for distributed data stores for very large scale data needs (e.g. Facebook or Twitter accumulate Terabits of data every day for millions of its users), there is no fixed schema and no joins. Meanwhile, relational database management systems (RDBMS) “scale up” by getting faster and faster hardware and adding memory. NoSQL, on the other hand, can take advantage of “scaling out” – which means spreading the load over many commodity systems.

The acronym NoSQL was coined in 1998, and while many think NoSQL is a derogatory term created to poke fun at SQL, in reality it means “Not Only SQL” rather than “No SQL at all.” The idea is that both technologies (NoSQL and RDBMSs) can co-exist and each has its place. Companies like Facebook, Twitter, Digg, Amazon, LinkedIn and Google all use NoSQL in some way — so the term has been in the current news often over the past few years.

What’s Wrong with RDBMS?

Well, nothing, really. They just have their limitations. Consider these three problems with RDBMSs:
RDBMSs use a table-based normalization approach to data, and that’s a limited model. Certain data structures cannot be represented without tampering with the data, programs, or both.
They allow versioning or activities like: Create, Read, Update and Delete. For databases, updates should never be allowed, because they destroy information. Rather, when data changes, the database should just add another record and note duly the previous value for that record.

Performance falls off as RDBMSs normalize data. The reason: Normalization requires more tables, table joins, keys and indexes and thus more internal database operations for implement queries. Pretty soon, the database starts to grow into the terabytes, and that’s when things slow down.

Four Categories of NoSQL

1. Key-values Stores

Examples Tokyo Cabinet/Tyrant, Redis, Voldemort, Oracle BDB
Typical applications Content caching (Focus on scaling to huge amounts of data, designed to handle massive load), logging, etc.
Data model Collection of Key-Value pairs
Strengths Fast lookups
Weaknesses Stored data has no schema

2. Column Family Stores

Examples Cassandra, HBase, Riak
Typical applications Distributed file systems
Data model Columns → column families
Strengths Fast lookups, good distributed storage of data
Weaknesses Very low-level API

3. Document Databases

Examples CouchDB, MongoDb
Typical applications Web applications (Similar to Key-Value stores, but the DB knows what the Value is)
Data model Collections of Key-Value collections
Strengths Tolerant of incomplete data
Weaknesses Query performance, no standard query syntax

4. Graph Databases

Examples Neo4J, InfoGrid, Infinite Graph
Typical applications Social networking, Recommendations (Focus on modeling the structure of data – inter connectivity)
Data model “Property Graph” – Nodes
Strengths Graph algorithms e.g. shortest path, connectedness, and degree relationships, etc.
Weaknesses Has to traverse the entire graph to achieve a definitive answer. Not easy to cluster.

Purpose of four categories of NOSQL

1. Key-values Stores

The main idea here is using a hash table where there is a unique key and a pointer to a particular item of data. The Key/value model is the simplest and easiest to implement. But it is inefficient when you are only interested in querying or updating part of a value, among other disadvantages.

Examples: Tokyo Cabinet/Tyrant, Redis, Voldemort, Oracle BDB, Amazon SimpleDB, Riak

2. Column Family Stores

These were created to store and process very large amounts of data distributed over many machines. There are still keys but they point to multiple columns. The columns are arranged by column family.

Examples: Cassandra, HBase

3. Document Databases

These were inspired by Lotus Notes and are similar to key-value stores. The model is basically versioned documents that are collections of other key-value collections. The semi-structured documents are stored in formats like JSON. Document databases are essentially the next level of Key/value, allowing nested values associated with each key. Document databases support querying more efficiently.

Examples: CouchDB, MongoDb

4. Graph Databases

Instead of tables of rows and columns and the rigid structure of SQL, a flexible graph model is used which, again, can scale across multiple machines. NoSQL databases do not provide a high-level declarative query language like SQL to avoid overtime in processing. Rather, querying these databases is data-model specific. Many of the NoSQL platforms allow for RESTful interfaces to the data, while other offer query APIs.

Examples: Neo4J, InfoGrid, Infinite Graph

What Type of Storage Should you use?

NoSQL

Storage should be able to deal with very high load
You do many write operations on the storage
You want storage that is horizontally scalable
Simplicity is good, as in a very simple query language (without joins)

RDBMS

Storage is expected to be high-load, too, but it mainly consists of read operations
You want performance over a more sophisticated data structure
You need powerful SQL query language

Coding Fungus

Thursday, December 15, 2016

Understanding Database Technology in IT world: RDBMS Vs NoSQL

No comments:

Post a Comment

Web application Security and underlying concepts

Report Abuse