There are basically two types of databases.
1. RDBMS
2. NOSQL
RDBMS vs NoSQL
NoSQL isn’t relational, and it is designed for distributed data stores for very large scale data needs (e.g. Facebook or Twitter accumulate Terabits of data every day for millions of its users), there is no fixed schema and no joins. Meanwhile, relational database management systems (RDBMS) “scale up” by getting faster and faster hardware and adding memory. NoSQL, on the other hand, can take advantage of “scaling out” – which means spreading the load over many commodity systems.
The acronym NoSQL was coined in 1998, and while many think NoSQL is a derogatory term created to poke fun at SQL, in reality it means “Not Only SQL” rather than “No SQL at all.” The idea is that both technologies (NoSQL and RDBMSs) can co-exist and each has its place. Companies like Facebook, Twitter, Digg, Amazon, LinkedIn and Google all use NoSQL in some way — so the term has been in the current news often over the past few years.
What’s Wrong with RDBMS?
Well, nothing, really. They just have their limitations. Consider these three problems with RDBMSs:
RDBMSs use a table-based normalization approach to data, and that’s a limited model. Certain data structures cannot be represented without tampering with the data, programs, or both.
They allow versioning or activities like: Create, Read, Update and Delete. For databases, updates should never be allowed, because they destroy information. Rather, when data changes, the database should just add another record and note duly the previous value for that record.
Performance falls off as RDBMSs normalize data. The reason: Normalization requires more tables, table joins, keys and indexes and thus more internal database operations for implement queries. Pretty soon, the database starts to grow into the terabytes, and that’s when things slow down.
1. RDBMS
2. NOSQL
RDBMS vs NoSQL
NoSQL isn’t relational, and it is designed for distributed data stores for very large scale data needs (e.g. Facebook or Twitter accumulate Terabits of data every day for millions of its users), there is no fixed schema and no joins. Meanwhile, relational database management systems (RDBMS) “scale up” by getting faster and faster hardware and adding memory. NoSQL, on the other hand, can take advantage of “scaling out” – which means spreading the load over many commodity systems.
The acronym NoSQL was coined in 1998, and while many think NoSQL is a derogatory term created to poke fun at SQL, in reality it means “Not Only SQL” rather than “No SQL at all.” The idea is that both technologies (NoSQL and RDBMSs) can co-exist and each has its place. Companies like Facebook, Twitter, Digg, Amazon, LinkedIn and Google all use NoSQL in some way — so the term has been in the current news often over the past few years.
What’s Wrong with RDBMS?
Well, nothing, really. They just have their limitations. Consider these three problems with RDBMSs:
RDBMSs use a table-based normalization approach to data, and that’s a limited model. Certain data structures cannot be represented without tampering with the data, programs, or both.
They allow versioning or activities like: Create, Read, Update and Delete. For databases, updates should never be allowed, because they destroy information. Rather, when data changes, the database should just add another record and note duly the previous value for that record.
Performance falls off as RDBMSs normalize data. The reason: Normalization requires more tables, table joins, keys and indexes and thus more internal database operations for implement queries. Pretty soon, the database starts to grow into the terabytes, and that’s when things slow down.
Four Categories of NoSQL
1. Key-values Stores
- Examples Tokyo Cabinet/Tyrant, Redis, Voldemort, Oracle BDB
- Typical applications Content caching (Focus on scaling to huge amounts of data, designed to handle massive load), logging, etc.
- Data model Collection of Key-Value pairs
- Strengths Fast lookups
- Weaknesses Stored data has no schema
2. Column Family Stores
- Examples Cassandra, HBase, Riak
- Typical applications Distributed file systems
- Data model Columns → column families
- Strengths Fast lookups, good distributed storage of data
- Weaknesses Very low-level API
3. Document Databases
- Examples CouchDB, MongoDb
- Typical applications Web applications (Similar to Key-Value stores, but the DB knows what the Value is)
- Data model Collections of Key-Value collections
- Strengths Tolerant of incomplete data
- Weaknesses Query performance, no standard query syntax
4. Graph Databases
- Examples Neo4J, InfoGrid, Infinite Graph
- Typical applications Social networking, Recommendations (Focus on modeling the structure of data – inter connectivity)
- Data model “Property Graph” – Nodes
- Strengths Graph algorithms e.g. shortest path, connectedness, and degree relationships, etc.
- Weaknesses Has to traverse the entire graph to achieve a definitive answer. Not easy to cluster.
Purpose of four categories of NOSQL
1. Key-values Stores
The main idea here is using a hash table where there is a unique key and a pointer to a particular item of data. The Key/value model is the simplest and easiest to implement. But it is inefficient when you are only interested in querying or updating part of a value, among other disadvantages.
Examples: Tokyo Cabinet/Tyrant, Redis, Voldemort, Oracle BDB, Amazon SimpleDB, Riak
2. Column Family Stores
These were created to store and process very large amounts of data distributed over many machines. There are still keys but they point to multiple columns. The columns are arranged by column family.
Examples: Cassandra, HBase
3. Document Databases
These were inspired by Lotus Notes and are similar to key-value stores. The model is basically versioned documents that are collections of other key-value collections. The semi-structured documents are stored in formats like JSON. Document databases are essentially the next level of Key/value, allowing nested values associated with each key. Document databases support querying more efficiently.
Examples: CouchDB, MongoDb
4. Graph Databases
Instead of tables of rows and columns and the rigid structure of SQL, a flexible graph model is used which, again, can scale across multiple machines. NoSQL databases do not provide a high-level declarative query language like SQL to avoid overtime in processing. Rather, querying these databases is data-model specific. Many of the NoSQL platforms allow for RESTful interfaces to the data, while other offer query APIs.
Examples: Neo4J, InfoGrid, Infinite Graph
What Type of Storage Should you use?
NoSQL
- Storage should be able to deal with very high load
- You do many write operations on the storage
- You want storage that is horizontally scalable
- Simplicity is good, as in a very simple query language (without joins)
RDBMS
- Storage is expected to be high-load, too, but it mainly consists of read operations
- You want performance over a more sophisticated data structure
- You need powerful SQL query language
No comments:
Post a Comment