Relational Database Management Systems (RDBMS) systems date back to the early 1970s and are characterized by a fixed schema, SQL (structured query language), and ACID compliance. (Atomicity: Transactions are complete or not complete, Consistency: leaves the system in a known state, Isolation: impacts only itself, and is Durable: permanent.) Data is organized in rows and columns like a spreadsheet and relationships can be built between different tables. In general relational databases work incredibly well until you hit “big data.” Big data is often defined using the three V’s velocity, volume, and variety.
- Volume – multiple terabyte or more
- Variety – numbers, audio, video, text, streams
- Velocity – how fast the data is collected and requested
Source: Montis.com
With the volume of data they need to process every second, it is easy to see why Google, Twitter, Facebook, and many other sites like them had to look for options beyond RDBMS. Over the past 10 years a technology known as NoSQL has emerged largely in response to the challenge of solving big data problems. Generically speaking NoSQL solutions have the following characteristics.
- No-schema required upfront
- Can store massive amounts of data
- Auto-sharding / elasticity (spreading data over multiple machines)
- Distributed query support (ability to execute a query on shared data)
- BASE (basically available, soft state, eventually consistent) not ACID
There is a raft of NoSQL solutions available for a variety of different business problems. The following figure from the 451 Group is an interesting illustration of the current marketplace. The acronym SPRAIN refers to:
- Scalability – hardware economics
- Performance – MySQL limitations
- Relaxed consistency – CAP theorem (*)
- Agility – polyglot persistence (use the right database for given problem)
- Intricacy – big data, total data
- Necessity – open source
* CAP (or Brewer’s) theorem says that a distributed computer system can only simultaneously satisfy two of three guarantees: consistency, availability, and partition tolerance.
Source: 451 Group
Having spent time putting a NoSQL (Mongo) database into place I can attest to the fact that it’s harder than it looks (hard to query directly, limited support, limited reporting, and steep learning curve). That said, the products and documentation are getting better, developers are becoming more comfortable with the technology, and companies like Couchbase are sprouting up to provide support. Today NoSQL is a great thing if you have the opportunity to start from a clean sheet of paper. In particular the no upfront-schema works really well with the MVC programming model (i.e., Rails, ASP.Net MVC3).
Many organizations don’t have the luxury of starting over and have RDBMS-based applications that for one reason or another face performance or scalability challenges that for business reasons (time, cost, skill, etc.) not technical challenges cannot be addressed with a new database. As Couchbase points out RDBMS developers have resorted to “sharding” – putting data across multiple servers, denormalizing data – adding redundant columns to the schema to optimize performance, and adding memory cache to optimize query performance. None of these solutions is really the answer and for an organization facing a big data problem.
Another way to optimize the performance of a database bound application is by minimizing query time by loading the database itself into an SSD or flash memory. The “catch” so to speak is that flash memory is not cheap but on the other hand it is much much less expensive than re-writing software. FusionIO makes a flash memory product called ioDrive. IoDrive is a PCI card that effectively behaves like an SSD drive. Because FusionIO has designed the card to integrate at the memory tier they have managed to minimize I/O bottlenecks. “Spinning disks” have a read rate of approximately 200-300 IOPS (I/O Operations per second); IoDrive can achieve a rate of almost 100K IOPS.
References:
- NoSQL Databases: Why, what and when – February 2011 (Slideshare.net)
- NoSQL, NewSQL and MDM – July 2011 (Blog post)
- Survey Distributed Databases
- Picking the Right NoSQL Database Tool – May 2011 (Blog post)
- NoSQL, NewSQL and Beyond: The answer to SPRAINed relational databases – April 2011 (Blog post)
- http://www.techvalidate.com/product-research/fusion-io-ioDrive


Posted by scooter998