What do you do when demand for your application outgrows the capabilities of an RDBMS?

February 11, 2012

Relational Database Management Systems (RDBMS) systems date back to the early 1970s and are characterized by a fixed schema, SQL (structured query language), and ACID compliance. (Atomicity: Transactions are complete or not complete, Consistency: leaves the system in a known state, Isolation: impacts only itself, and is Durable: permanent.) Data is organized in rows and columns like a spreadsheet and relationships can be built between different tables. In general relational databases work incredibly well until you hit “big data.” Big data is often defined using the three V’s velocity, volume, and variety.

  • Volume – multiple terabyte or more
  • Variety – numbers, audio, video, text, streams
  • Velocity – how fast the data is collected and requested

clip_image001

Source: Montis.com

With the volume of data they need to process every second, it is easy to see why Google, Twitter, Facebook, and many other sites like them had to look for options beyond RDBMS. Over the past 10 years a technology known as NoSQL has emerged largely in response to the challenge of solving big data problems. Generically speaking NoSQL solutions have the following characteristics.

  • No-schema required upfront
  • Can store massive amounts of data
  • Auto-sharding / elasticity (spreading data over multiple machines)
  • Distributed query support (ability to execute a query on shared data)
  • BASE (basically available, soft state, eventually consistent) not ACID

There is a raft of NoSQL solutions available for a variety of different business problems. The following figure from the 451 Group is an interesting illustration of the current marketplace. The acronym SPRAIN refers to:

  • Scalability – hardware economics
  • Performance – MySQL limitations
  • Relaxed consistency – CAP theorem (*)
  • Agility – polyglot persistence (use the right database for given problem)
  • Intricacy – big data, total data
  • Necessity – open source

* CAP (or Brewer’s) theorem says that a distributed computer system can only simultaneously satisfy two of three guarantees: consistency, availability, and partition tolerance.

clip_image002

Source: 451 Group

Having spent time putting a NoSQL (Mongo) database into place I can attest to the fact that it’s harder than it looks (hard to query directly, limited support, limited reporting, and steep learning curve). That said, the products and documentation are getting better, developers are becoming more comfortable with the technology, and companies like Couchbase are sprouting up to provide support. Today NoSQL is a great thing if you have the opportunity to start from a clean sheet of paper. In particular the no upfront-schema works really well with the MVC programming model (i.e., Rails, ASP.Net MVC3).

Many organizations don’t have the luxury of starting over and have RDBMS-based applications that for one reason or another face performance or scalability challenges that for business reasons (time, cost, skill, etc.) not technical challenges cannot be addressed with a new database. As Couchbase points out RDBMS developers have resorted to “sharding” – putting data across multiple servers, denormalizing data – adding redundant columns to the schema to optimize performance, and adding memory cache to optimize query performance. None of these solutions is really the answer and for an organization facing a big data problem.

Another way to optimize the performance of a database bound application is by minimizing query time by loading the database itself into an SSD or flash memory. The “catch” so to speak is that flash memory is not cheap but on the other hand it is much much less expensive than re-writing software. FusionIO makes a flash memory product called ioDrive. IoDrive is a PCI card that effectively behaves like an SSD drive. Because FusionIO has designed the card to integrate at the memory tier they have managed to minimize I/O bottlenecks. “Spinning disks” have a read rate of approximately 200-300 IOPS (I/O Operations per second); IoDrive can achieve a rate of almost 100K IOPS.

References:


Software Due Diligence – Part III

September 29, 2011

Software due diligence is a bit like having a home inspection done when purchasing a house.  Some problems are more serious than other.  For example, if you find that there is mold or asbestos in the basement that might be a reason to walk away.  Like a home inspection in most cases, the diligence does not reveal such serious problems with the software that you will want to back out of the deal entirely, it is typical that you may want to re-consider your valuation or take steps to manage the transition.

Red Flags

  • Architecture that will not scale (possible walk away)
  • Application cannot be re-built / run from source control checkout
  • Inability to engender a sense of confidence that the solution really works
  • Lack of forethought (this is where I’d like to go)
  • Architecture that cannot be cleanly expressed
  • Prima donna developers
  • Absence of technical leadership
  • Reliance on obsolete technology (i.e., Delphi)
  • Business logic consistently found in the presentation layer
  • Absence of any documentation whatsoever
  • Critical code that no one owns (i.e., that was developed by abc who isn’t here anymore)
  • Serious ethical breakdowns

Reasonable expectations

  • Absence of perfect documentation (even the best organizations are challenged to have up to date documentation)
  • At least one thing that impresses you as “world-class” (the more the better)
  • Good code
  • Finding that there are one or two go-to people
  • People wear many hats (product manager, QA, developer, etc.)
  • Insufficient infrastructure
  • Reliance on free services
  • Lack of published standards, metrics, or formal process
  • Informal bug tracking
  • Out of date off-the-shelf software
  • Limited requirements documents

Things to give you pause

  • Non-homogenous technology configuration
  • Bleeding edge / specialized technology (e.g., Cassandra, assembly)
  • Dependence on a service provided for free (Google Translate API)
  • Sloppy code
  • Lack of appreciation of the competition
  • Insufficient knowledge of best practices (JQuery vs. JavaScript)
  • How well will the system under examination compliment / be incompatible with existing systems?
  • Are some areas more complete than others? (some code is more battle tested)
  • Is there something that should be patented?
  • Poor user interface
  • General sense of mediocrity

Software Due Diligence – Part II

September 29, 2011

Operations

  • Review of current architecture either via documentation and whiteboard.
  • Watch the application run from an OS console. (e.g., top, perfmon)
  • Watch the application run from purpose-built administrator tools.
  • Describe the hosting architecture.
  • Where is the system hosted?
  • How redundant is the system?
  • How is the system monitored?
  • What are the biggest bottlenecks in the system?
  • Has your system ever been compromised?
  • Characterize the reliability of your system.
  • Have you done any vulnerability or penetration testing?
  • How would you handle 10X volume, 100X volume, 1000X volume? (This is a big one.)
  • Inventory of hardware and software (technology) assets.
  • Where is your source code stored?

Software

  • Review the source code (looking for good coding practices, clean architecture, exception handling, etc.).
  • Review database schema and query the live database (or copy of live).
  • Inventory custom components and software license agreements.
  • Are there any public or private APIs?
  • Review (developer) documentation associated with the code.
  • Review user-facing documentation and/or training materials.
  • Build all applications from source code and deploy to hosting environment.
  • Is there a debug / development interface?
  • Is there a database of customer feature requests or open issues?
  • Are any obsolete technologies (i.e. Delphi) in use?

People

  • Meet key employees and get to know their backgrounds.
  • Who is the go to person?
  • How do people collaborate?
  • Describe your SDLC?
  • Where do requirements come from?
  • How is the software tested?

Product

  • See a demo of all products, utilities, and supporting software.
  • Product Roadmap: Recent, Past, Present, Future.
  • Review current business model and sales process.
  • Are there any prototypes or product concepts that we should see or discuss? (These can be hidden gems)

Software Due Diligence – Part I

September 29, 2011

I was recently asked about what goes into software due diligence.  This is the first of three posts on this topic.  In this post I outline my thoughts on the process itself.  Part II is my working list of questions.  Finally, part III are some thoughts about what to expect and when to walk away.

There are a bunch of good checklists out there for buying an entire company.  See references below.  Most of these checklists talk about software diligence relatively generically. After looking at a number of different organizations for a variety of different reasons I’ve built myself a checklist that may be useful starting point for others. I think my checklist is most applicable for medium sized applications (~1MM SLOC) built by teams of 3-10 people. Larger applications probably warrant a more sophisticated approach.

This post is not about valuing the business, assessing the product, or anything to do with market position. It is all about looking at the code, how the code is hosted, and assessing the technical assets of a business. If you are doing a project for a VC they typically want to know if there are any “red flags.” There are two types of red flags – those that are correctable and those that are not. A correctable red flag is something like lack of off-site backups or a non-redundant server. An uncorrectable red flag (which typically means walk away) are prima donna developers, limited / no documentation, or an architecture that cannot scale. If you are doing a project for a business that is trying to integrate a property with their own systems they want to know about the red flags but they also want an understanding of what they are going to be inheriting and what it’s going to take to make it useful as fast as possible.

Invariably the technical examination will overlap with looking at the business itself. For example, when buying a product that claims to have a million users its prudent to query the database and see that there are at least a million email addresses in the database. Similarly, the business development folks are often after any information that they can use to value the business or close the deal.

I think that much of technology due diligence is common sense. The good news is that, if it’s done right, you quickly get a feel for the goodness of the product.  A process that has served me well is to sit in on the preliminary conversations (which are often over the phone) with the management team. I may / may not ask some general questions during that meeting. I will then follow-up with a call to the CTO (or technology lead) to get into more detail. The goal of the technology call is to confirm my understanding of the technology stack and to set expectations for an on-site visit.

I am not a big fan of questionnaires.  I believe an on-site visit is critical to getting a good understanding of the technology. More recently I’ve been challenged to find a place to visit as many of the principals work remotely from themselves. I think it is important to meet the people face to face. My primary objective is to learn as much as I can about the technology as possible. My secondary objective is to meet the development team and form an option about their respective competencies. By its nature diligence is the process of looking for problems. Rarely do you come back from looking at a business thinking that you under estimated how good it is. On the other hand I can think of many occasions where I’ve come away blown away by the people – their technical acumen, tenacity, and single minded determination to make something work.

Picking who goes on-site is a particularly important consideration. I think the minimum number is two – one subject matter expert on the business and someone who is proficient with coding in the language of the business being acquired. (You do not want a .Net person doing a Ruby on Rails evaluation). If budget permits an IT person is a very nice to have resource. Their perspective often compliments that of the business and developers.

References:

 


Follow

Get every new post delivered to your Inbox.

Join 45 other followers