Thursday, December 13, 2012

Rabbit Turds: Why I dislike NoSQL and Big-Data

Don't get me wrong on this one.  I love and use (and even teach others how to use) graph database's like Neo4J and others who label themselves "NoSQL' such as MongoDB.   I also don't have anything against someone who owns an ultra large piece of data.  It is the terms themselves and that is the subject of today's rant.

What I don't like is the same kind of nonsense, illogical convention for naming as has been present in the past.  It reminds me of another name that lingers to this day like the stench of a dead woodchuck (Web 2.0).   I wrote in the book "Web 2.0 Architectures" (co-authored with Dion Hinchcliffe and James Governor) that Web 2.0 is perhaps inappropriately named as the convention of [name]-[version_Major].[version_minor] sort of implies an observable state of a state-ful object, more than often a piece of software.   Since the web is a dynamic beast that is constantly in a state or flux with multiple technologies, this type of versioning cannot be applied.  I have written in the past that "there is no Web 2.0 architecture" and have even heard Tim O'Reilly himself get a bit (pardon the pun) riled up about those running around talking about Web 3.0, Web 4.0 etc.  Ooh I Cringe just thinking of it.  Like a people building penthouses on a pile of rubbish.   Try a google search for the term "Web 5.0" and just see the lunacy that exists.

When I first heard the term NoSQL used, it seemed to imply that there was not going to be any "SQL" in this movement but it turns out that the acronym stands for "Not Only SQL".    Hmm.  Let's think this one through a bit.  Not only SQL means that you are not excluding SQL.  It also means you are not explicitly including or even implicitly excluding everything else.  So here is the question.  What is defined by a term that is not exclusive of anything?  It is simply a mathematical set that includes "all". Can a rabbit turd be part of NoSQL?  Sure.  It certainly doesn't stand for "Not Only SQL But Definitely Not Rabbit Turds" or NOSQLBDNRT for short.  The most ironic thing about the term is that while many thinks it means "no relational" models or tables, Carlo Strozzi used the term NoSQL in 1998 to name his lightweight, open-source relational database that did not expose the standard SQL interface.  Maybe NoSQL should mean "I was too busy to add an SQL interface and used a different way to access database operations".  It certainly seems more fitting given the cool technological advances.  I was so inspired by the idea of NoSQL I decided to make the movement it's own graphic!  May I present to you...   NOSQL!





Ok, so for the record, I love the idea of building on Graph Databases.   My company is building a very innovative platform called Formstr built on top of Neo4J.  Neo4J is one of the best technologies I have ever come across.  It is time to maybe shift our thinking a bit and start classifying the technologies accordingly.  Graph Databases use Nodes and Relationships to store data.  They use ASCII art languages like Cypher to query them so there is no real SQL involved.   Other databases use native storage formats like JSON that translates into fast returns.  And hello Spring Data!  Why would you use  SQL in Java when you can use @annotations?   There is a good article written by Dan Sullivan here summing up the differences.  This now gets me into the second part of my rant.

Big Data.

What is Big Data?  Last time I looked, data is stored electronically as streams of ones and zeros. Since these are binary concepts, size is really not a consideration.   The ones and zeros are not bigger or smaller than anyone else's.   Big Data is another term that is vague, ill-conceived and has no real quantification or definition.  Loosely interpreted  I am currently under the belief most people really mean to say "A Very Large Amount of Data" or perhaps if it is bigger, we could make the acronym and say we have a VFLAD.  I'll let you guess what the "F" stands for.

Does Big Data mean a single piece of data or a lot of small pieces of data?  I find that due to the interconnected nature of the internet and the accessibility of open data (defined as Data that is accessible by anyone and anytime without any costs or significant barriers), we are really living in a world of "Abundant Data".  Some say "Interconnected Data" which also is sort of misleading.  Like the lessons learned from the old saying "You can't get there from here", I am tempted to say that all data I can connect to is potentially "connectable" data since I can make the connection.  And yes, that is a VFLAD!

So ends my rant, as mild as they come.  This blogger has better things to put energy into that trying to change people's behavior into using meaningful and well thought out terms.

No comments:

Post a Comment

Do not spam this blog! Google and Yahoo DO NOT follow comment links for SEO. If you post an unrelated link advertising a company or service, you will be reported immediately for spam and your link deleted within 30 minutes. If you want to sponsor a post, please let us know by reaching out to duane dot nickull at gmail dot com.