The other day I was cleaning out a pile of old magazines and came across the October 2004 issue of Wired with the article by Chris Anderson called The Long Tail. Chris published a book in 2006 on the subject, which was followed by a TED talk in 2007, where he takes a concept from statistics to describe the long tail in business. He proposed that in some cases the bulk of a company’s business comes from a long tail approach – low turn over items. Apple iTunes is often given as an example of long tail at its best. ITunes has thousands of songs and generates the bulk of its revenue from one-off sales. Amazon is another example.
The long tail approach in statistics tell us that a long tail distribution occurs after a high amplitude population, where the low amplitude population is just beyond the mean and gradually tails off. The long tail population is the most difficult to see in standard data because the occurrences are one-offs. The combination of the one-offs, as Chris postulates, is very powerful from a business perspective. The long tail is best shown on a graph.
Power law graph showing popularity ranking. To the right is the long tail; to the left are the few that dominate. Notice that the areas of both regions match. Source: Wikipedia.
So this got me thinking. Could the long tail concept be applied to data integration to generate valuable insight and perhaps new revenue sources? The answer is yes.
I have always espoused a holistic approach to data integration where one looks at all the data flows in and out of an organization, and applies a global strategy to integration to leverage a more global ROI. The long tail approach is one that I believe adds enormous value. The reality is that those small, sometimes one-off integration projects often do not fit into the global approach. They are, to use the jargon of the internet, mashups. Mashups can be created from both structured and unstructured data sources and often provide valuable insights into the patterns within the data, insights that are not picked up in the holistic approach.
The problem though is not in the data or the definition of the sources, but rather in the tool sets that people have at their disposal to integrate that data. In the data silo approach a single solution is used to solve a particular data problem. Those single silo solutions do not lend themselves to a long tail approach as they are not flexible enough nor are they designed to handle multi variant data sources, whether structured or not.
So if you are considering a long tail approach to a data integration problem, make sure that the tool set you are using has the flexibility. Make sure it is a middleware solution, one that handles disparate data sources. ECS and Delta fit that bill. They are powerful, flexible, and have a low cost of ownership that make the holistic and long tail approaches to data integration very powerful.