I’m currently reading “The Long Tail,” and it describes an interesting statistical distribution that was left out of my Intro to Stats class: power laws (or for the mathematically inclined: Zipf Distributions). Most of my academic background is around Gaussian distributions (the Bell Curve), but power laws are used to describe much of the economics that we see in the online universe. But just what are these “laws?”
History: Pareto and Zipf
Power laws started with a mathematician named Vilfredo Pareto. He came up with the idea to study personal income in the 1800s. After some number crunching, he found that income distribution exhibited a surprising extreme: 20% of the population owned 80% of wealth. In other words, there are a few very wealthy individuals and many middle-class/poor individuals.
In the mid-1900s linguist George Zipf found that each word in the English language had a certain frequency distribution, where the word “the” occurs most often, at 7.5%, follow by “of” at 3.5%. After about 130 words, you’ve already reached about 50% of all utterances in the English language. If you graph it, you get something that looks like this:
The gray section contains the most frequent words (i.e., the, of, is). These are the “hits” of the English language–the most popular words uttered. The yellow sections contains the thousands and thousands of words in the dictionary. This distribution is similar to Pareto’s personal income findings: there are a few very popular words, many unpopular words.
The graph you see is called a power law or Zipfian distribution (for math geeks, it follows the shape of 1/x). It is a special distribution that describes much of world that exists online. Examples include iTunes music sales, Facebook Apps, Netflix DVD rentals, Amazon book sales, Alexa traffic rankings, and blog rankings. In each example, there are a few very popular items and many unpopular/niche items.
Why the sudden buzz for this seemingly old concept? At traditional brick and mortar merchants, power laws do not wholly exist. Barnes & Noble, for example, cannot carry thousands of unpopular books due to limited store space. It will only carry the very popular items at the top of the curve. An online retailer, such as Amazon, can carry as many titles as needed. This concept of unlimited inventory, or more importantly, unlimited information, illustrates a significant development in the online universe. Since information is abundant, we are beginning to see a great amount of online activity exhibit power laws.