Tuesday, March 13, 2007

Alexa Toolbar and the Problem of Experiment Design

"... next time you see a statistic on web usage (or any statistic) that the results are only as good as the selection process that brings in the data. "

All of us are interested in tools that provide an insight into user activity on the Internet, and we all have different tools that we use to provide and evaluate the the information we are seeking. It is important to remember that each of these tools provides a particular, and biased, view of the data. When using the information we collect from these tools, we need to keep in mind what is the bias of the particular tool that we are using.

This article provides some insight into the bias inherent in the data offered by Alexa, one of the best sources of information about importance of different web sites and about the activity they attract.

The author points out that data collected by Alexa tends to reflect the activity of a rather select group of web users, that will not be completely representative of the Internet user community as a whole. This shows up in some distinctly biased reporting on activity on a few sites that he examined in his admittedly unscientific analysis.

One question that his report raises in my mind is, "how would this bias effect differences between different sites that I might be attempting to understand?" His analysis focused on sites that it seems might be particularly vulnerable to the specific bias that is part of Alexa data. Would that bias be as significant for sites that I might want to compare? I don't have an answer, but the point is to raise the question and think about it as I consider my findings.

As part of his analysis, he illustrates the capacity of Alexa to do comparisons of sites, which I had not explored to this degree and was happy to learn about (he graphed the activity on the sites he was comparing, making it easy to see how their activity varied over time.)

The caution in this article reiterates something we have emphasized on our classes: use the data available, but be careful that you understand what it means and what it does not mean when applying its lessons to your site management.

Labels: ,

0 Comments:

Post a Comment

<< Home