Widespread e-commerce activity on the Internet has led to new opportunities
to collect vast amounts of micro-level market and nonmarket data. In this paper
we share our experiences in collecting, validating, storing and analyzing large
Internet-based data sets in the area of online auctions, music file sharing and
online retailer pricing. We demonstrate how such data can advance knowledge by
facilitating sharper and more extensive tests of existing theories and by
offering observational underpinnings for the development of new theories. Just
as experimental economics pushed the frontiers of economic thought by enabling
the testing of numerous theories of economic behavior in the environment of a
controlled laboratory, we believe that observing, often over extended periods
of time, real-world agents participating in market and nonmarket activity on
the Internet can lead us to develop and test a variety of new theories.
Internet data gathering is not controlled experimentation. We cannot randomly
assign participants to treatments or determine event orderings. Internet data
gathering does offer potentially large data sets with repeated observation of
individual choices and action. In addition, the automated data collection holds
promise for greatly reduced cost per observation. Our methods rely on
technological advances in automated data collection agents. Significant
challenges remain in developing appropriate sampling techniques integrating
data from heterogeneous sources in a variety of formats, constructing
generalizable processes and understanding legal constraints. Despite these
challenges, the early evidence from those who have harvested and analyzed large
amounts of e-commerce data points toward a significant leap in our ability to
understand the functioning of electronic commerce.Comment: Published at http://dx.doi.org/10.1214/088342306000000231 in the
Statistical Science (http://www.imstat.org/sts/) by the Institute of
Mathematical Statistics (http://www.imstat.org