Over the past few years, we have built a system that has exposed large
volumes of Deep-Web content to Google.com users. The content that our system
exposes contributes to more than 1000 search queries per-second and spans over
50 languages and hundreds of domains. The Deep Web has long been acknowledged
to be a major source of structured data on the web, and hence accessing
Deep-Web content has long been a problem of interest in the data management
community. In this paper, we report on where we believe the Deep Web provides
value and where it does not. We contrast two very different approaches to
exposing Deep-Web content -- the surfacing approach that we used, and the
virtual integration approach that has often been pursued in the data management
literature. We emphasize where the values of each of the two approaches lie and
caution against potential pitfalls. We outline important areas of future
research and, in particular, emphasize the value that can be derived from
analyzing large collections of potentially disparate structured data on the
web.Comment: CIDR 200