TR-CIS-2005-03 (02/26/2005)
Alexander Markowetz, Yen-Yu Chen, Torsten Suel, Xiaohui Long, Bernhard Seeger
Abstract
In this paper, we describe the design and initial implementation of a geographic
search engine prototype for Germany, based on a large crawl of the .de domain.
Geographic search engines provide a flexible interface to the Web
that allows users to constrain and order search results in an intuitive manner,
by focusing a query on a particular geographic region. Geographic search
technology has recently received significant commercial interest, but
there has been only a limited amount of academic work in this direction
so far. Our prototype performs massive extraction of geographic features
from crawled data, which are then mapped to coordinates and aggregated across
link and site structure. This allows us to assign to each web page a set
of relevant locations, called the geographic footprint of the page. The
resulting footprint data is then integrated into a high-performance query processor
on a cluster-based architecture. We discuss the various techniques, both new
and existing, that are used for recognizing, matching, mapping, and aggregating
geographic features, and describe how to integrate geographic
query processing into a standard search engine architecture and search interface.