34 research outputs found

    Computing Geographical Scopes of Web Resources

    Get PDF
    Many information resources on the web are relevant primarily to limited geographical communities. For instance, web sites containing information on restaurants, theaters, and apartment rentals are relevant primarily to web users in geographical proximity to these locations. In contrast, other information resources are relevant to a broader geographical community. For instance, an on-line newspaper may be relevant to users across the United States. Unfortunately, most current web search engines largely ignore the geographical scope of web resources. In this paper, we introduce techniques for automatically computing the geographical scope of web resources, based on the textual content of the resources, as well as on the geographical distribution of hyperlinks to them. We report an extensive experimental evaluation of our strategies using real web data. Finally, we describe a geographically-aware search engine that we have built using our techniques for determining the geographical scope of web resources

    Spatio-textual indexing for geographical search on the web

    Get PDF
    Many web documents refer to specific geographic localities and many people include geographic context in queries to web search engines. Standard web search engines treat the geographical terms in the same way as other terms. This can result in failure to find relevant documents that refer to the place of interest using alternative related names, such as those of included or nearby places. This can be overcome by associating text indexing with spatial indexing methods that exploit geo-tagging procedures to categorise documents with respect to geographic space. We describe three methods for spatio-textual indexing based on multiple spatially indexed text indexes, attaching spatial indexes to the document occurrences of a text index, and merging text index access results with results of access to a spatial index of documents. These schemes are compared experimentally with a conventional text index search engine, using a collection of geo-tagged web documents, and are shown to be able to compete in speed and storage performance with pure text indexing

    The use of interactive graphical maps for browsing medical/health Internet information resources

    Get PDF
    As online information portals accumulate metadata descriptions of Web resources, it becomes necessary to develop effective ways for visualising and navigating the resultant huge metadata repositories as well as the different semantic relationships and attributes of described Web resources. Graphical maps provide a good method to visualise, understand and navigate a world that is too large and complex to be seen directly like the Web. Several examples of maps designed as a navigational aid for Web resources are presented in this review with an emphasis on maps of medical and health-related resources. The latter include HealthCyberMap maps , which can be classified as conceptual information space maps, and the very abstract and geometric Visual Net maps of PubMed (for demos). Information resources can be also organised and navigated based on their geographic attributes. Some of the maps presented in this review use a Kohonen Self-Organising Map algorithm, and only HealthCyberMap uses a Geographic Information System to classify Web resource data and render the maps. Maps based on familiar metaphors taken from users' everyday life are much easier to understand. Associative and pictorial map icons that enable instant recognition and comprehension are preferred to geometric ones and are key to successful maps for browsing medical/health Internet information resources

    A Neural Model for User Geolocation and Lexical Dialectology

    Full text link
    We propose a simple yet effective text- based user geolocation model based on a neural network with one hidden layer, which achieves state of the art performance over three Twitter benchmark geolocation datasets, in addition to producing word and phrase embeddings in the hidden layer that we show to be useful for detecting dialectal terms. As part of our analysis of dialectal terms, we release DAREDS, a dataset for evaluating dialect term detection methods

    Location-based health information services: a new paradigm in personalised information delivery

    Get PDF
    Brute health information delivery to various devices can be easily achieved these days, making health information instantly available whenever it is needed and nearly anywhere. However, brute health information delivery risks overloading users with unnecessary information that does not answer their actual needs, and might even act as noise, masking any other useful and relevant information delivered with it. Users' profiles and needs are definitely affected by where they are, and this should be taken into consideration when personalising and delivering information to users in different locations. The main goal of location-based health information services is to allow better presentation of the distribution of health and healthcare needs and Internet resources answering them across a geographical area, with the aim to provide users with better support for informed decision-making. Personalised information delivery requires the acquisition of high quality metadata about not only information resources, but also information service users, their geographical location and their devices. Throughout this review, experience from a related online health information service, HealthCyberMap , is referred to as a model that can be easily adapted to other similar services. HealthCyberMap is a Web-based directory service of medical/health Internet resources exploring new means to organise and present these resources based on consumer and provider locations, as well as the geographical coverage or scope of indexed resources. The paper also provides a concise review of location-based services, technologies for detecting user location (including IP geolocation), and their potential applications in health and healthcare

    Clusterisation du Web en vue d'extraction de corpus homogènes

    Get PDF
    ISBN 2-906855-18-9Web resources are more and more different, not only regarding thematic content but also related to type of document, geographic origin, level, language, etc. However, web search engines do not take into account this heterogeneity and propose only a thematic access by keywords to the documents. This paper presents a method allowing to extract homogenous corpus of web documents. This method based on link analysis uses co-citation method and focuses more specially on the notion of type of web documents.Les ressources disponibles sur le Web sont de plus en plus diverses aussi bien d'un point de vue thématique, qu'au niveau de leur type, de leur origine géographique, etc. Cependant, les outils de recherche ne prennent pas en compte cette hétérogénéité et ne proposent qu'un accès par mots-clés aux documents du web. Cet article présente une méthode basée sur les hyperliens, permettant d'extraire du graphe Web des sous-corpus de documents homogènes. L'expérience décrite ici utilise la méthode des co-citations et s'intéresse plus spécialement à la notion de genre (type) de document web

    One model to rule them all: unified classification model for geotagging websites

    Full text link
    The paper presents a novel approach to finding regional scopes (geotagging) of websites. It relies on a single binary classification model per region type to perform the multi-label classification and uses a variety of different features that have not been yet used together for machine-learning based regional classification of websites. The evaluation demonstrates the advantage of our one model per region type method versus the traditional one model per region approach

    Geospatial route extraction from texts

    Full text link

    Location-based search engines tasks and capabilities: A comparative study

    Get PDF
    Location-based web searching is one of the popular tasks expected from the search engines. A location-based query consists of a topic and a reference location. Unlike general web search, in location-based search it is expected to find and rank documents which are not only related to the query topic but also geographically related to the location which the query is associated with. There are several issues for developing effective geographic search engines and so far, no global location-based search engine has been reported. Location ambiguity, lack of geographic information on web pages, language-based and country-dependent addressing styles, and multiple locations related to a single web resource are notable difficulties. Search engine companies have started to develop and offer location-based services. However, they are still geographically limited and have not become as successful and popular as general search engines. This paper reviews the architecture and tasks of location-based search engines and compares the capabilities, functionalities and coverage of the current geographic search engines with a user-oriented approach. Copyrigh
    corecore