13 research outputs found

    GeoCLEF 2006: the CLEF 2006 Ccross-language geographic information retrieval track overview

    Get PDF
    After being a pilot track in 2005, GeoCLEF advanced to be a regular track within CLEF 2006. The purpose of GeoCLEF is to test and evaluate cross-language geographic information retrieval (GIR): retrieval for topics with a geographic specification. For GeoCLEF 2006, twenty-five search topics were defined by the organizing groups for searching English, German, Portuguese and Spanish document collections. Topics were translated into English, German, Portuguese, Spanish and Japanese. Several topics in 2006 were significantly more geographically challenging than in 2005. Seventeen groups submitted 149 runs (up from eleven groups and 117 runs in GeoCLEF 2005). The groups used a variety of approaches, including geographic bounding boxes, named entity extraction and external knowledge bases (geographic thesauri and ontologies and gazetteers)

    GEIR: a Full-Fledged Geographically Enhanced Information Retrieval Solution

    Get PDF
    With the development of search engines (e.g. Google, Bing, Yahoo, etc.), people is ambitiously expecting higher quality and improvements of current technologies. Bringing human intelligence features to these tools, like the ability to find implicit information through semantics, is one of the must prominent research lines in Computer Science. Information semantics is a very wide concept, as wide as the human capability to interpret, in particular, the analysis of geographical semantics gives the possibility to associate information with a place. It is estimated that more than 70\% of all information in the world has some kind of geographic features \cite{Jones04}. In 2012, Ed Parsons, a GeoSpatial Technologist from Google, reported that between 30\% and 40\% of the user queries at Google search engine contain geographic references \cite{Parsons12}. This thesis addresses the field of geographic information extraction and retrieval in unstructured texts. This process includes the identification of spatial features in textual documents, the data indexing, the manipulation of the relevance of the identified geographic entities and the multi-criteria retrieval according to the thematic and geographic information. The main contributions of this work include a custom geographic knowledge base, built from the combination of GeoNames and WordNet; a Natural Language Processing and knowledge based heuristics for Toponym Recognition and Toponym Disambiguation; and a geographic relevance weighting model that supports non-spatial indexing and simple ranking combination approaches. The validity of each one of these components is supported by practical experiments that show their effectiveness in different scenarios and their alignment with state of the art solutions. In addition, it also constitutes a main contribution of this work GEIR, a general purpose GIR framework that includes the implementations of the above described components and brings the possibility of implementing new ones and test their performance within an end to end GIR system

    Lokale Grammatiken zur Beschreibung von lokativen Sätzen und ihre Anwendung im Information Retrieval

    Get PDF

    Web-based discovery and dissemination of multidimensional geographic information

    Get PDF
    A spatial data clearinghouse is an electronic facility for searching, viewing, transferring, ordering, advertising, and disseminating spatial data from numerous sources via the Internet. Governments and other institutions have been implementing spatial data clearinghouses to minimise data duplication and thus reduce the cost of spatial data acquisition. Underlying these clearinghouses are geoportals and databases of geospatial metadata.A geoportal is an access point of a spatial data clearinghouse and metadata is data that describes data. The success of a clearinghouse's spatial data discovery system is dependent on its ability to communicate the contents of geospatial metadata by providing both visual and analytical assistancet o a user. The model currently adopted by the geographic information community was inherited from generic information systems and thus to an extent ignores spatial characteristics of geographic data. Consequently, research in Geographic Information Retrieval (GIR) has focussed on spatial aspects of webbased data discovery and acquisition. This thesis considers how the process of GIR from geoportals can be enhanced through multidimensional visualisation served by web-based geographic data sources. An approach is proposed for the presentation of search results in ontology assisted GIR. Also proposed is an approach for the visualisation of multidimensional geographic data from web-based data sources. These approaches are implemented in two prototypes, the Geospatial Database Online Visualisation Environment (GeoDOVE) and the Spatio-Temporal Ontological Relevance Model (STORM). A discussion of their design, implementation and evaluation is presented. The results suggest that ontology-assisted visualisation can improve a user's ability to identify the most relevant multidimensional geographic datasets from a set of search results. Additional results suggest that it is possible to offer the proposed visualisation approaches on existing geoportal frameworks. The implication of the results is that multidimensional visualisation should be considered by the wider geographic information community as an alternative to historic approaches for presenting search results on geoportals, such as the textual ranked list and two-dimensional maps.EThOS - Electronic Theses Online ServiceUniversity of Newcastle upon TyneGBUnited Kingdo

    Leveraging Semantic Annotations for Event-focused Search & Summarization

    Get PDF
    Today in this Big Data era, overwhelming amounts of textual information across different sources with a high degree of redundancy has made it hard for a consumer to retrospect on past events. A plausible solution is to link semantically similar information contained across the different sources to enforce a structure thereby providing multiple access paths to relevant information. Keeping this larger goal in view, this work uses Wikipedia and online news articles as two prominent yet disparate information sources to address the following three problems: • We address a linking problem to connect Wikipedia excerpts to news articles by casting it into an IR task. Our novel approach integrates time, geolocations, and entities with text to identify relevant documents that can be linked to a given excerpt. • We address an unsupervised extractive multi-document summarization task to generate a fixed-length event digest that facilitates efficient consumption of information contained within a large set of documents. Our novel approach proposes an ILP for global inference across text, time, geolocations, and entities associated with the event. • To estimate temporal focus of short event descriptions, we present a semi-supervised approach that leverages redundancy within a longitudinal news collection to estimate accurate probabilistic time models. Extensive experimental evaluations demonstrate the effectiveness and viability of our proposed approaches towards achieving the larger goal.Im heutigen Big Data Zeitalters existieren überwältigende Mengen an Textinformationen, die über mehrere Quellen verteilt sind und ein hohes Maß an Redundanz haben. Durch diese Gegebenheiten ist eine Retroperspektive auf vergangene Ereignisse für Konsumenten nur schwer möglich. Eine plausible Lösung ist die Verknüpfung semantisch ähnlicher, aber über mehrere Quellen verteilter Informationen, um dadurch eine Struktur zu erzwingen, die mehrere Zugriffspfade auf relevante Informationen, bietet. Vor diesem Hintergrund benutzt diese Dissertation Wikipedia und Onlinenachrichten als zwei prominente, aber dennoch grundverschiedene Informationsquellen, um die folgenden drei Probleme anzusprechen: • Wir adressieren ein Verknüpfungsproblem, um Wikipedia-Auszüge mit Nachrichtenartikeln zu verbinden und das Problem in eine Information-Retrieval-Aufgabe umzuwandeln. Unser neuartiger Ansatz integriert Zeit- und Geobezüge sowie Entitäten mit Text, um relevante Dokumente, die mit einem gegebenen Auszug verknüpft werden können, zu identifizieren. • Wir befassen uns mit einer unüberwachten Extraktionsmethode zur automatischen Zusammenfassung von Texten aus mehreren Dokumenten um Ereigniszusammenfassungen mit fester Länge zu generieren, was eine effiziente Aufnahme von Informationen aus großen Dokumentenmassen ermöglicht. Unser neuartiger Ansatz schlägt eine ganzzahlige lineare Optimierungslösung vor, die globale Inferenzen über Text, Zeit, Geolokationen und mit Ereignis-verbundenen Entitäten zieht. • Um den zeitlichen Fokus kurzer Ereignisbeschreibungen abzuschätzen, stellen wir einen semi-überwachten Ansatz vor, der die Redundanz innerhalb einer langzeitigen Dokumentensammlung ausnutzt, um genaue probabilistische Zeitmodelle abzuschätzen. Umfangreiche experimentelle Auswertungen zeigen die Wirksamkeit und Tragfähigkeit unserer vorgeschlagenen Ansätze zur Erreichung des größeren Ziels

    Hybrid geo-spatial query processing on the semantic web

    Get PDF
    SemanticWeb data sources such as DBpedia are a rich resource of structured representations of knowledge about geographical features and provide potential data for computing the results of Question Answering System queries that require geo-spatial computations. Retrieval from these resources of all content that is relevant to a particular spatial query of, for example, containment, proximity or crossing is not always straightforward as the geometry is usually confined to point representations and there is considerable inconsistency in the way in which geographical features are referenced to locations. In DBpedia, some geographical feature instances have point coordinates, others have qualitative properties that provide explicit or implicit spatial relationships between named places, and some have neither of these. This thesis demonstrates that structured geo-spatial query, a form of question answering, on DBpedia can be performed with a hybrid query method that exploits quantitative and qualitative spatial properties in combination with a high quality reference geo-dataset that can help to support a full range of geo-spatial query operators such as proximity, containment and crossing as well as vague directional queries such as Find airports north of London?. A quantitative model based on the spatial directional relations in DBpedia has been used to assist in query processing. Evaluation experiments confirm the benefits of combining qualitative and quantitative methods for containment queries and of employing high-quality spatial data, as opposed to DBpedia points, as reference objects for proximity queries, particularly for linear features. The high quality geo-data also enabled answering questions impossible to answer with SemanticWeb resources alone, such as finding geographic features within some distance from a region boundary. The contributions were validated by a prototype geo-spatial query system that combined qualitative and quantitative processing and included ranking answers for directional queries based on models derived from DBpedia contributed data

    Unlocking Environmental Narratives

    Get PDF
    Understanding the role of humans in environmental change is one of the most pressing challenges of the 21st century. Environmental narratives – written texts with a focus on the environment – offer rich material capturing relationships between people and surroundings. We take advantage of two key opportunities for their computational analysis: massive growth in the availability of digitised contemporary and historical sources, and parallel advances in the computational analysis of natural language. We open by introducing interdisciplinary research questions related to the environment and amenable to analysis through written sources. The reader is then introduced to potential collections of narratives including newspapers, travel diaries, policy documents, scientific proposals and even fiction. We demonstrate the application of a range of approaches to analysing natural language computationally, introducing key ideas through worked examples, and providing access to the sources analysed and accompanying code. The second part of the book is centred around case studies, each applying computational analysis to some aspect of environmental narrative. Themes include the use of language to describe narratives about glaciers, urban gentrification, diversity and writing about nature and ways in which locations are conceptualised and described in nature writing. We close by reviewing the approaches taken, and presenting an interdisciplinary research agenda for future work. The book is designed to be of interest to newcomers to the field and experienced researchers, and set out in a way that it can be used as an accompanying text for graduate level courses in, for example, geography, environmental history or the digital humanities

    Unlocking environmental narratives: towards understanding human environment interactions through computational text analysis

    Full text link
    Understanding the role of humans in environmental change is one of the most pressing challenges of the 21st century. Environmental narratives – written texts with a focus on the environment – offer rich material capturing relationships between people and surroundings. We take advantage of two key opportunities for their computational analysis: massive growth in the availability of digitised contemporary and historical sources, and parallel advances in the computational analysis of natural language. We open by introducing interdisciplinary research questions related to the environment and amenable to analysis through written sources. The reader is then introduced to potential collections of narratives including newspapers, travel diaries, policy documents, scientific proposals and even fiction. We demonstrate the application of a range of approaches to analysing natural language computationally, introducing key ideas through worked examples, and providing access to the sources analysed and accompanying code. The second part of the book is centred around case studies, each applying computational analysis to some aspect of environmental narrative. Themes include the use of language to describe narratives about glaciers, urban gentrification, diversity and writing about nature and ways in which locations are conceptualised and described in nature writing. We close by reviewing the approaches taken, and presenting an interdisciplinary research agenda for future work. The book is designed to be of interest to newcomers to the field and experienced researchers, and set out in a way that it can be used as an accompanying text for graduate level courses in, for example, geography, environmental history or the digital humanities

    Reasoning about fuzzy temporal and spatial information from the Web

    Get PDF
    corecore