22 research outputs found

    An approach towards a harmonized framework for hydrographic features domain

    Get PDF
    We describe an ontology in the hydrographic domain, hydrOntology, which can be used to overcome the semantic heterogeneity of different databases and other information sources that deal with geographic information. We show that ontologies are richer than other means commonly used to represent geographic information, such as feature catalogues and thesauri, and how ontologies could be used to overcome some of the problems associated with them

    Global Keyword Tracking in Archaeology

    Get PDF
    With the digitization of information, discoveries of events that previously took much human effort can now be found automatically. As example, we investigate several scandals in the art and antiques area that occurred between 1985 and 2005. In these events, the auction house Sotheby's was suspected to accept or even help the trading of smuggled paintings or antiques and the famous Getty Museum was exposed as purchasing antiques linked to treasure hunters. Discovering these secrets required the hard work of journalists, detectives, TV producers, and so on. The investigators were involved in illegal trades and various dangerous situations during their process of investigation. In comparison, today, with the access to digital version of large datasets, we are able to discover similar events using computationally-based techniques without the high risk and the cost of human labour needed before. This thesis introduces our tool for extracting keywords, terms and peoples' names from news articles, books, and marking them on an interactive map. We use the New York Times as the main resource, extract location terms in each news articles using Gazetteer, extract keywords and people's names in each articles and reduce ambiguity using WordNet. Combining them, we are able to form location-keyword-time pairs for each articles, and together they form a database. Then we build an interactive map based on the database. The map is able to show the relationships between location and keywords. The linkages between two or more people or locations is able to show on the map. The demonstration was able to perform similar detection process as those journalists did in the late 90s. The paper also introduces additional findings during the examination of the original datasets. As a news media outlet based in New York, we see evidence that the New York Times turns out to focus much more on New York City and the United States compared with other countries. With the extraction of locations inside the articles, we were able to see the distribution of articles mentioning different countries differs a lot when comparing the different continents. Our visualization also shows how locations names were changed throughout time, and how the terms people use describing a certain object changes

    Toponym Disambiguation in Information Retrieval

    Full text link
    In recent years, geography has acquired a great importance in the context of Information Retrieval (IR) and, in general, of the automated processing of information in text. Mobile devices that are able to surf the web and at the same time inform about their position are now a common reality, together with applications that can exploit this data to provide users with locally customised information, such as directions or advertisements. Therefore, it is important to deal properly with the geographic information that is included in electronic texts. The majority of such kind of information is contained as place names, or toponyms. Toponym ambiguity represents an important issue in Geographical Information Retrieval (GIR), due to the fact that queries are geographically constrained. There has been a struggle to nd speci c geographical IR methods that actually outperform traditional IR techniques. Toponym ambiguity may constitute a relevant factor in the inability of current GIR systems to take advantage from geographical knowledge. Recently, some Ph.D. theses have dealt with Toponym Disambiguation (TD) from di erent perspectives, from the development of resources for the evaluation of Toponym Disambiguation (Leidner (2007)) to the use of TD to improve geographical scope resolution (Andogah (2010)). The Ph.D. thesis presented here introduces a TD method based on WordNet and carries out a detailed study of the relationship of Toponym Disambiguation to some IR applications, such as GIR, Question Answering (QA) and Web retrieval. The work presented in this thesis starts with an introduction to the applications in which TD may result useful, together with an analysis of the ambiguity of toponyms in news collections. It could not be possible to study the ambiguity of toponyms without studying the resources that are used as placename repositories; these resources are the equivalent to language dictionaries, which provide the di erent meanings of a given word.Buscaldi, D. (2010). Toponym Disambiguation in Information Retrieval [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8912Palanci

    Knowledge-based and data-driven approaches for geographical information access

    Get PDF
    Geographical Information Access (GeoIA) can be defined as a way of retrieving information from textual collections that includes the automatic analysis and interpretation of the geographical constraints and terms present in queries and documents. This PhD thesis presents, describes and evaluates several heterogeneous approaches for the following three GeoIA tasks: Geographical Information Retrieval (GIR), Geographical Question Answering (GeoQA), and Textual Georeferencing (TG). The GIR task deals with user queries that search over documents (e.g. ¿vineyards in California?) and the GeoQA task treats questions that retrieve answers (e.g. ¿What is the capital of France?). On the other hand, TG is the task of associate one or more georeferences (such as polygons or coordinates in a geodetic reference system) to electronic documents. Current state-of-the-art AI algorithms are not yet fully understanding the semantic meaning and the geographical constraints and terms present in queries and document collections. This thesis attempts to improve the effectiveness results of GeoIA tasks by: 1) improving the detection, understanding, and use of a part of the geographical and the thematic content of queries and documents with Toponym Recognition, Toponym Disambiguation and Natural Language Processing (NLP) techniques, and 2) combining Geographical Knowledge-Based Heuristics based on common sense with Data-Driven IR algorithms. The main contributions of this thesis to the state-of-the-art of GeoIA tasks are: 1) The presentation of 10 novel approaches for GeoIA tasks: 3 approaches for GIR, 3 for GeoQA, and 4 for Textual Georeferencing (TG). 2) The evaluation of these novel approaches in these contexts: within official evaluation benchmarks, after evaluation benchmarks with the test collections, and with other specific datasets. Most of these algorithms have been evaluated in international evaluations and some of them achieved top-ranked state-of-the-art results, including top-performing results in GIR (GeoCLEF 2007) and TG (MediaEval 2014) benchmarks. 3) The experiments reported in this PhD thesis show that the approaches can combine effectively Geographical Knowledge and NLP with Data-Driven techniques to improve the efectiveness measures of the three Geographical Information Access tasks investigated. 4) TALPGeoIR: a novel GIR approach that combines Geographical Knowledge ReRanking (GeoKR), NLP and Relevance Feedback (RF) that achieved state-of-the-art results in official GeoCLEF benchmarks (Ferrés and Rodríguez, 2008; Mandl et al., 2008) and posterior experiments (Ferrés and Rodríguez, 2015a). This approach has been evaluated with the full GeoCLEF corpus (100 topics) and showed that GeoKR, NLP, and RF techniques evaluated separately or in combination improve the results in MAP and R-Precision effectiveness measures of the state-of-the-art IR algorithms TF-IDF, BM25 and InL2 and show statistical significance in most of the experiments. 5) GeoTALP-QA: a scope-based GeoQA approach for Spanish and English and its evaluation with a set of questions of the Spanish geography (Ferrés and Rodríguez, 2006). 6) Four state-of-the-art Textual Georeferencing approaches for informal and formal documents that achieved state-of-the-art results in evaluation benchmarks (Ferrés and Rodríguez, 2014) and posterior experiments (Ferrés and Rodríguez, 2011; Ferrés and Rodríguez, 2015b).L'Accés a la Informació Geogràfica (GeoAI) pot ser definit com una forma de recuperar informació de col·lecions textuals que inclou l'anàlisi automàtic i la interpretació dels termes i restriccions geogràfiques que apareixen en consultes i documents. Aquesta tesi doctoral presenta, descriu i avalua varies aproximacions heterogènies a les seguents tasques de GeoAI: Recuperació de la Informació Geogràfica (RIG), Cerca de la Resposta Geogràfica (GeoCR), i Georeferenciament Textual (GT). La tasca de RIG tracta amb consultes d'usuari que cerquen documents (e.g. ¿vinyes a California?) i la tasca GeoCR tracta de recuperar respostes concretes a preguntes (e.g. ¿Quina és la capital de França?). D'altra banda, GT es la tasca de relacionar una o més referències geogràfiques (com polígons o coordenades en un sistema de referència geodètic) a documents electrònics. Els algoritmes de l'estat de l'art actual en Intel·ligència Artificial encara no comprenen completament el significat semàntic i els termes i les restriccions geogràfiques presents en consultes i col·leccions de documents. Aquesta tesi intenta millorar els resultats en efectivitat de les tasques de GeoAI de la seguent manera: 1) millorant la detecció, comprensió, i la utilització d'una part del contingut geogràfic i temàtic de les consultes i documents amb tècniques de reconeixement de topònims, desambiguació de topònims, i Processament del Llenguatge Natural (PLN), i 2) combinant heurístics basats en Coneixement Geogràfic i en el sentit comú humà amb algoritmes de Recuperació de la Informació basats en dades. Les principals contribucions d'aquesta tesi a l'estat de l'art de les tasques de GeoAI són: 1) La presentació de 10 noves aproximacions a les tasques de GeoAI: 3 aproximacions per RIG, 3 per GeoCR, i 4 per Georeferenciament Textual (GT). 2) L'avaluació d'aquestes noves aproximacions en aquests contexts: en el marc d'avaluacions comparatives internacionals, posteriorment a avaluacions comparatives internacionals amb les col·lections de test, i amb altres conjunts de dades específics. La majoria d'aquests algoritmes han estat avaluats en avaluacions comparatives internacionals i alguns d'ells aconseguiren alguns dels millors resultats en l'estat de l'art, com per exemple els resultats en comparatives de RIG (GeoCLEF 2007) i GT (MediaEval 2014). 3) Els experiments descrits en aquesta tesi mostren que les aproximacions poden combinar coneixement geogràfic i PLN amb tècniques basades en dades per millorar les mesures d'efectivitat en les tres tasques de l'Accés a la Informació Geogràfica investigades. 4) TALPGeoIR: una nova aproximació a la RIG que combina Re-Ranking amb Coneixement Geogràfic (GeoKR), PLN i Retroalimentació de Rellevancia (RR) que aconseguí resultats en l'estat de l'art en comparatives oficials GeoCLEF (Ferrés and Rodríguez, 2008; Mandl et al., 2008) i en experiments posteriors (Ferrés and Rodríguez, 2015a). Aquesta aproximació ha estat avaluada amb el conjunt complert del corpus GeoCLEF (100 topics) i ha mostrat que les tècniques GeoKR, PLN i RR avaluades separadament o en combinació milloren els resultats en les mesures efectivitat MAP i R-Precision dels algoritmes de l'estat de l'art en Recuperació de la Infomació TF-IDF, BM25 i InL2 i a més mostren significació estadística en la majoria dels experiments. 5) GeoTALP-QA: una aproximació basada en l'àmbit geogràfic per espanyol i anglès i la seva avaluació amb un conjunt de preguntes de la geografía espanyola (Ferrés and Rodríguez, 2006). 6) Quatre aproximacions per al georeferenciament de documents formals i informals que obtingueren resultats en l'estat de l'art en avaluacions comparatives (Ferrés and Rodríguez, 2014) i en experiments posteriors (Ferrés and Rodríguez, 2011; Ferrés and Rodríguez, 2015b)

    Knowledge-based and data-driven approaches for geographical information access

    Get PDF
    Geographical Information Access (GeoIA) can be defined as a way of retrieving information from textual collections that includes the automatic analysis and interpretation of the geographical constraints and terms present in queries and documents. This PhD thesis presents, describes and evaluates several heterogeneous approaches for the following three GeoIA tasks: Geographical Information Retrieval (GIR), Geographical Question Answering (GeoQA), and Textual Georeferencing (TG). The GIR task deals with user queries that search over documents (e.g. ¿vineyards in California?) and the GeoQA task treats questions that retrieve answers (e.g. ¿What is the capital of France?). On the other hand, TG is the task of associate one or more georeferences (such as polygons or coordinates in a geodetic reference system) to electronic documents. Current state-of-the-art AI algorithms are not yet fully understanding the semantic meaning and the geographical constraints and terms present in queries and document collections. This thesis attempts to improve the effectiveness results of GeoIA tasks by: 1) improving the detection, understanding, and use of a part of the geographical and the thematic content of queries and documents with Toponym Recognition, Toponym Disambiguation and Natural Language Processing (NLP) techniques, and 2) combining Geographical Knowledge-Based Heuristics based on common sense with Data-Driven IR algorithms. The main contributions of this thesis to the state-of-the-art of GeoIA tasks are: 1) The presentation of 10 novel approaches for GeoIA tasks: 3 approaches for GIR, 3 for GeoQA, and 4 for Textual Georeferencing (TG). 2) The evaluation of these novel approaches in these contexts: within official evaluation benchmarks, after evaluation benchmarks with the test collections, and with other specific datasets. Most of these algorithms have been evaluated in international evaluations and some of them achieved top-ranked state-of-the-art results, including top-performing results in GIR (GeoCLEF 2007) and TG (MediaEval 2014) benchmarks. 3) The experiments reported in this PhD thesis show that the approaches can combine effectively Geographical Knowledge and NLP with Data-Driven techniques to improve the efectiveness measures of the three Geographical Information Access tasks investigated. 4) TALPGeoIR: a novel GIR approach that combines Geographical Knowledge ReRanking (GeoKR), NLP and Relevance Feedback (RF) that achieved state-of-the-art results in official GeoCLEF benchmarks (Ferrés and Rodríguez, 2008; Mandl et al., 2008) and posterior experiments (Ferrés and Rodríguez, 2015a). This approach has been evaluated with the full GeoCLEF corpus (100 topics) and showed that GeoKR, NLP, and RF techniques evaluated separately or in combination improve the results in MAP and R-Precision effectiveness measures of the state-of-the-art IR algorithms TF-IDF, BM25 and InL2 and show statistical significance in most of the experiments. 5) GeoTALP-QA: a scope-based GeoQA approach for Spanish and English and its evaluation with a set of questions of the Spanish geography (Ferrés and Rodríguez, 2006). 6) Four state-of-the-art Textual Georeferencing approaches for informal and formal documents that achieved state-of-the-art results in evaluation benchmarks (Ferrés and Rodríguez, 2014) and posterior experiments (Ferrés and Rodríguez, 2011; Ferrés and Rodríguez, 2015b).L'Accés a la Informació Geogràfica (GeoAI) pot ser definit com una forma de recuperar informació de col·lecions textuals que inclou l'anàlisi automàtic i la interpretació dels termes i restriccions geogràfiques que apareixen en consultes i documents. Aquesta tesi doctoral presenta, descriu i avalua varies aproximacions heterogènies a les seguents tasques de GeoAI: Recuperació de la Informació Geogràfica (RIG), Cerca de la Resposta Geogràfica (GeoCR), i Georeferenciament Textual (GT). La tasca de RIG tracta amb consultes d'usuari que cerquen documents (e.g. ¿vinyes a California?) i la tasca GeoCR tracta de recuperar respostes concretes a preguntes (e.g. ¿Quina és la capital de França?). D'altra banda, GT es la tasca de relacionar una o més referències geogràfiques (com polígons o coordenades en un sistema de referència geodètic) a documents electrònics. Els algoritmes de l'estat de l'art actual en Intel·ligència Artificial encara no comprenen completament el significat semàntic i els termes i les restriccions geogràfiques presents en consultes i col·leccions de documents. Aquesta tesi intenta millorar els resultats en efectivitat de les tasques de GeoAI de la seguent manera: 1) millorant la detecció, comprensió, i la utilització d'una part del contingut geogràfic i temàtic de les consultes i documents amb tècniques de reconeixement de topònims, desambiguació de topònims, i Processament del Llenguatge Natural (PLN), i 2) combinant heurístics basats en Coneixement Geogràfic i en el sentit comú humà amb algoritmes de Recuperació de la Informació basats en dades. Les principals contribucions d'aquesta tesi a l'estat de l'art de les tasques de GeoAI són: 1) La presentació de 10 noves aproximacions a les tasques de GeoAI: 3 aproximacions per RIG, 3 per GeoCR, i 4 per Georeferenciament Textual (GT). 2) L'avaluació d'aquestes noves aproximacions en aquests contexts: en el marc d'avaluacions comparatives internacionals, posteriorment a avaluacions comparatives internacionals amb les col·lections de test, i amb altres conjunts de dades específics. La majoria d'aquests algoritmes han estat avaluats en avaluacions comparatives internacionals i alguns d'ells aconseguiren alguns dels millors resultats en l'estat de l'art, com per exemple els resultats en comparatives de RIG (GeoCLEF 2007) i GT (MediaEval 2014). 3) Els experiments descrits en aquesta tesi mostren que les aproximacions poden combinar coneixement geogràfic i PLN amb tècniques basades en dades per millorar les mesures d'efectivitat en les tres tasques de l'Accés a la Informació Geogràfica investigades. 4) TALPGeoIR: una nova aproximació a la RIG que combina Re-Ranking amb Coneixement Geogràfic (GeoKR), PLN i Retroalimentació de Rellevancia (RR) que aconseguí resultats en l'estat de l'art en comparatives oficials GeoCLEF (Ferrés and Rodríguez, 2008; Mandl et al., 2008) i en experiments posteriors (Ferrés and Rodríguez, 2015a). Aquesta aproximació ha estat avaluada amb el conjunt complert del corpus GeoCLEF (100 topics) i ha mostrat que les tècniques GeoKR, PLN i RR avaluades separadament o en combinació milloren els resultats en les mesures efectivitat MAP i R-Precision dels algoritmes de l'estat de l'art en Recuperació de la Infomació TF-IDF, BM25 i InL2 i a més mostren significació estadística en la majoria dels experiments. 5) GeoTALP-QA: una aproximació basada en l'àmbit geogràfic per espanyol i anglès i la seva avaluació amb un conjunt de preguntes de la geografía espanyola (Ferrés and Rodríguez, 2006). 6) Quatre aproximacions per al georeferenciament de documents formals i informals que obtingueren resultats en l'estat de l'art en avaluacions comparatives (Ferrés and Rodríguez, 2014) i en experiments posteriors (Ferrés and Rodríguez, 2011; Ferrés and Rodríguez, 2015b).Postprint (published version

    Automatically Detecting the Resonance of Terrorist Movement Frames on the Web

    Get PDF
    The ever-increasing use of the internet by terrorist groups as a platform for the dissemination of radical, violent ideologies is well documented. The internet has, in this way, become a breeding ground for potential lone-wolf terrorists; that is, individuals who commit acts of terror inspired by the ideological rhetoric emitted by terrorist organizations. These individuals are characterized by their lack of formal affiliation with terror organizations, making them difficult to intercept with traditional intelligence techniques. The radicalization of individuals on the internet poses a considerable threat to law enforcement and national security officials. This new medium of radicalization, however, also presents new opportunities for the interdiction of lone wolf terrorism. This dissertation is an account of the development and evaluation of an information technology (IT) framework for detecting potentially radicalized individuals on social media sites and Web fora. Unifying Collective Action Framing Theory (CAFT) and a radicalization model of lone wolf terrorism, this dissertation analyzes a corpus of propaganda documents produced by several, radically different, terror organizations. This analysis provides the building blocks to define a knowledge model of terrorist ideological framing that is implemented as a Semantic Web Ontology. Using several techniques for ontology guided information extraction, the resultant ontology can be accurately processed from textual data sources. This dissertation subsequently defines several techniques that leverage the populated ontological representation for automatically identifying individuals who are potentially radicalized to one or more terrorist ideologies based on their postings on social media and other Web fora. The dissertation also discusses how the ontology can be queried using intuitive structured query languages to infer triggering events in the news. The prototype system is evaluated in the context of classification and is shown to provide state of the art results. The main outputs of this research are (1) an ontological model of terrorist ideologies (2) an information extraction framework capable of identifying and extracting terrorist ideologies from text, (3) a classification methodology for classifying Web content as resonating the ideology of one or more terrorist groups and (4) a methodology for rapidly identifying news content of relevance to one or more terrorist groups

    An employer demand intelligence framework

    Get PDF
    Employer demand intelligence is crucial to ensure accurate and reliable education, workforce and immigration related decisions are made. To date, current methods have been manually intensive and expensive — providing insufficient scope of information required to address such important economic implications. This research developed an Employer Demand Intelligence Framework (EDIF) to address detailed employer demand intelligence requirements. To further the EDIF’s functionality, a semi-automated Employer Demand Identification Tool (EDIT) was developed that continuously provide such intelligence

    The Digital Classicist 2013

    Get PDF
    This edited volume collects together peer-reviewed papers that initially emanated from presentations at Digital Classicist seminars and conference panels. This wide-ranging volume showcases exemplary applications of digital scholarship to the ancient world and critically examines the many challenges and opportunities afforded by such research. The chapters included here demonstrate innovative approaches that drive forward the research interests of both humanists and technologists while showing that rigorous scholarship is as central to digital research as it is to mainstream classical studies. As with the earlier Digital Classicist publications, our aim is not to give a broad overview of the field of digital classics; rather, we present here a snapshot of some of the varied research of our members in order to engage with and contribute to the development of scholarship both in the fields of classical antiquity and Digital Humanities more broadly

    The Digital Classicist 2013

    Get PDF
    This edited volume collects together peer-reviewed papers that initially emanated from presentations at Digital Classicist seminars and conference panels. This wide-ranging volume showcases exemplary applications of digital scholarship to the ancient world and critically examines the many challenges and opportunities afforded by such research. The chapters included here demonstrate innovative approaches that drive forward the research interests of both humanists and technologists while showing that rigorous scholarship is as central to digital research as it is to mainstream classical studies. As with the earlier Digital Classicist publications, our aim is not to give a broad overview of the field of digital classics; rather, we present here a snapshot of some of the varied research of our members in order to engage with and contribute to the development of scholarship both in the fields of classical antiquity and Digital Humanities more broadly

    Contextual Social Networking

    Get PDF
    The thesis centers around the multi-faceted research question of how contexts may be detected and derived that can be used for new context aware Social Networking services and for improving the usefulness of existing Social Networking services, giving rise to the notion of Contextual Social Networking. In a first foundational part, we characterize the closely related fields of Contextual-, Mobile-, and Decentralized Social Networking using different methods and focusing on different detailed aspects. A second part focuses on the question of how short-term and long-term social contexts as especially interesting forms of context for Social Networking may be derived. We focus on NLP based methods for the characterization of social relations as a typical form of long-term social contexts and on Mobile Social Signal Processing methods for deriving short-term social contexts on the basis of geometry of interaction and audio. We furthermore investigate, how personal social agents may combine such social context elements on various levels of abstraction. The third part discusses new and improved context aware Social Networking service concepts. We investigate special forms of awareness services, new forms of social information retrieval, social recommender systems, context aware privacy concepts and services and platforms supporting Open Innovation and creative processes. This version of the thesis does not contain the included publications because of copyrights of the journals etc. Contact in terms of the version with all included publications: Georg Groh, [email protected] zentrale Gegenstand der vorliegenden Arbeit ist die vielschichtige Frage, wie Kontexte detektiert und abgeleitet werden können, die dazu dienen können, neuartige kontextbewusste Social Networking Dienste zu schaffen und bestehende Dienste in ihrem Nutzwert zu verbessern. Die (noch nicht abgeschlossene) erfolgreiche Umsetzung dieses Programmes führt auf ein Konzept, das man als Contextual Social Networking bezeichnen kann. In einem grundlegenden ersten Teil werden die eng zusammenhängenden Gebiete Contextual Social Networking, Mobile Social Networking und Decentralized Social Networking mit verschiedenen Methoden und unter Fokussierung auf verschiedene Detail-Aspekte näher beleuchtet und in Zusammenhang gesetzt. Ein zweiter Teil behandelt die Frage, wie soziale Kurzzeit- und Langzeit-Kontexte als für das Social Networking besonders interessante Formen von Kontext gemessen und abgeleitet werden können. Ein Fokus liegt hierbei auf NLP Methoden zur Charakterisierung sozialer Beziehungen als einer typischen Form von sozialem Langzeit-Kontext. Ein weiterer Schwerpunkt liegt auf Methoden aus dem Mobile Social Signal Processing zur Ableitung sinnvoller sozialer Kurzzeit-Kontexte auf der Basis von Interaktionsgeometrien und Audio-Daten. Es wird ferner untersucht, wie persönliche soziale Agenten Kontext-Elemente verschiedener Abstraktionsgrade miteinander kombinieren können. Der dritte Teil behandelt neuartige und verbesserte Konzepte für kontextbewusste Social Networking Dienste. Es werden spezielle Formen von Awareness Diensten, neue Formen von sozialem Information Retrieval, Konzepte für kontextbewusstes Privacy Management und Dienste und Plattformen zur Unterstützung von Open Innovation und Kreativität untersucht und vorgestellt. Diese Version der Habilitationsschrift enthält die inkludierten Publikationen zurVermeidung von Copyright-Verletzungen auf Seiten der Journals u.a. nicht. Kontakt in Bezug auf die Version mit allen inkludierten Publikationen: Georg Groh, [email protected]
    corecore