162 research outputs found

    An Integrated Smart City Platform

    Get PDF
    Smart Cities aim to create a higher quality of life for their citizens, improve business services and promote tourism experience. Fostering smart city innovation at local and regional level requires a set of mature technologies to discover, integrate and harmonize multiple data sources and the exposure of eective applications for end-users (citizens, administrators, tourists...). In this context, Semantic Web technologies and Linked Open Data principles provide a means for sharing knowledge about cities as physical, economical, social, and technical systems, enabling the development of smart city services. Despite the tremendous effort these communities have done so far, there exists a lack of comprehensive and effective platforms that handle the entire process of identication, ingestion, consumption and publication of data for Smart Cities. In this paper, a complete open-source platform to boost the integration, semantic enrichment, publication and exploitation of public data to foster smart cities in local and national administrations is proposed. Starting from mature software solutions, we propose a platform to facilitate the harmonization of datasets (open and private, static and dynamic on real time) of the same domain generated by dierent authorities. The platform provides a unied dataset oriented to smart cities that can be exploited to offer services to the citizens in a uniform way, to easily release open data, and to monitor services status of the city in real time by means of a suite of web applications

    Statistical Extraction of Multilingual Natural Language Patterns for RDF Predicates: Algorithms and Applications

    Get PDF
    The Data Web has undergone a tremendous growth period. It currently consists of more then 3300 publicly available knowledge bases describing millions of resources from various domains, such as life sciences, government or geography, with over 89 billion facts. In the same way, the Document Web grew to the state where approximately 4.55 billion websites exist, 300 million photos are uploaded on Facebook as well as 3.5 billion Google searches are performed on average every day. However, there is a gap between the Document Web and the Data Web, since for example knowledge bases available on the Data Web are most commonly extracted from structured or semi-structured sources, but the majority of information available on the Web is contained in unstructured sources such as news articles, blog post, photos, forum discussions, etc. As a result, data on the Data Web not only misses a significant fragment of information but also suffers from a lack of actuality since typical extraction methods are time-consuming and can only be carried out periodically. Furthermore, provenance information is rarely taken into consideration and therefore gets lost in the transformation process. In addition, users are accustomed to entering keyword queries to satisfy their information needs. With the availability of machine-readable knowledge bases, lay users could be empowered to issue more specific questions and get more precise answers. In this thesis, we address the problem of Relation Extraction, one of the key challenges pertaining to closing the gap between the Document Web and the Data Web by four means. First, we present a distant supervision approach that allows finding multilingual natural language representations of formal relations already contained in the Data Web. We use these natural language representations to find sentences on the Document Web that contain unseen instances of this relation between two entities. Second, we address the problem of data actuality by presenting a real-time data stream RDF extraction framework and utilize this framework to extract RDF from RSS news feeds. Third, we present a novel fact validation algorithm, based on natural language representations, able to not only verify or falsify a given triple, but also to find trustworthy sources for it on the Web and estimating a time scope in which the triple holds true. The features used by this algorithm to determine if a website is indeed trustworthy are used as provenance information and therewith help to create metadata for facts in the Data Web. Finally, we present a question answering system that uses the natural language representations to map natural language question to formal SPARQL queries, allowing lay users to make use of the large amounts of data available on the Data Web to satisfy their information need

    Using semantic technologies to resolve heterogeneity issues in sustainability and disaster management knowledge bases

    Get PDF
    This thesis examines issues of semantic heterogeneity in the domains of sustainability indicators and disaster management. We propose a model that links two domains with the following logic. While disaster management implies a proper and efficient response to a risk that has materialised as a disaster, sustainability can be defined as the preparedness to unexpected situations by applying measurements such as sustainability indicators. As a step to this direction, we investigate how semantic technologies can tackle the issues of heterogeneity in the aforementioned domains. First, we consider approaches to resolve the heterogeneity issues of representing the key concepts of sustainability indicator sets. To develop a knowledge base, we apply the METHONTOLOGY approach to guide the construction of two ontology design candidates: generic and specic. Of the two, the generic design is more abstract, with fewer classes and properties. Documents describing two indicator systems - the Global Reporting Initiative and the Organisation for Economic Co-operation and Development - are used in the design of both candidate ontologies. We then evaluate both ontology designs using the ROMEO approach, to calculate their level of coverage against the seen indicators, as well as against an unseen third indicator set (the United Nations Statistics Division). We also show that use of existing structured approaches like METHONTOLOGY and ROMEO can reduce ambiguity in ontology design and evaluation for domain-level ontologies. It is concluded that where an ontology needs to be designed for both seen and unseen indicator systems, a generic and reusable design is preferable. Second, having addressed the heterogeneity issues at the data level of sustainability indicators in the first phase of the research, we then develop a software for a sustainability reporting framework - Circles of Sustainability - which provides two mechanisms for browsing heterogeneous sustainability indicator sets: a Tabular view and a Circular view. In particular, the generic design of ontology developed during the first phase of the research is applied to this software. Next, we evaluate the overall usefulness and ease of use for the presented software and the associated user interfaces by conducting a user study. The analysis of quantitative and qualitative results of the user study concludes that the Circular view is the preferred interface by most participants for browsing semantic heterogeneous indicators. Third, in the context of disaster management, we present a geotagger method for the OzCrisisTracker application that automatically detects and disambiguates the heterogeneity of georeferences mentioned in the tweets' content with three possibilities: definite, ambiguous and no-location. Our method semantically annotates the tweet components utilising existing and new ontologies. We also concluded that the accuracy of geographic focus of our geotagger is considerably higher than other systems. From a more general perspective the research contributions can be articulated as follows. The knowledge bases developed in this research have been applied to the two domain applications. The thesis therefore demonstrates how semantic technologies, such as ontology design patterns, browsing tools and geocoding, can untangle data representation and navigation issues of semantic heterogeneity in sustainability and disaster management domains

    Informationsbeschaffung aus digitalen Textressourcen - Domänenadaptive Verfahren zur Strukturierung heterogener Textdokumente

    Get PDF
    In der heutigen Informationsgesellschaft sind Personen häufig mit der sogenannten Informationsüberflutung konfrontiert. Dies bedeutet, dass es aufgrund der enormen Menge insbesondere digital verfügbarer textueller Ressourcen zu einer Überforderung bei der Identifikation relevanter Informationen kommen kann. Bislang ist eine Unterstützung bei dieser Aufgabe vorrangig über Volltextsuchen in Textsammlungen möglich, die jedoch keine komplexen Suchanfragen mit Beschreibung unterschiedlicher Aspekte der Suchanfrage erlauben. Werkzeuge zur elaborierten Suche, welche es erlauben, einzelne Aspekte der zu suchenden Information zu beschreiben, existieren nur in spezifischen Domänen. Ein wesentlicher Grund hierfür ist, dass die zu durchsuchenden digitalen Textressourcen meist in unstrukturierter Form vorliegen. Damit ist kein einheitlicher, gezielter Zugriff auf spezifische Informationen innerhalb der Dokumente möglich, welcher die Realisierung solcher Werkzeuge vereinfachen würde. Strukturierte Repräsentationen der Dokumente, in denen die Bedeutung einzelner Textfragmente für die in den Dokumenten beschriebenen Entitäten zu erkennen ist, würden diesen Zugriff ermöglichen. Im Rahmen dieser Dissertation wird untersucht, mit welchen Verfahren textuelle Dokumente automatisiert in eine strukturierte Repräsentation überführt werden können. Existierende Ansätze mit gleicher oder ähnlicher Zielsetzung sind meist für spezifische Anwendungsdomänen entwickelt und lassen sich nur schwer in andere Domänen übertragen. Bei Einsatz in neuen Domänen müssen bislang somit vollständig neue Ansätze zur Strukturierung entworfen werden oder zur Übertragung von Ansätzen ein großer manueller Aufwand erbracht werden. Daraus resultiert die Notwendigkeit, domänenadaptive Verfahren zur Strukturierung von Textressourcen zu entwickeln. Dem steht als wesentliche Herausforderung die Heterogenität von Anwendungsdomänen hinsichtlich verschiedener Kriterien wie verwendeter Dokumentenformate, vorherrschender Textlänge und domänenspezifischer Terminologie entgegen. Die Untersuchung von fünf ausgewählten heterogenen Anwendungsdomänen zeigte, dass bestimmte Typen von Informationen domänenübergreifend von Relevanz sind. Daher wurden für drei dieser Typen Verfahren konzipiert, welche Informationen dieser Typen in heterogenen Dokumenten identifizieren können. Hierbei wurde sichergestellt, dass für die erstmalige Anwendung der Verfahren in einer spezifischen Domäne möglichst wenig manueller Aufwand erforderlich ist, um die Anforderung der Domänenadaptivität der Verfahren zu berücksichtigen. Zur Reduktion des manuellen Aufwands wurden Techniken des maschinellen Lernens, wie der Ansatz des Active Learning, sowie existierende, frei verfügbare Wissensbasen verwendet. Die konzipierten Verfahren wurden implementiert und unter Verwendung von Textkorpora aus den zuvor analysierten Domänen evaluiert. Dabei konnte gezeigt werden, dass die Identifikation von Informationen dieser drei Typen mit hoher Güte möglich ist und gleichzeitig eine gute Domänenadaptivität erreicht wird. Weiterhin wurden unabhängige Verfahren zur Identifikation von Informationen der einzelnen Typen kombiniert, um eine Strukturierung kompletter Dokumente durchführen zu können. Dieses Konzept wurde in einer Fallstudie für eine der Anwendungsdomänen implementiert und unter Verwendung eines Textkorpus aus dieser Domäne evaluiert. Die Resultate bestätigen, dass eine Strukturierung mittels Kombination der Verfahren zur Identifikation der Informationen der einzelnen Typen erreicht werden kann. Unter Verwendung der in dieser Dissertation vorgestellten domänenadaptiven Verfahren lassen sich strukturierte Repräsentationen aus unstrukturierten digitalen Textressourcen erstellen, die die vereinfachte Realisierung von Werkzeugen zur Informationsbeschaffung ermöglichen. Die daraus resultierenden Möglichkeiten für elaborierte Werkzeuge zur Informationsbeschaffung reduzieren die Überforderung der Nutzer bei der Identifikation relevanter Informationen

    Human Resource Management in Emergency Situations

    Get PDF
    The dissertation examines the issues related to the human resource management in emergency situations and introduces the measures helping to solve these issues. The prime aim is to analyse complexly a human resource management, built environment resilience management life cycle and its stages for the purpose of creating an effective Human Resource Management in Emergency Situations Model and Intelligent System. This would help in accelerating resilience in every stage, managing personal stress and reducing disaster-related losses. The dissertation consists of an Introduction, three Chapters, the Conclusions, References, List of Author’s Publications and nine Appendices. The introduction discusses the research problem and the research relevance, outlines the research object, states the research aim and objectives, overviews the research methodology and the original contribution of the research, presents the practical value of the research results, and lists the defended propositions. The introduction concludes with an overview of the author’s publications and conference presentations on the topic of this dissertation. Chapter 1 introduces best practice in the field of disaster and resilience management in the built environment. It also analyses disaster and resilience management life cycle ant its stages, reviews different intelligent decision support systems, and investigates researches on application of physiological parameters and their dependence on stress. The chapter ends with conclusions and the explicit objectives of the dissertation. Chapter 2 of the dissertation introduces the conceptual model of human resource management in emergency situations. To implement multiple criteria analysis of the research object the methods of multiple criteria analysis and mahematics are proposed. They should be integrated with intelligent technologies. In Chapter 3 the model developed by the author and the methods of multiple criteria analysis are adopted by developing the Intelligent Decision Support System for a Human Resource Management in Emergency Situations consisting of four subsystems: Physiological Advisory Subsystem to Analyse a User’s Post-Disaster Stress Management; Text Analytics Subsystem; Recommender Thermometer for Measuring the Preparedness for Resilience and Subsystem of Integrated Virtual and Intelligent Technologies. The main statements of the thesis were published in eleven scientific articles: two in journals listed in the Thomson Reuters ISI Web of Science, one in a peer-reviewed scientific journal, four in peer-reviewed conference proceedings referenced in the Thomson Reuters ISI database, and three in peer-reviewed conference proceedings in Lithuania. Five presentations were given on the topic of the dissertation at conferences in Lithuania and other countries

    Toponym Resolution in Text

    Get PDF
    Institute for Communicating and Collaborative SystemsBackground. In the area of Geographic Information Systems (GIS), a shared discipline between informatics and geography, the term geo-parsing is used to describe the process of identifying names in text, which in computational linguistics is known as named entity recognition and classification (NERC). The term geo-coding is used for the task of mapping from implicitly geo-referenced datasets (such as structured address records) to explicitly geo-referenced representations (e.g., using latitude and longitude). However, present-day GIS systems provide no automatic geo-coding functionality for unstructured text. In Information Extraction (IE), processing of named entities in text has traditionally been seen as a two-step process comprising a flat text span recognition sub-task and an atomic classification sub-task; relating the text span to a model of the world has been ignored by evaluations such as MUC or ACE (Chinchor (1998); U.S. NIST (2003)). However, spatial and temporal expressions refer to events in space-time, and the grounding of events is a precondition for accurate reasoning. Thus, automatic grounding can improve many applications such as automatic map drawing (e.g. for choosing a focus) and question answering (e.g. , for questions like How far is London from Edinburgh?, given a story in which both occur and can be resolved). Whereas temporal grounding has received considerable attention in the recent past (Mani and Wilson (2000); Setzer (2001)), robust spatial grounding has long been neglected. Concentrating on geographic names for populated places, I define the task of automatic Toponym Resolution (TR) as computing the mapping from occurrences of names for places as found in a text to a representation of the extensional semantics of the location referred to (its referent), such as a geographic latitude/longitude footprint. The task of mapping from names to locations is hard due to insufficient and noisy databases, and a large degree of ambiguity: common words need to be distinguished from proper names (geo/non-geo ambiguity), and the mapping between names and locations is ambiguous (London can refer to the capital of the UK or to London, Ontario, Canada, or to about forty other Londons on earth). In addition, names of places and the boundaries referred to change over time, and databases are incomplete. Objective. I investigate how referentially ambiguous spatial named entities can be grounded, or resolved, with respect to an extensional coordinate model robustly on open-domain news text. I begin by comparing the few algorithms proposed in the literature, and, comparing semiformal, reconstructed descriptions of them, I factor out a shared repertoire of linguistic heuristics (e.g. rules, patterns) and extra-linguistic knowledge sources (e.g. population sizes). I then investigate how to combine these sources of evidence to obtain a superior method. I also investigate the noise effect introduced by the named entity tagging step that toponym resolution relies on in a sequential system pipeline architecture. Scope. In this thesis, I investigate a present-day snapshot of terrestrial geography as represented in the gazetteer defined and, accordingly, a collection of present-day news text. I limit the investigation to populated places; geo-coding of artifact names (e.g. airports or bridges), compositional geographic descriptions (e.g. 40 miles SW of London, near Berlin), for instance, is not attempted. Historic change is a major factor affecting gazetteer construction and ultimately toponym resolution. However, this is beyond the scope of this thesis. Method. While a small number of previous attempts have been made to solve the toponym resolution problem, these were either not evaluated, or evaluation was done by manual inspection of system output instead of curating a reusable reference corpus. Since the relevant literature is scattered across several disciplines (GIS, digital libraries, information retrieval, natural language processing) and descriptions of algorithms are mostly given in informal prose, I attempt to systematically describe them and aim at a reconstruction in a uniform, semi-formal pseudo-code notation for easier re-implementation. A systematic comparison leads to an inventory of heuristics and other sources of evidence. In order to carry out a comparative evaluation procedure, an evaluation resource is required. Unfortunately, to date no gold standard has been curated in the research community. To this end, a reference gazetteer and an associated novel reference corpus with human-labeled referent annotation are created. These are subsequently used to benchmark a selection of the reconstructed algorithms and a novel re-combination of the heuristics catalogued in the inventory. I then compare the performance of the same TR algorithms under three different conditions, namely applying it to the (i) output of human named entity annotation, (ii) automatic annotation using an existing Maximum Entropy sequence tagging model, and (iii) a na¨ıve toponym lookup procedure in a gazetteer. Evaluation. The algorithms implemented in this thesis are evaluated in an intrinsic or component evaluation. To this end, we define a task-specific matching criterion to be used with traditional Precision (P) and Recall (R) evaluation metrics. This matching criterion is lenient with respect to numerical gazetteer imprecision in situations where one toponym instance is marked up with different gazetteer entries in the gold standard and the test set, respectively, but where these refer to the same candidate referent, caused by multiple near-duplicate entries in the reference gazetteer. Main Contributions. The major contributions of this thesis are as follows: • A new reference corpus in which instances of location named entities have been manually annotated with spatial grounding information for populated places, and an associated reference gazetteer, from which the assigned candidate referents are chosen. This reference gazetteer provides numerical latitude/longitude coordinates (such as 51320 North, 0 50 West) as well as hierarchical path descriptions (such as London > UK) with respect to a world wide-coverage, geographic taxonomy constructed by combining several large, but noisy gazetteers. This corpus contains news stories and comprises two sub-corpora, a subset of the REUTERS RCV1 news corpus used for the CoNLL shared task (Tjong Kim Sang and De Meulder (2003)), and a subset of the Fourth Message Understanding Contest (MUC-4; Chinchor (1995)), both available pre-annotated with gold-standard. This corpus will be made available as a reference evaluation resource; • a new method and implemented system to resolve toponyms that is capable of robustly processing unseen text (open-domain online newswire text) and grounding toponym instances in an extensional model using longitude and latitude coordinates and hierarchical path descriptions, using internal (textual) and external (gazetteer) evidence; • an empirical analysis of the relative utility of various heuristic biases and other sources of evidence with respect to the toponym resolution task when analysing free news genre text; • a comparison between a replicated method as described in the literature, which functions as a baseline, and a novel algorithm based on minimality heuristics; and • several exemplary prototypical applications to show how the resulting toponym resolution methods can be used to create visual surrogates for news stories, a geographic exploration tool for news browsing, geographically-aware document retrieval and to answer spatial questions (How far...?) in an open-domain question answering system. These applications only have demonstrative character, as a thorough quantitative, task-based (extrinsic) evaluation of the utility of automatic toponym resolution is beyond the scope of this thesis and left for future work

    Recommendation in Enterprise 2.0 Social Media Streams

    Get PDF
    A social media stream allows users to share user-generated content as well as aggregate different external sources into one single stream. In Enterprise 2.0 such social media streams empower co-workers to share their information and to work efficiently and effectively together while replacing email communication. As more users share information it becomes impossible to read the complete stream leading to an information overload. Therefore, it is crucial to provide the users a personalized stream that suggests important and unread messages. The main characteristic of an Enterprise 2.0 social media stream is that co-workers work together on projects represented by topics: the stream is topic-centered and not user-centered as in public streams such as Facebook or Twitter. A lot of work has been done dealing with recommendation in a stream or for news recommendation. However, none of the current research approaches deal with the characteristics of an Enterprise 2.0 social media stream to recommend messages. The existing systems described in the research mainly deal with news recommendation for public streams and lack the applicability for Enterprise 2.0 social media streams. In this thesis a recommender concept is developed that allows the recommendation of messages in an Enterprise 2.0 social media stream. The basic idea is to extract features from a new message and use those features to compute a relevance score for a user. Additionally, those features are used to learn a user model and then use the user model for scoring new messages. This idea works without using explicit user feedback and assures a high user acceptance because no intense rating of messages is necessary. With this idea a content-based and collaborative-based approach is developed. To reflect the topic-centered streams a topic-specific user model is introduced which learns a user model independently for each topic. There are constantly new terms that occur in the stream of messages. For improving the quality of the recommendation (by finding more relevant messages) the recommender should be able to handle the new terms. Therefore, an approach is developed which adapts a user model if unknown terms occur by using terms of similar users or topics. Also, a short- and long-term approach is developed which tries to detect short-term interests of users. Only if the interest of a user occurs repeatedly over a certain time span are terms transferred to the long-term user model. The approaches are evaluated against a dataset obtained through an Enterprise 2.0 social media stream application. The evaluation shows the overall applicability of the concept. Specifically the evaluation shows that a topic-specific user model outperforms a global user model and also that adapting the user model according to similar users leads to an increase in the quality of the recommendation. Interestingly, the collaborative-based approach cannot reach the quality of the content-based approach

    Analyzing Granger causality in climate data with time series classification methods

    Get PDF
    Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested
    corecore