43,859 research outputs found

    A Query Integrator and Manager for the Query Web

    Get PDF
    We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions

    Information Extraction in Illicit Domains

    Full text link
    Extracting useful entities and attribute values from illicit domains such as human trafficking is a challenging problem with the potential for widespread social impact. Such domains employ atypical language models, have `long tails' and suffer from the problem of concept drift. In this paper, we propose a lightweight, feature-agnostic Information Extraction (IE) paradigm specifically designed for such domains. Our approach uses raw, unlabeled text from an initial corpus, and a few (12-120) seed annotations per domain-specific attribute, to learn robust IE models for unobserved pages and websites. Empirically, we demonstrate that our approach can outperform feature-centric Conditional Random Field baselines by over 18\% F-Measure on five annotated sets of real-world human trafficking datasets in both low-supervision and high-supervision settings. We also show that our approach is demonstrably robust to concept drift, and can be efficiently bootstrapped even in a serial computing environment.Comment: 10 pages, ACM WWW 201

    Extracting information from short messages

    Get PDF
    Much currently transmitted information takes the form of e-mails or SMS text messages and so extracting information from such short messages is increasingly important. The words in a message can be partitioned into the syntactic structure, terms from the domain of discourse and the data being transmitted. This paper describes a light-weight Information Extraction component which uses pattern matching to separate the three aspects: the structure is supplied as a template; domain terms are the metadata of a data source (or their synonyms), and data is extracted as those words matching placeholders in the templates

    EC-CENTRIC: An Energy- and Context-Centric Perspective on IoT Systems and Protocol Design

    Get PDF
    The radio transceiver of an IoT device is often where most of the energy is consumed. For this reason, most research so far has focused on low power circuit and energy efficient physical layer designs, with the goal of reducing the average energy per information bit required for communication. While these efforts are valuable per se, their actual effectiveness can be partially neutralized by ill-designed network, processing and resource management solutions, which can become a primary factor of performance degradation, in terms of throughput, responsiveness and energy efficiency. The objective of this paper is to describe an energy-centric and context-aware optimization framework that accounts for the energy impact of the fundamental functionalities of an IoT system and that proceeds along three main technical thrusts: 1) balancing signal-dependent processing techniques (compression and feature extraction) and communication tasks; 2) jointly designing channel access and routing protocols to maximize the network lifetime; 3) providing self-adaptability to different operating conditions through the adoption of suitable learning architectures and of flexible/reconfigurable algorithms and protocols. After discussing this framework, we present some preliminary results that validate the effectiveness of our proposed line of action, and show how the use of adaptive signal processing and channel access techniques allows an IoT network to dynamically tune lifetime for signal distortion, according to the requirements dictated by the application
    corecore