13,462 research outputs found

    Web Service Retrieval by Structured Models

    Get PDF
    Much of the information available on theWorldWideWeb cannot effectively be found by the help of search engines because the information is dynamically generated on a user’s request.This applies to online decision support services as well as Deep Web information. We present in this paper a retrieval system that uses a variant of structured modeling to describe such information services, and similarity of models for retrieval. The computational complexity of the similarity problem is discussed, and graph algorithms for retrieval on repositories of service descriptions are introduced. We show how bounds for combinatorial optimization problems can provide filter algorithms in a retrieval context. We report about an evaluation of the retrieval system in a classroom experiment and give computational results on a benchmark library.Economics ;

    Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics.

    Get PDF
    The annotation of small molecules remains a major challenge in untargeted mass spectrometry-based metabolomics. We here critically discuss structured elucidation approaches and software that are designed to help during the annotation of unknown compounds. Only by elucidating unknown metabolites first is it possible to biologically interpret complex systems, to map compounds to pathways and to create reliable predictive metabolic models for translational and clinical research. These strategies include the construction and quality of tandem mass spectral databases such as the coalition of MassBank repositories and investigations of MS/MS matching confidence. We present in silico fragmentation tools such as MS-FINDER, CFM-ID, MetFrag, ChemDistiller and CSI:FingerID that can annotate compounds from existing structure databases and that have been used in the CASMI (critical assessment of small molecule identification) contests. Furthermore, the use of retention time models from liquid chromatography and the utility of collision cross-section modelling from ion mobility experiments are covered. Workflows and published examples of successfully annotated unknown compounds are included

    A Logic-Independent IDE

    Full text link
    The author's MMT system provides a framework for defining and implementing logical systems. By combining MMT with the jEdit text editor, we obtain a logic-independent IDE. The IDE functionality includes advanced features such as context-sensitive auto-completion, search, and change management.Comment: In Proceedings UITP 2014, arXiv:1410.785

    Data analytics 2016: proceedings of the fifth international conference on data analytics

    Get PDF

    Knowledge Base Population using Semantic Label Propagation

    Get PDF
    A crucial aspect of a knowledge base population system that extracts new facts from text corpora, is the generation of training data for its relation extractors. In this paper, we present a method that maximizes the effectiveness of newly trained relation extractors at a minimal annotation cost. Manual labeling can be significantly reduced by Distant Supervision, which is a method to construct training data automatically by aligning a large text corpus with an existing knowledge base of known facts. For example, all sentences mentioning both 'Barack Obama' and 'US' may serve as positive training instances for the relation born_in(subject,object). However, distant supervision typically results in a highly noisy training set: many training sentences do not really express the intended relation. We propose to combine distant supervision with minimal manual supervision in a technique called feature labeling, to eliminate noise from the large and noisy initial training set, resulting in a significant increase of precision. We further improve on this approach by introducing the Semantic Label Propagation method, which uses the similarity between low-dimensional representations of candidate training instances, to extend the training set in order to increase recall while maintaining high precision. Our proposed strategy for generating training data is studied and evaluated on an established test collection designed for knowledge base population tasks. The experimental results show that the Semantic Label Propagation strategy leads to substantial performance gains when compared to existing approaches, while requiring an almost negligible manual annotation effort.Comment: Submitted to Knowledge Based Systems, special issue on Knowledge Bases for Natural Language Processin

    A Survey of Location Prediction on Twitter

    Full text link
    Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and people's daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
    • …
    corecore