826 research outputs found

    Semantic and Syntactic Matching of Heterogeneous e-Catalogues

    Get PDF
    In e-procurement, companies use e-catalogues to exchange product infor-mation with business partners. Matching e-catalogues with product requests helps the suppliers to identify the best business opportunities in B2B e-Marketplaces. But various ways to specify products and the large variety of e-catalogue formats used by different business actors makes it difficult. This Ph.D. thesis aims to discover potential syntactic and semantic rela-tionships among product data in procurement documents and exploit it to find similar e-catalogues. Using a Concept-based Vector Space Model, product data and its semantic interpretation is used to find the correlation of product data. In order to identify important terms in procurement documents, standard e-catalogues and e-tenders are used as a resource to train a Product Named Entity Recognizer to find B2B product mentions in e-catalogues. The proposed approach makes it possible to use the benefits of all availa-ble semantic resources and schemas but not to be dependent on any specific as-sumption. The solution can serve as a B2B product search system in e-Procurement platforms and e-Marketplaces

    Improving average ranking precision in user searches for biomedical research datasets

    Full text link
    Availability of research datasets is keystone for health and life science study reproducibility and scientific progress. Due to the heterogeneity and complexity of these data, a main challenge to be overcome by research data management systems is to provide users with the best answers for their search queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we investigate a novel ranking pipeline to improve the search of datasets used in biomedical experiments. Our system comprises a query expansion model based on word embeddings, a similarity measure algorithm that takes into consideration the relevance of the query terms, and a dataset categorisation method that boosts the rank of datasets matching query constraints. The system was evaluated using a corpus with 800k datasets and 21 annotated user queries. Our system provides competitive results when compared to the other challenge participants. In the official run, it achieved the highest infAP among the participants, being +22.3% higher than the median infAP of the participant's best submissions. Overall, it is ranked at top 2 if an aggregated metric using the best official measures per participant is considered. The query expansion method showed positive impact on the system's performance increasing our baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively. Our similarity measure algorithm seems to be robust, in particular compared to Divergence From Randomness framework, having smaller performance variations under different training conditions. Finally, the result categorization did not have significant impact on the system's performance. We believe that our solution could be used to enhance biomedical dataset management systems. In particular, the use of data driven query expansion methods could be an alternative to the complexity of biomedical terminologies

    Enriching ontological user profiles with tagging history for multi-domain recommendations

    Get PDF
    Many advanced recommendation frameworks employ ontologies of various complexities to model individuals and items, providing a mechanism for the expression of user interests and the representation of item attributes. As a result, complex matching techniques can be applied to support individuals in the discovery of items according to explicit and implicit user preferences. Recently, the rapid adoption of Web2.0, and the proliferation of social networking sites, has resulted in more and more users providing an increasing amount of information about themselves that could be exploited for recommendation purposes. However, the unification of personal information with ontologies using the contemporary knowledge representation methods often associated with Web2.0 applications, such as community tagging, is a non-trivial task. In this paper, we propose a method for the unification of tags with ontologies by grounding tags to a shared representation in the form of Wordnet and Wikipedia. We incorporate individuals' tagging history into their ontological profiles by matching tags with ontology concepts. This approach is preliminary evaluated by extending an existing news recommendation system with user tagging histories harvested from popular social networking sites

    Machine Learning in Automated Text Categorization

    Full text link
    The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey

    A history and theory of textual event detection and recognition

    Get PDF

    Information retrieval (Part 2):Document representations

    Get PDF

    Natural Language Processing for Teaching Ancient Languages

    Get PDF
    Konstantin Schulz shows various applications of natural language processing (NLP) to the field of Classics, especially to Latin texts. He addresses different levels of linguistic analysis while also highlighting educational benefits and important theoretical pitfalls, especially in vocabulary learning. NLP can solve some problems reasonably well, like tailoring exercises to the learners' current state of knowledge. However, some tasks still prove to be too difficult for machines at the moment, e.g. reliable and highly accurate parsing of syntax for historical languages
    • …
    corecore