29,662 research outputs found

    Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings

    Full text link
    Keyword extraction is a fundamental task in natural language processing that facilitates mapping of documents to a concise set of representative single and multi-word phrases. Keywords from text documents are primarily extracted using supervised and unsupervised approaches. In this paper, we present an unsupervised technique that uses a combination of theme-weighted personalized PageRank algorithm and neural phrase embeddings for extracting and ranking keywords. We also introduce an efficient way of processing text documents and training phrase embeddings using existing techniques. We share an evaluation dataset derived from an existing dataset that is used for choosing the underlying embedding model. The evaluations for ranked keyword extraction are performed on two benchmark datasets comprising of short abstracts (Inspec), and long scientific papers (SemEval 2010), and is shown to produce results better than the state-of-the-art systems.Comment: preprint for paper accepted in Proceedings of 1st IEEE International Conference on Multimedia Information Processing and Retrieva

    A text-mining system for extracting metabolic reactions from full-text articles

    Get PDF
    Background: Increasingly biological text mining research is focusing on the extraction of complex relationships relevant to the construction and curation of biological networks and pathways. However, one important category of pathway—metabolic pathways—has been largely neglected. Here we present a relatively simple method for extracting metabolic reaction information from free text that scores different permutations of assigned entities (enzymes and metabolites) within a given sentence based on the presence and location of stemmed keywords. This method extends an approach that has proved effective in the context of the extraction of protein–protein interactions. Results: When evaluated on a set of manually-curated metabolic pathways using standard performance criteria, our method performs surprisingly well. Precision and recall rates are comparable to those previously achieved for the well-known protein-protein interaction extraction task. Conclusions: We conclude that automated metabolic pathway construction is more tractable than has often been assumed, and that (as in the case of protein–protein interaction extraction) relatively simple text-mining approaches can prove surprisingly effective. It is hoped that these results will provide an impetus to further research and act as a useful benchmark for judging the performance of more sophisticated methods that are yet to be developed

    Hybrid Search: Effectively Combining Keywords and Semantic Searches

    Get PDF
    This paper describes hybrid search, a search method supporting both document and knowledge retrieval via the flexible combination of ontologybased search and keyword-based matching. Hybrid search smoothly copes with lack of semantic coverage of document content, which is one of the main limitations of current semantic search methods. In this paper we define hybrid search formally, discuss its compatibility with the current semantic trends and present a reference implementation: K-Search. We then show how the method outperforms both keyword-based search and pure semantic search in terms of precision and recall in a set of experiments performed on a collection of about 18.000 technical documents. Experiments carried out with professional users show that users understand the paradigm and consider it very powerful and reliable. K-Search has been ported to two applications released at Rolls-Royce plc for searching technical documentation about jet engines

    Highly focused document retrieval in aerospace engineering : user interaction design and evaluation

    Get PDF
    Purpose – This paper seeks to describe the preliminary studies (on both users and data), the design and evaluation of the K-Search system for searching legacy documents in aerospace engineering. Real-world reports of jet engine maintenance challenge the current indexing practice, while real users’ tasks require retrieving the information in the proper context. K-Search is currently in use in Rolls-Royce plc and has evolved to include other tools for knowledge capture and management. Design/methodology/approach – Semantic Web techniques have been used to automatically extract information from the reports while maintaining the original context, allowing a more focused retrieval than with more traditional techniques. The paper combines semantic search with classical information retrieval to increase search effectiveness. An innovative user interface has been designed to take advantage of this hybrid search technique. The interface is designed to allow a flexible and personal approach to searching legacy data. Findings – The user evaluation showed that the system is effective and well received by users. It also shows that different people look at the same data in different ways and make different use of the same system depending on their individual needs, influenced by their job profile and personal attitude. Research limitations/implications – This study focuses on a specific case of an enterprise working in aerospace engineering. Although the findings are likely to be shared with other engineering domains (e.g. mechanical, electronic), the study does not expand the evaluation to different settings. Originality/value – The study shows how real context of use can provide new and unexpected challenges to researchers and how effective solutions can then be adopted and used in organizations.</p
    • …
    corecore