3,135 research outputs found

    MAG: A Multilingual, Knowledge-base Agnostic and Deterministic Entity Linking Approach

    Full text link
    Entity linking has recently been the subject of a significant body of research. Currently, the best performing approaches rely on trained mono-lingual models. Porting these approaches to other languages is consequently a difficult endeavor as it requires corresponding training data and retraining of the models. We address this drawback by presenting a novel multilingual, knowledge-based agnostic and deterministic approach to entity linking, dubbed MAG. MAG is based on a combination of context-based retrieval on structured knowledge bases and graph algorithms. We evaluate MAG on 23 data sets and in 7 languages. Our results show that the best approach trained on English datasets (PBOH) achieves a micro F-measure that is up to 4 times worse on datasets in other languages. MAG, on the other hand, achieves state-of-the-art performance on English datasets and reaches a micro F-measure that is up to 0.6 higher than that of PBOH on non-English languages.Comment: Accepted in K-CAP 2017: Knowledge Capture Conferenc

    Multiple Retrieval Models and Regression Models for Prior Art Search

    Get PDF
    This paper presents the system called PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models (KL, Okapi) and term index definitions (lemma, phrase, concept) for the three languages considered in the present track (English, French, German) producing ten different sets of ranked results. 2. The merging of the different results based on multiple regression models using an additional validation set created from the patent collection. 3. The exploitation of patent metadata and of the citation structures for creating restricted initial working sets of patents and for producing a final re-ranking regression model. As we exploit specific metadata of the patent documents and the citation relations only at the creation of initial working sets and during the final post ranking step, our architecture remains generic and easy to extend

    Achieving Secure and Efficient Cloud Search Services: Cross-Lingual Multi-Keyword Rank Search over Encrypted Cloud Data

    Full text link
    Multi-user multi-keyword ranked search scheme in arbitrary language is a novel multi-keyword rank searchable encryption (MRSE) framework based on Paillier Cryptosystem with Threshold Decryption (PCTD). Compared to previous MRSE schemes constructed based on the k-nearest neighbor searcha-ble encryption (KNN-SE) algorithm, it can mitigate some draw-backs and achieve better performance in terms of functionality and efficiency. Additionally, it does not require a predefined keyword set and support keywords in arbitrary languages. However, due to the pattern of exact matching of keywords in the new MRSE scheme, multilingual search is limited to each language and cannot be searched across languages. In this pa-per, we propose a cross-lingual multi-keyword rank search (CLRSE) scheme which eliminates the barrier of languages and achieves semantic extension with using the Open Multilingual Wordnet. Our CLRSE scheme also realizes intelligent and per-sonalized search through flexible keyword and language prefer-ence settings. We evaluate the performance of our scheme in terms of security, functionality, precision and efficiency, via extensive experiments

    Extending, trimming and fusing WordNet for technical documents

    Get PDF
    This paper describes a tool for the automatic extension and trimming of a multilingual WordNet database for cross-lingual retrieval and multilingual ontology building in intranets and domain-specific document collections. Hierarchies, built from automatically extracted terms and combined with the WordNet relations, are trimmed with a disambiguation method based on the document salience of the words in the glosses. The disambiguation is tested in a cross-lingual retrieval task, showing considerable improvement (7%-11%). The condensed hierarchies can be used as browse-interfaces to the documents complementary to retrieval

    Archaeology in the Digital Age: From Paper to Databases

    Full text link
    Research units in archaeology often manage large and precious archives containing various documents, including reports on fieldwork, scholarly studies and reference books. These archives are of course invaluable, recording decades of work, but are generally hard to consult and access. In this context, digitizing full text documents is not enough: information must be formalized, structured and easy to access thanks to friendly user interfaces.Comment: Digital Humanities 2015, Jun 2015, Sydney, Australia. 2015, Proceedings of the conference "Digital Humanities 2015
    • …
    corecore