31,676 research outputs found

    Algorithms for cross-lingual data interlinking

    Get PDF
    lesnikova2015aInternational audienceLinked data technologies enable to publish and link structured data on the Web. Although RDF is not about text, many RDF data providers publish their data in their own language. Cross-lingual interlinking consists of discov- ering links between identical resources across data sets in different languages. In this report, we present a general framework for interlinking resources in different languages based on associating a specific representation to each re- source and computing a similarity between these representations. We describe and evaluate three methods using this approach: the two first methods are based on gathering virtual documents and translating them and the latter one represent them as bags of identifiers from a multilingual resource (BabelNet)

    Knowledge-based Biomedical Data Science 2019

    Full text link
    Knowledge-based biomedical data science (KBDS) involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey the progress in the last year in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing, and the expansion of knowledge-based approaches to novel domains, such as Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages with 3 table

    Pathways: Augmenting interoperability across scholarly repositories

    Full text link
    In the emerging eScience environment, repositories of papers, datasets, software, etc., should be the foundation of a global and natively-digital scholarly communications system. The current infrastructure falls far short of this goal. Cross-repository interoperability must be augmented to support the many workflows and value-chains involved in scholarly communication. This will not be achieved through the promotion of single repository architecture or content representation, but instead requires an interoperability framework to connect the many heterogeneous systems that will exist. We present a simple data model and service architecture that augments repository interoperability to enable scholarly value-chains to be implemented. We describe an experiment that demonstrates how the proposed infrastructure can be deployed to implement the workflow involved in the creation of an overlay journal over several different repository systems (Fedora, aDORe, DSpace and arXiv).Comment: 18 pages. Accepted for International Journal on Digital Libraries special issue on Digital Libraries and eScienc

    Exploiting Cross-Lingual Representations For Natural Language Processing

    Get PDF
    Traditional approaches to supervised learning require a generous amount of labeled data for good generalization. While such annotation-heavy approaches have proven useful for some Natural Language Processing (NLP) tasks in high-resource languages (like English), they are unlikely to scale to languages where collecting labeled data is di cult and time-consuming. Translating supervision available in English is also not a viable solution, because developing a good machine translation system requires expensive to annotate resources which are not available for most languages. In this thesis, I argue that cross-lingual representations are an effective means of extending NLP tools to languages beyond English without resorting to generous amounts of annotated data or expensive machine translation. These representations can be learned in an inexpensive manner, often from signals completely unrelated to the task of interest. I begin with a review of different ways of inducing such representations using a variety of cross-lingual signals and study algorithmic approaches of using them in a diverse set of downstream tasks. Examples of such tasks covered in this thesis include learning representations to transfer a trained model across languages for document classification, assist in monolingual lexical semantics like word sense induction, identify asymmetric lexical relationships like hypernymy between words in different languages, or combining supervision across languages through a shared feature space for cross-lingual entity linking. In all these applications, the representations make information expressed in other languages available in English, while requiring minimal additional supervision in the language of interest
    corecore