247 research outputs found

    Is it possible to enrich ontologies with a specialized domain linguistic resource?

    Get PDF
    Enriching ontologies with linguistic resources is considered an important target in natural language applications. These linguistic resources should contain not only linguistic but knowledge information. However the linguistic resources available, such as WordNet, are built around lexical relations such as synonymy, antonym, hyponymy, etc. and they do not provide enough information for ontology building. On the other hand, ontologies building requires deeper and more accurate knowledge than general vocabulary contains and, consequently, demands specialized domain resources. This paper presents a linguistic resource developed for Spanish, that has been built followingsome Meaning-Text Theory principles, in order to contain as much possible knowledge related to several specialized domains

    Multilingual Knowledge Base Completion by Cross-lingual Semantic Relation Inference

    Get PDF
    International audienceIn the present paper, we propose a simple en-dogenous method for enhancing a multilingual knowledge base through the cross-lingual semantic relation inference. It can be run on multilingual resources prior to semantic representation learning. Multilingual knowledge bases may integrate preexisting structured resources available for resource-rich languages. We aim at performing cross-lingual inference on them to improve the low resource language by creating semantic relationships

    Judging Ordinary Meaning

    Get PDF
    Judges generally begin their interpretive task by looking for the ordinary meaning of the language of the law. And they often end there - out of respect for the notice function of the law or deference to the presumed intent of the lawmaker. Most everyone agrees on the primacy of the ordinary meaning rule. Yet scholars roundly bemoan the indeterminacy of the communicative content of the language of the law. And they pivot quickly to other grounds for interpretation

    Bibliographic Control in the Digital Ecosystem

    Get PDF
    With the contributions of international experts, the book aims to explore the new boundaries of universal bibliographic control. Bibliographic control is radically changing because the bibliographic universe is radically changing: resources, agents, technologies, standards and practices. Among the main topics addressed: library cooperation networks; legal deposit; national bibliographies; new tools and standards (IFLA LRM, RDA, BIBFRAME); authority control and new alliances (Wikidata, Wikibase, Identifiers); new ways of indexing resources (artificial intelligence); institutional repositories; new book supply chain; “discoverability” in the IIIF digital ecosystem; role of thesauri and ontologies in the digital ecosystem; bibliographic control and search engines

    Meetodid avalike geeniekspressiooni andmete taaskasutamiseks

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsioone.Avalikud geeniekspressiooni andmebaasid sisaldavad andmeid rohkem kui miljoni bioloogilise proovi kohta, mis on pärit sadadest erinevatest kudedest ja haigustest. Sealjuures iga proovi kohta on teda sisuliselt kõigi geenide avaldumismuster. Nii on tekkinud olukord, kus on võimalik sooritada bioloogilisi uuringuid ilma katseid tegemata, kasutades vaid olemasolevaid andmeid. Andmestike suurus aga esitab mitmeid väljakutseid: korrektne analüüs nõuab spetsiifilisi statistilisi teadmisi, vajalik info on peidetud suure hulga ebavajaliku taha ning analüüs ise on töömahukas. Kõik need põhjused takistavad avalike andmete laiemat kasutuselevõttu. Antud töö eesmärk on muuta geeniekspressiooni andmete taaskasutamist, läbi meetodite ja tööriistade arendamise, efektiivsemaks ja kättesaadavamaks. Üks suuremaid probleeme andmete taaskasutamisel on nende ligipääsetavus. Seetõttu oleme loonud kaks veebikeskkonda, mis võimaldavad sooritada keerukaid analüüse avalikel andmetel kasutajasõbralikul moel. Neist esimene visualiseerib embrüonaalsete tüvirakkide kohta käivaid andmeid, mis pärinevad FunGenES konsortsiumist. Teine aga võimaldab otsida sarnase käitumisega geene üle sadade avalike andmestike. Teostades analüüse üle paljude andmestike tekib paratamatult vajadus saadud tulemusi omavahel ühendada. Selleks lõime algoritmi astakute agregeerimiseks, mis on kohandatud just geeni nimekirjade jaoks. Uurides mitmeid andmestikke korraga, on oluline neist kõigist omada sisulist ülevaadet. Selle hõlbustamiseks oleme välja töötanud visualiseerimismeetodi, mis suudab vähese vaevaga tekitada kompaktseid, kuid informatiivseid ülevaateid geeniekspressiooni andmetest. Tutvustatud meetodid ja tööriistad on loodud praktilisi vajadusi silmas pidades ning kõik nad on leidnud juba ka rakendust erinevates uuringutes.Public gene expression databases contain data about more than million biological samples, from hundreds of tissues and diseases. In principle, we know the expression pattern for all genes in these samples. Thus, we have a situation, where it is possible to carry out biological studies without performing new experiments. The size of the datasets, however, poses several challenges: appropriate analysis requires specific statistical skills, useful information is well hidden in the datasets and the analysis itself is time consuming. All these reasons prevent the wider usage of public gene expression data. The goal of this thesis is to facilitate re-use of expression data by developing analysis methods and tools. One of the biggest obstacles for re-using expression data is its accessibility. For that reason, we have created two web environments that allow to run complex analysis pipelines on public gene expression data. First of those visualises embryonic stem cell data from FunGenES consortium. The other allows to search for genes with similar behaviour across hundreds of public datasets. By performing analyses over multiple datasets there will be eventually need for integration of the results. For this task we created a rank aggregation algorithm that is specifically designed for lists of genes. When studying multiple datasets it is important to have good overview of their contents. To allow rapid functional characterization of datasets, we have created a visualisation method that can create compact but informative visual summaries of the data. Methods and tools described here, have been created with practical considerations in mind and have already been used in various studies

    Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources

    Get PDF
    Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen

    Representation and parsing of multiword expressions

    Get PDF
    This book consists of contributions related to the definition, representation and parsing of MWEs. These reflect current trends in the representation and processing of MWEs. They cover various categories of MWEs such as verbal, adverbial and nominal MWEs, various linguistic frameworks (e.g. tree-based and unification-based grammars), various languages including English, French, Modern Greek, Hebrew, Norwegian), and various applications (namely MWE detection, parsing, automatic translation) using both symbolic and statistical approaches

    Current trends

    Get PDF
    Deep parsing is the fundamental process aiming at the representation of the syntactic structure of phrases and sentences. In the traditional methodology this process is based on lexicons and grammars representing roughly properties of words and interactions of words and structures in sentences. Several linguistic frameworks, such as Headdriven Phrase Structure Grammar (HPSG), Lexical Functional Grammar (LFG), Tree Adjoining Grammar (TAG), Combinatory Categorial Grammar (CCG), etc., offer different structures and combining operations for building grammar rules. These already contain mechanisms for expressing properties of Multiword Expressions (MWE), which, however, need improvement in how they account for idiosyncrasies of MWEs on the one hand and their similarities to regular structures on the other hand. This collaborative book constitutes a survey on various attempts at representing and parsing MWEs in the context of linguistic theories and applications

    Semi-automated Ontology Generation for Biocuration and Semantic Search

    Get PDF
    Background: In the life sciences, the amount of literature and experimental data grows at a tremendous rate. In order to effectively access and integrate these data, biomedical ontologies – controlled, hierarchical vocabularies – are being developed. Creating and maintaining such ontologies is a difficult, labour-intensive, manual process. Many computational methods which can support ontology construction have been proposed in the past. However, good, validated systems are largely missing. Motivation: The biocuration community plays a central role in the development of ontologies. Any method that can support their efforts has the potential to have a huge impact in the life sciences. Recently, a number of semantic search engines were created that make use of biomedical ontologies for document retrieval. To transfer the technology to other knowledge domains, suitable ontologies need to be created. One area where ontologies may prove particularly useful is the search for alternative methods to animal testing, an area where comprehensive search is of special interest to determine the availability or unavailability of alternative methods. Results: The Dresden Ontology Generator for Directed Acyclic Graphs (DOG4DAG) developed in this thesis is a system which supports the creation and extension of ontologies by semi-automatically generating terms, definitions, and parent-child relations from text in PubMed, the web, and PDF repositories. The system is seamlessly integrated into OBO-Edit and Protégé, two widely used ontology editors in the life sciences. DOG4DAG generates terms by identifying statistically significant noun-phrases in text. For definitions and parent-child relations it employs pattern-based web searches. Each generation step has been systematically evaluated using manually validated benchmarks. The term generation leads to high quality terms also found in manually created ontologies. Definitions can be retrieved for up to 78% of terms, child ancestor relations for up to 54%. No other validated system exists that achieves comparable results. To improve the search for information on alternative methods to animal testing an ontology has been developed that contains 17,151 terms of which 10% were newly created and 90% were re-used from existing resources. This ontology is the core of Go3R, the first semantic search engine in this field. When a user performs a search query with Go3R, the search engine expands this request using the structure and terminology of the ontology. The machine classification employed in Go3R is capable of distinguishing documents related to alternative methods from those which are not with an F-measure of 90% on a manual benchmark. Approximately 200,000 of the 19 million documents listed in PubMed were identified as relevant, either because a specific term was contained or due to the automatic classification. The Go3R search engine is available on-line under www.Go3R.org
    corecore