677 research outputs found

    Lexical Adaptation of Link Grammar to the Biomedical Sublanguage: a Comparative Evaluation of Three Approaches

    Get PDF
    We study the adaptation of Link Grammar Parser to the biomedical sublanguage with a focus on domain terms not found in a general parser lexicon. Using two biomedical corpora, we implement and evaluate three approaches to addressing unknown words: automatic lexicon expansion, the use of morphological clues, and disambiguation using a part-of-speech tagger. We evaluate each approach separately for its effect on parsing performance and consider combinations of these approaches. In addition to a 45% increase in parsing efficiency, we find that the best approach, incorporating information from a domain part-of-speech tagger, offers a statistically signicant 10% relative decrease in error. The adapted parser is available under an open-source license at http://www.it.utu.fi/biolg

    TEN NEW ETYMOLOGIES BETWEEN OLD GAULISH AND THE INDO-EUROPEAN LANGUAGES

    Get PDF
    This paper belongs to a series of articles designed to contribute to the solution of one of the central problem of Indo-European linguistics of today, the comparative etymology of Indo-European languages. The ten new Indo-European etymologies for Old Gaulish presented are:1. OGaul. asia- ‘secale’ : Lith. asỹ- ‘Schachtel-, Schafthalm’; 2. OGaul. nemnali- ‘célébrer’ : RV. námna- ‘sich beugen/neigen’; 3. OGaul. mapalia- ‘kindlich’ : TochA. mkälto- ‘jung, klein’; 4. OGaul. mas ‘gl. metallum’ : TochA. msāṣ ‘imo : from beneath’; 5. OGaul. cunobarro- ‘Tête-de-Chien’ : CLu. paraia- ‘hoch’; 6. OGaul. marco- ‘horse’ : TochA. markä- ‘move’; 7. OGaul. slēbino- ‘montanus’ : TochB. ṣale ‘mountain, hill’; 8. OGaul. cobro- ‘love, desire, greed’ : TochB. kakāpo- ‘desire, crave, want’; 9. OGaul. mallo- ‘langsam, träge’ : TochB. mālle ‘dull’; 10. OGaul. bilio- ‘Baum’ : TochB. pilta- ‘leaf, petal

    A Dependency Parsing Approach to Biomedical Text Mining

    Get PDF
    Biomedical research is currently facing a new type of challenge: an excess of information, both in terms of raw data from experiments and in the number of scientific publications describing their results. Mirroring the focus on data mining techniques to address the issues of structured data, there has recently been great interest in the development and application of text mining techniques to make more effective use of the knowledge contained in biomedical scientific publications, accessible only in the form of natural human language. This thesis describes research done in the broader scope of projects aiming to develop methods, tools and techniques for text mining tasks in general and for the biomedical domain in particular. The work described here involves more specifically the goal of extracting information from statements concerning relations of biomedical entities, such as protein-protein interactions. The approach taken is one using full parsing—syntactic analysis of the entire structure of sentences—and machine learning, aiming to develop reliable methods that can further be generalized to apply also to other domains. The five papers at the core of this thesis describe research on a number of distinct but related topics in text mining. In the first of these studies, we assessed the applicability of two popular general English parsers to biomedical text mining and, finding their performance limited, identified several specific challenges to accurate parsing of domain text. In a follow-up study focusing on parsing issues related to specialized domain terminology, we evaluated three lexical adaptation methods. We found that the accurate resolution of unknown words can considerably improve parsing performance and introduced a domain-adapted parser that reduced the error rate of theoriginal by 10% while also roughly halving parsing time. To establish the relative merits of parsers that differ in the applied formalisms and the representation given to their syntactic analyses, we have also developed evaluation methodology, considering different approaches to establishing comparable dependency-based evaluation results. We introduced a methodology for creating highly accurate conversions between different parse representations, demonstrating the feasibility of unification of idiverse syntactic schemes under a shared, application-oriented representation. In addition to allowing formalism-neutral evaluation, we argue that such unification can also increase the value of parsers for domain text mining. As a further step in this direction, we analysed the characteristics of publicly available biomedical corpora annotated for protein-protein interactions and created tools for converting them into a shared form, thus contributing also to the unification of text mining resources. The introduced unified corpora allowed us to perform a task-oriented comparative evaluation of biomedical text mining corpora. This evaluation established clear limits on the comparability of results for text mining methods evaluated on different resources, prompting further efforts toward standardization. To support this and other research, we have also designed and annotated BioInfer, the first domain corpus of its size combining annotation of syntax and biomedical entities with a detailed annotation of their relationships. The corpus represents a major design and development effort of the research group, with manual annotation that identifies over 6000 entities, 2500 relationships and 28,000 syntactic dependencies in 1100 sentences. In addition to combining these key annotations for a single set of sentences, BioInfer was also the first domain resource to introduce a representation of entity relations that is supported by ontologies and able to capture complex, structured relationships. Part I of this thesis presents a summary of this research in the broader context of a text mining system, and Part II contains reprints of the five included publications.Siirretty Doriast

    Ten new Indo-European etymologies for the Celtic languages

    Get PDF
    This paper presents ten new etymologies between the Celtic and the Indo-European languages in a contribution to the reconstruction of the Proto-Indo-European parent langauge. The items compared are:1. OIr. oenach- ‘an injury/wound’ : OSax. ēndago- ‘day of death’, Hitt. ḫingan- ‘Seuche, Pest, Todesfall’; 2. OIr. airecht- ‘assembly, meeting, conversation’ : LAv. vyāxa- ‘Versammlung’; 3. OIr. cumachtae- ‘pouvoir, puissance’ : TochB. ekaññe- ‘possession, equipment’, AV. aṣṭi- ‘Erreichung’; 4. OIr. ás- ‘croissance, fait de grandir/grossir’ : Maced. ἄξο- ‘ὑλή’; 5. OBret. iolent ‘precentur’ : Lat. hariolā- ‘wahrsagen’; 6. Midlr. cīch- (f.) ‘weibliche Brust’ : RV. kkasā- ‘Brust·bein’; 7. OIr. nái- ‘human being, person’ : TochA. napen- ‘Mensch’; 8. OIr. tol- ‘Wille’ : RV. turá- ‘Willfährig’; 9. OIr. nūadat- ‘hand, wrist or arm’ : RV. nodh-            ‘Elefant’ 10. OIr. aiged ’visage’ : OHG. agsiunî- ‘species : Aussehen, Angesicht’

    A Minor Sound Law for Celtic: PIE *VNHK → OIr. Vcc : OCymr. Vnc

    Get PDF

    New Resources and Perspectives for Biomedical Event Extraction

    Get PDF
    Event extraction is a major focus of recent work in biomedical information extraction. Despite substantial advances, many challenges still remain for reliable automatic extraction of events from text. We introduce a new biomedical event extraction resource consisting of analyses automatically created by systems participating in the recent BioNLP Shared Task (ST) 2011. In providing for the first time the outputs of a broad set of state-ofthe-art event extraction systems, this resource opens many new opportunities for studying aspects of event extraction, from the identification of common errors to the study of effective approaches to combining the strengths of systems. We demonstrate these opportunities through a multi-system analysis on three BioNLP ST 2011 main tasks, focusing on events that none of the systems can successfully extract. We further argue for new perspectives to the performance evaluation of domain event extraction systems, considering a document-level, “off-the-page ” representation and evaluation to complement the mentionlevel evaluations pursued in most recent work.

    Visualization of uncertain catchment boundaries and its influence on decision making

    Get PDF
    Ponencias, comunicaciones y pĂłsters presentados en el 17th AGILE Conference on Geographic Information Science "Connecting a Digital Europe through Location and Place", celebrado en la Universitat Jaume I del 3 al 6 de junio de 2014.In this poster, we introduce an on-going project where uncertainty-aware drainage divides were calculated, visualized, and tested as background data for the decision-making process
    • …
    corecore