677 research outputs found
Lexical Adaptation of Link Grammar to the Biomedical Sublanguage: a Comparative Evaluation of Three Approaches
We study the adaptation of Link Grammar Parser to the biomedical sublanguage
with a focus on domain terms not found in a general parser lexicon. Using two
biomedical corpora, we implement and evaluate three approaches to addressing
unknown words: automatic lexicon expansion, the use of morphological clues, and
disambiguation using a part-of-speech tagger. We evaluate each approach
separately for its effect on parsing performance and consider combinations of
these approaches. In addition to a 45% increase in parsing efficiency, we find
that the best approach, incorporating information from a domain part-of-speech
tagger, offers a statistically signicant 10% relative decrease in error. The
adapted parser is available under an open-source license at
http://www.it.utu.fi/biolg
TEN NEW ETYMOLOGIES BETWEEN OLD GAULISH AND THE INDO-EUROPEAN LANGUAGES
This paper belongs to a series of articles designed to contribute to the solution of one of the central problem of Indo-European linguistics of today, the comparative etymology of Indo-European languages. The ten new Indo-European etymologies for Old Gaulish presented are:1. OGaul. asia- âsecaleâ : Lith. asáťš- âSchachtel-, Schafthalmâ; 2. OGaul. nemnali- âcĂŠlĂŠbrerâ : RV. nĂĄmna- âsich beugen/neigenâ; 3. OGaul. mapalia- âkindlichâ : TochA. mkälto- âjung, kleinâ; 4. OGaul. mas âgl. metallumâ : TochA. msÄᚣ âimo : from beneathâ; 5. OGaul. cunobarro- âTĂŞte-de-Chienâ : CLu. paraia- âhochâ; 6. OGaul. marco- âhorseâ : TochA. markä- âmoveâ; 7. OGaul. slÄbino- âmontanusâ : TochB. ᚣale âmountain, hillâ; 8. OGaul. cobro- âlove, desire, greedâ : TochB. kakÄpo- âdesire, crave, wantâ; 9. OGaul. mallo- âlangsam, trägeâ : TochB. mÄlle âdullâ; 10. OGaul. bilio- âBaumâ : TochB. pilta- âleaf, petal
A Dependency Parsing Approach to Biomedical Text Mining
Biomedical research is currently facing a new type of challenge: an excess of information, both in terms of raw data from experiments and in the number of scientific publications describing their results. Mirroring the focus on data mining techniques to address the issues of structured data, there has recently been great interest in the development and application of text mining techniques to make more effective use of the knowledge contained in biomedical scientific publications, accessible only in the form of natural human language.
This thesis describes research done in the broader scope of projects aiming to develop methods, tools and techniques for text mining tasks in general and for the biomedical domain in particular. The work described here involves more specifically the goal of extracting information from statements concerning relations of biomedical entities, such as protein-protein interactions. The approach taken is one using full parsingâsyntactic analysis of the entire structure of sentencesâand machine learning, aiming to develop reliable methods that can further be generalized to apply also to other domains.
The five papers at the core of this thesis describe research on a number of distinct but related topics in text mining. In the first of these studies, we assessed the applicability of two popular general English parsers to biomedical text mining and, finding their performance limited, identified several specific challenges to accurate parsing of domain text. In a follow-up study focusing on parsing issues related to specialized domain terminology, we evaluated three lexical adaptation methods. We found that the accurate resolution of unknown words can considerably improve parsing performance and introduced a domain-adapted parser that reduced the error rate of theoriginal by 10% while also roughly halving parsing time.
To establish the relative merits of parsers that differ in the applied formalisms and the representation given to their syntactic analyses, we have also developed evaluation methodology, considering different approaches to establishing comparable dependency-based evaluation results. We introduced a methodology for creating highly accurate conversions between different parse representations, demonstrating the feasibility of unification of idiverse syntactic schemes under a shared, application-oriented representation. In addition to allowing formalism-neutral evaluation, we argue that such unification can also increase the value of parsers for domain text mining. As a further step in this direction, we analysed the characteristics of publicly available biomedical corpora annotated for protein-protein interactions and created tools for converting them into a shared form, thus contributing also to the unification of text mining resources. The introduced unified corpora allowed us to perform a task-oriented comparative evaluation of biomedical text mining corpora. This evaluation established clear limits on the comparability of results for text mining methods evaluated on different resources, prompting further efforts toward standardization.
To support this and other research, we have also designed and annotated BioInfer, the first domain corpus of its size combining annotation of syntax and biomedical entities with a detailed annotation of their relationships. The corpus represents a major design and development effort of the research group, with manual annotation that identifies over 6000 entities, 2500 relationships and 28,000 syntactic dependencies in 1100 sentences. In addition to combining these key annotations for a single set of sentences, BioInfer was also the first domain resource to introduce a representation of entity relations that is supported by ontologies and able to capture complex, structured relationships.
Part I of this thesis presents a summary of this research in the broader context of a text mining system, and Part II contains reprints of the five included publications.Siirretty Doriast
Ten new Indo-European etymologies for the Celtic languages
This paper presents ten new etymologies between the Celtic and the Indo-European languages in a contribution to the reconstruction of the Proto-Indo-European parent langauge. The items compared are:1. OIr. oenach- âan injury/woundâ : OSax. Ändago- âday of deathâ, Hitt. ḍingan- âSeuche, Pest, Todesfallâ; 2. OIr. airecht- âassembly, meeting, conversationâ : LAv. vyÄxa- âVersammlungâ; 3. OIr. cumachtae- âpouvoir, puissanceâ : TochB. ekaùùe- âpossession, equipmentâ, AV. aᚣáši- âErreichungâ; 4. OIr. ĂĄs- âcroissance, fait de grandir/grossirâ : Maced. áźÎžÎż- âá˝ÎťÎŽâ; 5. OBret. iolent âprecenturâ : Lat. hariolÄ- âwahrsagenâ; 6. Midlr. cÄŤch- (f.) âweibliche Brustâ : RV. kkasÄ- âBrust¡beinâ; 7. OIr. nĂĄi- âhuman being, personâ : TochA. napen- âMenschâ; 8. OIr. tol- âWilleâ : RV. turĂĄ- âWillfährigâ; 9. OIr. nĹŤadat- âhand, wrist or armâ : RV. nodh-           âElefantâ 10. OIr. aiged âvisageâ : OHG. agsiunĂŽ- âspecies : Aussehen, Angesichtâ
New Resources and Perspectives for Biomedical Event Extraction
Event extraction is a major focus of recent work in biomedical information extraction. Despite substantial advances, many challenges still remain for reliable automatic extraction of events from text. We introduce a new biomedical event extraction resource consisting of analyses automatically created by systems participating in the recent BioNLP Shared Task (ST) 2011. In providing for the first time the outputs of a broad set of state-ofthe-art event extraction systems, this resource opens many new opportunities for studying aspects of event extraction, from the identification of common errors to the study of effective approaches to combining the strengths of systems. We demonstrate these opportunities through a multi-system analysis on three BioNLP ST 2011 main tasks, focusing on events that none of the systems can successfully extract. We further argue for new perspectives to the performance evaluation of domain event extraction systems, considering a document-level, âoff-the-page â representation and evaluation to complement the mentionlevel evaluations pursued in most recent work.
Visualization of uncertain catchment boundaries and its influence on decision making
Ponencias, comunicaciones y pĂłsters presentados en el 17th AGILE Conference on Geographic Information Science
"Connecting a Digital Europe through Location and Place", celebrado en la Universitat Jaume I del 3 al 6 de junio de 2014.In this poster, we introduce an on-going project where uncertainty-aware drainage divides were calculated, visualized, and tested as background data for the decision-making process
- âŚ