1,063 research outputs found

    Automatic extraction of candidate nomenclature terms using the doublet method

    Get PDF
    BACKGROUND: New terminology continuously enters the biomedical literature. How can curators identify new terms that can be added to existing nomenclatures? The most direct method, and one that has served well, involves reading the current literature. The scholarly curator adds new terms as they are encountered. Present-day scholars are severely challenged by the enormous volume of biomedical literature. Curators of medical nomenclatures need computational assistance if they hope to keep their terminologies current. The purpose of this paper is to describe a method of rapidly extracting new, candidate terms from huge volumes of biomedical text. The resulting lists of terms can be quickly reviewed by curators and added to nomenclatures, if appropriate. The candidate term extractor uses a variation of the previously described doublet coding method. The algorithm, which operates on virtually any nomenclature, derives from the observation that most terms within a knowledge domain are composed entirely of word combinations found in other terms from the same knowledge domain. Terms can be expressed as sequences of overlapping word doublets that have more specific meaning than the individual words that compose the term. The algorithm parses through text, finding contiguous sequences of word doublets that are known to occur somewhere in the reference nomenclature. When a sequence of matching word doublets is encountered, it is compared with whole terms already included in the nomenclature. If the doublet sequence is not already in the nomenclature, it is extracted as a candidate new term. Candidate new terms can be reviewed by a curator to determine if they should be added to the nomenclature. An implementation of the algorithm is demonstrated, using a corpus of published abstracts obtained through the National Library of Medicine's PubMed query service and using "The developmental lineage classification and taxonomy of neoplasms" as a reference nomenclature. RESULTS: A 31+ Megabyte corpus of pathology journal abstracts was parsed using the doublet extraction method. This corpus consisted of 4,289 records, each containing an abstract title. The total number of words included in the abstract titles was 50,547. New candidate terms for the nomenclature were automatically extracted from the titles of abstracts in the corpus. Total execution time on a desktop computer with CPU speed of 2.79 GHz was 2 seconds. The resulting output consisted of 313 new candidate terms, each consisting of concatenated doublets found in the reference nomenclature. Human review of the 313 candidate terms yielded a list of 285 terms approved by a curator. A final automatic extraction of duplicate terms yielded a final list of 222 new terms (71% of the original 313 extracted candidate terms) that could be added to the reference nomenclature. CONCLUSION: The doublet method for automatically extracting candidate nomenclature terms can be used to quickly find new terms from vast amounts of text. The method can be immediately adapted for virtually any text and any nomenclature. An implementation of the algorithm, in the Perl programming language, is provided with this article

    NOBLE - Flexible concept recognition for large-scale biomedical natural language processing

    Get PDF
    Background: Natural language processing (NLP) applications are increasingly important in biomedical data analysis, knowledge engineering, and decision support. Concept recognition is an important component task for NLP pipelines, and can be either general-purpose or domain-specific. We describe a novel, flexible, and general-purpose concept recognition component for NLP pipelines, and compare its speed and accuracy against five commonly used alternatives on both a biological and clinical corpus. NOBLE Coder implements a general algorithm for matching terms to concepts from an arbitrary vocabulary set. The system's matching options can be configured individually or in combination to yield specific system behavior for a variety of NLP tasks. The software is open source, freely available, and easily integrated into UIMA or GATE. We benchmarked speed and accuracy of the system against the CRAFT and ShARe corpora as reference standards and compared it to MMTx, MGrep, Concept Mapper, cTAKES Dictionary Lookup Annotator, and cTAKES Fast Dictionary Lookup Annotator. Results: We describe key advantages of the NOBLE Coder system and associated tools, including its greedy algorithm, configurable matching strategies, and multiple terminology input formats. These features provide unique functionality when compared with existing alternatives, including state-of-the-art systems. On two benchmarking tasks, NOBLE's performance exceeded commonly used alternatives, performing almost as well as the most advanced systems. Error analysis revealed differences in error profiles among systems. Conclusion: NOBLE Coder is comparable to other widely used concept recognition systems in terms of accuracy and speed. Advantages of NOBLE Coder include its interactive terminology builder tool, ease of configuration, and adaptability to various domains and tasks. NOBLE provides a term-to-concept matching system suitable for general concept recognition in biomedical NLP pipelines

    Development of advanced techniques for rotorcraft state estimation and parameter identification

    Get PDF
    An integrated methodology for rotorcraft system identification consists of rotorcraft mathematical modeling, three distinct data processing steps, and a technique for designing inputs to improve the identifiability of the data. These elements are as follows: (1) a Kalman filter smoother algorithm which estimates states and sensor errors from error corrupted data. Gust time histories and statistics may also be estimated; (2) a model structure estimation algorithm for isolating a model which adequately explains the data; (3) a maximum likelihood algorithm for estimating the parameters and estimates for the variance of these estimates; and (4) an input design algorithm, based on a maximum likelihood approach, which provides inputs to improve the accuracy of parameter estimates. Each step is discussed with examples to both flight and simulated data cases

    Evidence for an FU Orionis-like Outburst from a Classical T Tauri Star

    Get PDF
    We present pre- and post-outburst observations of the new FU Orionis-like young stellar object PTF 10qpf (also known as LkHα 188-G4 and HBC 722). Prior to this outburst, LkHα 188-G4 was classified as a classical T Tauri star (CTTS) on the basis of its optical emission-line spectrum superposed on a K8-type photosphere and its photometric variability. The mid-infrared spectral index of LkHα 188-G4 indicates a Class II-type object. LkHα 188-G4 exhibited a steady rise by ~1 mag over ~11 months starting in August 2009, before a subsequent more abrupt rise of >3 mag on a timescale of ~2 months. Observations taken during the eruption exhibit the defining characteristics of FU Orionis variables: (1) an increase in brightness by ≳ 4 mag, (2) a bright optical/near-infrared reflection nebula appeared, (3) optical spectra are consistent with a G supergiant and dominated by absorption lines, the only exception being Hα which is characterized by a P Cygni profile, (4) near-infrared spectra resemble those of late K-M giants/supergiants with enhanced absorption seen in the molecular bands of CO and H_(2)O, and (5) outflow signatures in H and He are seen in the form of blueshifted absorption profiles. LkHα 188-G4 is the first member of the FU Orionis-like class with a well-sampled optical to mid-infrared spectral energy distribution in the pre-outburst phase. The association of the PTF 10qpf outburst with the previously identified CTTS LkHα 188-G4 (HBC 722) provides strong evidence that FU Orionis-like eruptions represent periods of enhanced disk accretion and outflow, likely triggered by instabilities in the disk. The early identification of PTF 10qpf as an FU Orionis-like variable will enable detailed photometric and spectroscopic observations during its post-outburst evolution for comparison with other known outbursting objects

    Flux analysis in central carbon metabolism in plants: 13C NMR experiments and analysis

    Get PDF
    Metabolic flux analysis is crucial in metabolic engineering. This research concentrated on improvements in 13C labeling-based flux analysis, a powerful flux quantification method, particularly oriented toward application to plants. Furthermore, systemic 13C flux analyses were performed on two model plant systems: Glycine max (soybean) embryos, and Catharanthus roseus hairy roots.;The concepts \u27bond integrity\u27, \u27bondomer\u27 and the algorithm \u27Boolean function mapping\u27 were introduced, to facilitate efficient flux evaluation from carbon bond labeling experiments, and easier flux identifiability analysis.;13C labeling experiments were performed on developing soybean (Glycine max) embryos and C. roseus hairy roots. A computer program, NMR2Flux, was developed to automatically calculate fluxes from the labeling data. This program accepts a user-defined metabolic network model, and incorporates recent mathematical advances toward accurate and efficient evaluation of fluxes and their standard deviations. Several physiological insights were obtained from the flux results. For instance, in soybean embryos, the reductive pentose phosphate pathway was active in the plastid and negligible in the cytosol. Also, unknown fluxes (such as plastidic fructose-1,6-bisphosphatase) could be identified and quantified. To the best of the author\u27s knowledge, this is the most comprehensive flux analysis of a plant system to date.;Investigations on flux identifiability were carried out for the soybean embryo system. Using these, optimal labeling experiments were designed, that utilize judicious combinations of labeled varieties of two substrates (sucrose and glutamine), to maximize the statistical quality of the evaluated fluxes.;The identity of four intense peaks observed in the 2-D [13C, 1H] spectra of protein isolated from soybean embryos, was investigated. These peaks were identified as levulinic acid and 5-hydroxymethyl furfural, and were degradation products of glycosylating sugars associated with soybean embryo protein. A 2-D NMR study was conducted on them, and it was shown that the metabolic information in the degradation products can be used toward metabolic flux or pathway analysis.;In addition, the elemental make-up and composition of the biomass of C. roseus hairy roots (crucial toward flux analysis) is reported. 89.2% (+/-9.7%) of the biomass was accounted for.*;*This dissertation is a compound document (contains both a paper copy and a CD as part of the dissertation). The CD requires the following system requirements: Adobe Acrobat

    A feasibility study of orbiter flight control experiments

    Get PDF
    The results of a feasibility study of orbiter flight control experiments performed are summarized. Feasibility studies were performed on a group of 14 experiments selected from a candidate list of 35 submitted to the study contractor by the flight control community. Concepts and requirements were developed for the 14 selected experiments and they were ranked on a basis of technical value, feasibility, and cost. It was concluded that all the selected experiments can be considered as potential candidates for the Orbiter Experiment program, which is being formulated for the Orbiter Flight Tests and subsequent operational flights, regardless of the relative ranking established during the study. None of the selected experiments has significant safety implications and the cost of most was estimated to be less than $200K

    The host-galaxy response to the afterglow of GRB 100901A

    Get PDF
    For Gamma-Ray Burst 100901A, we have obtained Gemini-North and Very Large Telescope optical afterglow spectra at four epochs: one hour, one day, three days and one week after the burst, thanks to the afterglow remaining unusually bright at late times. Apart from a wealth of metal resonance lines, we also detect lines arising from fine-structure levels of the ground state of Fe II, and from metastable levels of Fe II and Ni II at the host redshift (z = 1.4084). These lines are found to vary significantly in time. The combination of the data and modelling results shows that we detect the fall of the Ni II 4 F9/2 metastable level population, which to date has not been observed. Assuming that the population of the excited states is due to the UV-radiation of the afterglow, we estimate an absorber distance of a few hundred pc. This appears to be a typical value when compared to similar studies. We detect two intervening absorbers (z = 1.3147, 1.3179). Despite the wide temporal range of the data, we do not see significant variation in the absorption lines of these two intervening systems.Comment: 17 pages, 9 figures. Accepted by Monthly Notices of the Royal Astronomical Society on Jan 11th 201
    • …
    corecore