50,390 research outputs found

    Cell line name recognition in support of the identification of synthetic lethality in cancer from text

    Get PDF
    Motivation: The recognition and normalization of cell line names in text is an important task in biomedical text mining research, facilitating for instance the identification of synthetically lethal genes from the literature. While several tools have previously been developed to address cell line recognition, it is unclear whether available systems can perform sufficiently well in realistic and broad-coverage applications such as extracting synthetically lethal genes from the cancer literature. In this study, we revisit the cell line name recognition task, evaluating both available systems and newly introduced methods on various resources to obtain a reliable tagger not tied to any specific subdomain. In support of this task, we introduce two text collections manually annotated for cell line names: the broad-coverage corpus Gellus and CLL, a focused target domain corpus. Results: We find that the best performance is achieved using NERsuite, a machine learning system based on Conditional Random Fields, trained on the Gellus corpus and supported with a dictionary of cell line names. The system achieves an F-score of 88.46% on the test set of Gellus and 85.98% on the independently annotated CLL corpus. It was further applied at large scale to 24 302 102 unannotated articles, resulting in the identification of 5 181 342 cell line mentions, normalized to 11 755 unique cell line database identifiers

    TLAD 2010 Proceedings:8th international workshop on teaching, learning and assesment of databases (TLAD)

    Get PDF
    This is the eighth in the series of highly successful international workshops on the Teaching, Learning and Assessment of Databases (TLAD 2010), which once again is held as a workshop of BNCOD 2010 - the 27th International Information Systems Conference. TLAD 2010 is held on the 28th June at the beautiful Dudhope Castle at the Abertay University, just before BNCOD, and hopes to be just as successful as its predecessors.The teaching of databases is central to all Computing Science, Software Engineering, Information Systems and Information Technology courses, and this year, the workshop aims to continue the tradition of bringing together both database teachers and researchers, in order to share good learning, teaching and assessment practice and experience, and further the growing community amongst database academics. As well as attracting academics from the UK community, the workshop has also been successful in attracting academics from the wider international community, through serving on the programme committee, and attending and presenting papers.This year, the workshop includes an invited talk given by Richard Cooper (of the University of Glasgow) who will present a discussion and some results from the Database Disciplinary Commons which was held in the UK over the academic year. Due to the healthy number of high quality submissions this year, the workshop will also present seven peer reviewed papers, and six refereed poster papers. Of the seven presented papers, three will be presented as full papers and four as short papers. These papers and posters cover a number of themes, including: approaches to teaching databases, e.g. group centered and problem based learning; use of novel case studies, e.g. forensics and XML data; techniques and approaches for improving teaching and student learning processes; assessment techniques, e.g. peer review; methods for improving students abilities to develop database queries and develop E-R diagrams; and e-learning platforms for supporting teaching and learning

    TLAD 2010 Proceedings:8th international workshop on teaching, learning and assesment of databases (TLAD)

    Get PDF
    This is the eighth in the series of highly successful international workshops on the Teaching, Learning and Assessment of Databases (TLAD 2010), which once again is held as a workshop of BNCOD 2010 - the 27th International Information Systems Conference. TLAD 2010 is held on the 28th June at the beautiful Dudhope Castle at the Abertay University, just before BNCOD, and hopes to be just as successful as its predecessors.The teaching of databases is central to all Computing Science, Software Engineering, Information Systems and Information Technology courses, and this year, the workshop aims to continue the tradition of bringing together both database teachers and researchers, in order to share good learning, teaching and assessment practice and experience, and further the growing community amongst database academics. As well as attracting academics from the UK community, the workshop has also been successful in attracting academics from the wider international community, through serving on the programme committee, and attending and presenting papers.This year, the workshop includes an invited talk given by Richard Cooper (of the University of Glasgow) who will present a discussion and some results from the Database Disciplinary Commons which was held in the UK over the academic year. Due to the healthy number of high quality submissions this year, the workshop will also present seven peer reviewed papers, and six refereed poster papers. Of the seven presented papers, three will be presented as full papers and four as short papers. These papers and posters cover a number of themes, including: approaches to teaching databases, e.g. group centered and problem based learning; use of novel case studies, e.g. forensics and XML data; techniques and approaches for improving teaching and student learning processes; assessment techniques, e.g. peer review; methods for improving students abilities to develop database queries and develop E-R diagrams; and e-learning platforms for supporting teaching and learning

    Англійська мова для навчання і работи. Навчальний посібник з англійської мови за професійним спрямуванням для студентів і фахівців галузі знань 0503 Розробка корисних копалин Т 1

    Get PDF
    A coursebook includes all the activities of students’ work at ESP course aimed at development of language behaviour necessary for effective communication of students in their study and specialism areas. The tasks and activities given in the coursebook are typicalfor students’ academic and professional domains and situations. The content is organized in modules that covers generic job-related language skills of engineers. The authentic texts taken from real life contain interesting up-to-date information about mining, peculiarities of study abroad, customs and traditions of English-speaking countries. Pack of self-study resources given in Part II contains Glossary of mining terms, tasks and activities aimed at developing a range of vocabulary necessary for mining, different functions and functional exponents to be used in academic and professional environment as well as tasks developing self-awareness, self-assessment and self-organisation skills. Testing points for different grammar structuresare given in Part III. Indices at the end of each part easify the use of the coursebook. The coursebook contains illustrations, various samples of visualizing technical information. The coursebook is designed for ESP students of non-linguistic universities. It can be used as teaching/learning materials for ESP Courses for Mining Engineers as well as for self-study of subject and specialist teachers, practicing mining engineers and researchers in Engineering.У посібнику представлені всі види діяльності студентів з вивчення англійської мови, спрямовані на розвиток мовної поведінки, необхідної для ефективного спілкування в академічному та професійному середовищах. Навчальний посібник містить завдання і вправи, типові для різноманітних академічних та професійних сфер і ситуацій. Структура організації змісту– модульна і охоплює загальні мовленнєві вміння інженерів. Зразки текстів– автентичні, взяті з реального життя, містять цікаву та актуальну інформацію про видобувничу промисловість, особливості навчання за кордоном, традиції та звичаї країн, мова яких вивчається. Ресурси для самостійної роботи(Том ІІ) містять глосарій термінів, завдання та вправи для розвитку словарного запасу та розширення діапазону функціональних зразків, необхідних для виконання певних функцій, та завдання, які спрямовані на розвиток навичок самооцінювання і організації свого навчання. Граматичні явища і вправи для їх засвоєння наводяться в томі ІІІ. Наприкінці кожної частини наведено алфавітно-предметні покажчики. Багато ілюстрацій та різних візуальних засобів подання інформації. Навчальний посібник призначений для студентів технічних університетів гірничого профілю. Може використовуватися для самостійного вивчення англійської мови викладачами, фахівцями і науковцями різних інженерних галузей

    Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation

    Get PDF
    Background: Manual curation of experimental data from the biomedical literature is an expensive and time-consuming endeavor. Nevertheless, most biological knowledge bases still rely heavily on manual curation for data extraction and entry. Text mining software that can semi- or fully automate information retrieval from the literature would thus provide a significant boost to manual curation efforts. Results: We employ the Textpresso category-based information retrieval and extraction system http://www.textpresso.org webcite, developed by WormBase to explore how Textpresso might improve the efficiency with which we manually curate C. elegans proteins to the Gene Ontology's Cellular Component Ontology. Using a training set of sentences that describe results of localization experiments in the published literature, we generated three new curation task-specific categories (Cellular Components, Assay Terms, and Verbs) containing words and phrases associated with reports of experimentally determined subcellular localization. We compared the results of manual curation to that of Textpresso queries that searched the full text of articles for sentences containing terms from each of the three new categories plus the name of a previously uncurated C. elegans protein, and found that Textpresso searches identified curatable papers with recall and precision rates of 79.1% and 61.8%, respectively (F-score of 69.5%), when compared to manual curation. Within those documents, Textpresso identified relevant sentences with recall and precision rates of 30.3% and 80.1% (F-score of 44.0%). From returned sentences, curators were able to make 66.2% of all possible experimentally supported GO Cellular Component annotations with 97.3% precision (F-score of 78.8%). Measuring the relative efficiencies of Textpresso-based versus manual curation we find that Textpresso has the potential to increase curation efficiency by at least 8-fold, and perhaps as much as 15-fold, given differences in individual curatorial speed. Conclusion: Textpresso is an effective tool for improving the efficiency of manual, experimentally based curation. Incorporating a Textpresso-based Cellular Component curation pipeline at WormBase has allowed us to transition from strictly manual curation of this data type to a more efficient pipeline of computer-assisted validation. Continued development of curation task-specific Textpresso categories will provide an invaluable resource for genomics databases that rely heavily on manual curation

    Buzz monitoring in word space

    Get PDF
    This paper discusses the task of tracking mentions of some topically interesting textual entity from a continuously and dynamically changing flow of text, such as a news feed, the output from an Internet crawler or a similar text source - a task sometimes referred to as buzz monitoring. Standard approaches from the field of information access for identifying salient textual entities are reviewed, and it is argued that the dynamics of buzz monitoring calls for more accomplished analysis mechanisms than the typical text analysis tools provide today. The notion of word space is introduced, and it is argued that word spaces can be used to select the most salient markers for topicality, find associations those observations engender, and that they constitute an attractive foundation for building a representation well suited for the tracking and monitoring of mentions of the entity under consideration
    corecore