1,666,001 research outputs found

    Using Text Analysis Tools to Improve Reference FAQs

    Get PDF
    A team of WSU reference librarians regularly reviews email reference questions and creates online FAQs as an aid to patrons and librarians. In the current project, text analysis tools were used to supplement the traditional process in an attempt to better understand the frequency and context of email reference queries. The presentation provides information on the text analysis tools used in the project, and presents several Q&A pairs developed using this process

    Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus

    Get PDF
    The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning

    Buzz monitoring in word space

    Get PDF
    This paper discusses the task of tracking mentions of some topically interesting textual entity from a continuously and dynamically changing flow of text, such as a news feed, the output from an Internet crawler or a similar text source - a task sometimes referred to as buzz monitoring. Standard approaches from the field of information access for identifying salient textual entities are reviewed, and it is argued that the dynamics of buzz monitoring calls for more accomplished analysis mechanisms than the typical text analysis tools provide today. The notion of word space is introduced, and it is argued that word spaces can be used to select the most salient markers for topicality, find associations those observations engender, and that they constitute an attractive foundation for building a representation well suited for the tracking and monitoring of mentions of the entity under consideration

    Chunking clinical text containing non-canonical language

    Get PDF
    Free text notes typed by primary care physicians during patient consultations typically contain highly non-canonical language. Shallow syntactic analysis of free text notes can help to reveal valuable information for the study of disease and treatment. We present an exploratory study into chunking such text using off-the-shelf language processing tools and pre-trained statistical models. We evaluate chunking accuracy with respect to part-of-speech tagging quality, choice of chunk representation, and breadth of context features. Our results indicate that narrow context feature windows give the best results, but that chunk representation and minor differences in tagging quality do not have a significant impact on chunking accuracy

    Visualisation of semantic enrichment

    Get PDF
    Automatically creating semantic enrichments for text may lead to annotations that allow for excellent recall but poor precision. Manual enrichment is potentially more targeted, leading to greater precision. We aim to support nonexperts in manually enriching texts with semantic annotations. Neither the visualisation of semantic enrichment nor the process of manually enriching texts has been evaluated before. This paper presents the results of our user study on visualisation of text enrichment during the annotation process. We performed extensive analysis of work related to the visualisation of semantic annotations. In a prototype implementation, we then explored two layout alternatives for visualising semantic annotations and their linkage to the text atoms. Here we summarise and discuss our results and their design implications for tools creating semantic annotations

    Pathway Tools version 23.0: Integrated Software for Pathway/Genome Informatics and Systems Biology

    Full text link
    Pathway Tools is a bioinformatics software environment with a broad set of capabilities. The software provides genome-informatics tools such as a genome browser, sequence alignments, a genome-variant analyzer, and comparative-genomics operations. It offers metabolic-informatics tools, such as metabolic reconstruction, quantitative metabolic modeling, prediction of reaction atom mappings, and metabolic route search. Pathway Tools also provides regulatory-informatics tools, such as the ability to represent and visualize a wide range of regulatory interactions. The software creates and manages a type of organism-specific database called a Pathway/Genome Database (PGDB), which the software enables database curators to interactively edit. It supports web publishing of PGDBs and provides a large number of query, visualization, and omics-data analysis tools. Scientists around the world have created more than 9,800 PGDBs by using Pathway Tools, many of which are curated databases for important model organisms. Those PGDBs can be exchanged using a peer-to-peer database-sharing system called the PGDB Registry.Comment: Reflects Pathway Tools version 23.0 in 2019; new information since the previous version is in blue text. 111 pages, 40 figure
    corecore