1,666,001 research outputs found
Using Text Analysis Tools to Improve Reference FAQs
A team of WSU reference librarians regularly reviews email reference questions and creates online FAQs as an aid to patrons and librarians. In the current project, text analysis tools were used to supplement the traditional process in an attempt to better understand the frequency and context of email reference queries. The presentation provides information on the text analysis tools used in the project, and presents several Q&A pairs developed using this process
Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus
The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning
Buzz monitoring in word space
This paper discusses the task of tracking mentions of some topically interesting textual entity from a continuously and dynamically changing flow of text, such as a news feed, the output from an Internet crawler or a similar text source - a task sometimes referred to as buzz monitoring. Standard approaches from the field of information access for identifying salient textual entities are reviewed, and it is argued that the dynamics of buzz monitoring calls for more accomplished analysis mechanisms than the typical text analysis tools provide today. The notion of word space is introduced, and it is argued that word spaces can be used to select the most salient markers for topicality, find associations those observations engender, and that they constitute an attractive foundation for building a representation well suited for the tracking and monitoring of mentions of the entity under consideration
Chunking clinical text containing non-canonical language
Free text notes typed by primary care physicians during patient consultations typically contain highly non-canonical language. Shallow syntactic analysis of free text notes can help to reveal valuable information for the study of disease and treatment. We present an exploratory study into chunking such text using off-the-shelf language processing tools and pre-trained statistical models. We evaluate chunking accuracy with respect to part-of-speech tagging quality, choice of chunk representation, and breadth of context features. Our results indicate that narrow context feature windows give the best results, but that chunk representation and minor differences in tagging quality do not have a significant impact on chunking accuracy
Visualisation of semantic enrichment
Automatically creating semantic enrichments for text may lead to annotations that allow for excellent recall but poor precision. Manual enrichment is potentially more targeted, leading to greater precision. We aim to support nonexperts in manually enriching texts with semantic annotations. Neither the visualisation of semantic enrichment nor the process of manually enriching texts has been evaluated before. This paper presents the results of our user study on visualisation of text enrichment during the annotation process. We performed extensive analysis of work related to the visualisation of semantic annotations. In a prototype implementation, we then explored two layout alternatives for visualising semantic annotations and their linkage to the text atoms. Here we summarise and discuss our results and their design implications for tools creating semantic annotations
Pathway Tools version 23.0: Integrated Software for Pathway/Genome Informatics and Systems Biology
Pathway Tools is a bioinformatics software environment with a broad set of
capabilities. The software provides genome-informatics tools such as a genome
browser, sequence alignments, a genome-variant analyzer, and
comparative-genomics operations. It offers metabolic-informatics tools, such as
metabolic reconstruction, quantitative metabolic modeling, prediction of
reaction atom mappings, and metabolic route search. Pathway Tools also provides
regulatory-informatics tools, such as the ability to represent and visualize a
wide range of regulatory interactions. The software creates and manages a type
of organism-specific database called a Pathway/Genome Database (PGDB), which
the software enables database curators to interactively edit. It supports web
publishing of PGDBs and provides a large number of query, visualization, and
omics-data analysis tools. Scientists around the world have created more than
9,800 PGDBs by using Pathway Tools, many of which are curated databases for
important model organisms. Those PGDBs can be exchanged using a peer-to-peer
database-sharing system called the PGDB Registry.Comment: Reflects Pathway Tools version 23.0 in 2019; new information since
the previous version is in blue text. 111 pages, 40 figure
- …
