2,562 research outputs found

    Summarisation and visualisation of e-Health data repositories

    Get PDF
    At the centre of the Clinical e-Science Framework (CLEF) project is a repository of well organised, detailed clinical histories, encoded as data that will be available for use in clinical care and in-silico medical experiments. We describe a system that we have developed as part of the CLEF project, to perform the task of generating a diverse range of textual and graphical summaries of a patientā€™s clinical history from a data-encoded model, a chronicle, representing the record of the patientā€™s medical history. Although the focus of our current work is on cancer patients, the approach we describe is generalisable to a wide range of medical areas

    Ontology-based Brucella vaccine literature indexing and systematic analysis of gene-vaccine association network

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Vaccine literature indexing is poorly performed in PubMed due to limited hierarchy of Medical Subject Headings (MeSH) annotation in the vaccine field. Vaccine Ontology (VO) is a community-based biomedical ontology that represents various vaccines and their relations. SciMiner is an in-house literature mining system that supports literature indexing and gene name tagging. We hypothesize that application of VO in SciMiner will aid vaccine literature indexing and mining of vaccine-gene interaction networks. As a test case, we have examined vaccines for <it>Brucella</it>, the causative agent of brucellosis in humans and animals.</p> <p>Results</p> <p>The VO-based SciMiner (VO-SciMiner) was developed to incorporate a total of 67 <it>Brucella </it>vaccine terms. A set of rules for term expansion of VO terms were learned from training data, consisting of 90 biomedical articles related to <it>Brucella </it>vaccine terms. VO-SciMiner demonstrated high recall (91%) and precision (99%) from testing a separate set of 100 manually selected biomedical articles. VO-SciMiner indexing exhibited superior performance in retrieving <it>Brucella </it>vaccine-related papers over that obtained with MeSH-based PubMed literature search. For example, a VO-SciMiner search of "live attenuated <it>Brucella </it>vaccine" returned 922 hits as of April 20, 2011, while a PubMed search of the same query resulted in only 74 hits. Using the abstracts of 14,947 <it>Brucella</it>-related papers, VO-SciMiner identified 140 <it>Brucella </it>genes associated with <it>Brucella </it>vaccines. These genes included known protective antigens, virulence factors, and genes closely related to <it>Brucella </it>vaccines. These VO-interacting <it>Brucella </it>genes were significantly over-represented in biological functional categories, including metabolite transport and metabolism, replication and repair, cell wall biogenesis, intracellular trafficking and secretion, posttranslational modification, and chaperones. Furthermore, a comprehensive interaction network of <it>Brucella </it>vaccines and genes were identified. The asserted and inferred VO hierarchies provide semantic support for inferring novel knowledge of association of vaccines and genes from the retrieved data. New hypotheses were generated based on this analysis approach.</p> <p>Conclusion</p> <p>VO-SciMiner can be used to improve the efficiency for PubMed searching in the vaccine domain.</p

    The gene normalization task in BioCreative III

    Get PDF
    BACKGROUND: We report the Gene Normalization (GN) challenge in BioCreative III where participating teams were asked to return a ranked list of identifiers of the genes detected in full-text articles. For training, 32 fully and 500 partially annotated articles were prepared. A total of 507 articles were selected as the test set. Due to the high annotation cost, it was not feasible to obtain gold-standard human annotations for all test articles. Instead, we developed an Expectation Maximization (EM) algorithm approach for choosing a small number of test articles for manual annotation that were most capable of differentiating team performance. Moreover, the same algorithm was subsequently used for inferring ground truth based solely on team submissions. We report team performance on both gold standard and inferred ground truth using a newly proposed metric called Threshold Average Precision (TAP-k). RESULTS: We received a total of 37 runs from 14 different teams for the task. When evaluated using the gold-standard annotations of the 50 articles, the highest TAP-k scores were 0.3297 (k=5), 0.3538 (k=10), and 0.3535 (k=20), respectively. Higher TAP-k scores of 0.4916 (k=5, 10, 20) were observed when evaluated using the inferred ground truth over the full test set. When combining team results using machine learning, the best composite system achieved TAP-k scores of 0.3707 (k=5), 0.4311 (k=10), and 0.4477 (k=20) on the gold standard, representing improvements of 12.4%, 21.8%, and 26.6% over the best team results, respectively. CONCLUSIONS: By using full text and being species non-specific, the GN task in BioCreative III has moved closer to a real literature curation task than similar tasks in the past and presents additional challenges for the text mining community, as revealed in the overall team results. By evaluating teams using the gold standard, we show that the EM algorithm allows team submissions to be differentiated while keeping the manual annotation effort feasible. Using the inferred ground truth we show measures of comparative performance between teams. Finally, by comparing team rankings on gold standard vs. inferred ground truth, we further demonstrate that the inferred ground truth is as effective as the gold standard for detecting good team performance

    Pharmspresso: a text mining tool for extraction of pharmacogenomic concepts and relationships from full text

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Pharmacogenomics studies the relationship between genetic variation and the variation in drug response phenotypes. The field is rapidly gaining importance: it promises drugs targeted to particular subpopulations based on genetic background. The pharmacogenomics literature has expanded rapidly, but is dispersed in many journals. It is challenging, therefore, to identify important associations between drugs and molecular entities ā€“ particularly genes and gene variants, and thus these critical connections are often lost. Text mining techniques can allow us to convert the free-style text to a computable, searchable format in which pharmacogenomic concepts (such as genes, drugs, polymorphisms, and diseases) are identified, and important links between these concepts are recorded. Availability of full text articles as input into text mining engines is key, as literature abstracts often do not contain sufficient information to identify these pharmacogenomic associations.</p> <p>Results</p> <p>Thus, building on a tool called Textpresso, we have created the Pharmspresso tool to assist in identifying important pharmacogenomic facts in full text articles. Pharmspresso parses text to find references to human genes, polymorphisms, drugs and diseases and their relationships. It presents these as a series of marked-up text fragments, in which key concepts are visually highlighted. To evaluate Pharmspresso, we used a gold standard of 45 human-curated articles. Pharmspresso identified 78%, 61%, and 74% of target gene, polymorphism, and drug concepts, respectively.</p> <p>Conclusion</p> <p>Pharmspresso is a text analysis tool that extracts pharmacogenomic concepts from the literature automatically and thus captures our current understanding of gene-drug interactions in a computable form. We have made Pharmspresso available at <url>http://pharmspresso.stanford.edu</url>.</p

    On Inferring Browsing Activity on Smartphones via USB Power Analysis Side-Channel

    Get PDF
    In this paper, we show that public USB charging stations pose a significant privacy risk to smartphone users even when no data communication is possible between the station and the user\u27s mobile device. We present a side-channel attack that allows a charging station to identify which Webpages are loaded while the smartphone is charging. To evaluate this side-channel, we collected power traces of Alexa top 50 Websites on multiple smartphones under several conditions, including battery charging level, browser cache enabled/disabled, taps on the screen, Wi-Fi/LTE, TLS encryption enabled/disabled, time elapsed between collection of training and testing data, and location of the Website. The results of our evaluation show that the attack is highly successful: in many settings, we were able to achieve over 90% Webpage identification accuracy. On the other hand, our experiments also show that this side-channel is sensitive to some of the aforementioned conditions. For instance, when training and testing traces were collected 70 days apart, accuracies were as low as 2.2%. Although there are studies that show that power-based side-channels can predict browsing activity on laptops, this paper is unique, because it is the first to study this side-channel on smartphones, under smartphone-specific constraints. Further, we demonstrate that Websites can be correctly identified within a short time span of 2 x 6 seconds, which is in contrast with prior work, which uses 15-s traces. This is important, because users typically spend less than 15 s on a Webpage
    • ā€¦
    corecore