315 research outputs found

    A framework for discovering meaningful associations in the annotated life sciences Web

    Get PDF
    During the last decade, life sciences researchers have gained access to the entire human genome, reliable high-throughput biotechnologies, affordable computational resources, and public network access. This has produced vast amounts of data and knowledge captured in the life sciences Web, and has created the need for new tools to analyze this knowledge and make discoveries. Consider a simplified Web of three publicly accessible data resources Entrez Gene, PubMed and OMIM. Data records in each resource are annotated with terms from multiple controlled vocabularies (CVs). The links between data records in two resources form a relationship between the two resources. Thus, a record in Entrez Gene, annotated with GO terms, can have links to multiple records in PubMed that are annotated with MeSH terms. Similarly, OMIM records annotated with terms from SNOMED CT may have links to records in Entrez Gene and PubMed. This forms a rich web of annotated data records. The objective of this research is to develop the Life Science Link (LSLink) methodology and tools to discover meaningful patterns across resources and CVs. In a first step, we execute a protocol to follow links, extract annotations, and generate datasets of termlinks, which consist of data records and CV terms. We then mine the termlinks of the datasets to find potentially meaningful associations between pairs of terms from two CVs. Biologically meaningful associations of pairs of CV terms may yield innovative nuggets of previously unknown knowledge. Moreover, the bridge of associations across CV terms will reflect the practice of how scientists annotate data across linked data repositories. Contributions include a methodology to create background datasets, metrics for mining patterns, applying semantic knowledge for generalization, tools for discovery, and validation with biological use cases. Inspired by research in association rule mining and linkage analysis, we develop two metrics to determine support and confidence scores in the associations of pairs of CV terms. Associations that have a statistically significant high score and are biologically meaningful may lead to new knowledge. To further validate the support and confidence metrics, we develop a secondary test for significance based on the hypergeometric distribution. We also exploit the semantics of the CVs. We aggregate termlinks over siblings of a common parent CV term and use them as additional evidence to boost the support and confidence scores in the associations of the parent CV term. We provide a simple discovery interface where biologists can review associations and their scores. Finally, a cancer informatics use case validates the discovery of associations between human genes and diseases

    Developmental progress and current status of the Animal QTLdb

    Get PDF
    The Animal QTL Database (QTLdb; http://www.animalgenome.org/QTLdb) has undergone dramatic growth in recent years in terms of new data curated, data downloads and new functions and tools. We have focused our development efforts to cope with challenges arising from rapid growth of newly published data and end usersโ€™ data demands, and to optimize data retrieval and analysis to facilitate usersโ€™ research. Evidenced by the 27 releases in the past 11 years, the growth of the QTLdb has been phenomenal. Here we report our recent progress which is highlighted by addition of one new species, four new data types, four new user tools, a new API tool set, numerous new functions and capabilities added to the curator tool set, expansion of our data alliance partners and more than 20 other improvements. In this paper we present a summary of our progress to date and an outlook regarding future directions

    Knowledge-based Biomedical Data Science 2019

    Full text link
    Knowledge-based biomedical data science (KBDS) involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey the progress in the last year in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing, and the expansion of knowledge-based approaches to novel domains, such as Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages with 3 table

    Discovering lesser known molecular players and mechanistic patterns in Alzheimer's disease using an integrative disease modelling approach

    Get PDF
    Convergence of exponentially advancing technologies is driving medical research with life changing discoveries. On the contrary, repeated failures of high-profile drugs to battle Alzheimer's disease (AD) has made it one of the least successful therapeutic area. This failure pattern has provoked researchers to grapple with their beliefs about Alzheimer's aetiology. Thus, growing realisation that Amyloid-ฮฒ and tau are not 'the' but rather 'one of the' factors necessitates the reassessment of pre-existing data to add new perspectives. To enable a holistic view of the disease, integrative modelling approaches are emerging as a powerful technique. Combining data at different scales and modes could considerably increase the predictive power of the integrative model by filling biological knowledge gaps. However, the reliability of the derived hypotheses largely depends on the completeness, quality, consistency, and context-specificity of the data. Thus, there is a need for agile methods and approaches that efficiently interrogate and utilise existing public data. This thesis presents the development of novel approaches and methods that address intrinsic issues of data integration and analysis in AD research. It aims to prioritise lesser-known AD candidates using highly curated and precise knowledge derived from integrated data. Here much of the emphasis is put on quality, reliability, and context-specificity. This thesis work showcases the benefit of integrating well-curated and disease-specific heterogeneous data in a semantic web-based framework for mining actionable knowledge. Furthermore, it introduces to the challenges encountered while harvesting information from literature and transcriptomic resources. State-of-the-art text-mining methodology is developed to extract miRNAs and its regulatory role in diseases and genes from the biomedical literature. To enable meta-analysis of biologically related transcriptomic data, a highly-curated metadata database has been developed, which explicates annotations specific to human and animal models. Finally, to corroborate common mechanistic patterns โ€” embedded with novel candidates โ€” across large-scale AD transcriptomic data, a new approach to generate gene regulatory networks has been developed. The work presented here has demonstrated its capability in identifying testable mechanistic hypotheses containing previously unknown or emerging knowledge from public data in two major publicly funded projects for Alzheimer's, Parkinson's and Epilepsy diseases

    Conceptualization of Computational Modeling Approaches and Interpretation of the Role of Neuroimaging Indices in Pathomechanisms for Pre-Clinical Detection of Alzheimer Disease

    Get PDF
    With swift advancements in next-generation sequencing technologies alongside the voluminous growth of biological data, a diversity of various data resources such as databases and web services have been created to facilitate data management, accessibility, and analysis. However, the burden of interoperability between dynamically growing data resources is an increasingly rate-limiting step in biomedicine, specifically concerning neurodegeneration. Over the years, massive investments and technological advancements for dementia research have resulted in large proportions of unmined data. Accordingly, there is an essential need for intelligent as well as integrative approaches to mine available data and substantiate novel research outcomes. Semantic frameworks provide a unique possibility to integrate multiple heterogeneous, high-resolution data resources with semantic integrity using standardized ontologies and vocabularies for context- specific domains. In this current work, (i) the functionality of a semantically structured terminology for mining pathway relevant knowledge from the literature, called Pathway Terminology System, is demonstrated and (ii) a context-specific high granularity semantic framework for neurodegenerative diseases, known as NeuroRDF, is presented. Neurodegenerative disorders are especially complex as they are characterized by widespread manifestations and the potential for dramatic alterations in disease progression over time. Early detection and prediction strategies through clinical pointers can provide promising solutions for effective treatment of AD. In the current work, we have presented the importance of bridging the gap between clinical and molecular biomarkers to effectively contribute to dementia research. Moreover, we address the need for a formalized framework called NIFT to automatically mine relevant clinical knowledge from the literature for substantiating high-resolution cause-and-effect models

    ์˜ํ•™ ์—ฐ๊ตฌ์—์„œ์˜ ๊ณผํ•™์  ์ฆ๊ฑฐ์˜ ํ™œ์šฉ์„ ์œ„ํ•œ ์‹œ๊ฐ์  ๋ถ„์„ ์‹œ์Šคํ…œ ๋””์ž์ธ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2022. 8. ์„œ์ง„์šฑ.Evidence-based medicine, "the conscientious, explicit, and judicious use of current best evidence in healthcare and medical research" [98], is one of the most widely accepted medical paradigms of modern times. Searching, reviewing, and synthesizing reliable and high-quality scientific evidence is the key step for the paradigm. However, despite the widespread use of the EBM paradigm, challenges remain in applying Evidence-based medicine protocols to medical research. One of the barriers to applying the best scientific evidence to medical research is the severe literature and clinical data overload that causes the evidence-based tasks to be tremendous time-consuming tasks that require vast human effort. In this dissertation, we aim to employ visual analytics approaches to address the challenges of searching and reviewing massive scientific evidence in medical research. To overcome the burden and facilitate handling scientific evidence in medical research, we conducted three design studies and implemented novel visual analytics systems for laborious evidence-based tasks. First, we designed PLOEM, a novel visual analytics system to aid evidence synthesis, an essential step in Evidence-Based medicine, and generate an Evidence Map in a standardized method. We conducted a case study with an oncologist with years of evidence-based medicine experience. In the second study, we conducted a preliminary survey with 76 medical doctors to derive the design requirements for a biomedical literature search. Based on the results, We designed EEEVis, an interactive visual analytic system for biomedical literature search tasks. The system enhances the PubMed search result with several bibliographic visualizations and PubTator annotations. We performed a user study to evaluate the designs with 24 medical doctors and presented the design guidelines and challenges for a biomedical literature search system design. The third study presents GeneVis, a visual analytics system to identify and analyze gene expression signatures across major cancer types. A task that cancer researchers utilize to discover biomarkers in precision medicine. We conducted four case studies with domain experts in oncology and genomics. The study results show that the system can facilitate the task and provide new insights from the data. Based on the three studies of this dissertation, we conclude that carefully designed visual analytics approaches can provide an enhanced understanding and support medical researchers for laborious evidence-based tasks in medical research.๊ทผ๊ฑฐ์ค‘์‹ฌ์˜ํ•™(Evidence-Based Medicine)์ด๋ž€ "์ž„์ƒ ์น˜๋ฃŒ ๋ฐ ์˜ํ•™ ์—ฐ๊ตฌ์—์„œ ํ˜„์žฌ ์กด์žฌํ•˜๋Š” ์ตœ๊ณ ์˜ ์ฆ๊ฑฐ๋ฅผ ์–‘์‹ฌ์ ์ด๊ณ , ๋ช…๋ฐฑํ•˜๋ฉฐ, ๋ถ„๋ณ„ ์žˆ๊ฒŒ ์ด์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก "์ด๋ฉฐ [98], ํ˜„๋Œ€ ์˜ํ•™์—์„œ ๊ฐ€์žฅ ๋„๋ฆฌ ๋ฐ›์•„๋“ค์—ฌ์ง€๋Š” ์˜ํ•™ ํŒจ๋Ÿฌ๋‹ค์ž„์ด๋‹ค. ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ๊ณ ์ˆ˜์ค€์˜ ๊ณผํ•™์  ๊ทผ๊ฑฐ๋ฅผ ๊ฒ€์ƒ‰, ๊ฒ€ํ† , ํ•ฉ์„ฑํ•˜๋Š” ๊ฒƒ์ด์•ผ ๋ง๋กœ ๊ทผ๊ฑฐ์ค‘์‹ฌ์˜ํ•™์˜ ํ•ต์‹ฌ์ด๋‹ค. ํ•˜์ง€๋งŒ, ๊ทผ๊ฑฐ์ค‘์‹ฌ์˜ํ•™์ด ์ด๋ฏธ ๊ด‘๋ฒ”์œ„ํ•˜๊ฒŒ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , ์˜ํ•™ ์—ฐ๊ตฌ์— ๊ทผ๊ฑฐ์ค‘์‹ฌ์˜ํ•™์˜ ํ”„๋กœํ† ์ฝœ์„ ์‹ค์ฒœํ•˜๋Š” ๋ฐ์—๋Š” ์—ฌ์ „ํžˆ ๋งŽ์€ ์–ด๋ ค์›€์ด ๋”ฐ๋ฅธ๋‹ค. ์˜๋ฃŒ ๋ฌธํ—Œ ์ •๋ณด, ์ž„์ƒ ์ •๋ณด ๋ฐ ์œ ์ „์ฒดํ•™ ์ •๋ณด๊นŒ์ง€ ์—ฐ๊ตฌ์ž๊ฐ€ ๊ฒ€ํ† ํ•ด์•ผ ํ•  ๊ทผ๊ฑฐ์˜ ์–‘์€ ๋ฐฉ๋Œ€ํ•˜๋ฉฐ ๊ด‘๋ฒ”์œ„ํ•˜๋‹ค. ๋˜ํ•œ ์˜ํ•™๊ณผ ๊ธฐ์ˆ ์˜ ๋ฐœ์ „์œผ๋กœ ์ธํ•ด ์ ์ฐจ ๋” ๋น ๋ฅธ ์†๋„๋กœ ๋Š˜์–ด๋‚˜๊ณ  ์žˆ๊ธฐ์—, ์ด๋ฅผ ๋ชจ๋‘ ์—„๋ฐ€ํžˆ ๊ฒ€ํ† ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋ง‰๋Œ€ํ•œ ์–‘์˜ ์‹œ๊ฐ„๊ณผ ์ธ๋ ฅ์ด ์žˆ์–ด์•ผ ํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ์‹œ๊ฐ์  ๋ถ„์„ ๋ฐฉ๋ฒ•๋ก ์„ ์ ‘๋ชฉํ•˜์—ฌ ์˜ํ•™ ์—ฐ๊ตฌ์—์„œ ๋ฐฉ๋Œ€ํ•œ ๊ณผํ•™์  ์ฆ๊ฑฐ๋ฅผ ๊ฒ€์ƒ‰ํ•˜๊ณ  ๊ฒ€ํ† ํ•  ์‹œ ๋ฐœ์ƒํ•˜๋Š” ๋ง‰๋Œ€ํ•œ ์ธ์  ์ž์›์˜ ๊ณผ๋ถ€ํ•˜ ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ•˜๊ณ ์ž ํ•œ๋‹ค. ์ด๋ฅผ ์œ„ํ•˜์—ฌ ๊ทผ๊ฑฐ์ค‘์‹ฌ์˜ํ•™์˜ ์ ˆ์ฐจ ์ค‘ ํŠนํžˆ ์ธ๋ ฅ ์†Œ๋ชจ๊ฐ€ ๋ง‰์‹ฌํ•œ ์ ˆ์ฐจ๋“ค์„ ์„ ์ •ํ•˜๊ณ , ์ด๋Ÿฌํ•œ ๋‚œ๊ด€์„ ๊ทน๋ณตํ•˜๊ณ  ๋ณด๋‹ค ํšจ์œจ์ ์ด๊ณ  ํšจ๊ณผ์ ์œผ๋กœ ๋ฐ์ดํ„ฐ์—์„œ ์œ ์˜๋ฏธํ•œ ์ •๋ณด๋ฅผ ๋„์ถœํ•  ์ˆ˜ ์žˆ๊ฒŒ๋” ๋ณด์กฐํ•˜๋Š” ์„ธ ๊ฐ€์ง€ ์‹œ๊ฐ์  ๋ถ„์„ ์‹œ์Šคํ…œ๋“ค์„ ๊ตฌํ˜„ํ•˜์˜€์œผ๋ฉฐ, ๊ฐ๊ฐ์˜ ์‹œ์Šคํ…œ์— ๊ด€ํ•œ ๋””์ž์ธ ์—ฐ๊ตฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค. ์šฐ์„  ์ฒซ ๋””์ž์ธ ์—ฐ๊ตฌ์—์„œ๋Š” ๊ทผ๊ฑฐ์ค‘์‹ฌ์˜ํ•™ ์—ฐ๊ตฌ์— ์žˆ์–ด ํ•„์ˆ˜์  ๋‹จ๊ณ„์ธ ๊ทผ๊ฑฐ ํ•ฉ์„ฑ ๋ฐฉ๋ฒ•๋ก ์˜ ํ•˜๋‚˜์ธ ๊ทผ๊ฑฐ ๋งคํ•‘(Evidence Mapping) ๊ณผ์ •์„ ์ง€์›ํ•˜๊ธฐ ์œ„ํ•œ ์‹œ๊ฐ์  ๋ถ„์„ ์‹œ์Šคํ…œ PLOEM์„ ์„ค๊ณ„ํ–ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด๋ฅผ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค๋…„๊ฐ„์˜ ๊ทผ๊ฑฐ ๊ธฐ๋ฐ˜ ์˜๋ฃŒ ๊ฒฝํ—˜์ด ์žˆ๋Š” ์ข…์–‘ํ•™์ž์™€ ํ•จ๊ป˜ ์‚ฌ๋ก€ ์—ฐ๊ตฌ๋ฅผ ์ˆ˜ํ–‰ํ–ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ ๋””์ž์ธ ์—ฐ๊ตฌ์—์„œ๋Š” ์˜ํ•™ ๋ฌธํ—Œ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์˜ ์š”๊ตฌ์‚ฌํ•ญ ๋ถ„์„์„ ์œ„ํ•ด ์ด 76๋ช…์˜ ์˜์‚ฌ๋ฅผ ์ƒ๋Œ€๋กœ ์„ค๋ฌธ์กฐ์‚ฌ๋ฅผ ์ง„ํ–‰ํ•˜์˜€๊ณ , ์ด๋Ÿฌํ•œ ๋ถ„์„์„ ๋ฐ”ํƒ•์œผ๋กœ ๋Œ€ํ™”ํ˜• ์‹œ๊ฐ์  ๋ถ„์„ ์‹œ์Šคํ…œ์ธ EEEVis๋ฅผ ์„ค๊ณ„ํ–ˆ๋‹ค. ์ด ์‹œ์Šคํ…œ์€ ์—ฌ๋Ÿฌ ์ข…์˜ ์„œ์ง€ ์ •๋ณด ์‹œ๊ฐํ™” ์ธํ„ฐํŽ˜์ด์Šค์™€ PubTator์˜ ์ฃผ์„ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜์—ฌ PubMed ๊ฒ€์ƒ‰ ์—”์ง„์˜ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ์ฆ๊ฐ•ํ•˜๋Š” ์‹œ์Šคํ…œ์ด๋ฉฐ, ์ด๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ์ด 24๋ช…์˜ ์˜์‚ฌ์™€ ํ•จ๊ป˜ ์‚ฌ์šฉ์ž ์—ฐ๊ตฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค. ์ด ์—ฐ๊ตฌ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์˜ํ•™ ๋ฌธํ—Œ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์— ๋Œ€ํ•œ ์„ค๊ณ„ ์ง€์นจ๊ณผ ๊ณผ์ œ๋ฅผ ์ œ์‹œํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ์„ธ ๋ฒˆ์งธ ๋””์ž์ธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ž„์˜์˜ ์œ ์ „์ž๊ตฐ์˜ ์œ ์ „์ž ๋ฐœํ˜„ ํŒจํ„ด์„ ์ฃผ์š” ์•” ์œ ํ˜•์— ๋”ฐ๋ผ ์‹œ๊ฐํ™”ํ•˜๊ณ  ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๋Š” ์‹œ์Šคํ…œ์ธ GeneVis๋ฅผ ์„ค๊ณ„ํ•˜์˜€๋‹ค. ์•” ์œ ํ˜•์— ๋”ฐ๋ฅธ ์œ ์ „์ž ๋ฐœํ˜„ ํŒจํ„ด์˜ ๋ถ„์„๊ณผ ๋น„๊ต๋Š” ์•” ์—ฐ๊ตฌ์ž๋“ค์ด ์ •๋ฐ€ ์˜ํ•™์—์„œ ์ƒ์ฒด ์ง€ํ‘œ(Biomarker)๋ฅผ ๋ฐœ๊ฒฌํ•˜๊ธฐ ์œ„ํ•ด ๋นˆ๋ฒˆํžˆ ์ˆ˜ํ–‰ํ•˜๋Š” ์ž‘์—…์ด๋‹ค. ์šฐ๋ฆฌ๋Š” ์ข…์–‘ํ•™ ์ „๋ฌธ๊ฐ€ ๋ฐ ์œ ์ „์ฒดํ•™ ์ „๋ฌธ๊ฐ€ ์ด 4์ธ์„ ๋Œ€์ƒ์œผ๋กœ ์‚ฌ๋ก€ ์—ฐ๊ตฌ๋ฅผ ์ง„ํ–‰ํ•˜์˜€๊ณ , ๊ทธ ๊ฒฐ๊ณผ GeneVis๊ฐ€ ํ•ด๋‹น ์ž‘์—…์„ ๋” ์ˆ˜์›”ํ•˜๊ฒŒ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ๊ณผ ๊ธฐ์กด์˜ ๋ฐ์ดํ„ฐ์—์„œ ์ƒˆ๋กœ์šด ์ •๋ณด๋ฅผ ๋„์ถœํ•˜๋Š” ๊ฒƒ์— ๋„์›€์ด ๋˜์—ˆ์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค. ์œ„์˜ ์„ธ ๋””์ž์ธ ์—ฐ๊ตฌ์˜ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ, ๋ณธ ๋…ผ๋ฌธ์€ ์‚ฌ์šฉ์ž ๋ถ„์„๊ณผ ์ž‘์—… ๋ถ„์„์„ ๋™๋ฐ˜ํ•œ ์‹œ๊ฐ์  ๋ถ„์„ ๋ฐฉ๋ฒ•๋ก ์ด ์˜ํ•™ ์—ฐ๊ตฌ์˜ ๊ทผ๊ฑฐ ๊ด€๋ จ ์ž‘์—…์˜ ์–ด๋ ค์›€์„ ํ•ด์†Œํ•˜๊ณ , ๋ถ„์„ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๋ณด๋‹ค ๋‚˜์€ ์ดํ•ด๋ฅผ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•˜๋‹ค๊ณ  ๊ฒฐ๋ก  ๋‚ด๋ฆฐ๋‹ค.CHAPTER1 Introduction 1 1.1 Background and Motivation 1 1.2 Dissertation Outline 5 CHAPTER2 Related Work 7 2.1 Evidence Mapping: Graphical Representation for a Scientific Evidence Landscape 7 2.2 Scientific Literature Visualizations and Bibliography Visualizations 9 2.3 Visual Anlytics Systems for Genomics Data sets and Research Tasks 10 CHAPTER3 PLOEM: An Interactive Visualization Tool for Effective Evidence Mapping with Biomedical literature 12 3.1 Introduction 12 3.2 Visual Representations and Interactions of PLOEM 14 3.2.1 Overview of the PICO Criteria 14 3.2.2 Trend Visualization with the Timeline view 17 3.2.3 Representing the PICO Co-occurrence with the Relation view 20 3.2.4 Study detail view 22 3.3 Usage Scenarios: Visualizing Various Study Sizes with PLOEM 23 3.4 Conclusion 24 CHAPTER4 EEEvis: Efficacy improvement in searching MEDLINE database using a novel PubMed visual analytic system 26 4.1 Introduction 26 4.1.1 Motivation 26 4.1.2 Preliminary Survey: A Questionnaire on conventional literature search methods 28 4.1.3 Design Requirements for Biomedical Literature Search Systems 36 4.2 System and Interface Implementation of EEEVis 37 4.2.1 System Overview 37 4.2.2 Bibliography Filters 40 4.2.3 Timeline View 41 4.2.4 Co-authorship Network View 43 4.2.5 Article List and Detail View 44 4.3 User Study 46 4.3.1 Participants 46 4.3.2 Procedures 48 4.3.3 Results and Observations 50 4.4 Discussion 54 4.4.1 Design Implications 56 4.4.2 Limitations and Future Work 57 4.5 Conclusions 59 CHAPTER5 GeneVis: A Visual Analytics Systemfor Gene Signature Analysis in Cancers 68 5.1 Motivation 68 5.2 System and Interface Implementation 69 5.2.1 System Overview 69 5.2.2 Gene Expression Detail View 71 5.2.3 Gene Vector Projection View 72 5.2.4 Gene x Cancer Type Heatmap view 74 5.2.5 User Interaction in Multiple Coordinated Views 76 5.3 Case Studies 76 5.3.1 Participants 76 5.3.2 Task and Procedures 76 5.3.3 Case1: Identifying SimilarGeneSignatures with TGFB1in Hallmark Gene Sets 80 5.3.4 Case2: Identifying Cluster Patterns in the HRD data set 81 5.3.5 Results 82 5.4 Summary 85 CHAPTER6 Conclusion and future work 86 6.1 Conclusion 86 6.2 Future Work 87 Abstract (Korean) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102๋ฐ•

    KnetMiner - An integrated data platform for gene mining and biological knowledge discovery

    Get PDF
    Hassani-Pak K. KnetMiner - An integrated data platform for gene mining and biological knowledge discovery. Bielefeld: Universitรคt Bielefeld; 2017.Discovery of novel genes that control important phenotypes and diseases is one of the key challenges in biological sciences. Now, in the post-genomics era, scientists have access to a vast range of genomes, genotypes, phenotypes and โ€˜omics data which - when used systematically - can help to gain new insights and make faster discoveries. However, the volume and diversity of such un-integrated data is often seen as a burden that only those with specialist bioinformatics skills, but often only minimal specialist biological knowledge, can penetrate. Therefore, new tools are required to allow researchers to connect, explore and compare large-scale datasets to identify the genes and pathways that control important phenotypes and diseases in plants, animals and humans. KnetMiner, with a silent "K" and standing for Knowledge Network Miner, is a suite of open-source software tools for integrating and visualising large biological datasets. The software mines the myriad databases that describe an organismโ€™s biology to present links between relevant pieces of information, such as genes, biological pathways, phenotypes and publications with the aim to provide leads for scientists who are investigating the molecular basis for a particular trait. The KnetMiner approach is based on 1) integration of heterogeneous, complex and interconnected biological information into a knowledge graph; 2) text-mining to enrich the knowledge graph with novel relations extracted from literature; 3) graph queries of varying depths to find paths between genes and evidence nodes; 4) evidence-based gene rank algorithm that combines graph and information theory; 5) fast search and interactive knowledge visualisation techniques. Overall, [KnetMiner](http://knetminer.rothamsted.ac.uk) is a publicly available resource that helps scientists trawl diverse biological databases for clues to design better crop varieties and understand diseases. The key strength of KnetMiner is to include the end user into the โ€œinteractiveโ€ knowledge discovery process with the goal of supporting human intelligence with machine intelligence

    The Musical Abilities, Pleiotropy, Language, and Environment (MAPLE) framework for understanding musicality-language links across the lifespan

    Get PDF
    Using individual differences approaches, a growing body of literature finds positive associations between musicality and language-related abilities, complementing prior findings of links between musical training and language skills. Despite these associations, musicality has been often overlooked in mainstream models of individual differences in language acquisition and development. To better understand the biological basis of these individual differences, we propose the Musical Abilities, Pleiotropy, Language, and Environment (MAPLE) framework. This novel integrative framework posits that musical and language-related abilities likely share some common genetic architecture (i.e., genetic pleiotropy) in addition to some degree of overlapping neural endophenotypes, and genetic influences on musically and linguistically enriched environments. Drawing upon recent advances in genomic methodologies for unraveling pleiotropy, we outline testable predictions for future research on language development and how its underlying neurobiological substrates may be supported by genetic pleiotropy with musicality. In support of the MAPLE framework, we review and discuss findings from over seventy behavioral and neural studies, highlighting that musicality is robustly associated with individual differences in a range of speech-language skills required for communication and development. These include speech perception-in-noise, prosodic perception, morphosyntactic skills, phonological skills, reading skills, and aspects of second/foreign language learning. Overall, the current work provides a clear agenda and framework for studying musicality-language links using individual differences approaches, with an emphasis on leveraging advances in the genomics of complex musicality and language traits

    Ranking target objects of navigational queries

    Full text link

    Doctor of Philosophy

    Get PDF
    dissertationThe objective of this work is to examine the efficacy of natural language processing (NLP) in summarizing bibliographic text for multiple purposes. Researchers have noted the accelerating growth of bibliographic databases. Information seekers using traditional information retrieval techniques when searching large bibliographic databases are often overwhelmed by excessive, irrelevant data. Scientists have applied natural language processing technologies to improve retrieval. Text summarization, a natural language processing approach, simplifies bibliographic data while filtering it to address a user's need. Traditional text summarization can necessitate the use of multiple software applications to accommodate diverse processing refinements known as "points-of-view." A new, statistical approach to text summarization can transform this process. Combo, a statistical algorithm comprised of three individual metrics, determines which elements within input data are relevant to a user's specified information need, thus enabling a single software application to summarize text for many points-of-view. In this dissertation, I describe this algorithm, and the research process used in developing and testing it. Four studies comprised the research process. The goal of the first study was to create a conventional schema accommodating a genetic disease etiology point-of-view, and an evaluative reference standard. This was accomplished through simulating the task of secondary genetic database curation. The second study addressed the development iv and initial evaluation of the algorithm, comparing its performance to the conventional schema using the previously established reference standard, again within the task of secondary genetic database curation. The third and fourth studies evaluated the algorithm's performance in accommodating additional points-of-view in a simulated clinical decision support task. The third study explored prevention, while the fourth evaluated performance for prevention and drug treatment, comparing results to a conventional treatment schema's output. Both summarization methods identified data that were salient to their tasks. The conventional genetic disease etiology and treatment schemas located salient information for database curation and decision support, respectively. The Combo algorithm located salient genetic disease etiology, treatment, and prevention data, for the associated tasks. Dynamic text summarization could potentially serve additional purposes, such as consumer health information delivery, systematic review creation, and primary research. This technology may benefit many user groups
    • โ€ฆ
    corecore