94 research outputs found

    Big tranSMART for clinical decision making

    Get PDF
    Molecular profiling data based patient stratification plays a key role in clinical decision making, such as identification of disease subgroups and prediction of treatment responses of individual subjects. Many existing knowledge management systems like tranSMART enable scientists to do such analysis. But in the big data era, molecular profiling data size increases sharply due to new biological techniques, such as next generation sequencing. None of the existing storage systems work well while considering the three ”V” features of big data (Volume, Variety, and Velocity). New Key Value data stores like Apache HBase and Google Bigtable can provide high speed queries by the Key. These databases can be modeled as Distributed Ordered Table (DOT), which horizontally partitions a table into regions and distributes regions to region servers by the Key. However, none of existing data models work well for DOT. A Collaborative Genomic Data Model (CGDM) has been designed to solve all these is- sues. CGDM creates three Collaborative Global Clustering Index Tables to improve the data query velocity. Microarray implementation of CGDM on HBase performed up to 246, 7 and 20 times faster than the relational data model on HBase, MySQL Cluster and MongoDB. Single nucleotide polymorphism implementation of CGDM on HBase outperformed the relational model on HBase and MySQL Cluster by up to 351 and 9 times. Raw sequence implementation of CGDM on HBase gains up to 440-fold and 22-fold speedup, compared to the sequence alignment map format implemented in HBase and a binary alignment map server. The integration into tranSMART shows up to 7-fold speedup in the data export function. In addition, a popular hierarchical clustering algorithm in tranSMART has been used as an application to indicate how CGDM can influence the velocity of the algorithm. The optimized method using CGDM performs more than 7 times faster than the same method using the relational model implemented in MySQL Cluster.Open Acces

    Cancer Epidemiol Biomarkers Prev

    Get PDF
    Background:Large-scale cancer epidemiology cohorts (CECs) have successfully collected, analyzed, and shared patient-reported data for years. CECs increasingly need to make their data more Findable, Accessible, Interoperable, and Reusable, or FAIR. How CECs should approach this transformation is unclear.Methods:The California Teachers Study (CTS) is an observational CEC of 133,477 participants followed since 1995\u20131996. In 2014, we began updating our data storage, management, analysis, and sharing strategy. With the San Diego Supercomputer Center, we deployed a new infrastructure based on a Data Warehouse, to integrate and manage data; and a secure and shared workspace with documentation, software, and analytic tools that facilitate collaboration and accelerate analyses.Results:Our new CTS infrastructure includes a Data Warehouse and data marts, which are focused subsets from the Data Warehouse designed for efficiency. The secure CTS workspace utilizes a Remote Desktop service that operates within a HIPAA and FISMA compliant platform. Our infrastructure offers broad access to CTS data; includes statistical analysis and data visualization software and tools; flexibly manages other key data activities (e.g., cleaning, updates, & data sharing); and will continue to evolve to advance FAIR principles.Conclusion:Our scalable infrastructure provides the security, authorization, data model, metadata, and analytic tools needed to manage, share, and analyze CTS data in ways that are consistent with the NCI\u2019s Cancer Research Data Commons Framework.Impact:The CTS\u2019s implementation of new infrastructure in an ongoing CEC demonstrates how population sciences can explore and embrace new cloud-based and analytics infrastructure to accelerate cancer research and translation.HHSN261201800032C/CA/NCI NIH HHSUnited States/HHSN261201800009C/CA/NCI NIH HHSUnited States/NU58DP006344/DP/NCCDPHP CDC HHSUnited States/HHSN261201800015I/CA/NCI NIH HHSUnited States/HHSN261201800032I/CA/NCI NIH HHSUnited States/P30 CA033572/CA/NCI NIH HHSUnited States/HHSN261201800015C/CA/NCI NIH HHSUnited States/HHSN261201800009I/CA/NCI NIH HHSUnited States/U01 CA199277/CA/NCI NIH HHSUnited States/UM1 CA164917/CA/NCI NIH HHSUnited States/P30 CA023100/CA/NCI NIH HHSUnited States/R01 CA077398/CA/NCI NIH HHSUnited States

    Discovering Biomarkers of Alzheimer's Disease by Statistical Learning Approaches

    Get PDF
    In this work, statistical learning approaches are exploited to discover biomarkers for Alzheimer's disease (AD). The contributions has been made in the fields of both biomarker and software driven studies. Surprising discoveries were made in the field of blood-based biomarker search. With the inclusion of existing biological knowledge and a proposed novel feature selection method, several blood-based protein models were discovered to have promising ability to separate AD patients from healthy individuals. A new statistical pattern was discovered which can be potential new guideline for diagnosis methodology. In the field of brain-based biomarker, the positive contribution of covariates such as age, gender and APOE genotype to a AD classifier was verified, as well as the discovery of panel of highly informative biomarkers comprising 26 RNA transcripts. The classifier trained by the panetl of genes shows excellent capacity in discriminating patients from control. Apart from biomarker driven studies, the development of statistical packages or application were also involved. R package metaUnion was designed and developed to provide advanced meta-analytic approach applicable for microarray data. This package overcomes the defects appearing in previous meta-analytic packages { 1) the neglection of missing data, 2) the in exibility of feature dimension 3) the lack of functions to support post-analysis summary. R package metaUnion has been applied in a published study as part of the integrated genomic approaches and resulted in significant findings. To provide benchmark references about significance of features for dementia researchers, a web-based platform AlzExpress was built to provide researchers with granular level of differential expression test and meta-analysis results. A combination of fashionable big data technologies and robust data mining algorithms make AlzExpress flexible, scalable and comprehensive platform of valuable bioinformatics in dementia research.Plymouth Universit

    Integrative analysis and visualization of multi-omics data of mitochondria-associated diseases

    Get PDF

    ICSEA 2021: the sixteenth international conference on software engineering advances

    Get PDF
    The Sixteenth International Conference on Software Engineering Advances (ICSEA 2021), held on October 3 - 7, 2021 in Barcelona, Spain, continued a series of events covering a broad spectrum of software-related topics. The conference covered fundamentals on designing, implementing, testing, validating and maintaining various kinds of software. The tracks treated the topics from theory to practice, in terms of methodologies, design, implementation, testing, use cases, tools, and lessons learnt. The conference topics covered classical and advanced methodologies, open source, agile software, as well as software deployment and software economics and education. The conference had the following tracks: Advances in fundamentals for software development Advanced mechanisms for software development Advanced design tools for developing software Software engineering for service computing (SOA and Cloud) Advanced facilities for accessing software Software performance Software security, privacy, safeness Advances in software testing Specialized software advanced applications Web Accessibility Open source software Agile and Lean approaches in software engineering Software deployment and maintenance Software engineering techniques, metrics, and formalisms Software economics, adoption, and education Business technology Improving productivity in research on software engineering Trends and achievements Similar to the previous edition, this event continued to be very competitive in its selection process and very well perceived by the international software engineering community. As such, it is attracting excellent contributions and active participation from all over the world. We were very pleased to receive a large amount of top quality contributions. We take here the opportunity to warmly thank all the members of the ICSEA 2021 technical program committee as well as the numerous reviewers. The creation of such a broad and high quality conference program would not have been possible without their involvement. We also kindly thank all the authors that dedicated much of their time and efforts to contribute to the ICSEA 2021. We truly believe that thanks to all these efforts, the final conference program consists of top quality contributions. This event could also not have been a reality without the support of many individuals, organizations and sponsors. We also gratefully thank the members of the ICSEA 2021 organizing committee for their help in handling the logistics and for their work that is making this professional meeting a success. We hope the ICSEA 2021 was a successful international forum for the exchange of ideas and results between academia and industry and to promote further progress in software engineering research

    Automated Injection of Curated Knowledge Into Real-Time Clinical Systems: CDS Architecture for the 21st Century

    Get PDF
    abstract: Clinical Decision Support (CDS) is primarily associated with alerts, reminders, order entry, rule-based invocation, diagnostic aids, and on-demand information retrieval. While valuable, these foci have been in production use for decades, and do not provide a broader, interoperable means of plugging structured clinical knowledge into live electronic health record (EHR) ecosystems for purposes of orchestrating the user experiences of patients and clinicians. To date, the gap between knowledge representation and user-facing EHR integration has been considered an “implementation concern” requiring unscalable manual human efforts and governance coordination. Drafting a questionnaire engineered to meet the specifications of the HL7 CDS Knowledge Artifact specification, for example, carries no reasonable expectation that it may be imported and deployed into a live system without significant burdens. Dramatic reduction of the time and effort gap in the research and application cycle could be revolutionary. Doing so, however, requires both a floor-to-ceiling precoordination of functional boundaries in the knowledge management lifecycle, as well as formalization of the human processes by which this occurs. This research introduces ARTAKA: Architecture for Real-Time Application of Knowledge Artifacts, as a concrete floor-to-ceiling technological blueprint for both provider heath IT (HIT) and vendor organizations to incrementally introduce value into existing systems dynamically. This is made possible by service-ization of curated knowledge artifacts, then injected into a highly scalable backend infrastructure by automated orchestration through public marketplaces. Supplementary examples of client app integration are also provided. Compilation of knowledge into platform-specific form has been left flexible, in so far as implementations comply with ARTAKA’s Context Event Service (CES) communication and Health Services Platform (HSP) Marketplace service packaging standards. Towards the goal of interoperable human processes, ARTAKA’s treatment of knowledge artifacts as a specialized form of software allows knowledge engineers to operate as a type of software engineering practice. Thus, nearly a century of software development processes, tools, policies, and lessons offer immediate benefit: in some cases, with remarkable parity. Analyses of experimentation is provided with guidelines in how choice aspects of software development life cycles (SDLCs) apply to knowledge artifact development in an ARTAKA environment. Portions of this culminating document have been further initiated with Standards Developing Organizations (SDOs) intended to ultimately produce normative standards, as have active relationships with other bodies.Dissertation/ThesisDoctoral Dissertation Biomedical Informatics 201

    Discovering lesser known molecular players and mechanistic patterns in Alzheimer's disease using an integrative disease modelling approach

    Get PDF
    Convergence of exponentially advancing technologies is driving medical research with life changing discoveries. On the contrary, repeated failures of high-profile drugs to battle Alzheimer's disease (AD) has made it one of the least successful therapeutic area. This failure pattern has provoked researchers to grapple with their beliefs about Alzheimer's aetiology. Thus, growing realisation that Amyloid-β and tau are not 'the' but rather 'one of the' factors necessitates the reassessment of pre-existing data to add new perspectives. To enable a holistic view of the disease, integrative modelling approaches are emerging as a powerful technique. Combining data at different scales and modes could considerably increase the predictive power of the integrative model by filling biological knowledge gaps. However, the reliability of the derived hypotheses largely depends on the completeness, quality, consistency, and context-specificity of the data. Thus, there is a need for agile methods and approaches that efficiently interrogate and utilise existing public data. This thesis presents the development of novel approaches and methods that address intrinsic issues of data integration and analysis in AD research. It aims to prioritise lesser-known AD candidates using highly curated and precise knowledge derived from integrated data. Here much of the emphasis is put on quality, reliability, and context-specificity. This thesis work showcases the benefit of integrating well-curated and disease-specific heterogeneous data in a semantic web-based framework for mining actionable knowledge. Furthermore, it introduces to the challenges encountered while harvesting information from literature and transcriptomic resources. State-of-the-art text-mining methodology is developed to extract miRNAs and its regulatory role in diseases and genes from the biomedical literature. To enable meta-analysis of biologically related transcriptomic data, a highly-curated metadata database has been developed, which explicates annotations specific to human and animal models. Finally, to corroborate common mechanistic patterns — embedded with novel candidates — across large-scale AD transcriptomic data, a new approach to generate gene regulatory networks has been developed. The work presented here has demonstrated its capability in identifying testable mechanistic hypotheses containing previously unknown or emerging knowledge from public data in two major publicly funded projects for Alzheimer's, Parkinson's and Epilepsy diseases

    Preface

    Get PDF
    • …
    corecore