1,300 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Computational and chemical approaches to drug repurposing

    Get PDF
    Drug repurposing, which entails discovering novel therapeutic applications for already existing drugs, provides numerous benefits compared to conventional drug discovery methods. This strategy can be pursued through two primary approaches: computational and chemical. Computational methods involve the utilization of data mining and bioinformatics techniques to identify potential drug candidates, while chemical approaches involve experimental screens oriented to finding new potential treatments based on existing drugs. Both computational and chemical methods have proven successful in uncovering novel therapeutic uses for established drugs. During my PhD, I participated in several experimental drug repurposing screens based on high-throughput phenotypic approaches. Finally, attracted by the potential of computational drug repurposing pipelines, I decided to contribute and generate a web platform focused on the use of transcriptional signatures to identify potential new treatments for human disease. A summary of these studies follows: In Study I, we utilized the tetracycline repressor (tetR)-regulated mechanism to create a human osteosarcoma cell line (U2OS) with the ability to express TAR DNA-binding protein 43 (TDP-43) upon induction. TDP-43 is a protein known for its association with several neurodegenerative diseases. We implemented a chemical screening with this system as part of our efforts to repurpose approved drugs. While the screening was unsuccessful to identify modulators of TDP-43 toxicity, it revealed compounds capable of inhibiting the doxycyclinedependent TDP-43 expression. Furthermore, a complementary CRISPR/Cas9 screening using the same cell system identified additional regulators of doxycycline-dependent TDP43 expression. This investigation identifies new chemical and genetic modulators of the tetR system and highlights potential limitations of using this system for chemical or genetic screenings in mammalian cells. In Study II, our objective was to reposition compounds that could potentially reduce the toxic effects of a fragment of the Huntingtin (HTT) protein containing a 94 amino acid long glutamine stretch (Htt-Q94), a feature of Huntington's disease (HD). To achieve this, we carried out a high-throughput chemical screening using a varied collection of 1,214 drugs, largely sourced from a drug repurposing library. Through our screening process, we singled out clofazimine, an FDA-approved anti-leprosy drug, as a potential therapeutic candidate. Its effectiveness was validated across several in vitro models as well as a zebrafish model of polyglutamine (polyQ) toxicity. Employing a combination of computational analysis of transcriptional signatures, molecular modeling, and biochemical assays, we deduced that clofazimine is an agonist for the peroxisome proliferator-activated receptor gamma (PPARγ), a receptor previously suggested to be a viable therapeutic target for HD due to its role in promoting mitochondrial biogenesis. Notably, clofazimine was successful in alleviating the mitochondrial dysfunction triggered by the expression of Htt-Q94. These findings lend substantial support to the potential of clofazimine as a viable candidate for drug repurposing in the treatment of polyQ diseases. In Study III, we explored the molecular mechanism of a previously identified repurposing example, the use of diethyldithiocarbamate-copper complex (CuET), a disulfiram metabolite, for cancer treatment. We found CuET effectively inhibits cancer cell growth by targeting the NPL4 adapter of the p97VCP segregase, leading to translational arrest and stress in tumor cells. CuET also activates ribosomal biogenesis and autophagy in cancer cells, and its cytotoxicity can be enhanced by inhibiting these pathways. Thus, CuET shows promise as a cancer treatment, especially in combination therapies. In Study IV, we capitalized on the Molecular Signatures Database (MSigDB), one of the largest signature repositories, and drug transcriptomic profiles from the Connectivity Map (CMap) to construct a comprehensive and interactive drug-repurposing database called the Drug Repurposing Encyclopedia (DRE). Housing over 39.7 million pre-computed drugsignature associations across 20 species, the DRE allows users to conduct real-time drugrepurposing analysis. This can involve comparing user-supplied gene signatures with existing ones in the DRE, carrying out drug-gene set enrichment analyses (drug-GSEA) using submitted drug transcriptomic profiles, or conducting similarity analyses across all database signatures using user-provided gene sets. Overall, the DRE is an exhaustive database aimed at promoting drug repurposing based on transcriptional signatures, offering deep-dive comparisons across molecular signatures and species. Drug repurposing presents a valuable strategy for discovering fresh therapeutic applications for existing drugs, offering numerous benefits compared to conventional drug discovery methods. The studies conducted in this thesis underscore the potential of drug repurposing and highlight the complementary roles of computational and chemical approaches. These studies enhance our understanding of the mechanistic properties of repurposed drugs, such as clofazimine and disulfiram, and reveal novel mechanisms for targeting specific disease pathways. Additionally, the development of the DRE platform provides a comprehensive tool to support researchers in conducting drug-repositioning analyses, further facilitating the advancement of drug repurposing studies

    Involvement of genes and non-coding RNAs in cancer: profiling using microarrays

    Get PDF
    MicroRNAs (miRNAs) are small noncoding RNAs (ncRNAs, RNAs that do not code for proteins) that regulate the expression of target genes. MiRNAs can act as tumor suppressor genes or oncogenes in human cancers. Moreover, a large fraction of genomic ultraconserved regions (UCRs) encode a particular set of ncRNAs whose expression is altered in human cancers. Bioinformatics studies are emerging as important tools to identify associations between miRNAs/ncRNAs and CAGRs (Cancer Associated Genomic Regions). ncRNA profiling, the use of highly parallel devices like microarrays for expression, public resources like mapping, expression, functional databases, and prediction algorithms have allowed the identification of specific signatures associated with diagnosis, prognosis and response to treatment of human tumors

    Systems Analytics and Integration of Big Omics Data

    Get PDF
    A “genotype"" is essentially an organism's full hereditary information which is obtained from its parents. A ""phenotype"" is an organism's actual observed physical and behavioral properties. These may include traits such as morphology, size, height, eye color, metabolism, etc. One of the pressing challenges in computational and systems biology is genotype-to-phenotype prediction. This is challenging given the amount of data generated by modern Omics technologies. This “Big Data” is so large and complex that traditional data processing applications are not up to the task. Challenges arise in collection, analysis, mining, sharing, transfer, visualization, archiving, and integration of these data. In this Special Issue, there is a focus on the systems-level analysis of Omics data, recent developments in gene ontology annotation, and advances in biological pathways and network biology. The integration of Omics data with clinical and biomedical data using machine learning is explored. This Special Issue covers new methodologies in the context of gene–environment interactions, tissue-specific gene expression, and how external factors or host genetics impact the microbiome

    Economic Evaluation of Potential Applications of Gene Expression Profiling in Clinical Oncology

    Get PDF
    Histopathological analysis of tumor is currently the main tool used to guide cancer management. Gene expression profiling may provide additional valuable information for both classification and prognostication of individual tumors. A number of gene expression profiling assays have been developed recently to inform therapy decisions in women with early stage breast cancer and help identify the primary tumor site in patients with metastatic cancer of unknown primary. The impact of these assays on health and economic outcomes, if introduced into general practice, has not been determined. I aimed to conduct an economic evaluation of regulatory-approved gene expression profiling assays for breast cancer and cancer of unknown primary for the purpose of determining whether these technologies represent value for money from the perspective of the Canadian health care system. I developed decision-analytic models to project the lifetime clinical and economic consequences of early stage breast cancer and metastatic cancer of unknown primary. I used Manitoba Cancer Registry and Manitoba administrative health databases to model current “real-world” Canadian clinical practices. I applied available data about gene expression profiling assays from secondary sources on these models to predict the impact of these assays on current clinical and economic outcomes. In the base case, gene expression profiling assays in early stage breast cancer and cancer of unknown primary resulted in incremental cost effectiveness ratios of less than $100,000 per quality-adjusted life-year gained. These results were most sensitive to the uncertainty associated with the accuracy of the assay, patient-physician response to gene expression profiling information and patient survival. The potential application of these gene expression profiling assays in clinical oncology appears to be cost-effective in the Canadian healthcare system. Field evaluation of these assays to establish their impact on cancer management and patient survival may have a large societal impact and should be initiated in Canada to ensure their clinical utility and cost-effectiveness. The use of Canadian provincial administrative population data in decision modeling is useful to quantify uncertainty about gene expression profiling assays and guide the use of novel funding models such as conditional funding alongside a field evaluation

    Efficient, Dependable Storage of Human Genome Sequencing Data

    Get PDF
    A compreensão do genoma humano impacta várias áreas da vida. Os dados oriundos do genoma humano são enormes pois existem milhões de amostras a espera de serem sequenciadas e cada genoma humano sequenciado pode ocupar centenas de gigabytes de espaço de armazenamento. Os genomas humanos são críticos porque são extremamente valiosos para a investigação e porque podem fornecer informações delicadas sobre o estado de saúde dos indivíduos, identificar os seus dadores ou até mesmo revelar informações sobre os parentes destes. O tamanho e a criticidade destes genomas, para além da quantidade de dados produzidos por instituições médicas e de ciências da vida, exigem que os sistemas informáticos sejam escaláveis, ao mesmo tempo que sejam seguros, confiáveis, auditáveis e com custos acessíveis. As infraestruturas de armazenamento existentes são tão caras que não nos permitem ignorar a eficiência de custos no armazenamento de genomas humanos, assim como em geral estas não possuem o conhecimento e os mecanismos adequados para proteger a privacidade dos dadores de amostras biológicas. Esta tese propõe um sistema de armazenamento de genomas humanos eficiente, seguro e auditável para instituições médicas e de ciências da vida. Ele aprimora os ecossistemas de armazenamento tradicionais com técnicas de privacidade, redução do tamanho dos dados e auditabilidade a fim de permitir o uso eficiente e confiável de infraestruturas públicas de computação em nuvem para armazenar genomas humanos. As contribuições desta tese incluem (1) um estudo sobre a sensibilidade à privacidade dos genomas humanos; (2) um método para detetar sistematicamente as porções dos genomas que são sensíveis à privacidade; (3) algoritmos de redução do tamanho de dados, especializados para dados de genomas sequenciados; (4) um esquema de auditoria independente para armazenamento disperso e seguro de dados; e (5) um fluxo de armazenamento completo que obtém garantias razoáveis de proteção, segurança e confiabilidade a custos modestos (por exemplo, menos de 1/Genoma/Ano),integrandoosmecanismospropostosaconfigurac\co~esdearmazenamentoapropriadasTheunderstandingofhumangenomeimpactsseveralareasofhumanlife.Datafromhumangenomesismassivebecausetherearemillionsofsamplestobesequenced,andeachsequencedhumangenomemaysizehundredsofgigabytes.Humangenomesarecriticalbecausetheyareextremelyvaluabletoresearchandmayprovidehintsonindividualshealthstatus,identifytheirdonors,orrevealinformationaboutdonorsrelatives.Theirsizeandcriticality,plustheamountofdatabeingproducedbymedicalandlifesciencesinstitutions,requiresystemstoscalewhilebeingsecure,dependable,auditable,andaffordable.Currentstorageinfrastructuresaretooexpensivetoignorecostefficiencyinstoringhumangenomes,andtheylacktheproperknowledgeandmechanismstoprotecttheprivacyofsampledonors.Thisthesisproposesanefficientstoragesystemforhumangenomesthatmedicalandlifesciencesinstitutionsmaytrustandafford.Itenhancestraditionalstorageecosystemswithprivacyaware,datareduction,andauditabilitytechniquestoenabletheefficient,dependableuseofmultitenantinfrastructurestostorehumangenomes.Contributionsfromthisthesisinclude(1)astudyontheprivacysensitivityofhumangenomes;(2)todetectgenomesprivacysensitiveportionssystematically;(3)specialiseddatareductionalgorithmsforsequencingdata;(4)anindependentauditabilityschemeforsecuredispersedstorage;and(5)acompletestoragepipelinethatobtainsreasonableprivacyprotection,security,anddependabilityguaranteesatmodestcosts(e.g.,lessthan1/Genoma/Ano), integrando os mecanismos propostos a configurações de armazenamento apropriadasThe understanding of human genome impacts several areas of human life. Data from human genomes is massive because there are millions of samples to be sequenced, and each sequenced human genome may size hundreds of gigabytes. Human genomes are critical because they are extremely valuable to research and may provide hints on individuals’ health status, identify their donors, or reveal information about donors’ relatives. Their size and criticality, plus the amount of data being produced by medical and life-sciences institutions, require systems to scale while being secure, dependable, auditable, and affordable. Current storage infrastructures are too expensive to ignore cost efficiency in storing human genomes, and they lack the proper knowledge and mechanisms to protect the privacy of sample donors. This thesis proposes an efficient storage system for human genomes that medical and lifesciences institutions may trust and afford. It enhances traditional storage ecosystems with privacy-aware, data-reduction, and auditability techniques to enable the efficient, dependable use of multi-tenant infrastructures to store human genomes. Contributions from this thesis include (1) a study on the privacy-sensitivity of human genomes; (2) to detect genomes’ privacy-sensitive portions systematically; (3) specialised data reduction algorithms for sequencing data; (4) an independent auditability scheme for secure dispersed storage; and (5) a complete storage pipeline that obtains reasonable privacy protection, security, and dependability guarantees at modest costs (e.g., less than 1/Genome/Year) by integrating the proposed mechanisms with appropriate storage configurations
    corecore