460 research outputs found

    Towards Data-Driven Large Scale Scientific Visualization and Exploration

    Get PDF
    Technological advances have enabled us to acquire extremely large datasets but it remains a challenge to store, process, and extract information from them. This dissertation builds upon recent advances in machine learning, visualization, and user interactions to facilitate exploration of large-scale scientific datasets. First, we use data-driven approaches to computationally identify regions of interest in the datasets. Second, we use visual presentation for effective user comprehension. Third, we provide interactions for human users to integrate domain knowledge and semantic information into this exploration process. Our research shows how to extract, visualize, and explore informative regions on very large 2D landscape images, 3D volumetric datasets, high-dimensional volumetric mouse brain datasets with thousands of spatially-mapped gene expression profiles, and geospatial trajectories that evolve over time. The contribution of this dissertation include: (1) We introduce a sliding-window saliency model that discovers regions of user interest in very large images; (2) We develop visual segmentation of intensity-gradient histograms to identify meaningful components from volumetric datasets; (3) We extract boundary surfaces from a wealth of volumetric gene expression mouse brain profiles to personalize the reference brain atlas; (4) We show how to efficiently cluster geospatial trajectories by mapping each sequence of locations to a high-dimensional point with the kernel distance framework. We aim to discover patterns, relationships, and anomalies that would lead to new scientific, engineering, and medical advances. This work represents one of the first steps toward better visual understanding of large-scale scientific data by combining machine learning and human intelligence

    Single-Cell Transcriptional and Epigenetic Profiles of Male Breast Cancer Nominate Salient Cancer-Specific Enhancers

    Get PDF
    Male breast cancer represents about 1% of all breast cancer diagnoses and, although there are some similarities between male and female breast cancer, the paucity of data available on male breast cancer makes it difficult to establish targeted therapies. To date, most male breast cancers (MBCs) are treated according to protocols established for female breast cancer (FBC). Thus, defining the transcriptional and epigenetic landscape of MBC with improved resolution is critical for developing better avenues for therapeutic intervention. In this study, we present matched transcriptional (scRNA-seq) and epigenetic (scATAC-seq) profiles at single-cell resolution of two treatment naïve MBC tumors processed immediately after surgical resection. These data enable the detection of differentially expressed genes between male and female breast tumors across immune, stromal, and malignant cell types, to highlight several genes that may have therapeutic implications. Notably, MYC target genes and mTORC1 signaling genes were significantly upregulated in the malignant cells of MBC compared to the female counterparts. To understand how the regulatory landscape of MBC gives rise to these male-specific gene expression patterns, we leveraged the scATAC-seq data to systematically link changes in chromatin accessibility to changes in gene expression within each cell type. We observed cancer-specific rewiring of several salient enhancers and posit that these enhancers have a higher regulatory load than lineage-specific enhancers. We highlight two examples of previously unannotated cancer-cell-specific enhancers of ANXA2 and PRDX4 gene expression and show evidence for super-enhancer regulation of LAMB3 and CD47 in male breast cancer cells. Overall, this dataset annotates clinically relevant regulatory networks in male breast tumors, providing a useful resource that expands our current understanding of the gene expression programs that underlie the biology of MBC

    Computational Methods for the Analysis of Genomic Data and Biological Processes

    Get PDF
    In recent decades, new technologies have made remarkable progress in helping to understand biological systems. Rapid advances in genomic profiling techniques such as microarrays or high-performance sequencing have brought new opportunities and challenges in the fields of computational biology and bioinformatics. Such genetic sequencing techniques allow large amounts of data to be produced, whose analysis and cross-integration could provide a complete view of organisms. As a result, it is necessary to develop new techniques and algorithms that carry out an analysis of these data with reliability and efficiency. This Special Issue collected the latest advances in the field of computational methods for the analysis of gene expression data, and, in particular, the modeling of biological processes. Here we present eleven works selected to be published in this Special Issue due to their interest, quality, and originality

    Network Approaches to the Study of Genomic Variation in Cancer

    Get PDF
    Advances in genomic sequencing technologies opened the door for a wider study of cancer etiology. By analyzing datasets with thousands of exomes (or genomes), researchers gained a better understanding of the genomic alterations that confer a selective advantage towards cancerous growth. A predominant narrative in the field has been based on a dichotomy of alterations that confer a strong selective advantage, called cancer drivers, and the bulk of other alterations assumed to have a neutral effect, called passengers. Yet, a series of studies questioned this narrative and assigned potential roles to passengers, be it in terms of facilitating tumorigenesis or countering the effect of drivers. Consequently, the passenger mutational landscape received a higher level of attention in attempt to prioritize the possible effects of its alterations and to identify new therapeutic targets. In this dissertation, we introduce interpretable network approaches to the study of genomic variation in cancer. We rely on two types of networks, namely functional biological networks and artificial neural nets. In the first chapter, we describe a propagation method that prioritizes 230 infrequently mutated genes with respect to their potential contribution to cancer development. In the second chapter, we further transcend the driver-passenger dichotomy and demonstrate a gradient of cancer relevance across human genes. In the last two chapters, we present methods that simplify neural network models to render them more interpretable with a focus on functional genomic applications in cancer and beyond

    Evaluation

    Get PDF

    Evaluating the utility of gene expression data from patient-matched samples for studying breast cancer

    Get PDF
    Breast cancer is a heterogeneous disease with distinct subtypes and many different clinical presentations. Neoadjuvant therapy of breast cancer offers a window of opportunity to study translational changes in tumours as a result of treatment alone and may help to identify tumour response status. Pairs of samples collected from different sites or sequentially from the same individual can potentially provide additional prognostic information for the risk stratification of breast cancer. Here, we seek to aggregate multiple studies of valuable, multi-sampled, patient-matched cohorts for meta-analysis to check for an enhanced ability to make new and significant findings about the underlying mechanisms of tumour treatment response. Multiple sequentially-matched datasets of pre- and on-treatment matched primary tumour and lymph node samples were collected and examined for differentially expressed genes and pathways indicative of pathological response. Machine learning methods were applied to identify biomarkers of response from the on-treatment samples, and profiling comparisons were made to assess the additional value of matched patient samples to accurately predict risk. Lastly, five sequentially sampled datasets were aggregated for meta-analysis by combining the normalised pre- to on-treatment expression level differences to identify commonalities in the response to therapy across both endocrine and chemotherapy treatment strategies. The gene, AAGAB, was identified through iterative differential analysis, and was found to be 78% accurate in validation for the prediction of pathological complete response in neoadjuvant chemotherapy treated breast cancer. AAGAB demonstrated significant separation of patient survival curves (log rank p = 0.0036), and the on-treatment samples more accurately reflected the patient risk than the pretreatment samples. Matched lymph node tissue of primary breast cancer was more successful at capturing the patient’s risk of recurrence than the primary biopsy, correctly identifying 83% (10/12) of the recurring patients compared to 25% (3/12) in the primary. Underlying differential expression analysis also showed a considerable number of high profile breast cancer genes over-represented in the lymph node. Aggregation of multiple sequential studies resulted in low post integration concordance values with the reference patient data (<30% profiling agreement), and is not recommended for this type of analysis. However, combining the pairwise change values for gene expression level data was successful, and resulted in the creation of highly accurate models for predicting patient response (F1 accuracy score, 0.92) as well as the identification of potential common escape pathways to breast cancer therapies. Analysis of the matched pre- and on-treatment samples revealed the intrinsic value of multiple on-treatment biopsies. These samples offer valuable new targets for biomarker identification that show significant increases in accuracy for the prediction of response and long term outcome in neoadjuvant chemotherapy. Additional sampling of involved metastatic lymph node also improves the prognostic capabilities for clinicians by providing a potentially more accurate view of the per-patient risk profile. Lastly, the pairwise expression change values show the direction of tumour change, which can be used to create new models for the prediction and classification of patient risk and for furthering our understanding of the mechanisms behind patient non-response

    Characterizing the Huntington's disease, Parkinson's disease, and pan-neurodegenerative gene expression signature with RNA sequencing

    Get PDF
    Huntington's disease (HD) and Parkinson's disease (PD) are devastating neurodegenerative disorders that are characterized pathologically by degeneration of neurons in the brain and clinically by loss of motor function and cognitive decline in mid to late life. The cause of neuronal degeneration in these diseases is unclear, but both are histologically marked by aggregation of specific proteins in specific brain regions. In HD, fragments of a mutant Huntingtin protein aggregate and cause medium spiny interneurons of the striatum to degenerate. In contrast, PD brains exhibit aggregation of toxic fragments of the alpha synuclein protein throughout the central nervous system and trigger degeneration of dopaminergic neurons in the substantia nigra. Considering the commonalities and differences between these diseases, identifying common biological patterns across HD and PD as well as signatures unique to each may provide significant insight into the molecular mechanisms underlying neurodegeneration as a general process. State-of-the-art high-throughput sequencing technology allows for unbiased, whole genome quantification of RNA molecules within a biological sample that can be used to assess the level of activity, or expression, of thousands of genes simultaneously. In this thesis, I present three studies characterizing the RNA expression profiles of post-mortem HD and PD subjects using high-throughput mRNA sequencing data sets. The first study describes an analysis of differential expression between HD individuals and neurologically normal controls that indicates a widespread increase in immune, neuroinflammatory, and developmental gene expression. The second study expands upon the first study by making methodological improvements and extends the differential expression analysis to include PD subjects, with the goal of comparing and contrasting HD and PD gene expression profiles. This study was designed to identify common mechanisms underlying the neurodegenerative phenotype, transcending those of each unique disease, and has revealed specific biological processes, in particular those related to NFkB inflammation, common to HD and PD. The last study describes a novel methodology for combining mRNA and miRNA expression that seeks to identify associations between mRNA-miRNA modules and continuous clinical variables of interest, including CAG repeat length and clinical age of onset in HD
    • …
    corecore