1,422 research outputs found

    Computational Methods for the Analysis of Genomic Data and Biological Processes

    Get PDF
    In recent decades, new technologies have made remarkable progress in helping to understand biological systems. Rapid advances in genomic profiling techniques such as microarrays or high-performance sequencing have brought new opportunities and challenges in the fields of computational biology and bioinformatics. Such genetic sequencing techniques allow large amounts of data to be produced, whose analysis and cross-integration could provide a complete view of organisms. As a result, it is necessary to develop new techniques and algorithms that carry out an analysis of these data with reliability and efficiency. This Special Issue collected the latest advances in the field of computational methods for the analysis of gene expression data, and, in particular, the modeling of biological processes. Here we present eleven works selected to be published in this Special Issue due to their interest, quality, and originality

    Computational analysis of enhancer deregulation in Multiple Myeloma

    Get PDF
    Gene regulation is a complex process, which dictates how the body reacts to different situations through gene expression. Enhancers are sequences of a few hundred base pairs, involved in the regulation of transcription. This work focuses on enhancer activity changes during the cancer multiple myeloma, an incurable malignancy of the plasma cells: B-cells, which are long-lived, produce immunoglobin and provide protection against antigens that activated them. In this thesis, data from different assays is combined for multiple myeloma and plasma cells samples to determine cancer-specific and subgroup-specific enhancers and these are correlated with target genes based on activity of both actors. I find hundreds of enhancers linked to expression of nearby genes, with a large fraction of these being specific to MAF translocated tumors. Changes in de-novo open chromatin distant to the promoter of a gene are more predictive of gene expression than opening of the promoter. Also, combination of chromatin accessibility data and gene expression data is better at distinguishing cancer subtypes than either alone. Many of the regulated genes are known to be important in multiple myeloma, and this study provides a potential mechanism for their deregulation. In addition, I identify novel genes of interest. These enhancers show motif enrichment for transcription factors expressed in plasma cells as opposed to cancer specific factors. In particular, a large, MAF binding, open chromatin region is identified that correlates with the expression of the oncogene CCND2, and distinguishes mutually exclusive sets of samples expressing CCND2 or CCND1, going some way to explaining the known CCND dichotomy. This work lays the foundations of in vivo and de novo Myeloma vs. PC and MM subgroup specific enhancer – promoter interactions essential for the oncogenic state. Given that currently Myeloma is an incurable cancer, this should be of significant relevance for diagnosis, prognosis and treatment

    The role of SOXC transcription factors in B-cell development and lymphoid malignancies

    Get PDF
    Mantle cell lymphoma (MCL) accounts for 5-10% out of all Non-Hodgkin lymphomas (NHLs) and is one of the most aggressive forms of lymphomas with a median survival of less than 5 years. Currently, MCL is considered to be an incurable disease. MCL is characterized by the t(11;14)(q13;q32) CCND1/IGH translocation that results in high expression of cyclin D1. This translocation takes place at the pre-B cell stage and is generally recognized as the hallmark and primary oncogenic event in the evolution of MCL. Recently, the neural transcription factor SRY (sex-determining region Y) box 11 (SOX11) gene was found to be expressed in over 90% of all MCLs. The SOX11 protein is not detected in the vast majority of other lymphomas or mature B-cells and its expression is independent of cyclin D1 status. Moreover, SOX11 has been proposed to have a functional role in the pathogenesis of MCL and may not only serve as a diagnostic biomarker. In this thesis, the functional role of the SOXC genes (SOX4, SOX11 and SOX12) have been studied in several different ways, both in MCL primary samples/cell lines and in non-MCL related cells with focus on the SOX11 gene. The SOXC transcription factors are known to compete for the same target genes. For the first time in MCL, the SOXC genes were quantified by qPCR in a set of MCL patients and MCL cell lines. As previously reported, SOX11 expression was high in MCL, but also SOX12 mRNA levels were found to be higher compared to non-malignant B-cells, whereas the expression levels of SOX4 varied. Further, expression of the SOXC genes correlated in SOX11 positive MCL (determined by immunohistochemistry). How SOX11 gene expression in MCL is regulated was also addressed by studying its promotor region. The promotor region of SOX11 was found to be hypomethylated in MCL patients and cell lines, but also in non-malignant B-cells indicating regulation by other epigenetic mechanisms than promotor methylation. Fast and accurate differentiation between similar entities of lymphoma is important since MCL has a more aggressive clinical course. Although having certain distinctive phenotypical markers, MCL and B-cell chronic lymphocytic leukemia/small lymphocytic lymphoma (B-CLL/SLL) are both CD19+, CD20+ and usually CD5+, which could complicate diagnosis by flow cytometry. We developed a method to accurately implement SOX11 in the diagnostic flow panel that consistently detected SOX11 protein in ex vivo isolated MCL cells, but not in CLL/SLL. When conjugated SOX11-antibodies are available, this method could be implemented in the clinic for CLL/SLL with aberrant immune phenotypes or rare cyclin D1- MCLs. The expression levels of SOX11 were further studied in a relatively large group of MCL patients (n=102) by qPCR to determine a cut-off for SOX11-negative MCL and to investigate how quantitative expression related to positivity/negativity by IHC. A cut-off was defined, which resulted in misclassification of only 2/102 by qPCR and IHC. However, for the IHC SOX11+ cases, the qPCR analysis was not able to find a natural cut-off that would identify cases with low expression. When grouping the samples based on expression (10% lowest expression versus the remaining cases), nodal disease was less frequent (p=0.01) and lymphocytosis more frequent (p=0.005) in the qPCR SOX11low-cases. Leukemic non-nodal MCL often expresses low levels of SOX11. The quartile of patients with the lowest SOX11 expression had significantly shorter overall survival in the group of patients who did not receive autologous stem cell transplantation. Studies were conducted in primary murine B-cells and a murine pro-B cell line to study Sox11 oncogenic potential and role in differentiation in early B-cells. In the studied cell types, Sox11 did not per se act as an oncogene. Instead the rate of proliferation was reduced in the pro-B cell line and these cells changed morphology upon expressing the Sox11 gene. Gene expression analysis revealed upregulation of early cell cycle and cellular adhesion genes upon introduction of the Sox11 gene in the pro-B cells. Despite high similarity to Sox4 (important for B-cell survival and development), no obvious effect on selected B-cell differentiation stage associated genes were detected, which suggest that the effects of Sox11 are context dependent and might differ in murine pro-B cells compared to MCL and during embryogenesis

    Acute Myeloid Leukemia

    Get PDF
    Acute myeloid leukemia (AML) is the most common type of leukemia. The Cancer Genome Atlas Research Network has demonstrated the increasing genomic complexity of acute myeloid leukemia (AML). In addition, the network has facilitated our understanding of the molecular events leading to this deadly form of malignancy for which the prognosis has not improved over past decades. AML is a highly heterogeneous disease, and cytogenetics and molecular analysis of the various chromosome aberrations including deletions, duplications, aneuploidy, balanced reciprocal translocations and fusion of transcription factor genes and tyrosine kinases has led to better understanding and identification of subgroups of AML with different prognoses. Furthermore, molecular classification based on mRNA expression profiling has facilitated identification of novel subclasses and defined high-, poor-risk AML based on specific molecular signatures. However, despite increased understanding of AML genetics, the outcome for AML patients whose number is likely to rise as the population ages, has not changed significantly. Until it does, further investigation of the genomic complexity of the disease and advances in drug development are needed. In this review, leading AML clinicians and research investigators provide an up-to-date understanding of the molecular biology of the disease addressing advances in diagnosis, classification, prognostication and therapeutic strategies that may have significant promise and impact on overall patient survival

    Analysis of large-scale molecular biological data using self-organizing maps

    Get PDF
    Modern high-throughput technologies such as microarrays, next generation sequencing and mass spectrometry provide huge amounts of data per measurement and challenge traditional analyses. New strategies of data processing, visualization and functional analysis are inevitable. This thesis presents an approach which applies a machine learning technique known as self organizing maps (SOMs). SOMs enable the parallel sample- and feature-centered view of molecular phenotypes combined with strong visualization and second-level analysis capabilities. We developed a comprehensive analysis and visualization pipeline based on SOMs. The unsupervised SOM mapping projects the initially high number of features, such as gene expression profiles, to meta-feature clusters of similar and hence potentially co-regulated single features. This reduction of dimension is attained by the re-weighting of primary information and does not entail a loss of primary information in contrast to simple filtering approaches. The meta-data provided by the SOM algorithm is visualized in terms of intuitive mosaic portraits. Sample-specific and common properties shared between samples emerge as a handful of localized spots in the portraits collecting groups of co-regulated and co-expressed meta-features. This characteristic color patterns reflect the data landscape of each sample and promote immediate identification of (meta-)features of interest. It will be demonstrated that SOM portraits transform large and heterogeneous sets of molecular biological data into an atlas of sample-specific texture maps which can be directly compared in terms of similarities and dissimilarities. Spot-clusters of correlated meta-features can be extracted from the SOM portraits in a subsequent step of aggregation. This spot-clustering effectively enables reduction of the dimensionality of the data in two subsequent steps towards a handful of signature modules in an unsupervised fashion. Furthermore we demonstrate that analysis techniques provide enhanced resolution if applied to the meta-features. The improved discrimination power of meta-features in downstream analyses such as hierarchical clustering, independent component analysis or pairwise correlation analysis is ascribed to essentially two facts: Firstly, the set of meta-features better represents the diversity of patterns and modes inherent in the data and secondly, it also possesses the better signal-to-noise characteristics as a comparable collection of single features. Additionally to the pattern-driven feature selection in the SOM portraits, we apply statistical measures to detect significantly differential features between sample classes. Implementation of scoring measurements supplements the basal SOM algorithm. Further, two variants of functional enrichment analyses are introduced which link sample specific patterns of the meta-feature landscape with biological knowledge and support functional interpretation of the data based on the ‘guilt by association’ principle. Finally, case studies selected from different ‘OMIC’ realms are presented in this thesis. In particular, molecular phenotype data derived from expression microarrays (mRNA, miRNA), sequencing (DNA methylation, histone modification patterns) or mass spectrometry (proteome), and also genotype data (SNP-microarrays) is analyzed. It is shown that the SOM analysis pipeline implies strong application capabilities and covers a broad range of potential purposes ranging from time series and treatment-vs.-control experiments to discrimination of samples according to genotypic, phenotypic or taxonomic classifications

    In praise of arrays

    Get PDF
    Microarray technologies have both fascinated and frustrated the transplant community since their introduction roughly a decade ago. Fascination arose from the possibility offered by the technology to gain a profound insight into the cellular response to immunogenic injury and the potential that this genomic signature would be indicative of the biological mechanism by which that stress was induced. Frustrations have arisen primarily from technical factors such as data variance, the requirement for the application of advanced statistical and mathematical analyses, and difficulties associated with actually recognizing signature gene-expression patterns and discerning mechanisms. To aid the understanding of this powerful tool, its versatility, and how it is dramatically changing the molecular approach to biomedical and clinical research, this teaching review describes the technology and its applications, as well as the limitations and evolution of microarrays, in the field of organ transplantation. Finally, it calls upon the attention of the transplant community to integrate into multidisciplinary teams, to take advantage of this technology and its expanding applications in unraveling the complex injury circuits that currently limit transplant survival

    Defining The Effect Of Environmental Perturbation On The Male Germline

    Get PDF
    Periconceptional environment, according to the Developmental Origins of Health and Disease (DOHaD) theory, influences offspring phenotype, primarily via epigenetic mechanisms. Although the paternal component in humans is poorly understood, both maternal and paternal peri-conceptional environment are now believed to contribute to this phenomenon. Manipulation of the early embryo for treating human infertility, is suspected of contributing to offspring abnormalities through epigenetic mechanisms. To directly address the effects of common assisted reproductive technology procedures on the offspring epigenome, the DNA methylation profiles of newborns conceived naturally, or through the use of intrauterine insemination (IUI), or in vitro fertilization (IVF) using Fresh or Cryopreserved (Frozen) embryo transfer, were compared. In addition to a reduction of epigenetic aberrations in the IVF conceptions using cryopreservation, metastable epialleles also exhibited altered methylation with fertility status. ART, embryo nutrition, and fertility status are thus suggested to have a lasting epigenetic effect of on the developing embryo. While the paternal contribution to the human embryo is uncertain, sperm deliver a collection of proteins and RNA to the zygote. To identify the entire cadre of intergenic spermatozoal RNAs, RNA Element (RE) discovery algorithm (REDa) was developed and applied to a spectrum of germline, embryonic, and somatic tissues. This highlighted extensive transcription throughout the human genome and yielded previously unidentified human RNAs. Human spermatogenesis was found to exhibit extensive intergenic transcription and pervasive repetitive sequence expression. By analyzing the collection of novel and annotated spermatozoal RNAs in sperm samples from the Mesalamine and Reproductive Health Study (MARS), the effect of endocrine disruptor exposure on human sperm RNA profiles was determined. Sperm RNA profiles among men and their relationship to di-butyl phthalate (DBP) was longitudinally assessed across binary (high or background) DBP crossover exposures. Numerous changes in the composition of sperm RNA elements were detected during the acute and recovery phases, which suggest that exposure to, or removal from high DBP, produces effects that require longer than one spermatogenic cycle to resolve, if at all. Overall, chronic phthalate exposure influences the male germline, and acts on the dynamic RNA expression during human spermiogenesis

    Biomarkers of mismatch repair deficiency in colorectal cancer and cancer predisposition syndromes

    Get PDF
    PhD ThesisColorectal cancer (CRC) is the third most common cancer in Western societies and approximately 15% are mismatch repair deficient (MMRd). MMRd CRCs have a distinct prognosis, respond to immunotherapy, and occur at a high rate in patients with Lynch syndrome or constitutional mismatch repair deficiency (CMMRD). Detection of MMR deficiency, therefore, guides treatment and identification of associated cancerpredisposition syndromes. However, there is a need for novel biomarkers to detect MMRd CRC, and innovative assays to improve Lynch syndrome and CMMRD diagnosis. I assessed autoantibodies generated against MMRd CRCs as a liquid-biopsy biomarker for cancer detection, by analysing the sera of 464 Lynch syndrome gene carriers using a recently published, multiplex method. Although autoantibodies correlated with a history of CRC, a lack of signal from patients who developed CRC shortly after sampling suggests the method has poor sensitivity. Microsatellite instability (MSI) is an established biomarker of MMR deficiency. I used single molecule molecular inversion probes to develop a sequencing-based MSI assay with an automated results analysis, suitable as a companion diagnostic for immunotherapy, and for streamlined Lynch syndrome screening. The assay achieved 100% accuracy in 197 CRCs, and was robust to sample variables, including quantity, quality, and tumour cell content. Subsequently, I adapted the MSI assay to detect low-level MSI in non-neoplastic tissues of CMMRD patients. The assay separated all 32 CMMRD patients from 94 controls. For both CRC and CMMRD diagnostics, the MSI assay is cheaper and faster than current methods, and is scalable to large cohorts. These results suggest that the humoral immune response to MMRd CRCs cannot readily be used as a biomarker to detect disease, and that alternatives should be sought. However, the MSI assay could be deployed into clinical practice to meet the high demand for MMR deficiency testing of CRCs and to improve CMMRD diagnostics.the Barbour Foundatio

    Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges

    Get PDF
    International audienceBackground: In high-dimensional data (HDD) settings, the number of variables associated with each observation is very large. Prominent examples of HDD in biomedical research include omics data with a large number of variables such as many measurements across the genome, proteome, or metabolome, as well as electronic health records data that have large numbers of variables recorded for each patient. The statistical analysis of such data requires knowledge and experience, sometimes of complex methods adapted to the respective research questions. Methods: Advances in statistical methodology and machine learning methods offer new opportunities for innovative analyses of HDD, but at the same time require a deeper understanding of some fundamental statistical concepts. Topic group TG9 “High-dimensional data” of the STRATOS (STRengthening Analytical Thinking for Observational Studies) initiative provides guidance for the analysis of observational studies, addressing particular statistical challenges and opportunities for the analysis of studies involving HDD. In this overview, we discuss key aspects of HDD analysis to provide a gentle introduction for non-statisticians and for classically trained statisticians with little experience specific to HDD. Results: The paper is organized with respect to subtopics that are most relevant for the analysis of HDD, in particular initial data analysis, exploratory data analysis, multiple testing, and prediction. For each subtopic, main analytical goals in HDD settings are outlined. For each of these goals, basic explanations for some commonly used analysis methods are provided. Situations are identified where traditional statistical methods cannot, or should not, be used in the HDD setting, or where adequate analytic tools are still lacking. Many key references are provided. Conclusions: This review aims to provide a solid statistical foundation for researchers, including statisticians and non-statisticians, who are new to research with HDD or simply want to better evaluate and understand the results of HDD analyses
    corecore