1,753 research outputs found

    A method to identify differential expression profiles of time-course gene data with Fourier transformation

    Get PDF
    BACKGROUND: Time course gene expression experiments are an increasingly popular method for exploring biological processes. Temporal gene expression profiles provide an important characterization of gene function, as biological systems are both developmental and dynamic. With such data it is possible to study gene expression changes over time and thereby to detect differential genes. Much of the early work on analyzing time series expression data relied on methods developed originally for static data and thus there is a need for improved methodology. Since time series expression is a temporal process, its unique features such as autocorrelation between successive points should be incorporated into the analysis. RESULTS: This work aims to identify genes that show different gene expression profiles across time. We propose a statistical procedure to discover gene groups with similar profiles using a nonparametric representation that accounts for the autocorrelation in the data. In particular, we first represent each profile in terms of a Fourier basis, and then we screen out genes that are not differentially expressed based on the Fourier coefficients. Finally, we cluster the remaining gene profiles using a model-based approach in the Fourier domain. We evaluate the screening results in terms of sensitivity, specificity, FDR and FNR, compare with the Gaussian process regression screening in a simulation study and illustrate the results by application to yeast cell-cycle microarray expression data with alpha-factor synchronization. The key elements of the proposed methodology: (i) representation of gene profiles in the Fourier domain; (ii) automatic screening of genes based on the Fourier coefficients and taking into account autocorrelation in the data, while controlling the false discovery rate (FDR); (iii) model-based clustering of the remaining gene profiles. CONCLUSIONS: Using this method, we identified a set of cell-cycle-regulated time-course yeast genes. The proposed method is general and can be potentially used to identify genes which have the same patterns or biological processes, and help facing the present and forthcoming challenges of data analysis in functional genomics

    Classification of microarray data using gene networks

    Get PDF
    BACKGROUND: Microarrays have become extremely useful for analysing genetic phenomena, but establishing a relation between microarray analysis results (typically a list of genes) and their biological significance is often difficult. Currently, the standard approach is to map a posteriori the results onto gene networks in order to elucidate the functions perturbed at the level of pathways. However, integrating a priori knowledge of the gene networks could help in the statistical analysis of gene expression data and in their biological interpretation. RESULTS: We propose a method to integrate a priori the knowledge of a gene network in the analysis of gene expression data. The approach is based on the spectral decomposition of gene expression profiles with respect to the eigenfunctions of the graph, resulting in an attenuation of the high-frequency components of the expression profiles with respect to the topology of the graph. We show how to derive unsupervised and supervised classification algorithms of expression profiles, resulting in classifiers with biological relevance. We illustrate the method with the analysis of a set of expression profiles from irradiated and non-irradiated yeast strains. CONCLUSION: Including a priori knowledge of a gene network for the analysis of gene expression data leads to good classification performance and improved interpretability of the results

    Peridocity, Change Detection and Prediction in Microarrays

    Get PDF
    Three topics in the analysis of microarray genomic data are discussed and improved statistical methods are developed in each case. A statistical test with higher power is developed for detecting periodicity in microarray time series data. Periodicity in short series, with non-Fourier frequencies, is detected through a Pearson curve calibrated to the null distribution obtained by computer simulation. Unlike other traditional methods, this approach is applicable even in the presence of missing values or unequal time intervals. The usefulness of the new method is demonstrated on simulated series as well as actual microarray time series. The second topic develops a new method for detection of changes in DNA or gene copy number. Regions for DNA copy number aberrations in chromosomal material are detected using maximum overlapping discrete wavelet transform (MODWT). It is shown how repeated application of MODWT to a series can be used to confirm the presence of change points. Application to simulated as well as array CGH (Comparative Genomic Hybridization) data confirms the excellent performance of this method. In the third topic, it is shown that an improved class predictor for tissue samples in microarray experiments is developed by incorporating nearest neighbour covariates (NNC). It is demonstrated that this method reduces the mis-classification errors in both simulated and actual microarray data

    Metabolomic effects of single gene deletions in Saccharomyces cerevisiae

    Get PDF
    Tese de mestrado em Bioquímica, Universidade de Lisboa, Faculdade de Ciências, 2021Saccharomyces cerevisae é um organismo modelo com cerca de 6000 genes. A maior parte destes genes podem ser eliminados sem comprometer a viabilidade da levedura, sendo uma vasta fração destas mutações silenciosas, não produzindo um fenótipo aparente observável. Mudanças fenotípicas associadas a muitas mutações podem apenas ser observadas com o crescimento em certos meios de cultura ou sob determinadas condições de stress. No entanto, variações significativas ocorrem no metabolismo intracelular das células mutadas, particularmente se estas mutações estiverem associadas a vias metabólicas chave. Os estudos destas variações normalmente envolvem a caracterização de atividades enzimáticas ou a quantificação de um pequeno número de metabolitos para obter uma pequena fração do metabolismo específico de uma estirpe mutante. O desenvolvimento de plataformas analíticas que permitem a análise de perfis de metabolitos em grande escala, particularmente baseadas em espectrometria de massa, tem contribuído para uma caracterização mais completa do metaboloma de um organismo. A utilização de instrumentos de extrema resolução, tais como o espectrómetro de massa de ressonância ciclotrónica de ião com transformada de Fourier (FT-ICR, Fourier-transform ion cyclotron resonance), permite a deteção de milhares a dezenas de milhares de compostos e os mais recentes permitem até resolver a estrutura isotópica fina molecular, sendo particularmente relevantes na análise discriminatória de compostos com base em perfis químicos complexos. Neste trabalho foi seguida uma abordagem de metabolómica global (também denominada untargeted) baseada em FT-ICR-MS para o estudo do impacto de deleções de um só gene em leveduras da espécie Saccharomyces cerevisiae. Para este efeito, cinco estirpes isogénicas desta espécie foram analisadas. Além da estirpe de referência, foram analisadas três estirpes mutantes com deleção de um gene relacionadas com o catabolismo do metilglioxal, um composto dicarbonilo, muito reativo e citotóxico implicado em diversas condições patológicas. Duas destas estirpes mutantes apresentavam deleções nos genes GLO1 e GLO2, que codificam para os dois enzimas do sistema dos glioxalases, respetivamente o glioxalase I e o glioxalase II. Este sistema catalisa a degradação do metilglioxal de uma forma dependente do glutationo. A terceira estirpe relacionada com o catabolismo do metilglioxal, apresentava uma deleção no gene GRE3, que codifica para o enzima aldose redutase da levedura, o principal atuante num processo alternativo de eliminação do metilglioxal que não depende do glutationo. Finalmente, uma outra estirpe, deficiente no gene ENO1, que codifica para o enzima enolase 1, relacionado com a glicólise, foi também analisada como controlo. As estirpes foram crescidas em iguais condições, tendo sido analisado o seu crescimento a 600nm e posteriormente o seu metaboloma por FT-ICR-MS. Não foi observada alteração de fenótipo de crescimento, tendo as cinco estirpes apresentado curvas de crescimento extremamente semelhantes, atingindo todas a fase estacionária de crescimento ao fim de 10 a 12 horas. A extração dos metabolitos de todas as estirpes, em fase estacionária de crescimento, foi efetuada utilizando uma mistura de metanol/água (1:1) e os diferentes extratos foram de seguida analisados por FT-ICR-MS em modo positivo de ionização por electrospray. As listas de picos dos espectros obtidos foram depois alinhadas e utilizadas para a identificação dos metabolitos, utilizando bases de dados metabolómicas humana e de levedura, e para a obtenção das fórmulas de composição elementar, previstas com base numa série de regras heurísticas. Realizaram-se depois contagens do número de metabolitos em cada amostra e em cada estirpe, e contruiu-se um diagrama de Venn com a distribuição dos números de metabolitos comuns e exclusivos para cada estirpe. Analisaram-se ainda as naturezas químicas das moléculas em cada estirpe, para as quais tinha sido possível prever uma fórmula química, construindo-se diagramas de Van Krevelen e um gráfico de séries de composição química. Três métodos de análise estatística multivariada foram aplicados aos dados de metabolómica. Estes foram a análise de componentes principais (PCA – principal component analysis), a análise de agrupamento hierárquico aglomerativo (HCA- hieararchical clustering analysis) e a análise discriminatória por regressão de mínimos quadrados parciais (PLS-DA – partial least squares discriminant analysis). Os primeiros dois são métodos não-supervisionados, o que significa que não é considerada a existência de grupos previamente definidos pelos quais as amostras se distribuem (neste caso as estirpes). Isto permite-lhes fazer uma separação das amostras com base numa medida global de semelhança entre elas, assegurando que os resultados refletem o perfil químico das amostras. Já o terceiro, PLS-DA, é um método supervisionado que pretende maximizar a covariância entre grupos previamente definidos. Isto leva a uma separação que pode não refletir necessariamente as maiores diferenças entre as amostras, visto que é dada uma maior importância a algumas variáveis (metabolitos) de modo a permitir uma melhor separação entre os grupos pré-definidos (estirpe), independentemente de esses grupos corresponderem ou não à melhor forma de separar as amostras. No entanto, o PLS-DA é útil pois permite a identificação das variáveis que mais contribuem para a separação. Neste trabalho, os dois métodos não supervisionados (PCA e HCA) demonstraram que era possível distinguir as estirpes com base nos seus perfis metabólicos, visto que amostras pertencentes à mesma estirpe apresentaram consistentemente um maior grau de semelhança metabólica entre elas do que com amostras pertencentes a estirpes diferentes. Além disso, revelaram-se também a existência de semelhanças entre as duas estirpes mutantes relacionadas com o sistema dos glioxalases (GLO1 e GLO2). A aplicação do método PLS-DA, supervisionado, permitiu maximizar a separação entre as estirpes, o que se revelou extremamente semelhante às separações realizadas pelos dois métodos não supervisionados. Esta concordância indica que a principal causa para as diferenças metabólicas entre as diferentes amostras se relaciona com a diferença de um só gene entre as estirpes, uma vez que a maximização da variância entre todas as amostras produz resultados semelhantes à maximização da covariância entre as estirpes. Através da análise das pontuações de importância da variável na projeção (VIP scores – variable importances on projection scores) calculadas para a separação por PLS-DA, identificaram-se os metabolitos que mais contribuíram para a separação, tendo o glutationo (GSH) emergido como o composto de maior importância, seguido de vários outros que apresentam, na generalidade, uma distribuição de abundâncias relativas semelhante. A asserção da importância do glutationo está em concordância com os níveis de semelhança metabólica verificados pelos métodos de análise estatística não-supervisionados. O glutationo apresenta uma menor abundância relativa nas estirpes com deleções em genes que codificam para os enzimas do sistema dos glioxalases, visto estes enzimas serem essenciais para a regeneração dos níveis desta molécula. Assim sendo, e tendo em consideração a identificação do glutationo como o composto mais importante para a separação por PLS-DA, a qual é extremamente semelhante às separações pelos métodos não supervisionados, é possível teorizar que as semelhanças verificadas entre as estirpes relacionadas com o sistema dos glioxilases são em larga parte devidas ao impacto da diminuição dos níveis de glutationo nas células. Com esta abordagem de metabolómica untargeted baseada em FT-ICR-MS, foi possível distinguir entre cinco estirpes de levedura que diferiam umas das outras em apenas um gene e que não apresentavam quaisquer diferenças fenotípicas observáveis quando crescidas em condições normais.Saccharomyces cerevisiae is a model eukaryote with around 6000 genes. Most of these genes can be deleted without compromising yeast viability, with a vast fraction of these mutations being silent and not producing an apparent observable phenotype. Phenotypic changes associated with many mutations may only be observed in specific growth media or under certain stress conditions. Nevertheless, significant variations occur in the intracellular metabolism of mutated cells, particularly if these mutations are associated with key metabolic pathways. The studies that reveal these variations usually involve the characterization of an enzyme activity or the quantification of a small number of metabolites to obtain “metabolic snapshots” for a specific yeast mutated strain. The development of analytical platforms allowing for the analysis of large metabolite profiles, particularly based on mass spectrometry, has contributed to a more thorough characterization of the organism’s metabolome. The use of extreme resolution instruments, like the Fourier-transform ion cyclotron resonance (FT-ICR) mass spectrometer, allows the detection of thousands to tens of thousands of compounds and the most recent ones are even able to resolve the isotopic fine molecular structure, being particularly relevant in sample discriminatory analysis based on complex chemical profiles. An untargeted metabolomics approach based on FT-ICR-MS was applied to study of the impact of single-gene deletions in the yeast Saccharomyces cerevisiae. For this purpose, five isogenic strains belonging to this species were analysed. Besides the wild-type strain, we chose three null mutants involved in the methylglyoxal catabolism, a well characterized biochemical system in yeast. These mutants lack the genes coding for the main enzymes related with methylglyoxal catabolism, glyoxalase I, glyoxalase II and aldose reductase. Another strain lacking enolase 1 gene, related to glycolysis, was also analysed as control. All strains were grown under the same conditions, without any alteration in growth phenotype being reported. Afterwards, metabolite extraction was performed and the extracts were analysed through FT-ICR-MS. The identified metabolites were putatively annotated with names (using human and yeast metabolomic databases as reference) and with chemical formulas (predicted based on a set of heuristic rules). Three multivariate statistical analysis methods were applied to the MS results. These were principal component analysis (PCA), agglomerative hierarchical clustering analysis (HCA) and partial least squares discriminant analysis (PLS-DA). The two unsupervised methods (PCA and HCA) showed that it was possible to distinguish between the strains based on their metabolic profiles, despite the common genetic background. A higher degree of similarity between samples of the same strain was observed, as expected. Similarities between mutant strains related to the glutathione-dependent pathway of methylglyoxal catabolism were also observed. The PLS-DA method, supervised, performed a separation between the samples that proved very similar to the ones performed by the two unsupervised methods. Through this method, the metabolites that contributed the most to the separation were identified, with glutathione (GSH) emerging as the compound with the greatest importance. Through this approach, it was possible to accurately distinguish between five yeasts strains which differed from other solely in one gene and which did not present any observable phenotypic differences when grown under normal conditions

    Applications of advanced spectroscopic imaging to biological tissues

    Get PDF
    The objectives of this research were to develop experimental approaches that can be applied to classify different stages of malignancy in routine formalin-fixed and paraffin-embedded tissues and to optimise the imaging approaches using novel implementations. It is hoped that the approach developed in this research may be applied for early cancer diagnostics in clinical settings in the future in order to increase cancer survival rates. Infrared spectroscopic imaging has recently shown to have great potential as a powerful method for the spatial visualization of biological tissues. This spectroscopic technique does not require sample labelling because its chemical specificity allows the differentiation of biocomponents to be achieved based on their chemical structures. Experiments were performed on 3-µm thick prostate and colon tissues that were deposited on 2 mm-calcium fluoride (CaF2) which were subsequently deparaffinised. The samples were measured under IR microscopes, in both transmission and attenuated total reflection (ATR) mode. In transmission, thermo-spectroscopic imaging of the prostate samples was first carried out to investigate the potential of thermography to complement the information obtained from IR spectral. Spectroscopic imaging has made the acquisition of chemical map of a sample possible within a short time span since this approach facilitates the simultaneous acquisition of thousands of spatially resolved infrared spectra. Spectral differences in the lipid region (3000 -2800 cm-1) were identified between cancer and benign regions within prostate tissues. The governing spectral band for classification was anti-symmetric stretching of CH2 (2921 cm-1) from PCA analysis. Nonetheless, the difference in tissue emissivity at room temperature was minimal, thus the contrast in the thermal image is low for intra-tissue classification. Besides, the thermal camera could only capture IR light between 3333-2000 cm-1. To record spectral data between 3900 - 900 cm-1 (mid-IR), Fourier transform infrared (FTIR) spectroscopic imaging was used to classify the different stages of colon disease. An automated processing framework was developed, that could achieve an overall classification accuracy of 92.7%. The processing steps included unsupervised k-means clustering of lipid bands, followed by Random Forest (RF) classification using the ‘fingerprint’ region of the data. The implementation of a correcting lens and the effect of the RMieS-EMSC correction on the tissue spectra were also investigated, which showed that computational RMieS-EMSC correction was more effective at removing spectral artefacts than the correcting lens. Furthermore, the effect of the fluctuations of surrounding humidity where the experiments were carried out was studied by using various supersaturated salt solutions. Significant peak changes of the phosphate band were observed, most notably the peak shift of the anti-symmetric stretching of phosphate bands from 1230 cm-1 to 1238 cm-1 was observed. By regulating and controlling humidity at its lowest, the classification accuracy of the colon specimens was improved without having to resort to alteration on the RF machine learning algorithm. In the ATR mode, additional apertures were introduced to the FTIR microscope, as a novel means of depth profiling the prostate tissue samples by changing the angle of incidence of IR light beam. Despite the successful attempts in capturing the qualitative information on the change of tissue morphology with the depth of penetration (dp), the spectral data were not suitable for further processing with machine learning as dp changes with wavelengths. Apart from the apertures, a ‘large-area’ germanium (Ge) crystal was introduced to enable simultaneous mapping and imaging of the colon tissue samples. Many advantages of this new implementation were observed, which included improvement in signal-to-noise ratio, uniform distribution, and no impression left on the sample. The research done in this thesis set a groundwork for clinical diagnosis and the novel implementations were transferable to studies of other samples.Open Acces

    A metabolomic investigation of key cellular processes relating to cancer development and progression.

    Get PDF
    Recent advancements in mass spectrometry have facilitated new analytical approaches capable of comprehensively characterizing metabolites in biological samples. Fourier transform ion cyclotron resonance mass spectrometry (FTICR-MS) combines excellent mass accuracy (pp

    Development of Protocols for Metabolomics in Biomedical Research using Chemometrics

    Get PDF
    Metabolomics is a rapidly growing research field. It aims for quantification of all the metabolites in a biological sample such as plasma, saliva, cerebrospinal fluid or cells. Because the metabolite levels in a biological sample are the end result of the regulatory processes in cells, metabolomics is a very powerful approach for characterisation of phenotypes. Metabolomics has been used to find disease biomarkers, investigate influences of heavy metals on the metabolism and to elucidate gene function. However, analysis of the complete metabolome puts high demands on the methods used. For instance, the methods should be unbiased to accurately depict the in vivo status in the cell. Furthermore, the methods must have very high resolution and sensitivity to allow detection of all metabolites. To approach these high goals, the protocols used in metabolomics need to be thoroughly optimised. The amount of information contained in the metabolome is immense. Consequently, the data set collected from a metabolomics study is very large. To extract the relevant information from such large sets of data, efficient methods are needed both to plan experiments and to convert the data to useful information. For this task, chemometrics is an ideal approach as it allows efficient experimental planning and multivariate data analysis. The experimental planning is sometimes referred to as statistical experimental design or design of experiments. It aims to systematically and simultaneously vary experimental factors in a structured manner. Hence, fewer experiments are generally needed to efficiently map how the system is affected by prevailing factors. The multivariate data analysis employs powerful projection and regression methods to find patterns in data, create system models and classify data. Hence, chemometrics provides a framework for efficient experimental design and an efficient approach for information retrieval. In this thesis two thorough developments of metabolomics protocols and three metabolomics investigations, relevant to metabolic regulation in diabetes patients and insulin-producing cells, are presented. The design of experiments approach and multivariate data analysis were applied. The developed protocols were optimised and validated for the analysis of human blood plasma and adherent cell cultures, respectively, and included optimisation from the sample preparation to the analysis with gas chromatography/mass spectrometry. The first of the metabolomics studies aimed to find biomarkers reflecting metabolic regulation during an oral glucose tolerance test in humans to aid in the diagnosis of diabetes. The second study was performed on clonal β-cells and aimed to find metabolic regulation coupled to the amplifying pathway of insulin secretion. The last study aimed to identify metabolic dysregulation in clonal β-cells growing under lipotoxic and glucotoxic conditions, respectively. In all studies, metabolomics extended and deepened the understanding of metabolic regulation in cells and patients. As such, metabolomics will help to find explanations for metabolic diseases such as diabete
    corecore