8 research outputs found

    Dimension reduction methods with applications to high dimensional data with a censored response

    Get PDF
    Dimension reduction methods have come to the forefront of many applications where the number of covariates, p, far exceed the sample size, N. For example, in survival analysis studies using microarray gene expression data, 10--30K expressions per patient are collected, but only a few hundred patients are available for the study. The focus of this work is on linear dimension reduction methods. Attention is given to the dimension reduction method of Random Projection (RP), in which the original p-dimensional data matrix X is projected onto a k-dimensional subspace using a random matrix Gamma. The motivation of RP is the Johnson-Lindenstrauss (JL) Lemma, which states that a set of N points in p-dimensional Euclidean space can be projected onto a k ≥ 24lnN3e2-2e 3 dimensional Euclidean space such that the pairwise distances between the points are preserved within a factor 1 +/- epsilon. In this work, the JL Lemma is revisited when the random matrix Gamma is defined as standard Gaussian and Achlioptas-typed. An improvement on the lower bound for k is provided by working directly with the distributions of the random distances rather than resorting to the moment generating function technique used in the literature. An improvement on the lower bound for k is also provided when using pairwise L2 distances in the space of the original points and pairwise L 1 distances in the space of the projected points. Another popular dimension reduction method is Partial Least Squares. In this work, a variant of Partial Least Squares is proposed, denoted by Rank-based Modified Partial Least Squares (RMPLS). The weight vectors of RMPLS can be seen to be the solution to an optimization problem. The method is insensitive to outlying values of both the response and the covariates, and takes into account the censoring information in the construction of its weight vectors. Results from simulation and real datasets under the Cox and Accelerated Failure Time (AFT) models indicate that RMPLS outperforms other leading methods for various measures when outliers are present in the response, and is comparable to other methods in the absence of outliers in the response

    Integrative and Comparative Analysis of Retinoblastoma and Osteosarcoma

    Get PDF
    In the last one and a half decades, the generalization of high throughput methods in molecular biology has led to the generation of vast amounts of datasets that unraveled the unfathomed complexity of the cell regulatory mechanisms. The recently published results of the ENCODE project (ENCODE Project Consortium et al., 2012) demonstrated the extend of these in the human genome and certainly more regulation mechanisms will be discovered in the future. Already, this complexity within a single cell - without taking into account cell-cell interaction or micro-environment influences - cannot be abstracted by the human mind. However, understanding it is the key to devise adapted treatments to genetic diseases or disorders, among which is cancer. In mathematics, such complex problems are addressed using methods that reduce their complexity, so that they can be modeled in a solvable manner. In biology, it led researchers to develop the concept of systems biology as a mean to abstract the complexity of the cell regulatory network. To date, most of the published studies using high throughput technologies only focus on one kind of regulatory mechanism and hence cannot be used as such to investigate the interactions between these. Moreover, distinguishing causative from confounding factors within such studies is difficult. These were my original motivations to develop analytical and statistical methods that control for confounding factors effects and allow the integrative and comparative analysis of different kinds of datasets. In fine, three different tools were developed to achieve this goal. First, "customCDF": a tool to redefine the Custom Definition File (CDF) of Affymetrix GeneChips. It results in the increased sensitivity of downstream analyses as these bene fit from the constantly evolving human genome reference and annotations. Second, "aSim": a tool to simulate microarray data, which was required to benchmark the developed algorithms. Third, for the integrative analysis, a set of combined statistical methods and finally for the comparative analysis, a modification of the integrative analysis approach. These were bundled in the "crossChip" R package. The "customCDF" and "aSim" tools were first validated on independant datasets. The developed analytical methods ("crossChip") were first validated on "aSim" simulated data and publicly available datasets and then used to answer two biological questions. First, using two retinoblastoma datasets, the effect of genomic copy number variations on gene-expression was investigated. Then, motivated by the fact that retinoblastoma patients have a higher chance to develop osteosarcoma later in life than the average population, datasets of both these tumors were comparatively analyzed to assess these tumors similarities and differences. Despite a rather limited number of samples within the selected datasets, the developed approaches with their higher sensitivity and sensibility were successful and set the ground for larger scale analyses. Indeed, the integrative analysis applied to retinoblastoma revealed the high importance of the chromosome 6 gain at a later stage of the disease, indicating that many genes on that chromosome are beneficial to cancerogenesis. Moreover, in comparison to standard microarray analyses, it demonstrated its efficacy at detecting the interplay of regulatory mechanisms: examples of positive and negative compensation of gene expression in lost and gained regions, respectively, as well as examples of antisense transcription, pseudogene and snRNAs regulation were identified in this dataset. The comparative analysis on the other hand revealed the high similarity of the retinoblastoma and osteosarcoma tumors, while at the same time showing that either of them take advantage of their distinct micro-environment and consequently appear to make use of different signaling pathways, PKC/calmodulin in retinoblastoma and GPCR/RAS in osteosarcoma. The developed tools and statistical methods have demonstrated their validity and utility by giving sensible answers to the two biological questions addressed. Moreover, they generated a large number of interesting hypotheses that need further investigations. And as they are not limited to microarray analysis but can be applied to analyze any high-throughput generated data, they demonstrated the usefulness of "systems biology" approaches to study cancerogenesis

    Approches de fractionnement biochimique couplé à la transcriptomique dans l’étude systématique de la localisation subcellulaire et extracellulaire des ARNs

    Full text link
    Divers transcrits acquièrent une asymétrie spatiale au sein de certaines cellules procaryotes et eucaryotes, phénomène qualifié de « localisation des ARNs ». Chez les types cellulaires dotés d’une polarisation marquée, notamment les neurones ou certains embryons, la localisation des ARNm constitue un mécanisme élégant permettant de restreindre l’expression et l’activité protéique associée à un contexte spatiotemporel précis. Bien que diverses techniques d’imagerie permettent d’apprécier la distribution spatiale des transcrits à haute résolution, elles se prêtent difficilement à l’étude systématique de l’asymétrie spatiale du transcriptome. Cette thèse retrace d’abord le développement de méthodes biochimiques de fractionnement cellulaire et extracellulaire couplées au séquençage à haut débit des ARNs comme approche systématique dans l’étude de la localisation des ARNs. Ces méthodologies, qualifiées collectivement de CeFra-seq (« Cell Fractionation – RNA-seq »), se veulent complémentaires aux outils d’imagerie. Le chapitre 4 est consacré à une description technique détaillée de l’approche CeFra-seq chez les cellules humaines leucémiques K562, accompagnée d’étapes de validation et de transformation de données. Les chapitres subséquents traitent ensuite de l’application de ces méthodologies pour explorer quatre questions fonctionnelles chez divers systèmes biologiques. Le chapitre 6 tire profit de l’approche CeFra-seq pour explorer les propriétés subcellulaires des ARNs ciblés aux vésicules extracellulaires (VEs). Les VEs correspondent à un groupe hétérogène de structures nanoscopiques constituées d’une bicouche lipidique qui contiennent un répertoire spécifique d’acides nucléiques et des protéines. Ubiquitaire au sein des liquides biologiques, les VEs ont été associées à la communication intercellulaire dans divers contextes, de la présentation des antigènes à la progression tumorale. Or, les mécanismes qui déterminent la localisation préférentielle de certains ARNs aux VEs demeurent nébuleux. En contrastant de manière systématique les populations d’ARNs contenus dans ces structures aux répertoires subcellulaires obtenus par l’approche de CeFra-seq, mon travail a permis de mettre en évidence certaines propriétés associées au ciblage extracellulaire, notamment l’accessibilité cytosolique, la taille des ARNs ainsi que des éléments de séquence en cis. Le chapitre 7 propose une comparaison extensive des propriétés morphologiques et transcriptomiques des VEs issues d’une série de lignées cellulaires humaines et de lignées embryonnaires de Drosophile. Ce travail révèle que les VEs de Drosophile sont plus petites que celles des cancers humains et que l’enrichissement d’ARNs courts transcrits par la polymérase III prévaut chez les VEs des deux espèces. Ensemble, ces résultats valident l’hypothèse d’une conservation élevée des phénomènes d’export de l’ARN. Le chapitre 8 étend CeFra-seq à un nouveau contexte biologique : le développement embryonnaire de la Drosophile. Ici, cette méthodologie conduit à l’établissement de répertoires de transcrits dotés d’une forte asymétrie spatiale et temporelle au cours de l’embryogenèse. L’analyse des mutants SLBP, un facteur de maturation des ARNs d’histone, et de Chk1, régulateur des voies de dommage à l’ADN, montre ensuite que la déplétion de ces protéines compromet sélectivement l’expression des transcrits zygotiques, identifiés grâce à la méthode CeFra-seq. Le chapitre 9 relate une étude des ARNs antisens issus du locus des histones pendant l’embryogenèse de la Drosophile. L’expression de ces transcrits non-polyadénylés fluctue pendant le développement et dépend de la protéine SLBP. Ici, l’approche CeFra-seq révèle que ces transcrits antisens, strictement zygotiques, co-ségréguent avec leurs ARNm complémentaires, un résultat qui évoque la formation d’ARN double-brin, puis de petits ARN interférents. Pour faire suite à cette hypothèse, j’ai démontré que de petits transcrits issus du locus des histones s’associent au facteur catalytique de la machinerie d’interférence aux ARNs, Argonaute-2. De plus, la déplétion d’Argonaute-2 mène à une dérepression des ARNm d’histones. Ensemble, ces résultats suggèrent un modèle de transcription antisens zygotique précoce menant à la formation de petits ARNs interférents qui contribuent à l’élimination des ARNm d’histones contribués maternellement. Ainsi, cette thèse est échafaudée sur le développement d’une approche versatile de l’étude systématique de la localisation des ARN, CeFra-seq, décrite dans le chapitre 5. La mise au point de cette approche débouche ensuite sur des études fonctionnelles visant à mieux comprendre les propriétés des ARNs ciblés au VEs (Chapitre 6) et le degré de conservation de ce ciblage (Chapitre 7). La suite de la thèse exploite l’approche CeFra-seq pour explorer le phénotype transcriptomique de la déplétion de SLBP chez l’embryon de Drosophile (Chapitre 8), ainsi que les propriétés et fonctions d’ARN antisens issus du locus des histones chez l’embryon précoce (Chapitre 9).Several RNA transcripts acquire spatially-resolved patterns in diverse prokaryotic and eukaryotic cells, a phenomenon termed “RNA localization”. In highly polarized cells, such as neurons or certain embryos, mRNA localization provides an elegant mechanism to restrict protein expression and activity to a narrow spatiotemporal context. Diverse imaging approaches have been developed to study RNA localization, including RNA in situ hybridization and the MS2 system. While these techniques enable the visualization of RNA spatial distributions at a high resolution, they hardly allow for systematic, transcriptome-wide analyses of spatial asymmetry. The first part of this thesis encompasses the development of biochemical, cell fractionation and extracellular milieu processing methods coupled to deep sequencing as a novel approach to study transcriptome-wide RNA localization. These methods, collectively termed CeFra-seq (“Cell Fractionation – RNA-seq”), are propose as a complementary tool with imaging approaches. Chapter 5 consists of a detailed technical description of the CeFra-seq methodology, along with a validation workflow and a relevant data transformation toolkit. The subsequent chapters discuss applications of these methods to investigate four outstanding questions in different biological systems. Chapter 6 relies on CeFra-seq to explore the subcellular properties of RNAs targeted to extracellular vesicles (EVs). EVs form a group of heterogeneous nanoscopic structures delimited by a phospholipid bilayer that contain specific repertoires of protein and nucleic acids. Ubiquitous in biological fluids, EVs have been associated to intercellular communication in diverse biological contexts, notably antigen presentation and tumor progression. Yet, the mechanisms that account for the enrichment of specific RNAs in EVs remain unclear. By systematically contrasting RNA populations found in EVs with subcellular distributions, my work has revealed diverse properties linked to EV targeting, including cytosolic accessibility, RNA length and cis-acting elements. Chapter 7 consists of an extensive comparison of the morphological and transcriptomic properties of EVs derived from several human and Drosophila cell lines. This work reveals that Drosophila EVs are smaller than their human counterparts and that they are both enriched in short, Polymerase III transcripts. Together, these results emphasize the high conservation of RNA export processes. Chapter 8 extends subcellular fractionation approaches coupled to deep sequencing in an additional biological system: Drosophila embryogenesis. Here, the method leads to repertoires of transcripts displaying high spatial and temporal asymmetry during development. The analysis of SLBP mutants, a factor involved in histone mRNA processing, and Chk1 mutnts, a regulator of the DNA damage response, shows that the depletion of these proteins selectively hampers the expression of zygotic transcripts, identified through CeFra-seq. Chapter 9 recounts a study of antisense transcripts derived from the histone gene locus during Drosophila embryogenesis. The expression of these non-polyadenylated RNAs fluctuates during development and depends on the protein SLBP. Here, CeFra-seq reveals that these antisense RNAs, which are strictly zygotic, co-segregate with their complementary mRNAs, hinting at the formation of double-stranded RNAs, precursors of small interfering RNAs. To follow-up on this hypothesis, I show that small RNAs derived from the histone gene locus bind to the catalytic factor of the RNA-induced silencing complex, Argonaute-2. In addition, depleting Argonaute-2 leads to a derepression of histone mRNAs. Together, these results suggest a model wherein precocious zygotic antisense transcription leads to the formation of small interfering RNAs, which contribute to the clearance of maternally deposited histone mRNAs. Hence, this thesis reflects the development and application of a versatile approach to study RNA localization, termed CeFra-seq and described in chapter 5. The use of this method leads to functional studies aiming to investigate the properties of EV-targeted RNAs (Chapter 6) and the extent of evolutionary conservation of this targeting process (Chapter 7). The rest of the thesis exploits the CeFra-seq approach to explore the transcriptomic phenotype of SLBP depletion in Drosophila embryos (Chapter 8), as well as the properties and functions of antisense RNAs produced by the histone gene locus in early embryos (Chapter 9)

    Ocorrência e distribuição de contaminantes emergentes em solos, águas superficiais e subterrâneas no município de Porto Alegre, RS

    Get PDF
    Contaminantes Emergentes (CEs) podem ser definidos como novos compostos ou moléculas que não eram conhecidas anteriormente ou que apareceram recentemente na literatura científica. Em algumas áreas urbanas do Brasil, a água subterrânea é normalmente consumida sem tratamento prévio, sendo necessário o monitoramento desses compostos. Assim, o objetivo principal deste trabalho é a análise da ocorrência e do comportamento de contaminantes no município de Porto Alegre, RS, com base em análises químicas realizadas por meio de Cromatografia Líquida Tandem e Espectrometria de Massas (LC-MS/MS), experimentos de coluna e mapas geoestatísticos. Primeiramente, foi investigado o comportamento de sete compostos em solos (atrazina, simazina, ametrina, tebutiuron, 2,4-D, fipronil e diclofenaco) por meio de experimentos de coluna de lixiviação para avaliar acúmulo e transferência em cinco tipos diferentes de solos de Porto Alegre. Os resultados mostraram que o solo derivado de sedimentos quaternários, com sedimentos arenosos bem selecionados, foi aquele em que os contaminantes apresentaram maior mobilidade. Esse solo também apresenta pH acima da média dos demais, fator que também pode ser responsável pela menor retenção de substâncias. O tebutiuron é a substância com maior potencial de lixiviação em geral. Além disso, a concentração e distribuição de 23 substâncias Per- e polifluoroalquil (PFAS) foram analisadas em amostras de águas subterrâneas e em amostras de águas superficiais. As concentrações totais de PFAS (ΣPFAS) em amostra de água subterrânea variaram entre 22 e 718 ng L-1. Dez espécies foram encontradas em águas superficiais, sendo que as predominantes foram PFOA, PFOS e PFHpA com frequência de detecção de 50%, 62%, 87%, respectivamente. Além disso, os corpos d'água tributários apresentaram maiores concentrações de PFAS do que o corpo d'água principal (Lago Guaíba), provavelmente devido a processos de diluição. Por fim, um total de 23 compostos CEs, incluindo pesticidas, produtos farmacêuticos e hormônios, foram determinados em amostras de águas subterrâneas. Os CEs mais abundantes foram atrazina e seus produtos de degradação, fipronil, simazina, tebuconazol, hexazinona e cafeína em concentrações de até 300 ng L-1. Todos os compostos estudados foram detectados nas águas subterrâneas em pelo menos uma amostra. Padrões nos dados por meio de mapas de SOM mostraram uma forte correlação positiva entre atrazina, hexazinona, simazina, tebutiuron, 2-hidroxiatrazina e 17β-estradiol. Os hormônios estrona e testosterona também apresentam uma correlação positiva devido às suas propriedades químicas semelhantes. Por outro lado, a cafeína foi detectada em 90% das amostras, provavelmente devido ao hábito da população de ingerir chimarrão, associado aos baixos índices de esgoto doméstico tratado na área de estudo.Contaminants of emerging concern (CEC) can be defined as new compounds or molecules that were not previously known or have recently appeared in the scientific literature. CEC could also be defined as compounds whose environmental contamination issues were not fully realized or apprehended and also as contaminants that new information is changing the understanding of the environmental and human health risks. In addition, in some urban areas in Brazil, groundwater is normally consumed without previous treatment, and monitoring CEC is necessary. Thus, The main objective of this work is the analysis of occurrence and behavior of emerging contaminants in Porto Alegre, based on chemical analyses carried out through Liquid Chromatography tandem Mass Spectrometry (LC-MS/MS). laboratory column experiments and geostatistical maps. firstly, was investigated the fate of seven compounds (atrazine, simazine, ametrine, tebuthiuron, 2,4-D, fipronil and diclofenac) using leaching column experiments to evaluate accumulation and transfer in 5 different types of soils from Porto Alegre. The results showed that the soil derived from quaternary sediments, with well sorted sandy sediments was the one in which the contaminants had higher mobility. This soil also has a pH above the average of the others in the city, a factor that may also be responsible for less retention of substances. Tebuthiuron is the substance with the greatest leaching potential overall. In addition, the concentration and distribution of 23 Per- and polyfluoroalkyl substances (PFAS), were analyzed in groundwater samples collected from water wells in urbanized areas and in surface water sampleS. The total concentrations of PFAS (ΣPFAS) in a groundwater sample varied between 22 and 718 ng L-1. Eleven PFAS species were detected in groundwater, including: PFOA and PFOS. Most of sample locations with quantified PFAS are in the porous aquifer, which has higher hydraulic conductivity than the fractured aquifer, fact than may contribute to groundwater contamination. Ten species were founded in surface water and the most dominant was PFOA, PFOS and PFHpA with detection frequency of 50%, 62%, 87%, respectively. PFOA was the most dominant specie in the study. In addition, tributaries water bodies had higher concentrations of PFAS than the mainly water body (Guaíba lake) probably due dilution process. Finally, a total of 23 CEC compounds including pesticides, pharmaceuticals, and hormones were determined in groundwater samples. The CEC most frequently detected were atrazine and degradation products, fipronil, simazine, tebuconazole, hexazinone, and caffeine in concentrations up to 300 ng L-1. All compounds studied were detected in groundwater at least one sample. Patterns in the data through SOM have shown a strong positive correlation between atrazine, hexazinone, simazine, tebuthiuron, 2-hydroxyatrazine, and 17β-estradiol. The hormones estrone and testosterone also shows a positive correlation due to their similar chemical properties. On the other hand, caffeine was detected in 90% of the samples, likely due to a population habit of taking daily a hot drink made of yerba mate associated with low rates of domestic sewage treated in the study area

    6th International Probabilistic Workshop - 32. Darmstädter Massivbauseminar: 26-27 November 2008 ; Darmstadt, Germany 2008 ; Technische Universität Darmstadt

    Get PDF
    These are the proceedings of the 6th International Probabilistic Workshop, formerly known as Dresden Probabilistic Symposium or International Probabilistic Symposium. The workshop was held twice in Dresden, then it moved to Vienna, Berlin, Ghent and finally to Darmstadt in 2008. All of the conference cities feature some specialities. However, Darmstadt features a very special property: The element number 110 was named Darmstadtium after Darmstadt: There are only very few cities worldwide after which a chemical element is named. The high element number 110 of Darmstadtium indicates, that much research is still required and carried out. This is also true for the issue of probabilistic safety concepts in engineering. Although the history of probabilistic safety concepts can be traced back nearly 90 years, for the practical applications a long way to go still remains. This is not a disadvantage. Just as research chemists strive to discover new element properties, with the application of new probabilistic techniques we may advance the properties of structures substantially. (Auszug aus Vorwort
    corecore