289 research outputs found

    Application of Esscher Transformed Laplace Distribution in Microarray Gene Expression Data

    Get PDF
    Microarrays allow the study of the expression profile of hundreds to thousands of genes simultaneously. These expressions could be from treated samples and the healthy controls. The Esscher transformed Laplace distribution is used to fit microarray expression data as compared to Normal and Laplace distributions. The Maximum Likelihood Estimation procedure is used to estimate the parameters of the distribution. R codes are developed to implement the estimation procedure. A simulation study is carried out to test the performance of the algorithm. AIC and BIC criterion are used to compare the distributions. It is shown that the fit of the Esscher transformed Laplace distribution is better as compared to Normal and standard Laplace distributions

    Vol. 15, No. 1 (Full Issue)

    Get PDF

    Modulating the Functional Contributions of c-Myc to the Human Endothelial Cell Cyclic Strain Response

    Get PDF
    With each heartbeat, major arteries experience circumferential expansion due to internal pressure changes. This pulsatile force is called cyclic strain and has been implicated in playing a pivotal role in the genetic regulation of vascular physiology and pathology. This dissertation investigates the hypothesis that in human umbilical vein endothelial cells (HUVEC), pathological levels of cyclic strain activate the c-Myc promoter, leading to c-Myc transcription and downstream gene induction. To determine expression and time-dependency of c-Myc in HUVEC, mRNA and protein expression of c-Myc under physiological (6-10% cyclic strain) and pathological conditions (20% cyclic strain) were studied. Both c-Myc mRNA and protein expression increased more than three-fold in HUVEC (P4-P5) cyclically-strained at 20%. This expression occurred in a time-dependent manner, peaking in the 1.5-2 hour range and falling to basal levels by 3 hours. Subsequently, the mechanism of c-Myc transcription was investigated by using specific inhibitors to modulate c-Myc transcriptional activation. These compounds, obtained from the University of Arizona Cancer Center, attenuated cyclic-strain-induced c-Myc transcription by about 50%. Having established this reduction in expression, it was investigated how these effects modulate downstream genes that are regulated by c-Myc. The results indicate that direct targeting of the c-Myc promoter may decrease stretch-induced gene expression of vascular endothelial growth factor (VEGF), proliferating cell nuclear antigen (PCNA) and heat shock protein 60 (HSP60). These findings may help in the development of a novel therapeutic opportunity in vascular diseases.Ph.D.Committee Chair: Larry McIntire; Committee Member: Marion Sewer; Committee Member: Ray Vito; Committee Member: Suzanne Eskin; Committee Member: Todd McDevit

    Unconventional Regression for High-Dimensional Data Analysis

    Get PDF
    University of Minnesota Ph.D. dissertation. June 2017. Major: Statistics. Advisor: Hui Zou. 1 computer file (PDF); xiv, 161 pages.Massive and complex data present new challenges that conventional sparse penalized mean regressions, such as the penalized least squares, cannot fully solve. For example, in high-dimensional data, non-constant variance, or heteroscedasticity, is commonly present but often receives little attention in penalized mean regressions. Heavy-tailedness is also frequently encountered in many high-dimensional scientific data. To resolve these issues, unconventional sparse regressions such as penalized quantile regression and penalized asymmetric least squares are the appropriate tools because they can infer the complete picture of the entire probability distribution. Asymmetric least squares regression has wide applications in statistics, econometrics and finance. It is also an important tool in analyzing heteroscedasticity and is computationally friendlier than quantile regression. The existing work on asymmetric least squares only considers the traditional low dimension and large sample setting. We systematically study the Sparse Asymmetric LEast Squares (SALES) under high dimensionality and fully explore its theoretical and numerical properties. SALES may fail to tell which variables are important for the mean function and which variables are important for the scale/variance function, especially when there are variables that are important for both mean and scale. To that end, we further propose a COupled Sparse Asymmetric LEast Squares (COSALES) regression for calibrated heteroscedasticity analysis. Penalized quantile regression has been shown to enjoy very good theoretical properties in the literature. However, the computational issue of penalized quantile regression has not yet been fully resolved in the literature. We introduce fast alternating direction method of multipliers (ADMM) algorithms for computing penalized quantile regression with the lasso, adaptive lasso, and folded concave penalties. The convergence properties of the proposed algorithms are established and numerical experiments demonstrate their computational efficiency and accuracy. To efficiently estimate coefficients in high-dimensional linear models without prior knowledge of the error distributions, sparse penalized composite quantile regression (CQR) provides protection against significant efficiency decay regardless of the error distribution. We consider both lasso and folded concave penalized CQR and establish their theoretical properties under ultrahigh dimensionality. A unified efficient numerical algorithm based on ADMM is also proposed to solve the penalized CQR. Numerical studies demonstrate the superior performance of penalized CQR over penalized least squares under many error distributions

    High Dimensional Statistical Testing With Applications to Gene Significance Detection

    Get PDF
    High-throughput screening has become an important mainstay for contemporary biomedical research. A standard approach is to use a large number of t-tests simultaneously and then select p-values in a manner that controls false discovery rate (FDR). Existing methods require very strong assumptions on the distribution of the data and the distribution of the p-values. We propose an asymptotically valid, data-driven procedure to find critical values for the t-statistics which requires minimal assumptions. A new asymptotically consistent estimate for the proportion of alternatives has been developed along the way. We demonstrate that our approach has improved computational efficiency and power over existing approaches while requiring fewer assumptions. The method controls the k-family wise error rate (k-FWER), the tail probability of false discovery proportion (FDTP) and false discovery rate (FDR). Simulation studies support our theoretical results and demonstrate the favorable performance of our new multiple testing procedure. We also apply our method to analyze cancer microarray studies. One feature of our approach is that it takes the alternative into account. Existing approaches take the alternative into account as well. However, we found that a standard concavity assumption on the p-value distribution for the alternative is violated under certain circumstances. A more general concept is the monotone likelihood ratio condition (MLRC) introduced in Sun and Cai (2007). We show that the concavity assumption can be violated for (i) a simple heteroscedastic normal mixture model and (ii) dependent tests. Some interesting implications, including the choice of test statistics, existing FDR control procedures (step-up and step-down) and the power definition, are discussed.Doctor of Philosoph

    From phenomenological modelling of anomalous diffusion through continuous-time random walks and fractional calculus to correlation analysis of complex systems

    Get PDF
    This document contains more than one topic, but they are all connected in ei- ther physical analogy, analytic/numerical resemblance or because one is a building block of another. The topics are anomalous diffusion, modelling of stylised facts based on an empirical random walker diffusion model and null-hypothesis tests in time series data-analysis reusing the same diffusion model. Inbetween these topics are interrupted by an introduction of new methods for fast production of random numbers and matrices of certain types. This interruption constitutes the entire chapter on random numbers that is purely algorithmic and was inspired by the need of fast random numbers of special types. The sequence of chapters is chrono- logically meaningful in the sense that fast random numbers are needed in the first topic dealing with continuous-time random walks (CTRWs) and their connection to fractional diffusion. The contents of the last four chapters were indeed produced in this sequence, but with some temporal overlap. While the fast Monte Carlo solution of the time and space fractional diffusion equation is a nice application that sped-up hugely with our new method we were also interested in CTRWs as a model for certain stylised facts. Without knowing economists [80] reinvented what physicists had subconsciously used for decades already. It is the so called stylised fact for which another word can be empirical truth. A simple example: The diffusion equation gives a probability at a certain time to find a certain diffusive particle in some position or indicates concentration of a dye. It is debatable if probability is physical reality. Most importantly, it does not describe the physical system completely. Instead, the equation describes only a certain expectation value of interest, where it does not matter if it is of grains, prices or people which diffuse away. Reality is coded and “averaged” in the diffusion constant. Interpreting a CTRW as an abstract microscopic particle motion model it can solve the time and space fractional diffusion equation. This type of diffusion equation mimics some types of anomalous diffusion, a name usually given to effects that cannot be explained by classic stochastic models. In particular not by the classic diffusion equation. It was recognised only recently, ca. in the mid 1990s, that the random walk model used here is the abstract particle based counterpart for the macroscopic time- and space-fractional diffusion equation, just like the “classic” random walk with regular jumps ±∆x solves the classic diffusion equation. Both equations can be solved in a Monte Carlo fashion with many realisations of walks. Interpreting the CTRW as a time series model it can serve as a possible null- hypothesis scenario in applications with measurements that behave similarly. It may be necessary to simulate many null-hypothesis realisations of the system to give a (probabilistic) answer to what the “outcome” is under the assumption that the particles, stocks, etc. are not correlated. Another topic is (random) correlation matrices. These are partly built on the previously introduced continuous-time random walks and are important in null- hypothesis testing, data analysis and filtering. The main ob jects encountered in dealing with these matrices are eigenvalues and eigenvectors. The latter are car- ried over to the following topic of mode analysis and application in clustering. The presented properties of correlation matrices of correlated measurements seem to be wasted in contemporary methods of clustering with (dis-)similarity measures from time series. Most applications of spectral clustering ignores information and is not able to distinguish between certain cases. The suggested procedure is sup- posed to identify and separate out clusters by using additional information coded in the eigenvectors. In addition, random matrix theory can also serve to analyse microarray data for the extraction of functional genetic groups and it also suggests an error model. Finally, the last topic on synchronisation analysis of electroen- cephalogram (EEG) data resurrects the eigenvalues and eigenvectors as well as the mode analysis, but this time of matrices made of synchronisation coefficients of neurological activity

    Applications of MATLAB in Science and Engineering

    Get PDF
    The book consists of 24 chapters illustrating a wide range of areas where MATLAB tools are applied. These areas include mathematics, physics, chemistry and chemical engineering, mechanical engineering, biological (molecular biology) and medical sciences, communication and control systems, digital signal, image and video processing, system modeling and simulation. Many interesting problems have been included throughout the book, and its contents will be beneficial for students and professionals in wide areas of interest

    Analysing and quantitatively modelling nucleosome binding preferences

    Get PDF
    The main emphasis of my work as a PhD student was the analysis and prediction of nucleosome positioning, focusing on the role sequence features play. Part I gives a broad overview of nucleosomes, before defining important technical terms. It continues by describing and reviewing experiments that measure nucleosome positioning and bioinformatic methods that learn the sequence preferences of nucleosomes to predict their positioning. Part II describes a collaboration project with the Gaul-lab, where I analyzed MNase-Seq measurements of nucleosomes in Drosophila. The original intention was to investigate the extent to which experimental biases influence the measurements. We extended the analysis to categorize and explore fragile, average and resistant nucleosome populations. I focused on the relation between nucleosome fragility and the sequence landscape, especially at promoters and enhancers. Analyzing the partial unwrapping of nucleosomes genome-wide, I found that the G+C ratio is a determinant of asymmetric unwrapping. I excluded an analysis of histone modifications from this work, which was part of this collaboration, due to its low relevance to the rest of the presented work. Part III describes my main project of developing a probabilistic nucleosome-position prediction method. I developed a maximum likelihood approach to learn a biophysical model of nucleosome binding. By including the low positional resolution of MNase-Seq and the sequence bias of CC-Seq into the likelihood, I could separate them from the nucleosome binding preferences and learn highly correlated nucleosome binding energy models. My analysis shows that nucleosomes have a position-specific binding preference and might be uninfluenced by G+C content or even disfavor it – contrary to the Consensus in literature. Part IV describes further analysis I did during my time as a PhD student that are not part of any planned publications. The main topics are: ancillary elements of my main project, unsuccessful attempts to correct experimental biases, analysis of the quality of experimental measurements, and adapting my probabilistic nucleosome-position prediction method to work with occupancy measurements. Lastly, I give a general outlook that reflects on my results and discusses next steps, like ways to improve my method further. I excluded two collaboration projects I participated in from this thesis, because they are still ongoing: a systematic analysis of how the core promoter sequence influences gene expression in Drosophila and the development of an experiment to measure nucleosome occupancy more precisely

    Computational design and designability of gene regulatory networks

    Full text link
    Nuestro conocimiento de las interacciones moleculares nos ha conducido hoy hacia una perspectiva ingenieril, donde diseños e implementaciones de sistemas artificiales de regulación intentan proporcionar instrucciones fundamentales para la reprogramación celular. Nosotros aquí abordamos el diseño de redes de genes como una forma de profundizar en la comprensión de las regulaciones naturales. También abordamos el problema de la diseñabilidad dada una genoteca de elementos compatibles. Con este fin, aplicamos métodos heuríticos de optimización que implementan rutinas para resolver problemas inversos, así como herramientas de análisis matemático para estudiar la dinámica de la expresión genética. Debido a que la ingeniería de redes de transcripción se ha basado principalmente en el ensamblaje de unos pocos elementos regulatorios usando principios de diseño racional, desarrollamos un marco de diseño computacional para explotar este enfoque. Modelos asociados a genotecas fueron examinados para descubrir el espacio genotípico asociado a un cierto fenotipo. Además, desarrollamos un procedimiento completamente automatizado para diseñar moleculas de ARN no codificante con capacidad regulatoria, basándonos en un modelo fisicoquímico y aprovechando la regulación alostérica. Los circuitos de ARN resultantes implementaban un mecanismo de control post-transcripcional para la expresión de proteínas que podía ser combinado con elementos transcripcionales. También aplicamos los métodos heurísticos para analizar la diseñabilidad de rutas metabólicas. Ciertamente, los métodos de diseño computacional pueden al mismo tiempo aprender de los mecanismos naturales con el fin de explotar sus principios fundamentales. Así, los estudios de estos sistemas nos permiten profundizar en la ingeniería genética. De relevancia, el control integral y las regulaciones incoherentes son estrategias generales que los organismos emplean y que aquí analizamos.Rodrigo Tarrega, G. (2011). Computational design and designability of gene regulatory networks [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/1417
    corecore