16 research outputs found
Searching for the scale of homogeneity
We introduce a statistical quantity, known as the function, related to
the integral of the two--point correlation function. It gives us
straightforward information about the scale where clustering dominates and the
scale at which homogeneity is reached. We evaluate the correlation dimension,
, as the local slope of the log--log plot of the function. We apply
this statistic to several stochastic point fields, to three numerical
simulations describing the distribution of clusters and finally to real galaxy
redshift surveys. Four different galaxy catalogues have been analysed using
this technique: the Center for Astrophysics I, the Perseus--Pisces redshift
surveys (these two lying in our local neighbourhood), the Stromlo--APM and the
1.2 Jy {\it IRAS} redshift surveys (these two encompassing a larger volume). In
all cases, this cumulant quantity shows the fingerprint of the transition to
homogeneity. The reliability of the estimates is clearly demonstrated by the
results from controllable point sets, such as the segment Cox processes. In the
cluster distribution models, as well as in the real galaxy catalogues, we never
see long plateaus when plotting as a function of the scale, leaving no
hope for unbounded fractal distributions.Comment: 9 pages, 11 figures, MNRAS, in press; minor revision and added
reference
Spatial variation of Anopheles-transmitted Wuchereria bancrofti and Plasmodium falciparum infection densities in Papua New Guinea.
RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.The spatial variation of Wuchereria bancrofti and Plasmodium falciparum infection densities was measured in a rural area of Papua New Guinea where they share anopheline vectors. The spatial correlation of W. bancrofti was found to reduce by half over an estimated distance of 1.7 km, much smaller than the 50 km grid used by the World Health Organization rapid mapping method. For P. falciparum, negligible spatial correlation was found. After mass treatment with anti-filarial drugs, there was negligible correlation between the changes in the densities of the two parasites
Gene expression meta-analysis of Parkinson’s disease and its relationship with Alzheimer’s disease
Abstract Parkinson’s disease (PD) and Alzheimer’s disease (AD) are the most common neurodegenerative diseases and have been suggested to share common pathological and physiological links. Understanding the cross-talk between them could reveal potentials for the development of new strategies for early diagnosis and therapeutic intervention thus improving the quality of life of those affected. Here we have conducted a novel meta-analysis to identify differentially expressed genes (DEGs) in PD microarray datasets comprising 69 PD and 57 control brain samples which is the biggest cohort for such studies to date. Using identified DEGs, we performed pathway, upstream and protein-protein interaction analysis. We identified 1046 DEGs, of which a majority (739/1046) were downregulated in PD. YWHAZ and other genes coding 14–3-3 proteins are identified as important DEGs in signaling pathways and in protein-protein interaction networks (PPIN). Perturbed pathways also include mitochondrial dysfunction and oxidative stress. There was a significant overlap in DEGs between PD and AD, and over 99% of these were differentially expressed in the same up or down direction across the diseases. REST was identified as an upstream regulator in both diseases. Our study demonstrates that PD and AD share significant common DEGs and pathways, and identifies novel genes, pathways and upstream regulators which may be important targets for therapy in both diseases
Bayesian modelling of ultra high-frequency financial data
The availability of ultra high-frequency (UHF) data on transactions has revolutionised data processing and statistical modelling techniques in finance. The unique characteristics of such data, e.g. discrete structure of price change, unequally spaced time intervals and multiple transactions have introduced new theoretical and computational challenges. In this study, we develop a Bayesian framework for modelling integer-valued variables to capture the fundamental properties of price change. We propose the application of the zero inflated Poisson difference (ZPD) distribution for modelling UHF data and assess the effect of covariates on the behaviour of price change. For this purpose, we present two modelling schemes; the first one is based on the analysis of the data after the market closes for the day and is referred to as off-line data processing. In this case, the Bayesian interpretation and analysis are undertaken using Markov chain Monte Carlo methods. The second modelling scheme introduces the dynamic ZPD model which is implemented through Sequential Monte Carlo methods (also known as particle filters). This procedure enables us to update our inference from data as new transactions take place and is known as online data processing. We apply our models to a set of FTSE100 index changes. Based on the probability integral transform, modified for the case of integer-valued random variables, we show that our models are capable of explaining well the observed distribution of price change. We then apply the deviance information criterion and introduce its sequential version for the purpose of model comparison for off-line and online modelling, respectively. Moreover, in order to add more flexibility to the tails of the ZPD distribution, we introduce the zero inflated generalised Poisson difference distribution and outline its possible application for modelling UHF data.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Bayesian quantile regression
The paper introduces the idea of Bayesian quantile regression employing a likelihood function that is based on the asymmetric Laplace distribution. It is shown that irrespective of the original distribution of the data, the use of the asymmetric Laplace distribution is a very natural and effective way for modelling Bayesian quantile regression. The paper also demonstrates that improper uniform priors for the unknown model parameters yield a proper joint posterior. The approach is illustrated via a simulated and two real data sets.Asymmetric Laplace distribution Bayesian inference Markov chain Monte Carlo methods Quantile regression
Blood biomarker-based classification study for neurodegenerative diseases
AbstractAs the population ages, neurodegenerative diseases are becoming more prevalent, making it crucial to comprehend the underlying disease mechanisms and identify biomarkers to allow for early diagnosis and effective screening for clinical trials. Thanks to advancements in gene expression profiling, it is now possible to search for disease biomarkers on an unprecedented scale.Here we applied a selection of five machine learning (ML) approaches to identify blood-based biomarkers for Alzheimer's (AD) and Parkinson's disease (PD) with the application of multiple feature selection methods. Based on ROC AUC performance, one optimal random forest (RF) model was discovered for AD with 159 gene markers (ROC-AUC = 0.886), while one optimal RF model was discovered for PD (ROC-AUC = 0.743). Additionally, in comparison to traditional ML approaches, deep learning approaches were applied to evaluate their potential applications in future works. We demonstrated that convolutional neural networks perform consistently well across both the Alzheimer's (ROC AUC = 0.810) and Parkinson's (ROC AUC = 0.715) datasets, suggesting its potential in gene expression biomarker detection with increased tuning of their architecture.</jats:p
Bayesian nonparametric quantile regression using splines
A new technique based on Bayesian quantile regression that models the dependence of a quantile of one variable on the values of another using a natural cubic spline is presented. Inference is based on the posterior density of the spline and an associated smoothing parameter and is performed by means of a Markov chain Monte Carlo algorithm. Examples of the application of the new technique to two real environmental data sets and to simulated data for which polynomial modelling is inappropriate are given. An aid for making a good choice of proposal density in the Metropolis-Hastings algorithm is discussed. The new nonparametric methodology provides more flexible modelling than the currently used Bayesian parametric quantile regression approach.
Additional file 1: of Gene expression meta-analysis of Parkinson’s disease and its relationship with Alzheimer’s disease
Table S1. Information about each study used in our meta-analysis after removal of outlier samples. Table S2. Differentially expressed genes identified in our meta-analysis that have been identified as PD risk genes in a recent GWAS meta-analysis [33]. Table S3. IPA canonical pathway analysis for significant pathways identified using all PD DEGs, included with the information for pathways shared with those identified as significant using all AD DEGs. Table S4. IPA canonical pathway analysis for significant pathways identified using down-regulated PD DEGs. Table S5. IPA upstream regulator analysis for up and down regulated PD DEGs analysed separately. Table S6. Top 10 hubs found in the protein-protein interaction network (PPIN) analysis subnetwork created using the top 30 PD DEGs. Table S7. The direction of differential expression between the common DEGs found between AD and PD. Figure S1. Selecting filtering threshold for microarray data. The percentage of studies called absent in a mas5 present absent call for each probe was calculated, and threshold determined by minimizing Anderson-Darling normality tests and giving optimal Q-Q plot of the Z-scores after meta-analysis. The Q-Q plot for (A) 5%, (B) 10%, (C) 15%, (D) 20% and (E) 30% filtering. After 15% filtering A-D p-values were minimized (F) and the 15% Q-Q plot gave closest values to normality. A-D is Anderson-Darling normality test. Figure S2. RNAseq data vs. microarray gene expression data. Average absolute expression level of RNA-seq log2(TPM) of SN tissue from GTEx database plotted against RMA normalised and filtered intensity of microarray control and PD data used in this meta-analysis. The Pearson correlation coefficient between the control microarray data and healthy RNA-seq data (A) is 0.70 (pvalu