130 research outputs found
Efficient Bayesian hierarchical functional data analysis with basis function approximations using Gaussian-Wishart processes
Functional data are defined as realizations of random functions (mostly
smooth functions) varying over a continuum, which are usually collected with
measurement errors on discretized grids. In order to accurately smooth noisy
functional observations and deal with the issue of high-dimensional observation
grids, we propose a novel Bayesian method based on the Bayesian hierarchical
model with a Gaussian-Wishart process prior and basis function representations.
We first derive an induced model for the basis-function coefficients of the
functional data, and then use this model to conduct posterior inference through
Markov chain Monte Carlo. Compared to the standard Bayesian inference that
suffers serious computational burden and unstableness for analyzing
high-dimensional functional data, our method greatly improves the computational
scalability and stability, while inheriting the advantage of simultaneously
smoothing raw observations and estimating the mean-covariance functions in a
nonparametric way. In addition, our method can naturally handle functional data
observed on random or uncommon grids. Simulation and real studies demonstrate
that our method produces similar results as the standard Bayesian inference
with low-dimensional common grids, while efficiently smoothing and estimating
functional data with random and high-dimensional observation grids where the
standard Bayesian inference fails. In conclusion, our method can efficiently
smooth and estimate high-dimensional functional data, providing one way to
resolve the curse of dimensionality for Bayesian functional data analysis with
Gaussian-Wishart processes.Comment: Under revie
IPAD: Stable Interpretable Forecasting with Knockoffs Inference
Interpretability and stability are two important features that are desired in
many contemporary big data applications arising in economics and finance. While
the former is enjoyed to some extent by many existing forecasting approaches,
the latter in the sense of controlling the fraction of wrongly discovered
features which can enhance greatly the interpretability is still largely
underdeveloped in the econometric settings. To this end, in this paper we
exploit the general framework of model-X knockoffs introduced recently in
Cand\`{e}s, Fan, Janson and Lv (2018), which is nonconventional for
reproducible large-scale inference in that the framework is completely free of
the use of p-values for significance testing, and suggest a new method of
intertwined probabilistic factors decoupling (IPAD) for stable interpretable
forecasting with knockoffs inference in high-dimensional models. The recipe of
the method is constructing the knockoff variables by assuming a latent factor
model that is exploited widely in economics and finance for the association
structure of covariates. Our method and work are distinct from the existing
literature in that we estimate the covariate distribution from data instead of
assuming that it is known when constructing the knockoff variables, our
procedure does not require any sample splitting, we provide theoretical
justifications on the asymptotic false discovery rate control, and the theory
for the power analysis is also established. Several simulation examples and the
real data analysis further demonstrate that the newly suggested method has
appealing finite-sample performance with desired interpretability and stability
compared to some popularly used forecasting methods
Investigating the Correlation between Performance Scores and Energy Consumption of Mobile Web Apps
Context. Developers have access to tools like Google Lighthouse to assess the performance of web apps and to guide the adoption of development best practices. However, when it comes to energy consumption of mobile web apps, these tools seem to be lacking. Goal. This study investigates on the correlation between the performance scores produced by Lighthouse and the energy consumption of mobile web apps. Method. We design and conduct an empirical experiment where 21 real mobile web apps are (i) analyzed via the Lighthouse performance analysis tool and (ii) measured on an Android device running a software-based energy profiler. Then, we statistically assess how energy consumption correlates with the obtained performance scores and carry out an effect size estimation. Results. We discover a statistically significant negative correlation between performance scores and the energy consumption of mobile web apps (with medium to large effect sizes), implying that an increase of the performance score tend to lead to a decrease of energy consumption. Conclusions. We recommend developers to strive to improve the performance level of their mobile web apps, as this can also have a positive impact on their energy consumption on Android devices
Semi-supervised discovery of differential genes
BACKGROUND: Various statistical scores have been proposed for evaluating the significance of genes that may exhibit differential expression between two or more controlled conditions. However, in many clinical studies to detect clinical marker genes for example, the conditions have not necessarily been controlled well, thus condition labels are sometimes hard to obtain due to physical, financial, and time costs. In such a situation, we can consider an unsupervised case where labels are not available or a semi-supervised case where labels are available for a part of the whole sample set, rather than a well-studied supervised case where all samples have their labels. RESULTS: We assume a latent variable model for the expression of active genes and apply the optimal discovery procedure (ODP) proposed by Storey (2005) to the model. Our latent variable model allows gene significance scores to be applied to unsupervised and semi-supervised cases. The ODP framework improves detectability by sharing the estimated parameters of null and alternative models of multiple tests over multiple genes. A theoretical consideration leads to two different interpretations of the latent variable, i.e., it only implicitly affects the alternative model through the model parameters, or it is explicitly included in the alternative model, so that the interpretations correspond to two different implementations of ODP. By comparing the two implementations through experiments with simulation data, we have found that sharing the latent variable estimation is effective for increasing the detectability of truly active genes. We also show that the unsupervised and semi-supervised rating of genes, which takes into account the samples without condition labels, can improve detection of active genes in real gene discovery problems. CONCLUSION: The experimental results indicate that the ODP framework is effective for hypotheses including latent variables and is further improved by sharing the estimations of hidden variables over multiple tests
The Impacts of Reduced Access to Abortion and Family Planning Services: Evidence from Texas
Between 2011 and 2014, Texas enacted three pieces of legislation that significantly reduced funding for family planning services and increased restrictions on abortion clinic operations. Together this legislation creates cross-county variation in access to abortion and family planning services, which we leverage to understand the impact of family planning and abortion clinic access on abortions, births, and contraceptive purchases. In-state abortions fell 20% and births rose 3% in counties that no longer had an abortion provider within 50 miles. Births increased 1% and contraceptive purchases rose 8% in counties without a publicly-funded family planning clinic within 25 miles
Phenotype Sequencing: Identifying the Genes That Cause a Phenotype Directly from Pooled Sequencing of Independent Mutants
Random mutagenesis and phenotype screening provide a powerful method for dissecting microbial functions, but their results can be laborious to analyze experimentally. Each mutant strain may contain 50–100 random mutations, necessitating extensive functional experiments to determine which one causes the selected phenotype. To solve this problem, we propose a “Phenotype Sequencing” approach in which genes causing the phenotype can be identified directly from sequencing of multiple independent mutants. We developed a new computational analysis method showing that 1. causal genes can be identified with high probability from even a modest number of mutant genomes; 2. costs can be cut many-fold compared with a conventional genome sequencing approach via an optimized strategy of library-pooling (multiple strains per library) and tag-pooling (multiple tagged libraries per sequencing lane). We have performed extensive validation experiments on a set of E. coli mutants with increased isobutanol biofuel tolerance. We generated a range of sequencing experiments varying from 3 to 32 mutant strains, with pooling on 1 to 3 sequencing lanes. Our statistical analysis of these data (4099 mutations from 32 mutant genomes) successfully identified 3 genes (acrB, marC, acrA) that have been independently validated as causing this experimental phenotype. It must be emphasized that our approach reduces mutant sequencing costs enormously. Whereas a conventional genome sequencing experiment would have cost 1200. In fact, our smallest experiments reliably identified acrB and marC at a cost of only 340
Structural Relationships between Highly Conserved Elements and Genes in Vertebrate Genomes
Large numbers of sequence elements have been identified to be highly conserved among vertebrate genomes. These highly conserved elements (HCEs) are often located in or around genes that are involved in transcription regulation and early development. They have been shown to be involved in cis-regulatory activities through both in vivo and additional computational studies. We have investigated the structural relationships between such elements and genes in six vertebrate genomes human, mouse, rat, chicken, zebrafish and tetraodon and detected several thousand cases of conserved HCE-gene associations, and also cases of HCEs with no common target genes. A few examples underscore the potential significance of our findings about several individual genes. We found that the conserved association between HCE/HCEs and gene/genes are not restricted to elements by their absolute distance on the genome. Notably, long-range associations were identified and the molecular functions of the associated genes do not show any particular overrepresentation of the functional categories previously reported. HCEs in close proximity are found to be linked with different set of gene/genes. The results reflect the highly complex correlation between HCEs and their putative target genes
Transient exposure to low levels of insecticide affects metabolic networks of honeybee larvae
The survival of a species depends on its capacity to adjust to changing environmental conditions, and new stressors. Such new, anthropogenic stressors include the neonicotinoid class of crop-protecting agents, which have been implicated in the population declines of pollinating insects, including honeybees (Apis mellifera). The low-dose effects of these compounds on larval development and physiological responses have remained largely unknown. Over a period of 15 days, we provided syrup tainted with low levels (2 µg/L−1) of the neonicotinoid insecticide imidacloprid to beehives located in the field. We measured transcript levels by RNA sequencing and established lipid profiles using liquid chromatography coupled with mass spectrometry from worker-bee larvae of imidacloprid-exposed (IE) and unexposed, control (C) hives. Within a catalogue of 300 differentially expressed transcripts in larvae from IE hives, we detect significant enrichment of genes functioning in lipid-carbohydrate-mitochondrial metabolic networks. Myc-involved transcriptional response to exposure of this neonicotinoid is indicated by overrepresentation of E-box elements in the promoter regions of genes with altered expression. RNA levels for a cluster of genes encoding detoxifying P450 enzymes are elevated, with coordinated downregulation of genes in glycolytic and sugar-metabolising pathways. Expression of the environmentally responsive Hsp90 gene is also reduced, suggesting diminished buffering and stability of the developmental program. The multifaceted, physiological response described here may be of importance to our general understanding of pollinator health. Muscles, for instance, work at high glycolytic rates and flight performance could be impacted should low levels of this evolutionarily novel stressor likewise induce downregulation of energy metabolising genes in adult pollinators
- …