523 research outputs found
Recommended from our members
Empowering statistical methods for cellular and molecular biologists.
We provide guidelines for using statistical methods to analyze the types of experiments reported in cellular and molecular biology journals such as Molecular Biology of the Cell. Our aim is to help experimentalists use these methods skillfully, avoid mistakes, and extract the maximum amount of information from their laboratory work. We focus on comparing the average values of control and experimental samples. A Supplemental Tutorial provides examples of how to analyze experimental data using R software
motifDiverge: a model for assessing the statistical significance of gene regulatory motif divergence between two DNA sequences
Next-generation sequencing technology enables the identification of thousands
of gene regulatory sequences in many cell types and organisms. We consider the
problem of testing if two such sequences differ in their number of binding site
motifs for a given transcription factor (TF) protein. Binding site motifs
impart regulatory function by providing TFs the opportunity to bind to genomic
elements and thereby affect the expression of nearby genes. Evolutionary
changes to such functional DNA are hypothesized to be major contributors to
phenotypic diversity within and between species; but despite the importance of
TF motifs for gene expression, no method exists to test for motif loss or gain.
Assuming that motif counts are Binomially distributed, and allowing for
dependencies between motif instances in evolutionarily related sequences, we
derive the probability mass function of the difference in motif counts between
two nucleotide sequences. We provide a method to numerically estimate this
distribution from genomic data and show through simulations that our estimator
is accurate. Finally, we introduce the R package {\tt motifDiverge} that
implements our methodology and illustrate its application to gene regulatory
enhancers identified by a mouse developmental time course experiment. While
this study was motivated by analysis of regulatory motifs, our results can be
applied to any problem involving two correlated Bernoulli trials
Failure of an Educational Intervention to Improve Consultation and Implications for Healthcare Consultation.
INTRODUCTION:
Consultation of another physician for his or her specialized expertise regarding a patient's care is a common occurrence in most physicians' daily practice, especially in the emergency department (ED). Therefore, the ability to communicate effectively with another physician during a patient consultation is an essential skill. However, there has been limited research on a standardized method for a physician to physician consultation with little guidance on teaching consultations to physicians in training. The objective of our study was to measure the effect of a structured consultation intervention on both content standardization and quality of medical student consultations.
METHODS:
Senior medical students were assessed on a required emergency medicine rotation with a physician phone consultation during a standardized, simulated chest pain case. The intervention groups received a standard consult checklist as part of their orientation to the rotation, followed by a video recording of a good consult call and a bad consult call with commentary from an emergency physician. The intervention was given to students every other month, alternating with a control group who received no additional education. Recordings were reviewed by three second-year internal medicine residents pursuing a fellowship in cardiology. Each recording was evaluated by two of the three reviewers and scored using a standardized checklist.
RESULTS:
Providing a standardized consultation intervention did not improve students' ability to communicate with consultants. In addition, there was variability between evaluators in regards to how they received the same information and how they perceived the quality of the same recorded consultation calls. Evaluator inter-rater reliability (IRR) was poor on the questions of 1) would you have any other questions of the student calling the consult and 2) did the student calling the consult provide an accurate account of information and case detail. The IRR was also poor on objective data such as whether the student stated their name.
CONCLUSIONS:
A brief intervention may not be enough to change complex behavior such as a physician to physician consultant communication. Importantly, despite consultants listening to the same audio recordings, the information was processed differently. Future investigations should focus on both those delivering as well as those receiving a consultation
SIRT1 and SIRT3 deacetylate homologous substrates: AceCS1,2 and HMGCS1,2.
SIRT1 and SIRT3 are NAD+-dependent protein deacetylases that are evolutionarily conserved across mammals. These proteins are located in the cytoplasm/nucleus and mitochondria, respectively. Previous reports demonstrated that human SIRT1 deacetylates Acetyl-CoA Synthase 1 (AceCS1) in the cytoplasm, whereas SIRT3 deacetylates the homologous Acetyl-CoA Synthase 2 (AceCS2) in the mitochondria. We recently showed that 3-hydroxy-3-methylglutaryl CoA synthase 2 (HMGCS2) is deacetylated by SIRT3 in mitochondria, and we demonstrate here that SIRT1 deacetylates the homologous 3-hydroxy-3-methylglutaryl CoA synthase 1 (HMGCS1) in the cytoplasm. This novel pattern of substrate homology between cytoplasmic SIRT1 and mitochondrial SIRT3 suggests that considering evolutionary relationships between the sirtuins and their substrates may help to identify and understand the functions and interactions of this gene family. In this perspective, we take a first step by characterizing the evolutionary history of the sirtuins and these substrate families
Supervised Distance Matrices: Theory and Applications to Genomics
We propose a new approach to studying the relationship between a very high dimensional random variable and an outcome. Our method is based on a novel concept, the supervised distance matrix, which quantifies pairwise similarity between variables based on their association with the outcome. A supervised distance matrix is derived in two stages. The first stage involves a transformation based on a particular model for association. In particular, one might regress the outcome on each variable and then use the residuals or the influence curve from each regression as a data transformation. In the second stage, a choice of distance measure is used to compute all pairwise distances between variables in this transformed data. When the outcome is right-censored, we show that the supervised distance matrix can be consistently estimated using inverse probability of censoring weighted (IPCW) estimators based on the mean and covariance of the transformed data. The proposed methodology is illustrated with examples of gene expression data analysis with a survival outcome. This approach is widely applicable in genomics and other fields where high-dimensional data is collected on each subject
Resampling-based Multiple Testing: Asymptotic Control of Type I Error and Applications to Gene Expression Data
We define a general statistical framework for multiple hypothesis testing and show that the correct null distribution for the test statistics is obtained by projecting the true distribution of the test statistics onto the space of mean zero distributions. For common choices of test statistics (based on an asymptotically linear parameter estimator), this distribution is asymptotically multivariate normal with mean zero and the covariance of the vector influence curve for the parameter estimator. This test statistic null distribution can be estimated by applying the non-parametric or parametric bootstrap to correctly centered test statistics. We prove that this bootstrap estimated null distribution provides asymptotic control of most type I error rates. We show that obtaining a test statistic null distribution from a data null distribution, e.g. projecting the data generating distribution onto the space of all distributions satisfying the complete null), only provides the correct test statistic null distribution if the covariance of the vector influence curve is the same under the data null distribution as under the true data distribution. This condition is a weak version of the subset pivotality condition. We show that our multiple testing methodology controlling type I error is equivalent to constructing an error-specific confidence region for the true parameter and checking if it contains the hypothesized value. We also study the two sample problem and show that the permutation distribution produces an asymptotically correct null distribution if (i) the sample sizes are equal or (ii) the populations have the same covariance structure. We include a discussion of the application of multiple testing to gene expression data, where the dimension typically far exceeds the sample size. An analysis of a cancer gene expression data set illustrates the methodology
Statistical Inference for Simultaneous Clustering of Gene Expression Data
Current methods for analysis of gene expression data are mostly based on clustering and classification of either genes or samples. We offer support for the idea that more complex patterns can be identified in the data if genes and samples are considered simultaneously. We formalize the approach and propose a statistical framework for two-way clustering. A simultaneous clustering parameter is defined as a function of the true data generating distribution, and an estimate is obtained by applying this function to the empirical distribution. We illustrate that a wide range of clustering procedures, including generalized hierarchical methods, can be defined as parameters which are compositions of individual mappings for clustering patients and genes. This framework allows one to assess classical properties of clustering methods, such as consistency, and to formally study statistical inference regarding the clustering parameter. We present results of simulations designed to assess the asymptotic validity of different bootstrap methods for estimating the distributions of estimated simultaneous clustering parameters. The method is illustrated on a publicly available data set
Novel genes exhibit distinct patterns of function acquisition and network integration
BackgroundGenes are created by a variety of evolutionary processes, some of which generate duplicate copies of an entire gene, while others rearrange pre-existing genetic elements or co-opt previously non-coding sequence to create genes with 'novel' sequences. These novel genes are thought to contribute to distinct phenotypes that distinguish organisms. The creation, evolution, and function of duplicated genes are well-studied; however, the genesis and early evolution of novel genes are not well-characterized. We developed a computational approach to investigate these issues by integrating genome-wide comparative phylogenetic analysis with functional and interaction data derived from small-scale and high-throughput experiments.ResultsWe examine the function and evolution of new genes in the yeast Saccharomyces cerevisiae. We observed significant differences in the functional attributes and interactions of genes created at different times and by different mechanisms. Novel genes are initially less integrated into cellular networks than duplicate genes, but they appear to gain functions and interactions more quickly than duplicates. Recently created duplicated genes show evidence of adapting existing functions to environmental changes, while young novel genes do not exhibit enrichment for any particular functions. Finally, we found a significant preference for genes to interact with other genes of similar age and origin.ConclusionsOur results suggest a strong relationship between how and when genes are created and the roles they play in the cell. Overall, genes tend to become more integrated into the functional networks of the cell with time, but the dynamics of this process differ significantly between duplicate and novel genes
- …