327 research outputs found

    Microarray-based gene set analysis: a comparison of current methods

    Get PDF
    BACKGROUND: The analysis of gene sets has become a popular topic in recent times, with researchers attempting to improve the interpretability and reproducibility of their microarray analyses through the inclusion of supplementary biological information. While a number of options for gene set analysis exist, no consensus has yet been reached regarding which methodology performs best, and under what conditions. The goal of this work was to examine the performance characteristics of a collection of existing gene set analysis methods, on both simulated and real microarray data sets. Of particular interest was the potential utility gained through the incorporation of inter-gene correlation into the analysis process. RESULTS: Each of six gene set analysis methods was applied to both simulated and publicly available microarray data sets. Overall, the various methodologies were all found to be better at detecting gene sets that moved from non-active (i.e., genes not expressed) to active states (or vice versa), rather than those that simply changed their level of activity. Methods which incorporate correlation structures were found to provide increased ability to detect altered gene sets in some settings. CONCLUSION: Based on the results obtained through the analysis of simulated data, it is clear that the performance of gene set analysis methods is strongly influenced by the features of the data set in question, and that methods which incorporate correlation structures into the analysis process tend to achieve better performance, relative to methods which rely on univariate test statistics

    Testing the additional predictive value of high-dimensional molecular data

    Get PDF
    While high-dimensional molecular data such as microarray gene expression data have been used for disease outcome prediction or diagnosis purposes for about ten years in biomedical research, the question of the additional predictive value of such data given that classical predictors are already available has long been under-considered in the bioinformatics literature. We suggest an intuitive permutation-based testing procedure for assessing the additional predictive value of high-dimensional molecular data. Our method combines two well-known statistical tools: logistic regression and boosting regression. We give clear advice for the choice of the only method parameter (the number of boosting iterations). In simulations, our novel approach is found to have very good power in different settings, e.g. few strong predictors or many weak predictors. For illustrative purpose, it is applied to two publicly available cancer data sets. Our simple and computationally efficient approach can be used to globally assess the additional predictive power of a large number of candidate predictors given that a few clinical covariates or a known prognostic index are already available

    The AccelerAge framework: a new statistical approach to predict biological age based on time-to-event data

    Get PDF
    Aging is a multifaceted and intricate physiological process characterized by a gradual decline in functional capacity, leading to increased susceptibility to diseases and mortality. While chronological age serves as a strong risk factor for age-related health conditions, considerable heterogeneity exists in the aging trajectories of individuals, suggesting that biological age may provide a more nuanced understanding of the aging process. However, the concept of biological age lacks a clear operationalization, leading to the development of various biological age predictors without a solid statistical foundation. This paper addresses these limitations by proposing a comprehensive operationalization of biological age, introducing the “AccelerAge” framework for predicting biological age, and introducing previously underutilized evaluation measures for assessing the performance of biological age predictors. The AccelerAge framework, based on Accelerated Failure Time (AFT) models, directly models the effect of candidate predictors of aging on an individual’s survival time, aligning with the prevalent metaphor of aging as a clock. We compare predictors based on the AccelerAge framework to a predictor based on the GrimAge predictor, which is considered one of the best-performing biological age predictors, using simulated data as well as data from the UK Biobank and the Leiden Longevity Study. Our approach seeks to establish a robust statistical foundation for biological age clocks, enabling a more accurate and interpretable assessment of an individual’s aging status.Molecular Epidemiolog

    Classes of Multiple Decision Functions Strongly Controlling FWER and FDR

    Full text link
    This paper provides two general classes of multiple decision functions where each member of the first class strongly controls the family-wise error rate (FWER), while each member of the second class strongly controls the false discovery rate (FDR). These classes offer the possibility that an optimal multiple decision function with respect to a pre-specified criterion, such as the missed discovery rate (MDR), could be found within these classes. Such multiple decision functions can be utilized in multiple testing, specifically, but not limited to, the analysis of high-dimensional microarray data sets.Comment: 19 page

    Transcriptional Profiling of Human Familial Longevity Indicates a Role for ASF1A and IL7R

    Get PDF
    The Leiden Longevity Study consists of families that express extended survival across generations, decreased morbidity in middle-age, and beneficial metabolic profiles. To identify which pathways drive this complex phenotype of familial longevity and healthy aging, we performed a genome-wide gene expression study within this cohort to screen for mRNAs whose expression changes with age and associates with longevity. We first compared gene expression profiles from whole blood samples between 50 nonagenarians and 50 middle-aged controls, resulting in identification of 2,953 probes that associated with age. Next, we determined which of these probes associated with longevity by comparing the offspring of the nonagenarians (50 subjects) and the middle-aged controls. The expression of 360 probes was found to change differentially with age in members of the long-lived families. In a RT-qPCR replication experiment utilizing 312 controls, 332 offspring and 79 nonagenarians, we confirmed a nonagenarian specific expression profile for 21 genes out of 25 tested. Since only some of the offspring will have inherited the beneficial longevity profile from their long-lived parents, the contrast between offspring and controls is expected to be weak. Despite this dilution of the longevity effects, reduced expression levels of two genes, ASF1A and IL7R, involved in maintenance of chromatin structure and the immune system, associated with familial longevity already in middle-age. The size of this association increased when controls were compared to a subfraction of the offspring that had the highest probability to age healthily and become long-lived according to beneficial metabolic parameters. In conclusion, an “aging-signature” formed of 21 genes was identified, of which reduced expression of ASF1A and IL7R marked familial longevity already in middle-age. This indicates that expression changes of genes involved in metabolism, epigenetic control and immune function occur as a function of age, and some of these, like ASF1A and IL7R, represent early features of familial longevity and healthy ageing

    Simulated Effects of Recruitment Variability, Exploitation, and Reduced Habitat Area on the Muskellunge Population in Shoepack Lake, Voyageurs National Park, Minnesota

    Get PDF
    The genetically unique population of muskellunge Esox masquinongy inhabiting Shoepack Lake in Voyageurs National Park, Minnesota, is potentially at risk for loss of genetic variability and long-term viability. Shoepack Lake has been subject to dramatic surface area changes from the construction of an outlet dam by beavers Castor canadensis and its subsequent failure. We simulated the long-term dynamics of this population in response to recruitment variation, increased exploitation, and reduced habitat area. We then estimated the effective population size of the simulated population and evaluated potential threats to long-term viability, based on which we recommend management actions to help preserve the long-term viability of the population. Simulations based on the population size and habitat area at the beginning of a companion study resulted in an effective population size that was generally above the threshold level for risk of loss of genetic variability, except when fishing mortality was increased. Simulations based on the reduced habitat area after the beaver dam failure and our assumption of a proportional reduction in population size resulted in an effective population size that was generally below the threshold level for risk of loss of genetic variability. Our results identified two potential threats to the long-term viability of the Shoepack Lake muskellunge population, reduction in habitat area and exploitation. Increased exploitation can be prevented through traditional fishery management approaches such as the adoption of no-kill, barbless hook, and limited entry regulations. Maintenance of the greatest possible habitat area and prevention of future habitat area reductions will require maintenance of the outlet dam built by beavers. Our study should enhance the long-term viability of the Shoepack Lake muskellunge population and illustrates a useful approach for other unique populations

    Investigating the effect of paralogs on microarray gene-set analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In order to interpret the results obtained from a microarray experiment, researchers often shift focus from analysis of individual differentially expressed genes to analyses of sets of genes. These gene-set analysis (GSA) methods use previously accumulated biological knowledge to group genes into sets and then aim to rank these gene sets in a way that reflects their relative importance in the experimental situation in question. We suspect that the presence of paralogs affects the ability of GSA methods to accurately identify the most important sets of genes for subsequent research.</p> <p>Results</p> <p>We show that paralogs, which typically have high sequence identity and similar molecular functions, also exhibit high correlation in their expression patterns. We investigate this correlation as a potential confounding factor common to current GSA methods using Indygene <url>http://www.cbio.uct.ac.za/indygene</url>, a web tool that reduces a supplied list of genes so that it includes no pairwise paralogy relationships above a specified sequence similarity threshold. We use the tool to reanalyse previously published microarray datasets and determine the potential utility of accounting for the presence of paralogs.</p> <p>Conclusions</p> <p>The Indygene tool efficiently removes paralogy relationships from a given dataset and we found that such a reduction, performed prior to GSA, has the ability to generate significantly different results that often represent novel and plausible biological hypotheses. This was demonstrated for three different GSA approaches when applied to the reanalysis of previously published microarray datasets and suggests that the redundancy and non-independence of paralogs is an important consideration when dealing with GSA methodologies.</p

    Echo planar imaging–induced errors in intracardiac 4D flow MRI quantification

    Get PDF
    Purpose To assess errors associated with EPI-accelerated intracardiac 4D flow MRI (4DEPI) with EPI factor 5, compared with non-EPI gradient echo (4DGRE). Methods Three 3T MRI experiments were performed comparing 4DEPI to 4DGRE: steady flow through straight tubes, pulsatile flow in a left-ventricle phantom, and intracardiac flow in 10 healthy volunteers. For each experiment, 4DEPI was repeated with readout and blip phase-encoding gradient in different orientations, parallel or perpendicular to the flow direction. In vitro flow rates were compared with timed volumetric collection. In the left-ventricle phantom and in vivo, voxel-based speed and spatio-temporal median speed were compared between sequences, as well as mitral and aortic transvalvular net forward volume. Results In steady-flow phantoms, the flow rate error was largest (12%) for high velocity (>2 m/s) with 4DEPI readout gradient parallel to the flow. Voxel-based speed and median speed in the left-ventricle phantom were ≤5.5% different between sequences. In vivo, mean net forward volume inconsistency was largest (6.4 ± 8.5%) for 4DEPI with nonblip phase-encoding gradient parallel to the main flow. The difference in median speed for 4DEPI versus 4DGRE was largest (9%) when the 4DEPI readout gradient was parallel to the flow. Conclusions Velocity and flow rate are inaccurate for 4DEPI with EPI factor 5 when flow is parallel to the readout or blip phase-encoding gradient. However, mean differences in flow rate, voxel-based speed, and spatio-temporal median speed were acceptable (≤10%) when comparing 4DEPI to 4DGRE for intracardiac flow in healthy volunteers

    Integrated analysis of DNA copy number and gene expression microarray data using gene sets

    Get PDF
    Background: Genes that play an important role in tumorigenesis are expected to show association between DNA copy number and RNA expression. Optimal power to find such associations can only be achieved if analysing copy number and gene expression jointly. Furthermore, some copy number changes extend over larger chromosomal regions affecting the expression levels of multiple resident genes.

    Gene set analysis exploiting the topology of a pathway

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recently, a great effort in microarray data analysis is directed towards the study of the so-called gene sets. A gene set is defined by genes that are, somehow, functionally related. For example, genes appearing in a known biological pathway naturally define a gene set. The gene sets are usually identified from a priori biological knowledge. Nowadays, many bioinformatics resources store such kind of knowledge (see, for example, the Kyoto Encyclopedia of Genes and Genomes, among others). Although pathways maps carry important information about the structure of correlation among genes that should not be neglected, the currently available multivariate methods for gene set analysis do not fully exploit it.</p> <p>Results</p> <p>We propose a novel gene set analysis specifically designed for gene sets defined by pathways. Such analysis, based on graphical models, explicitly incorporates the dependence structure among genes highlighted by the topology of pathways. The analysis is designed to be used for overall surveillance of changes in a pathway in different experimental conditions. In fact, under different circumstances, not only the expression of the genes in a pathway, but also the strength of their relations may change. The methods resulting from the proposal allow both to test for variations in the strength of the links, and to properly account for heteroschedasticity in the usual tests for differential expression.</p> <p>Conclusions</p> <p>The use of graphical models allows a deeper look at the components of the pathway that can be tested separately and compared marginally. In this way it is possible to test single components of the pathway and highlight only those involved in its deregulation.</p
    corecore