69 research outputs found

    Refining gene signatures: a Bayesian approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In high density arrays, the identification of relevant genes for disease classification is complicated by not only the curse of dimensionality but also the highly correlated nature of the array data. In this paper, we are interested in the question of how many and which genes should be selected for a disease class prediction. Our work consists of a Bayesian supervised statistical learning approach to refine gene signatures with a regularization which penalizes for the correlation between the variables selected.</p> <p>Results</p> <p>Our simulation results show that we can most often recover the correct subset of genes that predict the class as compared to other methods, even when accuracy and subset size remain the same. On real microarray datasets, we show that our approach can refine gene signatures to obtain either the same or better predictive performance than other existing methods with a smaller number of genes.</p> <p>Conclusions</p> <p>Our novel Bayesian approach includes a prior which penalizes highly correlated features in model selection and is able to extract key genes in the highly correlated context of microarray data. The methodology in the paper is described in the context of microarray data, but can be applied to any array data (such as micro RNA, for example) as a first step towards predictive modeling of cancer pathways. A user-friendly software implementation of the method is available.</p

    On normality, ethnicity, and missing values in quantitative trait locus mapping

    Get PDF
    BACKGROUND: This paper deals with the detection of significant linkage for quantitative traits using a variance components approach. Microsatellite markers were obtained for the Genetic Analysis Workshop 14 Collaborative Study on the Genetics of Alcoholism data. Ethnic heterogeneity, highly skewed quantitative measures, and a high rate of missing values are all present in this dataset and well known to impact upon linkage analysis. This makes it a good candidate for investigation. RESULTS: As expected, we observed a number of changes in LOD scores, especially for chromosomes 1, 7, and 18, along with the three factors studied. A dramatic example of such changes can be found in chromosome 7. Highly significant linkage to one of the quantitative traits became insignificant when a proper normalizing transformation of the trait was used and when analysis was carried out on an ethnically homogeneous subset of the original pedigrees. CONCLUSION: In agreement with existing literature, transforming a trait to ensure normality using a Box-Cox transformation is highly recommended in order to avoid false-positive linkages. Furthermore, pedigrees should be sorted by ethnic groups and analyses should be carried out separately. Finally, one should be aware that the inclusion of covariates with a high rate of missing values reduces considerably the number of subjects included in the model. In such a case, the loss in power may be large. Imputation methods are then recommended

    DM-PhyClus: A Bayesian phylogenetic algorithm for infectious disease transmission cluster inference

    Full text link
    Background. Conventional phylogenetic clustering approaches rely on arbitrary cutpoints applied a posteriori to phylogenetic estimates. Although in practice, Bayesian and bootstrap-based clustering tend to lead to similar estimates, they often produce conflicting measures of confidence in clusters. The current study proposes a new Bayesian phylogenetic clustering algorithm, which we refer to as DM-PhyClus, that identifies sets of sequences resulting from quick transmission chains, thus yielding easily-interpretable clusters, without using any ad hoc distance or confidence requirement. Results. Simulations reveal that DM-PhyClus can outperform conventional clustering methods, as well as the Gap procedure, a pure distance-based algorithm, in terms of mean cluster recovery. We apply DM-PhyClus to a sample of real HIV-1 sequences, producing a set of clusters whose inference is in line with the conclusions of a previous thorough analysis. Conclusions. DM-PhyClus, by eliminating the need for cutpoints and producing sensible inference for cluster configurations, can facilitate transmission cluster detection. Future efforts to reduce incidence of infectious diseases, like HIV-1, will need reliable estimates of transmission clusters. It follows that algorithms like DM-PhyClus could serve to better inform public health strategies

    Analysis of Case-Parent Trios Using a Loglinear Model with Adjustment for Transmission Ratio Distortion

    Get PDF
    Transmission of the two parental alleles to offspring deviating from the Mendelian ratio is termed Transmission Ratio Distortion (TRD), occurs throughout gametic and embryonic development. TRD has been well-studied in animals, but remains largely unknown in humans. The Transmission Disequilibrium Test (TDT) was first proposed to test for association and linkage in case-trios (affected offspring and parents); adjusting for TRD using control-trios was recommended. However, the TDT does not provide risk parameter estimates for different genetic models. A loglinear model was later proposed to provide child and maternal relative risk (RR) estimates of disease, assuming Mendelian transmission. Results from our simulation study showed that case-trios RR estimates using this model are biased in the presence of TRD; power and Type 1 error are compromised. We propose an extended loglinear model adjusting for TRD. Under this extended model, RR estimates, power and Type 1 error are correctly restored. We applied this model to an intrauterine growth restriction dataset, and showed consistent results with a previous approach that adjusted for TRD using control-trios. Our findings suggested the need to adjust for TRD in avoiding spurious results. Documenting TRD in the population is therefore essential for the correct interpretation of genetic association studies

    Decreased expression of nociceptin/orphanin FQ in the dorsal anterior cingulate cortex of suicides

    Get PDF
    International audienceThe nociceptin/orphanin FQ (N/OFQ)-Nociceptin Opiod-like Peptide (NOP) receptor system is a critical mediator of physiological and pathological processes involved in emotional regulation and drug addiction. As such, this system may be an important biological substrate underlying psychiatric conditions that contribute to the risk of suicide. Thus, the goal of the present study was to characterize changes in human N/OFQ and NOP signaling as a function of depression, addiction and suicide. We quantified the expression of N/OFQ and NOP by RT-PCR in the anterior insula, the mediodorsal thalamus, and the dorsal anterior cingulate cortex (dACC) from a large sample of individuals who died by suicide and matched psychiatrically-healthy controls. Suicides displayed an 18% decrease in the expression of N/OFQ in the dACC that was not accounted for by current depressive or substance use disorders at the time of death. Therefore, our results suggest that dysregulation of the N/OFQ-NOP system may contribute to the neurobiology of suicide, a hypothesis that warrants further exploration

    Principal Components of Heritability for High Dimension Quantitative Traits and General Pedigrees

    Get PDF
    For many complex disorders, genetically relevant disease definition is still unclear. For this reason, researchers tend to collect large numbers of items related directly or indirectly to the disease diagnostic. Since the measured traits may not be all influenced by genetic factors, researchers are faced with the problem of choosing which traits or combinations of traits to consider in linkage analysis. To combine items, one can subject the data to a principal component analysis. However, when family date are collected, principal component analysis does not take family structure into account. In order to deal with these issues, Ott & Rabinowitz (1999) introduced the principal components of heritability (PCH), which capture the familial information across traits by calculating linear combinations of traits that maximize heritability. The calculation of the PCHs is based on the estimation of the genetic and the environmental components of variance. In the genetic context, the standard estimators of the variance components are Lange's maximum\ud likelihood estimators, which require complex numerical calculations. The objectives of this paper are the following: i) to review some standard strategies available in the literature to estimate variance components for unbalanced data in mixed models; ii) to propose an ANOVA method for a genetic random effect model to estimate the variance components, which can be applied to general pedigrees and high dimensional family data within the PCH framework; iii) to elucidate the connection between PCH analysis and Linear Discriminant Analysis. We use computer simulations to show that the proposed method has similar asymptotic properties as Lange's method when the number of traits is small, and we study the efficiency of our method when the number of traits is large. A data analysis involving schizophrenia and bipolar quantitative traits is finally presented to illustrate the PCH methodology

    Measuring and visualizing space–time congestion patterns in an urban road network using large-scale smartphone-collected GPS data

    Get PDF
    Congestion is a dynamic phenomenon with elements of space and time, making it a promising application of probe vehicles. The purpose of this paper is to measure and visualize the magnitude and variability of congestion on the network scale using smartphone GPS travel data. The sample of data collected in Quebec City contained over 4000 drivers and 21,000 trips. The congestion index (CI) was calculated at the link level for each hour of the peak period and congestion was visualized at aggregate and disaggregate levels. Results showed that each peak period can be viewed as having an onset period and dissipation period lasting one hour. Congestion in the evening is greater and more dispersed than in the morning. Motorways, arterials, and collectors contribute most to peak period congestion, while residential links contribute little. Further analysis of the CI data is required for practical implementation in network planning or congestion remediation
    corecore