392 research outputs found

    A general approach to simultaneous model fitting and variable elimination in response models for biological data with many more variables than observations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With the advent of high throughput biotechnology data acquisition platforms such as micro arrays, SNP chips and mass spectrometers, data sets with many more variables than observations are now routinely being collected. Finding relationships between response variables of interest and variables in such data sets is an important problem akin to finding needles in a haystack. Whilst methods for a number of response types have been developed a general approach has been lacking.</p> <p>Results</p> <p>The major contribution of this paper is to present a unified methodology which allows many common (statistical) response models to be fitted to such data sets. The class of models includes virtually any model with a linear predictor in it, for example (but not limited to), multiclass logistic regression (classification), generalised linear models (regression) and survival models. A fast algorithm for finding sparse well fitting models is presented. The ideas are illustrated on real data sets with numbers of variables ranging from thousands to millions. R code implementing the ideas is available for download.</p> <p>Conclusion</p> <p>The method described in this paper enables existing work on response models when there are less variables than observations to be leveraged to the situation when there are many more variables than observations. It is a powerful approach to finding parsimonious models for such datasets. The method is capable of handling problems with millions of variables and a large variety of response types within the one framework. The method compares favourably to existing methods such as support vector machines and random forests, but has the advantage of not requiring separate variable selection steps. It is also works for data types which these methods were not designed to handle. The method usually produces very sparse models which make biological interpretation simpler and more focused.</p

    Methicillin-Resistant Staphylococcus aureus Infection and Hospitalization in High-Risk Patients in the Year following Detection

    Get PDF
    Many studies have evaluated methicillin-resistant Staphylococcus aureus (MRSA) infections during single hospitalizations and subsequent readmissions to the same institution. None have assessed the comprehensive burden of MRSA infection in the period after hospital discharge while accounting for healthcare utilization across institutions.We conducted a retrospective cohort study of adult patients insured by Harvard Pilgrim Health Care who were newly-detected to harbor MRSA between January 1991 and December 2003 at a tertiary care medical center. We evaluated all MRSA-attributable infections associated with hospitalization in the year following new detection, regardless of hospital location. Data were collected on comorbidities, healthcare utilization, mortality and MRSA outcomes. Of 591 newly-detected MRSA carriers, 23% were colonized and 77% were infected upon detection. In the year following detection, 196 (33%) patients developed 317 discrete and unrelated MRSA infections. The most common infections were pneumonia (34%), soft tissue (27%), and primary bloodstream (18%) infections. Infections occurred a median of 56 days post-detection. Of all infections, 26% involved bacteremia, and 17% caused MRSA-attributable death. During the admission where MRSA was newly-detected, 14% (82/576) developed subsequent infection. Of those surviving to discharge, 24% (114/482) developed post-discharge infections in the year following detection. Half (99/185, 54%) of post-discharge infections caused readmission, and most (104/185, 55%) occurred over 90 days post-discharge.In high-risk tertiary care patients, newly-detected MRSA carriage confers large risks of infection and substantial attributable mortality in the year following acquisition. Most infections occur post-discharge, and 18% of infections associated with readmission occurred in hospitals other than the one where MRSA was newly-detected. Despite gains in reducing MRSA infections during hospitalization, the risk of MRSA infection among critically and chronically ill carriers persists after discharge and warrants targeted prevention strategies

    Genotype and functional correlates of disease phenotype in deficiency of adenosine deaminase 2 (DADA2)

    Get PDF
    BACKGROUND Deficiency of adenosine deaminase 2 (DADA2) is a syndrome with pleiotropic manifestations including vasculitis and hematologic compromise. A systematic definition of the relationship between ADA2 mutations and clinical phenotype remains unavailable. OBJECTIVE We tested whether the impact of ADA2 mutations on enzyme function correlates with clinical presentation. METHODS DADA2 patients with severe hematologic manifestations were compared with vasculitis-predominant patients. Enzymatic activity was assessed using expression constructs reflecting all 53 missense, nonsense, insertion and deletion genotypes from 152 patients across the DADA2 spectrum. RESULTS We identified DADA2 patients presenting with pure red cell aplasia (PRCA, n = 5) or bone marrow failure syndrome (BMF, n = 10). Most patients did not exhibit features of vasculitis. Recurrent infection, hepatosplenomegaly and gingivitis were common in patients with BMF, of whom half died from infection. Unlike DADA2 patients with vasculitis, patients with PRCA and BMF proved largely refractory to tumor necrosis factor inhibitors. ADA2 variants associated with vasculitis predominantly reflected missense mutations with at least 3% residual enzymatic activity. By contrast, PRCA and BMF were associated with missense mutations with minimal residual enzyme activity, nonsense variants, and insertions / deletions resulting in complete loss of function. CONCLUSION Functional interrogation of ADA2 mutations reveals an association of subtotal function loss with vasculitis, typically responsive to TNF blockade, whereas more extensive loss is observed in hematologic disease which may be refractory to treatment. These findings establish a genotype-phenotype spectrum in DADA2

    Expanded syringe exchange programs and reduced HIV infection among new injection drug users in Tallinn, Estonia

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Estonia has experienced an HIV epidemic among intravenous drug users (IDUs) with the highest per capita HIV prevalence in Eastern Europe. We assessed the effects of expanded syringe exchange programs (SEP) in the capital city, Tallinn, which has an estimated 10,000 IDUs.</p> <p>Methods</p> <p>SEP implementation was monitored with data from the Estonian National Institute for Health Development. Respondent driven sampling (RDS) interview surveys with HIV testing were conducted in Tallinn in 2005, 2007 and 2009 (involving 350, 350 and 327 IDUs respectively). HIV incidence among new injectors (those injecting for < = 3 years) was estimated by assuming (1) new injectors were HIV seronegative when they began injecting, and (2) HIV infection occurred at the midpoint between first injection and time of interview.</p> <p>Results</p> <p>SEP increased from 230,000 syringes exchanged in 2005 to 440,000 in 2007 and 770,000 in 2009. In all three surveys, IDUs were predominantly male (80%), ethnic Russians (>80%), and young adults (mean ages 24 to 27 years). The proportion of new injectors decreased significantly over the years (from 21% in 2005 to 12% in 2009, p = 0.005). HIV prevalence among all respondents stabilized at slightly over 50% (54% in 2005, 55% in 2007, 51% in 2009), and decreased among new injectors (34% in 2005, 16% in 2009, p = 0.046). Estimated HIV incidence among new injectors decreased significantly from 18/100 person-years in 2005 and 21/100 person-years in 2007 to 9/100 person-years in 2009 (p = 0.026).</p> <p>Conclusions</p> <p>In Estonia, a transitional country, a decrease in the HIV prevalence among new injectors and in the numbers of people initiating injection drug use coincided with implementation of large-scale SEPs. Further reductions in HIV transmission among IDUs are still required. Provision of 70 or more syringes per IDU per year may be needed before significant reductions in HIV incidence occur.</p

    Support Vector Machine Implementations for Classification & Clustering

    Get PDF
    BACKGROUND: We describe Support Vector Machine (SVM) applications to classification and clustering of channel current data. SVMs are variational-calculus based methods that are constrained to have structural risk minimization (SRM), i.e., they provide noise tolerant solutions for pattern recognition. The SVM approach encapsulates a significant amount of model-fitting information in the choice of its kernel. In work thus far, novel, information-theoretic, kernels have been successfully employed for notably better performance over standard kernels. Currently there are two approaches for implementing multiclass SVMs. One is called external multi-class that arranges several binary classifiers as a decision tree such that they perform a single-class decision making function, with each leaf corresponding to a unique class. The second approach, namely internal-multiclass, involves solving a single optimization problem corresponding to the entire data set (with multiple hyperplanes). RESULTS: Each SVM approach encapsulates a significant amount of model-fitting information in its choice of kernel. In work thus far, novel, information-theoretic, kernels were successfully employed for notably better performance over standard kernels. Two SVM approaches to multiclass discrimination are described: (1) internal multiclass (with a single optimization), and (2) external multiclass (using an optimized decision tree). We describe benefits of the internal-SVM approach, along with further refinements to the internal-multiclass SVM algorithms that offer significant improvement in training time without sacrificing accuracy. In situations where the data isn't clearly separable, making for poor discrimination, signal clustering is used to provide robust and useful information – to this end, novel, SVM-based clustering methods are also described. As with the classification, there are Internal and External SVM Clustering algorithms, both of which are briefly described

    Adoption of an “Open” Envelope Conformation Facilitating CD4 Binding and Structural Remodeling Precedes Coreceptor Switch in R5 SHIV-Infected Macaques

    Get PDF
    A change in coreceptor preference from CCR5 to CXCR4 towards the end stage disease in some HIV-1 infected individuals has been well documented, but the reasons and mechanisms for this tropism switch remain elusive. It has been suggested that envelope structural constraints in accommodating amino acid changes required for CXCR4 usage is an obstacle to tropism switch, limiting the rate and pathways available for HIV-1 coreceptor switching. The present study was initiated in two R5 SHIVSF162P3N-infected rapid progressor macaques with coreceptor switch to test the hypothesis that an early step in the evolution of tropism switch is the adoption of a less constrained and more “open” envelope conformation for better CD4 usage, allowing greater structural flexibility to accommodate further mutational changes that confer CXCR4 utilization. We show that, prior to the time of coreceptor switch, R5 viruses in both macaques evolved to become increasingly sCD4-sensitive, suggestive of enhanced exposure of the CD4 binding site and an “open” envelope conformation, and this correlated with better gp120 binding to CD4 and with more efficient infection of CD4low cells such as primary macrophages. Moreover, significant changes in neutralization sensitivity to agents and antibodies directed against functional domains of gp120 and gp41 were seen for R5 viruses close to the time of X4 emergence, consistent with global changes in envelope configuration and structural plasticity. These observations in a simian model of R5-to-X4 evolution provide a mechanistic basis for the HIV-1 coreceptor switch

    Using Expression and Genotype to Predict Drug Response in Yeast

    Get PDF
    Personalized, or genomic, medicine entails tailoring pharmacological therapies according to individual genetic variation at genomic loci encoding proteins in drug-response pathways. It has been previously shown that steady-state mRNA expression can be used to predict the drug response (i.e., sensitivity or resistance) of non-genotyped mammalian cancer cell lines to chemotherapeutic agents. In a real-world setting, clinicians would have access to both steady-state expression levels of patient tissue(s) and a patient's genotypic profile, and yet the predictive power of transcripts versus markers is not well understood. We have previously shown that a collection of genotyped and expression-profiled yeast strains can provide a model for personalized medicine. Here we compare the predictive power of 6,229 steady-state mRNA transcript levels and 2,894 genotyped markers using a pattern recognition algorithm. We were able to predict with over 70% accuracy the drug sensitivity of 104 individual genotyped yeast strains derived from a cross between a laboratory strain and a wild isolate. We observe that, independently of drug mechanism of action, both transcripts and markers can accurately predict drug response. Marker-based prediction is usually more accurate than transcript-based prediction, likely reflecting the genetic determination of gene expression in this cross

    Sensing coral reef connectivity pathways from space

    Get PDF
    Coral reefs rely on inter-habitat connectivity to maintain gene flow, biodiversity and ecosystem resilience. Coral reef communities of the Red Sea exhibit remarkable genetic homogeneity across most of the Arabian Peninsula coastline, with a genetic break towards the southern part of the basin. While previous studies have attributed these patterns to environmental heterogeneity, we hypothesize that they may also emerge as a result of dynamic circulation flow; yet, such linkages remain undemonstrated. Here, we integrate satellite-derived biophysical observations, particle dispersion model simulations, genetic population data and ship-borne in situ profiles to assess reef connectivity in the Red Sea. We simulated long-term (>20 yrs.) connectivity patterns driven by remotely-sensed sea surface height and evaluated results against estimates of genetic distance among populations of anemonefish, Amphiprion bicinctus, along the eastern Red Sea coastline. Predicted connectivity was remarkably consistent with genetic population data, demonstrating that circulation features (eddies, surface currents) formulate physical pathways for gene flow. The southern basin has lower physical connectivity than elsewhere, agreeing with known genetic structure of coral reef organisms. The central Red Sea provides key source regions, meriting conservation priority. Our analysis demonstrates a cost-effective tool to estimate biophysical connectivity remotely, supporting coastal management in data-limited regions
    corecore