26,307 research outputs found

    Robust variable screening for regression using factor profiling

    Full text link
    Sure Independence Screening is a fast procedure for variable selection in ultra-high dimensional regression analysis. Unfortunately, its performance greatly deteriorates with increasing dependence among the predictors. To solve this issue, Factor Profiled Sure Independence Screening (FPSIS) models the correlation structure of the predictor variables, assuming that it can be represented by a few latent factors. The correlations can then be profiled out by projecting the data onto the orthogonal complement of the subspace spanned by these factors. However, neither of these methods can handle the presence of outliers in the data. Therefore, we propose a robust screening method which uses a least trimmed squares method to estimate the latent factors and the factor profiled variables. Variable screening is then performed on factor profiled variables by using regression MM-estimators. Different types of outliers in this model and their roles in variable screening are studied. Both simulation studies and a real data analysis show that the proposed robust procedure has good performance on clean data and outperforms the two nonrobust methods on contaminated data

    Sparse Proteomics Analysis - A compressed sensing-based approach for feature selection and classification of high-dimensional proteomics mass spectrometry data

    Get PDF
    Background: High-throughput proteomics techniques, such as mass spectrometry (MS)-based approaches, produce very high-dimensional data-sets. In a clinical setting one is often interested in how mass spectra differ between patients of different classes, for example spectra from healthy patients vs. spectra from patients having a particular disease. Machine learning algorithms are needed to (a) identify these discriminating features and (b) classify unknown spectra based on this feature set. Since the acquired data is usually noisy, the algorithms should be robust against noise and outliers, while the identified feature set should be as small as possible. Results: We present a new algorithm, Sparse Proteomics Analysis (SPA), based on the theory of compressed sensing that allows us to identify a minimal discriminating set of features from mass spectrometry data-sets. We show (1) how our method performs on artificial and real-world data-sets, (2) that its performance is competitive with standard (and widely used) algorithms for analyzing proteomics data, and (3) that it is robust against random and systematic noise. We further demonstrate the applicability of our algorithm to two previously published clinical data-sets

    The genotype-phenotype relationship in multicellular pattern-generating models - the neglected role of pattern descriptors

    Get PDF
    Background: A deep understanding of what causes the phenotypic variation arising from biological patterning processes, cannot be claimed before we are able to recreate this variation by mathematical models capable of generating genotype-phenotype maps in a causally cohesive way. However, the concept of pattern in a multicellular context implies that what matters is not the state of every single cell, but certain emergent qualities of the total cell aggregate. Thus, in order to set up a genotype-phenotype map in such a spatiotemporal pattern setting one is actually forced to establish new pattern descriptors and derive their relations to parameters of the original model. A pattern descriptor is a variable that describes and quantifies a certain qualitative feature of the pattern, for example the degree to which certain macroscopic structures are present. There is today no general procedure for how to relate a set of patterns and their characteristic features to the functional relationships, parameter values and initial values of an original pattern-generating model. Here we present a new, generic approach for explorative analysis of complex patterning models which focuses on the essential pattern features and their relations to the model parameters. The approach is illustrated on an existing model for Delta-Notch lateral inhibition over a two-dimensional lattice. Results: By combining computer simulations according to a succession of statistical experimental designs, computer graphics, automatic image analysis, human sensory descriptive analysis and multivariate data modelling, we derive a pattern descriptor model of those macroscopic, emergent aspects of the patterns that we consider of interest. The pattern descriptor model relates the values of the new, dedicated pattern descriptors to the parameter values of the original model, for example by predicting the parameter values leading to particular patterns, and provides insights that would have been hard to obtain by traditional methods. Conclusion: The results suggest that our approach may qualify as a general procedure for how to discover and relate relevant features and characteristics of emergent patterns to the functional relationships, parameter values and initial values of an underlying pattern-generating mathematical model

    Marginal empirical likelihood and sure independence feature screening

    Full text link
    We study a marginal empirical likelihood approach in scenarios when the number of variables grows exponentially with the sample size. The marginal empirical likelihood ratios as functions of the parameters of interest are systematically examined, and we find that the marginal empirical likelihood ratio evaluated at zero can be used to differentiate whether an explanatory variable is contributing to a response variable or not. Based on this finding, we propose a unified feature screening procedure for linear models and the generalized linear models. Different from most existing feature screening approaches that rely on the magnitudes of some marginal estimators to identify true signals, the proposed screening approach is capable of further incorporating the level of uncertainties of such estimators. Such a merit inherits the self-studentization property of the empirical likelihood approach, and extends the insights of existing feature screening methods. Moreover, we show that our screening approach is less restrictive to distributional assumptions, and can be conveniently adapted to be applied in a broad range of scenarios such as models specified using general moment conditions. Our theoretical results and extensive numerical examples by simulations and data analysis demonstrate the merits of the marginal empirical likelihood approach.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1139 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Protein profiling in hepatocellular carcinoma by label-free quantitative proteomics in two west african populations.

    Get PDF
    Background Hepatocellular Carcinoma is the third most common cause of cancer related death worldwide, often diagnosed by measuring serum AFP; a poor performance stand-alone biomarker. With the aim of improving on this, our study focuses on plasma proteins identified by Mass Spectrometry in order to investigate and validate differences seen in the respective proteomes of controls and subjects with LC and HCC. Methods Mass Spectrometry analysis using liquid chromatography electro spray ionization quadrupole time-of-flight was conducted on 339 subjects using a pooled expression profiling approach. ELISA assays were performed on four significantly differentially expressed proteins to validate their expression profiles in subjects from the Gambia and a pilot group from Nigeria. Results from this were collated for statistical multiplexing using logistic regression analysis. Results Twenty-six proteins were identified as differentially expressed between the three subject groups. Direct measurements of four; hemopexin, alpha-1-antitrypsin, apolipoprotein A1 and complement component 3 confirmed their change in abundance in LC and HCC versus control patients. These trends were independently replicated in the pilot validation subjects from Nigeria. The statistical multiplexing of these proteins demonstrated performance comparable to or greater than ALT in identifying liver cirrhosis or carcinogenesis. This exercise also proposed preliminary cut offs with achievable sensitivity, specificity and AUC statistics greater than reported AFP averages. Conclusions The validated changes of expression in these proteins have the potential for development into high-performance tests usable in the diagnosis and or monitoring of HCC and LC patients. The identification of sustained expression trends strengthens the suggestion of these four proteins as worthy candidates for further investigation in the context of liver disease. The statistical combinations also provide a novel inroad of analyses able to propose definitive cut-offs and combinations for evaluation of performance

    Second trimester inflammatory and metabolic markers in women delivering preterm with and without preeclampsia.

    Get PDF
    ObjectiveInflammatory and metabolic pathways are implicated in preterm birth and preeclampsia. However, studies rarely compare second trimester inflammatory and metabolic markers between women who deliver preterm with and without preeclampsia.Study designA sample of 129 women (43 with preeclampsia) with preterm delivery was obtained from an existing population-based birth cohort. Banked second trimester serum samples were assayed for 267 inflammatory and metabolic markers. Backwards-stepwise logistic regression models were used to calculate odds ratios.ResultsHigher 5-α-pregnan-3β,20α-diol disulfate, and lower 1-linoleoylglycerophosphoethanolamine and octadecanedioate, predicted increased odds of preeclampsia.ConclusionsAmong women with preterm births, those who developed preeclampsia differed with respect metabolic markers. These findings point to potential etiologic underpinnings for preeclampsia as a precursor to preterm birth

    Stops and Stares: Street Stops, Surveillance, and Race in the New Policing

    Get PDF
    The use of proactive tactics to disrupt criminal activities, such as Terry street stops and concentrated misdemeanor arrests, are essential to the “new policing.” This model applies complex metrics, strong management, and aggressive enforcement and surveillance to focus policing on high crime risk persons and places. The tactics endemic to the “new policing” gave rise in the 1990s to popular, legal, political and social science concerns about disparate treatment of minority groups in their everyday encounters with law enforcement. Empirical evidence showed that minorities were indeed stopped and arrested more frequently than similarly situated whites, even when controlling for local social and crime conditions. In this article, we examine racial disparities under a unique configuration of the street stop prong of the “new policing” – the inclusion of non-contact observations (or surveillances) in the field interrogation (or investigative stop) activity of Boston Police Department officers. We show that Boston Police officers focus significant portions of their field investigation activity in two areas: suspected and actual gang members, and the city’s high crime areas. Minority neighborhoods experience higher levels of field interrogation and surveillance activity net of crime and other social factors. Relative to white suspects, Black suspects are more likely to be observed, interrogated, and frisked or searched controlling for gang membership and prior arrest history. Moreover, relative to their black counterparts, white police officers conduct high numbers of field investigations and are more likely to frisk/search subjects of all races. We distinguish between preference-based and statistical discrimination by comparing stops by officer-suspect racial pairs. If officer activity is independent of officer race, we would infer that disproportionate stops of minorities reflect statistical discrimination. We show instead that officers seem more likely to investigate and frisk or search a minority suspect if officer and suspect race differ. We locate these results in the broader tensions of racial profiling that pose recurring social and constitutional concerns in the “new policing.”

    Circulating micrornas associated with glycemic impairment and progression in Asian Indians.

    Get PDF
    Aims/hypothesisAsian Indians have a high incidence of type 2 diabetes, but factors associated with glycemic progression in this population are not understood. MicroRNAs are emerging as important mediators of glucose homeostasis and have not been previously studied in Asian Indians. We examined microRNA (miR) expression associated with glycemic impairment and progression in Asian Indians from the San Francisco Bay Area. We studied 128 Asian Indians age 45-84 years without known cardiovascular disease and not taking diabetes medications. Oral glucose tolerance tests were performed at baseline and after 2.5 years. We quantified circulating miRs from plasma collected during the enrollment visit using a flow cytometry-based assay.ResultsGlycemic impairment was present in 57 % (n = 73) at baseline. MiR-191 was positively associated with glycemic impairment (odds ratio (OR) 1.7 (95 % CI 1.2, 2.4), p < 0.01). The prevalence of glycemic progression after 2.5 years was 24 % (n = 23). Six miRs were negatively associated with glycemic progression: miR-122 (OR 0.5 (0.2, 0.8), p < 0.01), miR-15a (OR 0.6 (0.4, 0.9), p < 0.01), miR-197 (OR 0.6 (0.4, 0.9), p < 0.01), miR-320a (OR 0.6 (0.4, 0.9), p < 0.01), miR-423 (OR 0.6 (0.4, 0.9), p < 0.01), and miR-486 (OR 0.5 (0.3, 0.8), p < 0.01). Further multivariate adjustment did not attenuate these results.Conclusions/interpretationThis is the first study to investigate circulating miRs associated with glycemic status among this high-risk ethnic group. Individual miRs were significantly associated with both glycemic impairment and glycemic progression. Further studies are needed to determine whether miR (s) might be useful clinical biomarkers for incident T2D in the Asian Indian population
    corecore