26,307 research outputs found
Robust variable screening for regression using factor profiling
Sure Independence Screening is a fast procedure for variable selection in
ultra-high dimensional regression analysis. Unfortunately, its performance
greatly deteriorates with increasing dependence among the predictors. To solve
this issue, Factor Profiled Sure Independence Screening (FPSIS) models the
correlation structure of the predictor variables, assuming that it can be
represented by a few latent factors. The correlations can then be profiled out
by projecting the data onto the orthogonal complement of the subspace spanned
by these factors. However, neither of these methods can handle the presence of
outliers in the data. Therefore, we propose a robust screening method which
uses a least trimmed squares method to estimate the latent factors and the
factor profiled variables. Variable screening is then performed on factor
profiled variables by using regression MM-estimators. Different types of
outliers in this model and their roles in variable screening are studied. Both
simulation studies and a real data analysis show that the proposed robust
procedure has good performance on clean data and outperforms the two nonrobust
methods on contaminated data
Sparse Proteomics Analysis - A compressed sensing-based approach for feature selection and classification of high-dimensional proteomics mass spectrometry data
Background: High-throughput proteomics techniques, such as mass spectrometry
(MS)-based approaches, produce very high-dimensional data-sets. In a clinical
setting one is often interested in how mass spectra differ between patients of
different classes, for example spectra from healthy patients vs. spectra from
patients having a particular disease. Machine learning algorithms are needed to
(a) identify these discriminating features and (b) classify unknown spectra
based on this feature set. Since the acquired data is usually noisy, the
algorithms should be robust against noise and outliers, while the identified
feature set should be as small as possible.
Results: We present a new algorithm, Sparse Proteomics Analysis (SPA), based
on the theory of compressed sensing that allows us to identify a minimal
discriminating set of features from mass spectrometry data-sets. We show (1)
how our method performs on artificial and real-world data-sets, (2) that its
performance is competitive with standard (and widely used) algorithms for
analyzing proteomics data, and (3) that it is robust against random and
systematic noise. We further demonstrate the applicability of our algorithm to
two previously published clinical data-sets
The genotype-phenotype relationship in multicellular pattern-generating models - the neglected role of pattern descriptors
Background: A deep understanding of what causes the phenotypic variation arising from biological patterning
processes, cannot be claimed before we are able to recreate this variation by mathematical models capable of
generating genotype-phenotype maps in a causally cohesive way. However, the concept of pattern in a
multicellular context implies that what matters is not the state of every single cell, but certain emergent qualities
of the total cell aggregate. Thus, in order to set up a genotype-phenotype map in such a spatiotemporal pattern
setting one is actually forced to establish new pattern descriptors and derive their relations to parameters of the
original model. A pattern descriptor is a variable that describes and quantifies a certain qualitative feature of the
pattern, for example the degree to which certain macroscopic structures are present. There is today no general
procedure for how to relate a set of patterns and their characteristic features to the functional relationships,
parameter values and initial values of an original pattern-generating model. Here we present a new, generic
approach for explorative analysis of complex patterning models which focuses on the essential pattern features
and their relations to the model parameters. The approach is illustrated on an existing model for Delta-Notch
lateral inhibition over a two-dimensional lattice.
Results: By combining computer simulations according to a succession of statistical experimental designs,
computer graphics, automatic image analysis, human sensory descriptive analysis and multivariate data modelling,
we derive a pattern descriptor model of those macroscopic, emergent aspects of the patterns that we consider
of interest. The pattern descriptor model relates the values of the new, dedicated pattern descriptors to the
parameter values of the original model, for example by predicting the parameter values leading to particular
patterns, and provides insights that would have been hard to obtain by traditional methods.
Conclusion: The results suggest that our approach may qualify as a general procedure for how to discover and
relate relevant features and characteristics of emergent patterns to the functional relationships, parameter values
and initial values of an underlying pattern-generating mathematical model
Marginal empirical likelihood and sure independence feature screening
We study a marginal empirical likelihood approach in scenarios when the
number of variables grows exponentially with the sample size. The marginal
empirical likelihood ratios as functions of the parameters of interest are
systematically examined, and we find that the marginal empirical likelihood
ratio evaluated at zero can be used to differentiate whether an explanatory
variable is contributing to a response variable or not. Based on this finding,
we propose a unified feature screening procedure for linear models and the
generalized linear models. Different from most existing feature screening
approaches that rely on the magnitudes of some marginal estimators to identify
true signals, the proposed screening approach is capable of further
incorporating the level of uncertainties of such estimators. Such a merit
inherits the self-studentization property of the empirical likelihood approach,
and extends the insights of existing feature screening methods. Moreover, we
show that our screening approach is less restrictive to distributional
assumptions, and can be conveniently adapted to be applied in a broad range of
scenarios such as models specified using general moment conditions. Our
theoretical results and extensive numerical examples by simulations and data
analysis demonstrate the merits of the marginal empirical likelihood approach.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1139 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Protein profiling in hepatocellular carcinoma by label-free quantitative proteomics in two west african populations.
Background Hepatocellular Carcinoma is the third most common cause of cancer related death worldwide, often diagnosed by measuring serum AFP; a poor performance stand-alone biomarker. With the aim of improving on this, our study focuses on plasma proteins identified by Mass Spectrometry in order to investigate and validate differences seen in the respective proteomes of controls and subjects with LC and HCC. Methods Mass Spectrometry analysis using liquid chromatography electro spray ionization quadrupole time-of-flight was conducted on 339 subjects using a pooled expression profiling approach. ELISA assays were performed on four significantly differentially expressed proteins to validate their expression profiles in subjects from the Gambia and a pilot group from Nigeria. Results from this were collated for statistical multiplexing using logistic regression analysis. Results Twenty-six proteins were identified as differentially expressed between the three subject groups. Direct measurements of four; hemopexin, alpha-1-antitrypsin, apolipoprotein A1 and complement component 3 confirmed their change in abundance in LC and HCC versus control patients. These trends were independently replicated in the pilot validation subjects from Nigeria. The statistical multiplexing of these proteins demonstrated performance comparable to or greater than ALT in identifying liver cirrhosis or carcinogenesis. This exercise also proposed preliminary cut offs with achievable sensitivity, specificity and AUC statistics greater than reported AFP averages. Conclusions The validated changes of expression in these proteins have the potential for development into high-performance tests usable in the diagnosis and or monitoring of HCC and LC patients. The identification of sustained expression trends strengthens the suggestion of these four proteins as worthy candidates for further investigation in the context of liver disease. The statistical combinations also provide a novel inroad of analyses able to propose definitive cut-offs and combinations for evaluation of performance
Second trimester inflammatory and metabolic markers in women delivering preterm with and without preeclampsia.
ObjectiveInflammatory and metabolic pathways are implicated in preterm birth and preeclampsia. However, studies rarely compare second trimester inflammatory and metabolic markers between women who deliver preterm with and without preeclampsia.Study designA sample of 129 women (43 with preeclampsia) with preterm delivery was obtained from an existing population-based birth cohort. Banked second trimester serum samples were assayed for 267 inflammatory and metabolic markers. Backwards-stepwise logistic regression models were used to calculate odds ratios.ResultsHigher 5-α-pregnan-3β,20α-diol disulfate, and lower 1-linoleoylglycerophosphoethanolamine and octadecanedioate, predicted increased odds of preeclampsia.ConclusionsAmong women with preterm births, those who developed preeclampsia differed with respect metabolic markers. These findings point to potential etiologic underpinnings for preeclampsia as a precursor to preterm birth
Stops and Stares: Street Stops, Surveillance, and Race in the New Policing
The use of proactive tactics to disrupt criminal activities, such as Terry street stops and concentrated misdemeanor arrests, are essential to the “new policing.” This model applies complex metrics, strong management, and aggressive enforcement and surveillance to focus policing on high crime risk persons and places. The tactics endemic to the “new policing” gave rise in the 1990s to popular, legal, political and social science concerns about disparate treatment of minority groups in their everyday encounters with law enforcement. Empirical evidence showed that minorities were indeed stopped and arrested more frequently than similarly situated whites, even when controlling for local social and crime conditions. In this article, we examine racial disparities under a unique configuration of the street stop prong of the “new policing” – the inclusion of non-contact observations (or surveillances) in the field interrogation (or investigative stop) activity of Boston Police Department officers. We show that Boston Police officers focus significant portions of their field investigation activity in two areas: suspected and actual gang members, and the city’s high crime areas. Minority neighborhoods experience higher levels of field interrogation and surveillance activity net of crime and other social factors. Relative to white suspects, Black suspects are more likely to be observed, interrogated, and frisked or searched controlling for gang membership and prior arrest history. Moreover, relative to their black counterparts, white police officers conduct high numbers of field investigations and are more likely to frisk/search subjects of all races. We distinguish between preference-based and statistical discrimination by comparing stops by officer-suspect racial pairs. If officer activity is independent of officer race, we would infer that disproportionate stops of minorities reflect statistical discrimination. We show instead that officers seem more likely to investigate and frisk or search a minority suspect if officer and suspect race differ. We locate these results in the broader tensions of racial profiling that pose recurring social and constitutional concerns in the “new policing.”
Circulating micrornas associated with glycemic impairment and progression in Asian Indians.
Aims/hypothesisAsian Indians have a high incidence of type 2 diabetes, but factors associated with glycemic progression in this population are not understood. MicroRNAs are emerging as important mediators of glucose homeostasis and have not been previously studied in Asian Indians. We examined microRNA (miR) expression associated with glycemic impairment and progression in Asian Indians from the San Francisco Bay Area. We studied 128 Asian Indians age 45-84 years without known cardiovascular disease and not taking diabetes medications. Oral glucose tolerance tests were performed at baseline and after 2.5 years. We quantified circulating miRs from plasma collected during the enrollment visit using a flow cytometry-based assay.ResultsGlycemic impairment was present in 57 % (n = 73) at baseline. MiR-191 was positively associated with glycemic impairment (odds ratio (OR) 1.7 (95 % CI 1.2, 2.4), p < 0.01). The prevalence of glycemic progression after 2.5 years was 24 % (n = 23). Six miRs were negatively associated with glycemic progression: miR-122 (OR 0.5 (0.2, 0.8), p < 0.01), miR-15a (OR 0.6 (0.4, 0.9), p < 0.01), miR-197 (OR 0.6 (0.4, 0.9), p < 0.01), miR-320a (OR 0.6 (0.4, 0.9), p < 0.01), miR-423 (OR 0.6 (0.4, 0.9), p < 0.01), and miR-486 (OR 0.5 (0.3, 0.8), p < 0.01). Further multivariate adjustment did not attenuate these results.Conclusions/interpretationThis is the first study to investigate circulating miRs associated with glycemic status among this high-risk ethnic group. Individual miRs were significantly associated with both glycemic impairment and glycemic progression. Further studies are needed to determine whether miR (s) might be useful clinical biomarkers for incident T2D in the Asian Indian population
- …