2,418 research outputs found
Robust variable screening for regression using factor profiling
Sure Independence Screening is a fast procedure for variable selection in
ultra-high dimensional regression analysis. Unfortunately, its performance
greatly deteriorates with increasing dependence among the predictors. To solve
this issue, Factor Profiled Sure Independence Screening (FPSIS) models the
correlation structure of the predictor variables, assuming that it can be
represented by a few latent factors. The correlations can then be profiled out
by projecting the data onto the orthogonal complement of the subspace spanned
by these factors. However, neither of these methods can handle the presence of
outliers in the data. Therefore, we propose a robust screening method which
uses a least trimmed squares method to estimate the latent factors and the
factor profiled variables. Variable screening is then performed on factor
profiled variables by using regression MM-estimators. Different types of
outliers in this model and their roles in variable screening are studied. Both
simulation studies and a real data analysis show that the proposed robust
procedure has good performance on clean data and outperforms the two nonrobust
methods on contaminated data
Empirical comparison of the performance of location estimates of fuzzy number-valued data
© Springer Nature Switzerland AG 2019. Several location measures have already been proposed in the literature in order to summarize the central tendency of a random fuzzy number in a robust way. Among them, fuzzy trimmed means and fuzzy M-estimators of location extend two successful approaches from the real-valued settings. The aim of this work is to present an empirical comparison of different location estimators, including both fuzzy trimmed means and fuzzy M-estimators, to study their differences in finite sample behaviour.status: publishe
Who is leading the campaign charts? Comparing individual popularity on old and new media
Traditionally, election campaigns are covered in the mass media with a strong focus on a limited number of top candidates. The question of this paper is whether this knowledge still holds today, when social media outlets are becoming more popular. Do candidates who dominate the traditional media also dominate the social media? Or can candidates make up for a lack of mass media coverage
by attracting attention on Twitter? This study addresses these question by paring Twitter data with traditional media data for the 2014 Belgian elections. Our findings show that the two platforms are indeed strongly related and that candidates with a prominent position in the media are generally also most successful on Twitter. This is not because more popularity on Twitter translates directly into more traditional media coverage, but mainly because largely the same political elite dominates both platforms
The multivariate least trimmed squares estimator.
In this paper we introduce the least trimmed squares estimator for multivariate regression. We give three equivalent formulations of the estimator and obtain its breakdown point. A fast algorithm for its computation is proposed. We prove Fisher-consistency at the multivariate regression model with elliptically symmetric error distribution and derive the influence function. Simulations investigate the finite-sample efficiency and robustness of the estimator. To increase the efficiency of the estimator, we also consider a one-step reweighted version, as well as multivariate generalizations of one-step GM-estimators.Model; Data; Distribution; Simulation;
The early signs are that Belgium is heading for more political deadlock over who should form the next government
Belgium held federal elections in May, with negotiations currently on-going over the makeup of the next government. As Peter Van Aelst writes, a key concern is that the country could experience political deadlock of the kind which occurred after the 2010 elections, where it took 541 days of negotiations before a government could be formed. He notes that while there appears to be more urgency than there was in 2010, the linguistic cleavage between French and Dutch-speaking parties will still be exceptionally difficult to overcome
Enhanced analysis of real-time PCR data by using a variable efficiency model : FPK-PCR
Current methodology in real-time Polymerase chain reaction (PCR) analysis performs well provided PCR efficiency remains constant over reactions. Yet, small changes in efficiency can lead to large quantification errors. Particularly in biological samples, the possible presence of inhibitors forms a challenge. We present a new approach to single reaction efficiency calculation, called Full Process Kinetics-PCR (FPK-PCR). It combines a kinetically more realistic model with flexible adaptation to the full range of data. By reconstructing the entire chain of cycle efficiencies, rather than restricting the focus on a 'window of application', one extracts additional information and loses a level of arbitrariness. The maximal efficiency estimates returned by the model are comparable in accuracy and precision to both the golden standard of serial dilution and other single reaction efficiency methods. The cycle-to-cycle changes in efficiency, as described by the FPK-PCR procedure, stay considerably closer to the data than those from other S-shaped models. The assessment of individual cycle efficiencies returns more information than other single efficiency methods. It allows in-depth interpretation of real-time PCR data and reconstruction of the fluorescence data, providing quality control. Finally, by implementing a global efficiency model, reproducibility is improved as the selection of a window of application is avoided
Simulation of between repeat variability in real time PCR reactions
While many decisions rely on real time quantitative PCR (qPCR) analysis few attempts have hitherto been made to quantify bounds of precision accounting for the various sources of variation involved in the measurement process. Besides influences of more obvious factors such as camera noise and pipetting variation, changing efficiencies within and between reactions affect PCR results to a degree which is not fully recognized. Here, we develop a statistical framework that models measurement error and other sources of variation as they contribute to fluorescence observations during the amplification process and to derived parameter estimates. Evaluation of reproducibility is then based on simulations capable of generating realistic variation patterns. To this end, we start from a relatively simple statistical model for the evolution of efficiency in a single PCR reaction and introduce additional error components, one at a time, to arrive at stochastic data generation capable of simulating the variation patterns witnessed in repeated reactions (technical repeats). Most of the variation in C-q values was adequately captured by the statistical model in terms of foreseen components. To recreate the dispersion of the repeats' plateau levels while keeping the other aspects of the PCR curves within realistic bounds, additional sources of reagent consumption (side reactions) enter into the model. Once an adequate data generating model is available, simulations can serve to evaluate various aspects of PCR under the assumptions of the model and beyond
- …