155 research outputs found

    Robust Estimators are Hard to Compute

    Get PDF
    In modern statistics, the robust estimation of parameters of a regression hyperplane is a central problem. Robustness means that the estimation is not or only slightly affected by outliers in the data. In this paper, it is shown that the following robust estimators are hard to compute: LMS, LQS, LTS, LTA, MCD, MVE, Constrained M estimator, Projection Depth (PD) and Stahel-Donoho. In addition, a data set is presented such that the ltsReg-procedure of R has probability less than 0.0001 of finding a correct answer. Furthermore, it is described, how to design new robust estimators. --Computational statistics,complexity theory,robust statistics,algorithms,search heuristics

    Modified repeated median filters

    Get PDF
    We discuss moving window techniques for fast extraction of a signal comprising monotonic trends and abrupt shifts from a noisy time series with irrelevant spikes. Running medians remove spikes and preserve shifts, but they deteriorate in trend periods. Modified trimmed mean filters use a robust scale estimate such as the median absolute deviation about the median (MAD) to select an adaptive amount of trimming. Application of robust regression, particularly of the repeated median, has been suggested for improving upon the median in trend periods. We combine these ideas and construct modified filters based on the repeated median offering better shift preservation. All these filters are compared w.r.t. fundamental analytical properties and in basic data situations. An algorithm for the update of the MAD running in time O(log n) for window width n is presented as well. --signal extraction,robust filtering,drifts,jumps,outliers,computational geometry,update algorithm

    Repeated median and hybrid filters

    Get PDF
    Standard median filters preserve abrupt shifts (edges) and remove impulsive noise (outliers) from a constant signal but they deteriorate in trend periods. FIR median hybrid (FMH) filters are more flexible and also preserve shifts, but they are much more vulnerable to outliers. Application of robust regression methods, in particular of the repeated median, has been suggested for removing subsequent outliers from a signal with trends. A fast algorithm for updating the repeated median in linear time using quadratic space is given in Bernholt and Fried (2003). We construct repeated median hybrid filters to combine the robustness properties of the repeated median with the edge preservation ability of FMH filters. An algorithm for updating the repeated median is presented which needs only linear space. We also investigate analytical properties of these filters and compare their performance via simulations. --Signal extraction,Drifts,Jumps,Outliers,Update algorithm

    Computing the Least Quartile Difference Estimator in the Plane

    Get PDF
    A common problem in linear regression is that largely aberrant values can strongly influence the results. The least quartile difference (LQD) regression estimator is highly robust, since it can resist up to almost 50% largely deviant data values without becoming extremely biased. Additionally, it shows good behavior on Gaussian data ā€“ in contrast to many other robust regression methods. However, the LQD is not widely used yet due to the high computational effort needed when using common algorithms, e.g. the subset algorithm of Rousseeuw and Leroy. For computing the LQD estimator for n data points in the plane, we propose a randomized algorithm with expected running time O(n2 log2 n) and an approximation algorithm with a running time of roughly O(n2 log n). It can be expected that the practical relevance of the LQD estimator will strongly increase thereby. --

    Detecting high-order interactions of single nucleotide polymorphisms using genetic programming

    Get PDF
    Motivation: Not individual single nucleotide polymorphisms (SNPs), but high-order interactions of SNPs are assumed to be responsible for complex diseases such as cancer. Therefore, one of the major goals of genetic association studies concerned with such genotype data is the identification of these high-order interactions. This search is additionally impeded by the fact that these interactions often are only explanatory for a relatively small subgroup of patients. Most of the feature selection methods proposed in the literature, unfortunately, fail at this task, since they can either only identify individual variables or interactions of a low order, or try to find rules that are explanatory for a high percentage of the observations. In this paper, we present a procedure based on genetic programming and multi-valued logic that enables the identification of high-order interactions of categorical variables such as SNPs. This method called GPAS (Genetic Programming for Association Studies) cannot only be used for feature selection, but can also be employed for discrimination. Results: In an application to the genotype data from the GENICA study, an association study concerned with sporadic breast cancer, GPAS is able to identify high-order interactions of SNPs leading to a considerably increased breast cancer risk for different subsets of patients that are not found by other feature selection methods. As an application to a subset of the HapMap data shows, GPAS is not restricted to association studies comprising several ten SNPs, but can also be employed to analyze whole-genome data. --

    Constrained Minkowski Sums: A Geometric Framework for Solving Interval Problems inComputational Biology Efficiently

    Get PDF
    In this paper, we introduce the notion of a constrained Minkowski sum: for two (finite) point-sets P,QāŠ†ā„2 and a set of k inequalities Axā‰„b, it is defined as the point-set (P āŠ• Q) Axā‰„b ={x=p+qāˆ£pāˆˆP,qāˆˆQ,Axā‰„b}. We show that typical interval problems from computational biology can be solved by computing a set containing the vertices of the convex hull of an appropriately constrained Minkowski sum. We provide an algorithm for computing such a set with running time O(Nlogā€‰N), where N=|P|+|Q| if k is fixed. For the special case (PāŠ•Q)x1ā‰„Ī²(P\oplus Q)_{x_{1}\geq \beta} where P and Q consist of points with integer x 1-coordinates whose absolute values are bounded by O(N), we even achieve a linear running time O(N). We thereby obtain a linear running time for many interval problems from the literature and improve upon the best known running times for some of them. The main advantage of the presented approach is that it provides a general framework within which a broad variety of interval problems can be modeled and solve

    Epistemic Beliefs in Scienceā€”A Systematic Integration of Evidence From Multiple Studies

    Get PDF
    Recent research has integrated developmental and dimensional perspectives on epistemic beliefs by implementing an approach in which profiles of learnersā€™ epistemic beliefs are modeled across multiple dimensions. Variability in study characteristics has impeded the comparison of profiles of epistemic beliefs and their relations with external variables across studies. We examined this comparability by integrating data on epistemic beliefs about the source, certainty, development, and justification of knowledge in science from six studies comprising N = 10,932 German students from elementary to upper secondary school. Applying latent profile analyses to these data, we found that profiles of epistemic beliefs that were previously conceptualized were robust across multiple samples. We found indications that profiles of epistemic beliefs homogenize over the course of studentsā€™ education, are related to school tracking, and demonstrate robust relations with studentsā€™ personal characteristics and socioeconomic background. We discuss implications for the theory, assessment, and education of epistemic beliefs. Ā© 2022, The Author(s)

    Constrained Minkowski Sums: A Geometric Framework for Solving Interval Problems in Computational Biology Efficiently

    Get PDF
    In this paper, we introduce the notion of a constrained Minkowski sum: for two (finite) point-sets P, Q subset of R-2 and a set of k inequalities Ax >= b, it is defined as the point-set (P circle plus Q)(Ax >= b) = {x = p + q vertical bar p is an element of P, q is an element of Q, Ax >= b}. We show that typical interval problems from computational biology can be solved by computing a set containing the vertices of the convex hull of an appropriately constrained Minkowski sum. We provide an algorithm for computing such a set with running time O (N log N), where N = vertical bar P vertical bar + vertical bar Q vertical bar if k is fixed. For the special case (P circle plus Q)(x1 >=beta) where P and Q consist of points with integer x(1)-coordinates whose absolute values are bounded by O(N), we even achieve a linear running time O(N). We thereby obtain a linear running time for many interval problems from the literature and improve upon the best known running times for some of them. The main advantage of the presented approach is that it provides a general framework within which a broad variety of interval problems can be modeled and solved

    MOOCs in higher education ā€“ Motives and expectations of university teachers and students

    Get PDF
    Lehrende spielen im Diskurs um Massive Open Online Courses (MOOCs) in der Hochschullehre als potenzielle Nutzer:innen und Entwickler:innen eine wichtige Rolle. Auch die Passung mit den Erwartungen der Studierenden ist entscheidend fĆ¼r eine breite Verankerung von MOOCs in die Lehre. In einer Fragebogenstudie mit 445 Lehrenden und 1644 Studierenden aus Schleswig-Holstein zeigt sich, dass Lehrende mit und ohne MOOC-Erfahrung Ƥhnliche Motivationsstrukturen und Einstellungen zu digitalen Bildungsangeboten haben. Im Unterschied zu Studierenden sprechen sich Lehrende fĆ¼r kollaborative, seminarƤhnliche Formate aus. FĆ¼r Studierende sind sowohl klare Strukturen, als auch Wahlmƶglichkeiten wichtig

    Anhang zum Beitrag: MOOCs in der Hochschullehre - Motive und Erwartungen von Lehrenden und Studierenden

    Get PDF
    Anhang zum Artikel "MOOCs in der Hochschullehre - Motive und Erwartungen von Lehrenden und Studierenden
    • ā€¦
    corecore