Search CORE

41 research outputs found

Robustness concepts for sliced inverse regression

Author: Genschel Ulrike
Publication venue
Publication date: 18/07/2005
Field of study

A typical difficulty with nonparametric regression with a large number of regressor variables is the so-called curse of dimensionality. That is, as the dimension of the regressor space increases, more data are needed to fill the space densely enough to accurately estimate an underlying regression function. As a remedy, various dimension reduction procedures, such as SIR, SIR II (Li, 1991), SAVE (Cook and Weisberg (1991), Cook (2000)), or MAVE (Xia et al. (2002)) have been proposed for identifying an appropriate, smaller subspace of the original regressor space before fitting an underlying regression function. Because ultimately the estimation of a regression curve or link function relies crucially on the correct identification of the linear combinations that span the dimension reduction subspace, robustness properties of a dimension reduction procedure become crucial to understand. That is, it is important to consider just how sensitive dimension reduction procedures and their subspace estimates are to data contamination. The focus of this thesis is placed on a detailed investigation of the robustness properties of the dimension reduction procedure SIR (Li, (1991)). In particular, we emphasize on the finite sample behavior of the SIR procedure under data contamination, considering various types of contamination (i.e., directions of contamination) which may produce a “worst case” subspace estimate. We demonstrate that the data contamination scenarios that produce bad subspace estimates in SIR depend also on the covariance structure of the regressor variables as well as the dimension K of the final dimension reduction subspace. We show that the type of data contamination that causes SIR to yield an erroneous subspace estimate can change depending on whether the covariance of the regressors is known or not. Initial efforts to define a breakdown point concept for dimension reduction procedures in the finite sample case goes back to the dissertation of Hilker (1997) and involved canonical correlations as a “distance measure” between estimated and true regression subspaces (cf. Hilker (1997); Becker (2001); Gather, Hilker and Becker (2002)). Hilker's work stipulated that breakdown occurs if one basis vector of an estimated subspace is orthogonal to the true subspace. However, this formulation of breakdown in dimension reduction has some drawbacks. For one, it is arguably worse to estimate and select the entire orthogonal subspace of the true regression subspace of interest so that the previous concept of breakdown may not be adequate. Another problematic point is that breakdown classically involves the use of an underlying metric in its definition, but canonical correlations as a measure of “closeness” between spaces do not constitute a metric. The dissertation develops an alternative definition of breakdown in dimension reduction in the finite sample case and investigates an upper bound for the breakdown point in this situation. This formulation of breakdown uses an appropriate metric based on the Frobenius norm to measure the distance between subspaces and defines breakdown under data contamination when the distance between an estimated regression subspace and the true subspace is maximal under the metric. Because a subspace is characterized by its projection matrix, a suitable metric between spaces is possible through a matrix norm applied on the difference of two projection matrices. This gives a geometrically meaningful definition for the finite sample breakdown point of methods such as SIR. This thesis also contains a simulation study used to numerically support our theoretical findings.Ein bekanntes Phänomen bei der Schätzung nichtparametrischer Regressionsmodelle ist der sogenannte Fluch der Dimensionen. Dieser besagt, dass bei steigender Anzahl an Einflussvariablen, d.h. Dimension des Regressorraumes die benötigte Datenmenge für eine adequate Schätzung des zugrunde liegenden Modells exponentiell anwächst. Zur Umgehung dieser Problematik existieren dimensionsreduzierende Verfahren, die eine maßgebliche Reduktion der Dimension des Regressorraumes anstreben. Als Verfahren dieses Typs seien beispielsweise SIR, SIR II (Li, 1991), SAVE (Cook and Weisberg (1991), Cook (2000)), oder MAVE (Xia et al. (2002)) genannt, welche einen Unterraum, genannt e.d.r. Raum, des ursprünglichen Regressoraumes schätzen. Eine korrekte Identifizierung dieses Unterraumes ist für die sich anschliessende Anpassung des Regressionsmodells konsequenterweise ausschlaggebend und Kentnisse über die Empfindlichkeit solcher dimensionsreduzierenden Verfahren gegenüber Kontamination der Daten sind daher von besonderem Interesse. Die zentrale Fragestellung dieser Dissertation beschäftigt sich mit einer ausführlichen Analyse der Robustheitseigenschaften des dimensionsreduzierenden Verfahrens SIR (Li, 1991). Besonderer Augenmerk wird dabei auf das Verhalten des Verfahrens im endlichen Stichprobenfall unter Kontamination der Daten gelegt. Ziel der Arbeit ist es aufzuzeigen, welche Art von Datenkontamination eine sogenannte “worst case” Schätzung des e.d.r. Raumes verursacht. Dabei stellt sich heraus, dass für die Schätzung die Kentniss sowohl der Kovarianzstruktur des Regressorvektors, als auch der Dimension K des e.d.r. Raumes von Bedeutung ist. Im Rahmen der Arbeit kann gezeigt werden, dass die Richtung, in welche eine Kontamination der Daten für das Erhalten einer „worst case“ Schätzung gelegt werden muss, entscheidend davon abhängt, ob die Kovarianzmatrix des Regressorvektors bekannt oder unbekannt ist. Des Weiteren werden erste Ergebnisse zur geeigneten Definition des Bruchpunktverhaltens im endlichen Stichprobenfall aus der Dissertation von Hilker (1997) analysiert und auf den mehrdimensionalen Fall erweitert. Dabei hat sich herausgestellt, dass das von Hilker verwendete Distanzmaß der kanonischen Korrelation sowie die von ihm eingeführte Bruchpunktdefinition für die Erweiterung im mehrdimensionalen Fall nicht länger geeignet sind. Eine alternative Bruchpunktdefinition für den endlichen Stichprobenfall wird daher vorgeschlagen, welche auf einer für Unterräume geeigeten Metrik basiert. Die in der Dissertation erzielten Ergebnisse werden durch eine Simulationsstudie gestützt

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

A Robust Approach to Automatically Locating Grooves in 3D Bullet Land Scans

Author: Genschel Ulrike
Hofmann Heike
Rice Kiegan
Publication venue: Iowa State University Digital Repository
Publication date: 30/12/2019
Field of study

Land engraved areas (LEAs) provide evidence to address the same source–different source problem in forensic firearms examination. Collecting 3D images of bullet LEAs requires capturing portions of the neighboring groove engraved areas (GEAs). Analyzing LEA and GEA data separately is imperative to accuracy in automated comparison methods such as the one developed by Hare et al. (Ann Appl Stat 2017;11, 2332). Existing standard statistical modeling techniques often fail to adequately separate LEA and GEA data due to the atypical structure of 3D bullet data. We developed a method for automated removal of GEA data based on robust locally weighted regression (LOESS). This automated method was tested on high‐resolution 3D scans of LEAs from two bullet test sets with a total of 622 LEA scans. Our robust LOESS method outperforms a previously proposed “rollapply” method. We conclude that our method is a major improvement upon rollapply, but that further validation needs to be conducted before the method can be applied in a fully automated fashion

Digital Repository @ Iowa State University (ISU)

rotations: An R Package for SO(3) Data

Author: Bryan Stanfill
Heike Hofmann
Ulrike Genschel
Publication venue
Publication date: 11/04/2020
Field of study

Abstract In this article we introduce the rotations package which provides users with the ability to simulate, analyze and visualize three-dimensional rotation data. More specifically it includes four commonly used distributions from which to simulate data, four estimators of the central orientation, six confidence region estimation procedures and two approaches to visualizing rotation data. All of these features are available for two different parameterizations of rotations: three-by-three matrices and quaternions. In addition, two datasets are included that illustrate the use of rotation data in practice

CiteSeerX

Recommended from our members

A Permutation Test for Correlated Errors in Adjacent Questionnaire Items

Author: Genschel Ulrike
Hildreth Laura A.
Lesser Virginia
Lorenz Frederick
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Response patterns are of importance to survey researchers because of the insight they provide into the thought processes respondents use to answer survey questions. In this paper we propose the use of structural equation modeling to examine response patterns and develop a permutation test to quantify the likelihood of observing a specific response pattern. Of interest is a response pattern where the response to the current item is conditioned on the respondent’s answer to the immediately preceding item. This pattern manifests itself in the error structure of the survey items by resulting in larger correlations of the errors for adjacent items than for non-adjacent items. We illustrate the proposed method using data from the 2002 Oregon Survey of Roads and Highways and report SAS code which can be easily modified to examine other response patterns of interest.Keywords: permutation test, general-specific questions, correlated errors, response patterns in survey

ScholarsArchive@OSU

The Power in Groups: Using Cluster Analysis to Critically Quantify Women’s STEM Enrollment

Author: Gansemer-Topf Ann M.
Genschel Ulrike
Nguyen Xuan Hien
Sourwine Jasmine
Wang Yuchen
Publication venue: 'IntechOpen'
Publication date: 07/04/2022
Field of study

Despite efforts to close the gender gap in science, technology, engineering, and math (STEM), disparities still exist, especially in math intensive STEM (MISTEM) majors. Females and males receive similar academic preparation and overall, perform similarly, yet females continue to enroll in STEM majors less frequently than men. In examining academic preparation, most research considers performance measures individually, ignoring the possible interrelationships between these measures. We address this problem by using hierarchical agglomerative clustering – a statistical technique which allows for identifying groups (i.e., clusters) of students who are similar in multiple factors. We first apply this technique to readily available institutional data to determine if we could identify distinct groups. Results illustrated that it was possible to identify nine unique groups. We then examined differences in STEM enrollment by group and by gender. We found that the proportion of females differed by group, and the gap between males and females also varied by group. Overall, males enrolled in STEM at a higher proportion than females and did so regardless of the strength of their academic preparation. Our results provide a novel yet feasible approach to examining gender differences in STEM enrollment in postsecondary education

IntechOpen

Digital Repository @ Iowa State University (ISU)