61 research outputs found

    Robust regularized singular value decomposition with application to mortality data

    Get PDF
    We develop a robust regularized singular value decomposition (RobRSVD) method for analyzing two-way functional data. The research is motivated by the application of modeling human mortality as a smooth two-way function of age group and year. The RobRSVD is formulated as a penalized loss minimization problem where a robust loss function is used to measure the reconstruction error of a low-rank matrix approximation of the data, and an appropriately defined two-way roughness penalty function is used to ensure smoothness along each of the two functional domains. By viewing the minimization problem as two conditional regularized robust regressions, we develop a fast iterative reweighted least squares algorithm to implement the method. Our implementation naturally incorporates missing values. Furthermore, our formulation allows rigorous derivation of leave-one-row/column-out cross-validation and generalized cross-validation criteria, which enable computationally efficient data-driven penalty parameter selection. The advantages of the new robust method over nonrobust ones are shown via extensive simulation studies and the mortality rate application.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS649 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Functional singular value decomposition and multi-resolution anomaly detection

    Get PDF
    This dissertation has two major parts. The first part discusses the connections and differences between the statistical tool of Principal Component Analysis (PCA) and the related numerical method of Singular Value Decomposition (SVD), and related visualization methods. The second part proposes a Multi-Resolution Anomaly Detection (MRAD) method for time series with long range dependence (LRD). PCA is a popular method in multivariate analysis and in Functional Data Analysis (FDA). Compared to PCA, SVD is more general, because it not only provides a direct approach to calculate the principal components (PCs), but also simultaneously yields the PCAs for both the row and the column spaces. SVD has been used directly to explore and analyze data sets, and has been shown to be an insightful analysis tool in many fields. However, the connection and differences between PCA and SVD have seldom been explored from a statistical view point. Here we explore the connections and differences between PCA and SVD, and extend the usual SVD method to variations including different centerings based on various types of means. A generalized scree plot is developed to provide a visual aid for selection of different centerings. Several matrix views of the SVD components are introduced to explore different features in data, including SVD surface plots, image plots, rotation movies, and curve movies. These methods visualize both column and row information of a two-way matrix simultaneously, relate the matrix to relevant curves, and show local variations and interactions between columns and rows. Several toy examples are designed iii to compare the different types of centerings, and three real applications are used to illustrate the matrix views. In the field of Internet traffic anomaly detection, different types of network anomalies exist at different time scales. This motivates anomaly detection methods that effectively exploit multiscale properties. Because time series of Internet measurements exhibit long range dependence (LRD) and self-similarity (SS), the classical outlier detection methods base on short-range dependent time series may not be suitable for identifying network anomalies. Based on a time series collected at a single scale (the finest scale), we aggregate to form time series of various scales, and propose a MRAD procedure to find anomalies which appear at different time scales. We show that this MRAD method is more conservative than a typical outlier detection method based on a given scale, and has larger power on average than any single scale outlier detection method based on some reasonable assumptions. Asymptotic distribution of the test statistic is developed as well. An MRAD map is developed to show candidate anomalies and the corresponding significance probabilities (p values). This method can be easily extended to be implemented in real time. Simulations and real examples are reported as well, to illustrate the usefulness of the MRAD method. Keywords: Principal Component Analysis, Functional Data Analysis, Exploratory Data Analysis, Network Intrusion Detection, Outlier detection, Level Shift, Multiscale analysis, Long Range Dependence, Multiple Comparison, p values, Time Series, false discovery rate

    Agreement Study Using Gesture Description Analysis

    Get PDF
    Choosing adequate gestures for touchless interfaces is a challenging task that has a direct impact on human-computer interaction. Such gestures are commonly determined by the designer, ad-hoc, rule-based or agreement-based methods. Previous approaches to assess agreement grouped the gestures into equivalence classes and ignored the integral properties that are shared between them. In this work, we propose a generalized framework that inherently incorporates the gesture descriptors into the agreement analysis (GDA). In contrast to previous approaches, we represent gestures using binary description vectors and allow them to be partially similar. In this context, we introduce a new metric referred to as Soft Agreement Rate (SAR) to measure the level of agreement and provide a mathematical justification for this metric. Further, we performed computational experiments to study the behavior of SAR and demonstrate that existing agreement metrics are a special case of our approach. Our method was evaluated and tested through a guessability study conducted with a group of neurosurgeons. Nevertheless, our formulation can be applied to any other user-elicitation study. Results show that the level of agreement obtained by SAR is 2.64 times higher than the previous metrics. Finally, we show that our approach complements the existing agreement techniques by generating an artificial lexicon based on the most agreed properties

    No-Shows to Primary Care Appointments: Subsequent Acute Care Utilization among Diabetic Patients

    Get PDF
    Background Patients who no-show to primary care appointments interrupt clinicians’ efforts to provide continuity of care. Prior literature reveals no-shows among diabetic patients are common. The purpose of this study is to assess whether no-shows to primary care appointments are associated with increased risk of future emergency department (ED) visits or hospital admissions among diabetics. Methods A prospective cohort study was conducted using data from 8,787 adult diabetic patients attending outpatient clinics associated with a medical center in Indiana. The outcomes examined were hospital admissions or ED visits in the 6 months (182 days) following the patient’s last scheduled primary care appointment. The Andersen-Gill extension of the Cox proportional hazard model was used to assess risk separately for hospital admissions and ED visits. Adjustment was made for variables associated with no-show status and acute care utilization such as gender, age, race, insurance and co-morbid status. The interaction between utilization of the acute care service in the six months prior to the appointment and no-show was computed for each model. Results The six-month rate of hospital admissions following the last scheduled primary care appointment was 0.22 (s.d. = 0.83) for no-shows and 0.14 (s.d. = 0.63) for those who attended (p \u3c 0.0001). No-show was associated with greater risk for hospitalization only among diabetics with a hospital admission in the prior six months. Among diabetic patients with a prior hospital admission, those who no-showed were at 60% greater risk for subsequent hospital admission (HR = 1.60, CI = 1.17–2.18) than those who attended their appointment. The six-month rate of ED visits following the last scheduled primary care appointment was 0.56 (s.d. = 1.48) for no-shows and 0.38 (s.d. = 1.05) for those who attended (p \u3c 0.0001); after adjustment for covariates, no-show status was not significantly related to subsequent ED utilization
    • …
    corecore