636 research outputs found

    Association of National Football League Fan Attendance With County-Level COVID-19 Incidence in the 2020-2021 Season

    Get PDF
    Importance The 2020-2021 National Football League (NFL) season had some games with fans and others without. Thus, the exposed group (ie, games with fans) and the unexposed group (games without fans) could be examined to better understand the association between fan attendance and local incidence of COVID-19. Objective To assess whether NFL football games with varying degrees of in-person attendance were associated with increased COVID-19 cases in the counties where the games were held, as well as in contiguous counties, compared with games without in-person attendance for 7-, 14-, and 21-day follow-ups. Design, Setting, and Participants This cross-sectional study used data for all 32 NFL teams across the entirety of the 2020-2021 season. Separate daily time-series of COVID-19 total cases and case rates were generated using 7-, 14-, and 21-day simple moving averages for every team and were plotted against the actuals to detect potential spikes (outliers) in incidence levels following games for the county in which games took place, contiguous counties, and a combination. Outliers flagged in the period following games were recorded. Poisson exact tests were evaluated for differences in spike incidence as well across games with different rates of attendance. The data were analyzed between February 2021 and March 2021. Exposures Games with fan attendance vs games with no fan attendance, as well as the number of fans in attendance for games with fans. Main Outcomes and Measures The main outcome was estimation of COVID-19 cases and rates at the county and contiguous county level at 7-, 14-, and 21-day intervals for in-person attended games and non–fan attended games, which was further investigated by stratifying by the number of persons in fan-attended games. Results This included a total of 269 NFL game dates. Of these games, 117 were assigned to an exposed group (fans attended), and the remaining 152 games comprised the unexposed group (unattended). Fan attendance ranged from 748 to 31 700 persons. Fan attendance was associated with episodic spikes in COVID-19 cases and rates in the 14-day window for the in-county (cases: rate ratio [RR], 1.36; 95% CI, 1.00-1.87), contiguous counties (cases: RR, 1.31; 95% CI, 1.00-1.72; rates: RR, 1.41; 95% CI, 1.13-1.76), and pooled counties groups (cases: RR, 1.34; 95% CI, 1.01-1.79; rates: RR, 1.72; 95% CI, 1.29-2.28) as well as for the 21-day window in-county (cases: RR, 1.49; 95% CI, 1.21-1.83; rates: RR, 1.50; 95% CI, 1.26-1.78), in contiguous counties(cases: RR, 1.37; 95% CI, 1.14-1.65; rates: RR, 1.45; 95% CI, 1.24-1.71), and pooled counties groups (cases: RR, 1.41; 95% CI, 1.11-1.79; rates: RR, 1.70; 95% CI, 1.35-2.15). Games with fewer than 5000 fans were not associated with any spikes, but in counties where teams had 20 000 fans in attendance, there were 2.23 times the rate of spikes in COVID-19 (95% CI, 1.53 to ∞). Conclusions and Relevance In this cross-sectional study of the presence of fans at NFL home games during the 2020-2021 season, results indicated that fan attendance was associated with increased levels of COVID-19 in the counties in which the venues are nested within, as well as in surrounding counties. The spikes in COVID-19 for crowds of over 20 000 people suggest that large events should be handled with extreme caution during public health event(s) where vaccines, on-site testing, and various countermeasures are not readily available to the public

    Anomaly detection and explanation in big data

    Get PDF
    2021 Spring.Includes bibliographical references.Data quality tests are used to validate the data stored in databases and data warehouses, and to detect violations of syntactic and semantic constraints. Domain experts grapple with the issues related to the capturing of all the important constraints and checking that they are satisfied. The constraints are often identified in an ad hoc manner based on the knowledge of the application domain and the needs of the stakeholders. Constraints can exist over single or multiple attributes as well as records involving time series and sequences. The constraints involving multiple attributes can involve both linear and non-linear relationships among the attributes. We propose ADQuaTe as a data quality test framework that automatically (1) discovers different types of constraints from the data, (2) marks records that violate the constraints as suspicious, and (3) explains the violations. Domain knowledge is required to determine whether or not the suspicious records are actually faulty. The framework can incorporate feedback from domain experts to improve the accuracy of constraint discovery and anomaly detection. We instantiate ADQuaTe in two ways to detect anomalies in non-sequence and sequence data. The first instantiation (ADQuaTe2) uses an unsupervised approach called autoencoder for constraint discovery in non-sequence data. ADQuaTe2 is based on analyzing records in isolation to discover constraints among the attributes. We evaluate the effectiveness of ADQuaTe2 using real-world non-sequence datasets from the human health and plant diagnosis domains. We demonstrate that ADQuaTe2 can discover new constraints that were previously unspecified in existing data quality tests, and can report both previously detected and new faults in the data. We also use non-sequence datasets from the UCI repository to evaluate the improvement in the accuracy of ADQuaTe2 after incorporating ground truth knowledge and retraining the autoencoder model. The second instantiation (IDEAL) uses an unsupervised LSTM-autoencoder for constraint discovery in sequence data. IDEAL analyzes the correlations and dependencies among data records to discover constraints. We evaluate the effectiveness of IDEAL using datasets from Yahoo servers, NASA Shuttle, and Colorado State University Energy Institute. We demonstrate that IDEAL can detect previously known anomalies from these datasets. Using mutation analysis, we show that IDEAL can detect different types of injected faults. We also demonstrate that the accuracy of the approach improves after incorporating ground truth knowledge about the injected faults and retraining the LSTM-Autoencoder model. The novelty of this research lies in the development of a domain-independent framework that effectively and efficiently discovers different types of constraints from the data, detects and explains anomalous data, and minimizes false alarms through an interactive learning process

    Methods for event time series prediction and anomaly detection

    Get PDF
    Event time series are sequences of events occurring in continuous time. They arise in many real-world problems and may represent, for example, posts in social media, administrations of medications to patients, or adverse events, such as episodes of atrial fibrillation or earthquakes. In this work, we study and develop methods for prediction and anomaly detection on event time series. We study two general approaches. The first approach converts event time series to regular time series of counts via time discretization. We develop methods relying on (a) nonparametric time series decomposition and (b) dynamic linear models for regular time series. The second approach models the events in continuous time directly. We develop methods relying on point processes. For prediction, we develop a new model based on point processes to combine the advantages of existing models. It is flexible enough to capture complex dependency structures between events, while not sacrificing applicability in common scenarios. For anomaly detection, we develop methods that can detect new types of anomalies in continuous time and that show advantages compared to time discretization

    TFAD: A Decomposition Time Series Anomaly Detection Architecture with Time-Frequency Analysis

    Full text link
    Time series anomaly detection is a challenging problem due to the complex temporal dependencies and the limited label data. Although some algorithms including both traditional and deep models have been proposed, most of them mainly focus on time-domain modeling, and do not fully utilize the information in the frequency domain of the time series data. In this paper, we propose a Time-Frequency analysis based time series Anomaly Detection model, or TFAD for short, to exploit both time and frequency domains for performance improvement. Besides, we incorporate time series decomposition and data augmentation mechanisms in the designed time-frequency architecture to further boost the abilities of performance and interpretability. Empirical studies on widely used benchmark datasets show that our approach obtains state-of-the-art performance in univariate and multivariate time series anomaly detection tasks. Code is provided at https://github.com/DAMO-DI-ML/CIKM22-TFAD.Comment: Accepted by the ACM International Conference on Information and Knowledge Management (CIKM 2022

    Featured Anomaly Detection Methods and Applications

    Get PDF
    Anomaly detection is a fundamental research topic that has been widely investigated. From critical industrial systems, e.g., network intrusion detection systems, to people’s daily activities, e.g., mobile fraud detection, anomaly detection has become the very first vital resort to protect and secure public and personal properties. Although anomaly detection methods have been under consistent development over the years, the explosive growth of data volume and the continued dramatic variation of data patterns pose great challenges on the anomaly detection systems and are fuelling the great demand of introducing more intelligent anomaly detection methods with distinct characteristics to cope with various needs. To this end, this thesis starts with presenting a thorough review of existing anomaly detection strategies and methods. The advantageous and disadvantageous of the strategies and methods are elaborated. Afterward, four distinctive anomaly detection methods, especially for time series, are proposed in this work aiming at resolving specific needs of anomaly detection under different scenarios, e.g., enhanced accuracy, interpretable results, and self-evolving models. Experiments are presented and analysed to offer a better understanding of the performance of the methods and their distinct features. To be more specific, the abstracts of the key contents in this thesis are listed as follows: 1) Support Vector Data Description (SVDD) is investigated as a primary method to fulfill accurate anomaly detection. The applicability of SVDD over noisy time series datasets is carefully examined and it is demonstrated that relaxing the decision boundary of SVDD always results in better accuracy in network time series anomaly detection. Theoretical analysis of the parameter utilised in the model is also presented to ensure the validity of the relaxation of the decision boundary. 2) To support a clear explanation of the detected time series anomalies, i.e., anomaly interpretation, the periodic pattern of time series data is considered as the contextual information to be integrated into SVDD for anomaly detection. The formulation of SVDD with contextual information maintains multiple discriminants which help in distinguishing the root causes of the anomalies. 3) In an attempt to further analyse a dataset for anomaly detection and interpretation, Convex Hull Data Description (CHDD) is developed for realising one-class classification together with data clustering. CHDD approximates the convex hull of a given dataset with the extreme points which constitute a dictionary of data representatives. According to the dictionary, CHDD is capable of representing and clustering all the normal data instances so that anomaly detection is realised with certain interpretation. 4) Besides better anomaly detection accuracy and interpretability, better solutions for anomaly detection over streaming data with evolving patterns are also researched. Under the framework of Reinforcement Learning (RL), a time series anomaly detector that is consistently trained to cope with the evolving patterns is designed. Due to the fact that the anomaly detector is trained with labeled time series, it avoids the cumbersome work of threshold setting and the uncertain definitions of anomalies in time series anomaly detection tasks

    Spatiotemporal anomaly detection: streaming architecture and algorithms

    Get PDF
    Includes bibliographical references.2020 Summer.Anomaly detection is the science of identifying one or more rare or unexplainable samples or events in a dataset or data stream. The field of anomaly detection has been extensively studied by mathematicians, statisticians, economists, engineers, and computer scientists. One open research question remains the design of distributed cloud-based architectures and algorithms that can accurately identify anomalies in previously unseen, unlabeled streaming, multivariate spatiotemporal data. With streaming data, time is of the essence, and insights are perishable. Real-world streaming spatiotemporal data originate from many sources, including mobile phones, supervisory control and data acquisition enabled (SCADA) devices, the internet-of-things (IoT), distributed sensor networks, and social media. Baseline experiments are performed on four (4) non-streaming, static anomaly detection multivariate datasets using unsupervised offline traditional machine learning (TML), and unsupervised neural network techniques. Multiple architectures, including autoencoders, generative adversarial networks, convolutional networks, and recurrent networks, are adapted for experimentation. Extensive experimentation demonstrates that neural networks produce superior detection accuracy over TML techniques. These same neural network architectures can be extended to process unlabeled spatiotemporal streaming using online learning. Space and time relationships are further exploited to provide additional insights and increased anomaly detection accuracy. A novel domain-independent architecture and set of algorithms called the Spatiotemporal Anomaly Detection Environment (STADE) is formulated. STADE is based on federated learning architecture. STADE streaming algorithms are based on a geographically unique, persistently executing neural networks using online stochastic gradient descent (SGD). STADE is designed to be pluggable, meaning that alternative algorithms may be substituted or combined to form an ensemble. STADE incorporates a Stream Anomaly Detector (SAD) and a Federated Anomaly Detector (FAD). The SAD executes at multiple locations on streaming data, while the FAD executes at a single server and identifies global patterns and relationships among the site anomalies. Each STADE site streams anomaly scores to the centralized FAD server for further spatiotemporal dependency analysis and logging. The FAD is based on recent advances in DNN-based federated learning. A STADE testbed is implemented to facilitate globally distributed experimentation using low-cost, commercial cloud infrastructure provided by Microsoftâ„¢. STADE testbed sites are situated in the cloud within each continent: Africa, Asia, Australia, Europe, North America, and South America. Communication occurs over the commercial internet. Three STADE case studies are investigated. The first case study processes commercial air traffic flows, the second case study processes global earthquake measurements, and the third case study processes social media (i.e., Twitterâ„¢) feeds. These case studies confirm that STADE is a viable architecture for the near real-time identification of anomalies in streaming data originating from (possibly) computationally disadvantaged, geographically dispersed sites. Moreover, the addition of the FAD provides enhanced anomaly detection capability. Since STADE is domain-independent, these findings can be easily extended to additional application domains and use cases

    The 8th International Conference on Time Series and Forecasting

    Get PDF
    The aim of ITISE 2022 is to create a friendly environment that could lead to the establishment or strengthening of scientific collaborations and exchanges among attendees. Therefore, ITISE 2022 is soliciting high-quality original research papers (including significant works-in-progress) on any aspect time series analysis and forecasting, in order to motivating the generation and use of new knowledge, computational techniques and methods on forecasting in a wide range of fields
    • …
    corecore