120 research outputs found

    Robust Multivariate Autoregression for Anomaly Detection in Dynamic Product Ratings

    Get PDF
    ABSTRACT User provided rating data about products and services is one key feature of websites such as Amazon, TripAdvisor, or Yelp. Since these ratings are rather static but might change over time, a temporal analysis of rating distributions provides deeper insights into the evolution of a products' quality. Given a time-series of rating distributions, in this work, we answer the following questions: (1) How to detect the base behavior of users regarding a product's evaluation over time? (2) How to detect points in time where the rating distribution differs from this base behavior, e.g., due to attacks or spontaneous changes in the product's quality? To achieve these goals, we model the base behavior of users regarding a product as a latent multivariate autoregressive process. This latent behavior is mixed with a sparse anomaly signal finally leading to the observed data. We propose an efficient algorithm solving our objective and we present interesting findings on various real world datasets

    System Support For Stream Processing In Collaborative Cloud-Edge Environment

    Get PDF
    Stream processing is a critical technique to process huge amount of data in real-time manner. Cloud computing has been used for stream processing due to its unlimited computation resources. At the same time, we are entering the era of Internet of Everything (IoE). The emerging edge computing benefits low-latency applications by leveraging computation resources at the proximity of data sources. Billions of sensors and actuators are being deployed worldwide and huge amount of data generated by things are immersed in our daily life. It has become essential for organizations to be able to stream and analyze data, and provide low-latency analytics on streaming data. However, cloud computing is inefficient to process all data in a centralized environment in terms of the network bandwidth cost and response latency. Although edge computing offloads computation from the cloud to the edge of the Internet, there is not a data sharing and processing framework that efficiently utilizes computation resources in the cloud and the edge. Furthermore, the heterogeneity of edge devices brings more difficulty to the development of collaborative cloud-edge applications. To explore and attack the challenges of stream processing system in collaborative cloudedge environment, in this dissertation we design and develop a series of systems to support stream processing applications in hybrid cloud-edge analytics. Specifically, we develop an hierarchical and hybrid outlier detection model for multivariate time series streams that automatically selects the best model for different time series. We optimize one of the stream processing system (i.e., Spark Streaming) to reduce the end-to-end latency. To facilitate the development of collaborative cloud-edge applications, we propose and implement a new computing framework, Firework that allows stakeholders to share and process data by leveraging both the cloud and the edge. A vision-based cloud-edge application is implemented to demonstrate the capabilities of Firework. By combining all these studies, we provide comprehensive system support for stream processing in collaborative cloud-edge environment

    Toward Understanding Causes of Anomaly in Dynamic Restaurant Rating

    Get PDF
    Rating score and text review are the most common features provided in online review systems to gather the opinions shared by users. Product rating distributions usually evolve dynamically over time and potentially accompany with some unusual changes, namely anomalies, which might be caused by product quality change or spamming attacks. In this preliminary study, we analyze the time-series of rating score distributions by using the data collected from Yelp restaurants, and we apply Principal Component Analysis (PCA) to detect anomalous time points. Through manually checking the corresponding review texts, we further investigate the underlying reasons leading to anomalous rating scores. The potential reasons we identified include food/service quality change, user preference, and review spam. Our study is envisioned to help business owners respond timely to unusual feedbacks and manage their business more efficiently

    Comparing vector autoregressive (VAR) estimation with combine white noise (CWN) estimation

    Get PDF
    The purpose of this study is to compare one of the existing models, which is VAR model with the new Combine White Noise model. The VAR models have not been able to model the conditional heteroscedasticity and the leverage effect exhibited by the data. Likewise, GARCH family models cannot model leverage effect. The Combine White Noise (CWN) has proved more efficient and takes care of these weaknesses. CWN has the minimum information criteria and high log likelihood when compare with VAR estimation. The determinant of the residual covariance matrix value indicates that CWN estimation is efficient. It passes the Levene’s test of equal variances. CWN has a minimum forecast errors which indicates forecast accuracy. All its outcomes outperform all the outcomes of VAR widely

    Analytic Case Study Using Unsupervised Event Detection in Multivariate Time Series Data

    Get PDF
    Analysis of cyber-physical systems (CPS) has emerged as a critical domain for providing US Air Force and Space Force leadership decision advantage in air, space, and cyberspace. Legacy methods have been outpaced by evolving battlespaces and global peer-level challengers. Automation provides one way to decrease the time that analysis currently takes. This thesis presents an event detection automation system (EDAS) which utilizes deep learning models, distance metrics, and static thresholding to detect events. The EDAS automation is evaluated with case study of CPS domain experts in two parts. Part 1 uses the current methods for CPS analysis with a qualitative pre-survey and tasks participants, in their natural setting to annotate events. Part 2 asks participants to perform annotation with the assistance of EDAS’s pre-annotations. Results from Part 1 and Part 2 exhibit low inter-coder agreement for both human-derived and automation-assisted event annotations. Qualitative analysis of survey results showed low trust and confidence in the event detection automation. One correlation or interpretation to the low confidence is that the low inter-coder agreement means that the humans do not share the same idea of what an annotation product should be

    Improvement of Vector Autoregression (VAR) estimation using Combine White Noise (CWN) technique

    Get PDF
    Previous studies revealed that Exponential Generalized Autoregressive Conditional Heteroscedastic (EGARCH) outperformed Vector Autoregression (VAR) when data exhibit heteroscedasticity. However, EGARCH estimation is not efficient when the data have leverage effect. Therefore, in this study the weaknesses of VAR and EGARCH were modelled using Combine White Noise (CWN). The CWN model was developed by integrating the white noise of VAR with EGARCH using Bayesian Model Averaging (BMA) for the improvement of VAR estimation. First, the standardized residuals of EGARCH errors (heteroscedastic variance) were decomposed into equal variances and defined as white noise series. Next, this series was transformed into CWN model through BMA. The CWN was validated using comparison study based on simulation and four countries real data sets of Gross Domestic Product (GDP). The data were simulated by incorporating three sample sizes with low, moderate and high values of leverages and skewness. The CWN model was compared with three existing models (VAR, EGARCH and Moving Average (MA)). Standard error, log-likelihood, information criteria and forecast error measures were used to evaluate the performance of the models. The simulation findings showed that CWN outperformed the three models when using sample size of 200 with high leverage and moderate skewness. Similar results were obtained for the real data sets where CWN outperformed the three models with high leverage and moderate skewness using France GDP. The CWN also outperformed the three models when using the other three countries GDP data sets. The CWN was the most accurate model of about 70 percent as compared with VAR, EGARCH and MA models. These simulated and real data findings indicate that CWN are more accurate and provide better alternative to model heteroscedastic data with leverage effect
    • …
    corecore