8,874 research outputs found

    Detecting Flow Anomalies in Distributed Systems

    Get PDF
    Deep within the networks of distributed systems, one often finds anomalies that affect their efficiency and performance. These anomalies are difficult to detect because the distributed systems may not have sufficient sensors to monitor the flow of traffic within the interconnected nodes of the networks. Without early detection and making corrections, these anomalies may aggravate over time and could possibly cause disastrous outcomes in the system in the unforeseeable future. Using only coarse-grained information from the two end points of network flows, we propose a network transmission model and a localization algorithm, to detect the location of anomalies and rank them using a proposed metric within distributed systems. We evaluate our approach on passengers' records of an urbanized city's public transportation system and correlate our findings with passengers' postings on social media microblogs. Our experiments show that the metric derived using our localization algorithm gives a better ranking of anomalies as compared to standard deviation measures from statistical models. Our case studies also demonstrate that transportation events reported in social media microblogs matches the locations of our detect anomalies, suggesting that our algorithm performs well in locating the anomalies within distributed systems

    Doctor of Philosophy

    Get PDF
    dissertationRecent years' advancements in sensing technology have generated an enormous amount of data in various fields and industries, including transportation. Public transportation systems, as a critical component within the transportation ecosystem, have also been experiencing much data growth. The availability of big data not only improves traditional transit service monitoring, but also enables high-resolution transit performance analysis that guides decision making. However, the potential of these datasets is not fully explored yet due to several challenges such as residing noises in data records and limited computational power. This dissertation tries to address three of those challenges: how to incorporate and analyze missing data due to lack of electronic footage, how to enable high-resolution performance measurements that require extensive computation, and how to interpret the high-resolution results? The first challenge was addressed in a quest to find missing data on the different fare payment methods without electronic footage, and their impact (among other factors) on bus Dwell Time (DT). Integrating information from multiple data sources, a combined approach of optimization and regression analysis was developed that offers a data-driven evaluation of existing fare payment structures and their individual effects on DT. Using the 35M bus rapid transit line operated by the Utah Transit Authority as a case study, the method demonstrates the robustness and strong predictive power in DT modeling. Then we introduce a new algorithm that is computationally elegant and mathematically efficient to address the second challenge of run-time reduction. An open-source toolbox written in C++ is developed to implement the algorithm. The toolbox is tested on the City of St. George's transit network to showcase dynamic transit accessibility analysis. The experimental evidence shows significant reduction on computational time. To address challenge three on interpreting the high-resolution transit accessibility results, the algorithm in the previous study was applied to the Salt Lake City's network to compute travel times at multiple departure times throughout the day. A series of indicators that are intuitive to interpret were developed to determine the varying causes of poor transit accessibility and identify areas with immediate needs for service improvements. This dissertation manifested that utilizing newly available datasets not only improves the resolution and accuracy of the transit service assessments, but also takes a step further to enable a comprehensive study of various factors (stop characteristics) impacting transit service efficiency and quantifying critical decision-making indices unveiling transit service effectiveness that were not possible before. Findings from this research are expected to lead to methodological advancements in data-driven approaches in public transit studies, and help transform the transit management mindset into a model of data-driven, sensing, and smart urban systems

    Detecting Outliers in Data with Correlated Measures

    Full text link
    Advances in sensor technology have enabled the collection of large-scale datasets. Such datasets can be extremely noisy and often contain a significant amount of outliers that result from sensor malfunction or human operation faults. In order to utilize such data for real-world applications, it is critical to detect outliers so that models built from these datasets will not be skewed by outliers. In this paper, we propose a new outlier detection method that utilizes the correlations in the data (e.g., taxi trip distance vs. trip time). Different from existing outlier detection methods, we build a robust regression model that explicitly models the outliers and detects outliers simultaneously with the model fitting. We validate our approach on real-world datasets against methods specifically designed for each dataset as well as the state of the art outlier detectors. Our outlier detection method achieves better performances, demonstrating the robustness and generality of our method. Last, we report interesting case studies on some outliers that result from atypical events.Comment: 10 page
    • …
    corecore