904 research outputs found

    Routine pattern discovery and anomaly detection in individual travel behavior

    Full text link
    Discovering patterns and detecting anomalies in individual travel behavior is a crucial problem in both research and practice. In this paper, we address this problem by building a probabilistic framework to model individual spatiotemporal travel behavior data (e.g., trip records and trajectory data). We develop a two-dimensional latent Dirichlet allocation (LDA) model to characterize the generative mechanism of spatiotemporal trip records of each traveler. This model introduces two separate factor matrices for the spatial dimension and the temporal dimension, respectively, and use a two-dimensional core structure at the individual level to effectively model the joint interactions and complex dependencies. This model can efficiently summarize travel behavior patterns on both spatial and temporal dimensions from very sparse trip sequences in an unsupervised way. In this way, complex travel behavior can be modeled as a mixture of representative and interpretable spatiotemporal patterns. By applying the trained model on future/unseen spatiotemporal records of a traveler, we can detect her behavior anomalies by scoring those observations using perplexity. We demonstrate the effectiveness of the proposed modeling framework on a real-world license plate recognition (LPR) data set. The results confirm the advantage of statistical learning methods in modeling sparse individual travel behavior data. This type of pattern discovery and anomaly detection applications can provide useful insights for traffic monitoring, law enforcement, and individual travel behavior profiling

    Methods for Large Scale Hydraulic Fracture Monitoring

    Full text link
    In this paper we propose computationally efficient and robust methods for estimating the moment tensor and location of micro-seismic event(s) for large search volumes. Our contribution is two-fold. First, we propose a novel joint-complexity measure, namely the sum of nuclear norms which while imposing sparsity on the number of fractures (locations) over a large spatial volume, also captures the rank-1 nature of the induced wavefield pattern. This wavefield pattern is modeled as the outer-product of the source signature with the amplitude pattern across the receivers from a seismic source. A rank-1 factorization of the estimated wavefield pattern at each location can therefore be used to estimate the seismic moment tensor using the knowledge of the array geometry. In contrast to existing work this approach allows us to drop any other assumption on the source signature. Second, we exploit the recently proposed first-order incremental projection algorithms for a fast and efficient implementation of the resulting optimization problem and develop a hybrid stochastic & deterministic algorithm which results in significant computational savings.Comment: arXiv admin note: text overlap with arXiv:1305.006

    Automatic Objects Removal for Scene Completion

    Get PDF
    With the explosive growth of web-based cameras and mobile devices, billions of photographs are uploaded to the internet. We can trivially collect a huge number of photo streams for various goals, such as 3D scene reconstruction and other big data applications. However, this is not an easy task due to the fact the retrieved photos are neither aligned nor calibrated. Furthermore, with the occlusion of unexpected foreground objects like people, vehicles, it is even more challenging to find feature correspondences and reconstruct realistic scenes. In this paper, we propose a structure based image completion algorithm for object removal that produces visually plausible content with consistent structure and scene texture. We use an edge matching technique to infer the potential structure of the unknown region. Driven by the estimated structure, texture synthesis is performed automatically along the estimated curves. We evaluate the proposed method on different types of images: from highly structured indoor environment to the natural scenes. Our experimental results demonstrate satisfactory performance that can be potentially used for subsequent big data processing: 3D scene reconstruction and location recognition.Comment: 6 pages, IEEE International Conference on Computer Communications (INFOCOM 14), Workshop on Security and Privacy in Big Data, Toronto, Canada, 201

    A method for extracting travel patterns using data polishing

    Get PDF
    With recent developments in ICT, the interest in using large amounts of accumulated data for traffic policy planning has increased significantly. In recent years, data polishing has been proposed as a new method of big data analysis. Data polishing is a graphical clustering method, which can be used to extract patterns that are similar or related to each other by identifying the cluster structures present in the data. The purpose of this study is to identify the travel patterns of railway passengers by applying data polishing to smart card data collected in the Kagawa Prefecture, Japan. To this end, we consider 9,008,709 data points collected over a period of 15 months, ranging from December 1st, 2013 to February 28th, 2015. This dataset includes various types of information, including trip histories and types of passengers. This study implements data polishing to cluster 4,667,520 combinations of information regarding individual rides in terms of the day of the week, the time of the day, passenger types, and origin and destination stations. Via the analysis, 127 characteristic travel patterns are identified in aggregate

    On the dynamics of interdomain routing in the Internet

    Full text link
    The routes used in the Internet's interdomain routing system are a rich information source that could be exploited to answer a wide range of questions.  However, analyzing routes is difficult, because the fundamental object of study is a set of paths. In this dissertation, we present new analysis tools -- metrics and methods -- for analyzing paths, and apply them to study interdomain routing in the Internet over long periods of time. Our contributions are threefold. First, we build on an existing metric (Routing State Distance) to define a new metric that allows us to measure the similarity between two prefixes with respect to the state of the global routing system. Applying this metric over time yields a measure of how the set of paths to each prefix varies at a given timescale. Second, we present PathMiner, a system to extract large scale routing events from background noise and identify the AS (Autonomous System) or AS-link most likely responsible for the event. PathMiner is distinguished from previous work in its ability to identify and analyze large-scale events that may re-occur many times over long timescales. We show that it is scalable, being able to extract significant events from multiple years of routing data at a daily granularity. Finally, we equip Routing State Distance with a new set of tools for identifying and characterizing unusually-routed ASes. At the micro level, we use our tools to identify clusters of ASes that have the most unusual routing at each time. We also show that analysis of individual ASes can expose business and engineering strategies of the organizations owning the ASes.  These strategies are often related to content delivery or service replication. At the macro level, we show that the set of ASes with the most unusual routing defines discernible and interpretable phases of the Internet's evolution. Furthermore, we show that our tools can be used to provide a quantitative measure of the "flattening" of the Internet

    Computational methods to predict and enhance decision-making with biomedical data.

    Get PDF
    The proposed research applies machine learning techniques to healthcare applications. The core ideas were using intelligent techniques to find automatic methods to analyze healthcare applications. Different classification and feature extraction techniques on various clinical datasets are applied. The datasets include: brain MR images, breathing curves from vessels around tumor cells during in time, breathing curves extracted from patients with successful or rejected lung transplants, and lung cancer patients diagnosed in US from in 2004-2009 extracted from SEER database. The novel idea on brain MR images segmentation is to develop a multi-scale technique to segment blood vessel tissues from similar tissues in the brain. By analyzing the vascularization of the cancer tissue during time and the behavior of vessels (arteries and veins provided in time), a new feature extraction technique developed and classification techniques was used to rank the vascularization of each tumor type. Lung transplantation is a critical surgery for which predicting the acceptance or rejection of the transplant would be very important. A review of classification techniques on the SEER database was developed to analyze the survival rates of lung cancer patients, and the best feature vector that can be used to predict the most similar patients are analyzed
    corecore