904 research outputs found
Routine pattern discovery and anomaly detection in individual travel behavior
Discovering patterns and detecting anomalies in individual travel behavior is
a crucial problem in both research and practice. In this paper, we address this
problem by building a probabilistic framework to model individual
spatiotemporal travel behavior data (e.g., trip records and trajectory data).
We develop a two-dimensional latent Dirichlet allocation (LDA) model to
characterize the generative mechanism of spatiotemporal trip records of each
traveler. This model introduces two separate factor matrices for the spatial
dimension and the temporal dimension, respectively, and use a two-dimensional
core structure at the individual level to effectively model the joint
interactions and complex dependencies. This model can efficiently summarize
travel behavior patterns on both spatial and temporal dimensions from very
sparse trip sequences in an unsupervised way. In this way, complex travel
behavior can be modeled as a mixture of representative and interpretable
spatiotemporal patterns. By applying the trained model on future/unseen
spatiotemporal records of a traveler, we can detect her behavior anomalies by
scoring those observations using perplexity. We demonstrate the effectiveness
of the proposed modeling framework on a real-world license plate recognition
(LPR) data set. The results confirm the advantage of statistical learning
methods in modeling sparse individual travel behavior data. This type of
pattern discovery and anomaly detection applications can provide useful
insights for traffic monitoring, law enforcement, and individual travel
behavior profiling
Methods for Large Scale Hydraulic Fracture Monitoring
In this paper we propose computationally efficient and robust methods for
estimating the moment tensor and location of micro-seismic event(s) for large
search volumes. Our contribution is two-fold. First, we propose a novel
joint-complexity measure, namely the sum of nuclear norms which while imposing
sparsity on the number of fractures (locations) over a large spatial volume,
also captures the rank-1 nature of the induced wavefield pattern. This
wavefield pattern is modeled as the outer-product of the source signature with
the amplitude pattern across the receivers from a seismic source. A rank-1
factorization of the estimated wavefield pattern at each location can therefore
be used to estimate the seismic moment tensor using the knowledge of the array
geometry. In contrast to existing work this approach allows us to drop any
other assumption on the source signature. Second, we exploit the recently
proposed first-order incremental projection algorithms for a fast and efficient
implementation of the resulting optimization problem and develop a hybrid
stochastic & deterministic algorithm which results in significant computational
savings.Comment: arXiv admin note: text overlap with arXiv:1305.006
Automatic Objects Removal for Scene Completion
With the explosive growth of web-based cameras and mobile devices, billions
of photographs are uploaded to the internet. We can trivially collect a huge
number of photo streams for various goals, such as 3D scene reconstruction and
other big data applications. However, this is not an easy task due to the fact
the retrieved photos are neither aligned nor calibrated. Furthermore, with the
occlusion of unexpected foreground objects like people, vehicles, it is even
more challenging to find feature correspondences and reconstruct realistic
scenes. In this paper, we propose a structure based image completion algorithm
for object removal that produces visually plausible content with consistent
structure and scene texture. We use an edge matching technique to infer the
potential structure of the unknown region. Driven by the estimated structure,
texture synthesis is performed automatically along the estimated curves. We
evaluate the proposed method on different types of images: from highly
structured indoor environment to the natural scenes. Our experimental results
demonstrate satisfactory performance that can be potentially used for
subsequent big data processing: 3D scene reconstruction and location
recognition.Comment: 6 pages, IEEE International Conference on Computer Communications
(INFOCOM 14), Workshop on Security and Privacy in Big Data, Toronto, Canada,
201
A method for extracting travel patterns using data polishing
With recent developments in ICT, the interest in using large amounts of accumulated data for traffic policy planning has increased significantly. In recent years, data polishing has been proposed as a new method of big data analysis. Data polishing is a graphical clustering method, which can be used to extract patterns that are similar or related to each other by identifying the cluster structures present in the data. The purpose of this study is to identify the travel patterns of railway passengers by applying data polishing to smart card data collected in the Kagawa Prefecture, Japan. To this end, we consider 9,008,709 data points collected over a period of 15 months, ranging from December 1st, 2013 to February 28th, 2015. This dataset includes various types of information, including trip histories and types of passengers. This study implements data polishing to cluster 4,667,520 combinations of information regarding individual rides in terms of the day of the week, the time of the day, passenger types, and origin and destination stations. Via the analysis, 127 characteristic travel patterns are identified in aggregate
On the dynamics of interdomain routing in the Internet
The routes used in the Internet's interdomain routing system are a rich
information source that could be exploited to answer a wide range of
questions. However, analyzing routes is difficult, because the fundamental
object of study is a set of paths. In this dissertation, we present new
analysis tools -- metrics and methods -- for analyzing paths, and apply them
to study interdomain routing in the Internet over long periods of time.
Our contributions are threefold. First, we build on an existing metric (Routing
State Distance) to define a new metric that allows us to measure the similarity
between two prefixes with respect to the state of the global routing system.
Applying this metric over time yields a measure of how the set of paths to each
prefix varies at a given timescale. Second, we present PathMiner, a system to
extract large scale routing events from background noise and identify the AS
(Autonomous System) or AS-link most likely responsible for the event. PathMiner
is distinguished from previous work in its ability to identify and analyze
large-scale events that may re-occur many times over long timescales. We show
that it is scalable, being able to extract significant events from multiple
years of routing data at a daily granularity. Finally, we equip Routing State
Distance with a new set of tools for identifying and characterizing
unusually-routed ASes. At the micro level, we use our tools to identify
clusters of ASes that have the most unusual routing at each time. We also show
that analysis of individual ASes can expose business and engineering strategies
of the organizations owning the ASes. These strategies are often related to
content delivery or service replication. At the macro level, we show that the
set of ASes with the most unusual routing defines discernible and interpretable
phases of the Internet's evolution. Furthermore, we show that our tools can be
used to provide a quantitative measure of the "flattening" of the Internet
Computational methods to predict and enhance decision-making with biomedical data.
The proposed research applies machine learning techniques to healthcare applications. The core ideas were using intelligent techniques to find automatic methods to analyze healthcare applications. Different classification and feature extraction techniques on various clinical datasets are applied. The datasets include: brain MR images, breathing curves from vessels around tumor cells during in time, breathing curves extracted from patients with successful or rejected lung transplants, and lung cancer patients diagnosed in US from in 2004-2009 extracted from SEER database. The novel idea on brain MR images segmentation is to develop a multi-scale technique to segment blood vessel tissues from similar tissues in the brain. By analyzing the vascularization of the cancer tissue during time and the behavior of vessels (arteries and veins provided in time), a new feature extraction technique developed and classification techniques was used to rank the vascularization of each tumor type. Lung transplantation is a critical surgery for which predicting the acceptance or rejection of the transplant would be very important. A review of classification techniques on the SEER database was developed to analyze the survival rates of lung cancer patients, and the best feature vector that can be used to predict the most similar patients are analyzed
- …