439,839 research outputs found
Understanding Mobile Traffic Patterns of Large Scale Cellular Towers in Urban Environment
Understanding mobile traffic patterns of large scale cellular towers in urban
environment is extremely valuable for Internet service providers, mobile users,
and government managers of modern metropolis. This paper aims at extracting and
modeling the traffic patterns of large scale towers deployed in a metropolitan
city. To achieve this goal, we need to address several challenges, including
lack of appropriate tools for processing large scale traffic measurement data,
unknown traffic patterns, as well as handling complicated factors of urban
ecology and human behaviors that affect traffic patterns. Our core contribution
is a powerful model which combines three dimensional information (time,
locations of towers, and traffic frequency spectrum) to extract and model the
traffic patterns of thousands of cellular towers. Our empirical analysis
reveals the following important observations. First, only five basic
time-domain traffic patterns exist among the 9,600 cellular towers. Second,
each of the extracted traffic pattern maps to one type of geographical
locations related to urban ecology, including residential area, business
district, transport, entertainment, and comprehensive area. Third, our
frequency-domain traffic spectrum analysis suggests that the traffic of any
tower among the 9,600 can be constructed using a linear combination of four
primary components corresponding to human activity behaviors. We believe that
the proposed traffic patterns extraction and modeling methodology, combined
with the empirical analysis on the mobile traffic, pave the way toward a deep
understanding of the traffic patterns of large scale cellular towers in modern
metropolis.Comment: To appear at IMC 201
Statistical Traffic State Analysis in Large-scale Transportation Networks Using Locality-Preserving Non-negative Matrix Factorization
Statistical traffic data analysis is a hot topic in traffic management and
control. In this field, current research progresses focus on analyzing traffic
flows of individual links or local regions in a transportation network. Less
attention are paid to the global view of traffic states over the entire
network, which is important for modeling large-scale traffic scenes. Our aim is
precisely to propose a new methodology for extracting spatio-temporal traffic
patterns, ultimately for modeling large-scale traffic dynamics, and long-term
traffic forecasting. We attack this issue by utilizing Locality-Preserving
Non-negative Matrix Factorization (LPNMF) to derive low-dimensional
representation of network-level traffic states. Clustering is performed on the
compact LPNMF projections to unveil typical spatial patterns and temporal
dynamics of network-level traffic states. We have tested the proposed method on
simulated traffic data generated for a large-scale road network, and reported
experimental results validate the ability of our approach for extracting
meaningful large-scale space-time traffic patterns. Furthermore, the derived
clustering results provide an intuitive understanding of spatial-temporal
characteristics of traffic flows in the large-scale network, and a basis for
potential long-term forecasting.Comment: IET Intelligent Transport Systems (2013
Model development and validation of a ceramic core injection molding process
In this study, a thermal-fluid finite element analysis software package calledProCAST was used to validate computer modeling of a ceramic injection molding process by comparing the modeling results with experimental data. The experiments were performed in an industrial environment under actual manufacturing conditions.The computer modeling was performed at the Advanced Casting Simulation and MoldDesign Laboratory at the University of Tennessee, Knoxville. Thermocouples were used to collect the experimental cooling curve data. The results from the computer simulations were then validated by comparing them to the experimental temperature data. The ceramic parts that were modeled are highly intricate three-dimensional parts. The research modeled filling and heat transfer with solidification and included three-dimensional, transient, and non-riewtoniari effects. Five different computer simulations were run with varying interfacial thermal boundary conditions and a uniform steady inlet velocity. Various filling patterns and shear rate heating were observed as different interfacial thermal boundary conditions were used in the computer simulations. The modeling results that agree most with the experimental results have heat transfer coefficients of 1,800 W/(m2 K) and 2,200 W/(m2K)
Quantifying alternative splicing from paired-end RNA-sequencing data
RNA-sequencing has revolutionized biomedical research and, in particular, our
ability to study gene alternative splicing. The problem has important
implications for human health, as alternative splicing may be involved in
malfunctions at the cellular level and multiple diseases. However, the
high-dimensional nature of the data and the existence of experimental biases
pose serious data analysis challenges. We find that the standard data summaries
used to study alternative splicing are severely limited, as they ignore a
substantial amount of valuable information. Current data analysis methods are
based on such summaries and are hence suboptimal. Further, they have limited
flexibility in accounting for technical biases. We propose novel data summaries
and a Bayesian modeling framework that overcome these limitations and determine
biases in a nonparametric, highly flexible manner. These summaries adapt
naturally to the rapid improvements in sequencing technology. We provide
efficient point estimates and uncertainty assessments. The approach allows to
study alternative splicing patterns for individual samples and can also be the
basis for downstream analyses. We found a severalfold improvement in estimation
mean square error compared popular approaches in simulations, and substantially
higher consistency between replicates in experimental data. Our findings
indicate the need for adjusting the routine summarization and analysis of
alternative splicing RNA-seq studies. We provide a software implementation in
the R package casper.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS687 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org). With correction
Scale-dependent heterogeneity in fracture data sets and grayscale images
Lacunarity is a technique developed for multiscale analysis of spatial data and can quantify scale-dependent heterogeneity in a dataset. The present research is based on characterizing fracture data of various types by invoking lacunarity as a concept that can not only be applied to both fractal and non-fractal binary data but can also be extended to analyzing non-binary data sets comprising a spectrum of values between 0 and 1. Lacunarity has been variously modified in characterizing fracture data from maps and scanlines in tackling five different problems. In Chapter 2, it is shown that normalized lacunarity curves can differentiate between maps (2-dimensional binary data) belonging to the same fractal-fracture system and that clustering increases with decreasing spatial scale. Chapter 4 analyzes spacing data from scanlines (1-dimensional binary data) and employs log-transformed lacunarity curves along with their 1st derivatives in identifying the presence of fracture clusters and their spatial organization. This technique is extended to 1-dimensional non-binary data in chapter 5 where spacing is integrated with aperture values and a lacunarity ratio is invoked in addressing the question of whether large fractures occur within clusters. Finally, it is investigated in chapter 6 if lacunarity can find differences in clustering along various directions of a fracture netowork thus identifying differentially-clustered fracture sets. In addition to fracture data, chapter 3 employs lacunarity in identifying clustering and multifractal behavior in synthetic and natural 2-dimensional non-binary patterns in the form of soil thin sections. Future avenues for research include estimation of 2-dimensional clustering from 1-dimensional samples (e.g., scanlines and well-data), forward modeling of fracture networks using lacunarity, and the possible application of lacunarity in delineating shapes of other geologic patterns such as channel beds
Multivariate Mixed Data Mining with Gifi System using Genetic Algorithm and Information Complexity
Statistical analysis is very much dependent on the quality and type of a data set. There are three types of data - continuous, categorical and mixed. Of these three types, statistical modeling on a mixed data had been a challenging job for a long time. This is due to the fact that most of the traditional statistical techniques are defined either for purely continuous data or for purely categorical data but not mixed data. In reality, most of the data sets are neither continuous nor categorical in a pure sense but are in mixed form which makes the statistical analysis quite difficult. For instance, in the medical sector where classification of the data is very important, presence of many categorical and continuous predictors results in a poor model. In the insurance and finance sectors, lots of categorical and continuous data are collected on customers for targeted marketing, detection of suspicious insurance claims, actuarial modeling, risk analysis, modeling of financial derivatives, detection of profitable zones etc.
In this work, we bring together several relatively new developments in statistical model selection and data mining. In this work, we address two problems. The first problem is to determine the optimal number of mixtures from a multivariate Bernoulli distributed data using genetic algorithm and Bozdogan\u27s information complexity, ICOMP. We show that the results of the maximum likelihood values are not just sufficient in determining the optimal number of mixtures. We also address the issue of high dimensional binary data using a genetic algorithm to determine the optimal predictors. Finally, we show the results of our algorithm on a simulated and two real data sets.
The second problem is to discovering interesting patterns from a complicated mixed data set. Since mixed data are a combination of continuous and categorical variables, we trans- form the non linear categorical variables to a linear scale by a mechanism called Gifi transformation, [Gifi, 1989]. Once the non linear variables are transformed to a linear scale (Euclidean space), we apply several classical multivariate techniques on the transformed continuous data to identify the unusual patterns. The advantage with this transformation is that it has a one-to-one mapping mechanism. Hence, the transformed set of continuous value(s) in the Gifi space can be remapped to a unique set of categorical value(s) in the original space. Once the data is transformed to the Gifi space, we implement various statistical techniques to identify interesting patterns. We also address the problem of high dimensional data using genetic algorithm for variable selection and Bozdogan\u27s information complexity (ICOMP) as our fitness function.
We present details of our newly-developed Matlab toolbox, called Gifi System, that implements everything presented, and can readily be extended to add new functionality. Finally, results on both simulated and real world data sets are presented and discussed.
Keywords: Gifi, homals, regression, multivariate logistic regression, fraud detection, medical diagnostics, supervised classification, unsupervised classification, variable selection, high dimensional data mining, stock market trading, detection of suspicious insurance claim estimates
Adaptive learning for event modeling and pattern classification
It is crucial to detect, characterize and model events of interest in a new propulsion system. As technology advances, the amount of data being generated increases significantly with respect to time. This increase substantially strains our ability to interpret the data at an equivalent rate. It demands efficient methodologies and algorithms in the development of automated event modeling and pattern recognition to detect and characterize events of interest and correlate them to the system performance. The fact that the information required to properly evaluate system performance and health is seldom known in advance further exacerbates this issue.
Event modeling and detection is essentially a discovery problem and involves the use of techniques in the pattern classification domain, specifically the use of cluster analysis if a prior information is unknown. In this dissertation, a framework of Adaptive Learning for Event Modeling and Characterization (ALEC) system is proposed to deal with this problem. Within this framework, a wavelet-based hierarchical fuzzy clustering approach which integrates several advanced technologies and overcomes the disadvantages of traditional clustering algorithms is developed to make the implementation of the system effective and computationally efficient.
In another separate but related research, a generalized multi-dimensional Gaussian membership function is constructed and formulated to make the fuzzy classification of blade engine damage modes among a group of engines containing historical flight data after Principal Component Analysis (PCA) is applied to reduce the excessive dimensionality. This approach can be effectively used to deal with classification of patterns with overlapping structures in which some patterns fall into more than one classes or categories
- …