439,839 research outputs found

    Understanding Mobile Traffic Patterns of Large Scale Cellular Towers in Urban Environment

    Full text link
    Understanding mobile traffic patterns of large scale cellular towers in urban environment is extremely valuable for Internet service providers, mobile users, and government managers of modern metropolis. This paper aims at extracting and modeling the traffic patterns of large scale towers deployed in a metropolitan city. To achieve this goal, we need to address several challenges, including lack of appropriate tools for processing large scale traffic measurement data, unknown traffic patterns, as well as handling complicated factors of urban ecology and human behaviors that affect traffic patterns. Our core contribution is a powerful model which combines three dimensional information (time, locations of towers, and traffic frequency spectrum) to extract and model the traffic patterns of thousands of cellular towers. Our empirical analysis reveals the following important observations. First, only five basic time-domain traffic patterns exist among the 9,600 cellular towers. Second, each of the extracted traffic pattern maps to one type of geographical locations related to urban ecology, including residential area, business district, transport, entertainment, and comprehensive area. Third, our frequency-domain traffic spectrum analysis suggests that the traffic of any tower among the 9,600 can be constructed using a linear combination of four primary components corresponding to human activity behaviors. We believe that the proposed traffic patterns extraction and modeling methodology, combined with the empirical analysis on the mobile traffic, pave the way toward a deep understanding of the traffic patterns of large scale cellular towers in modern metropolis.Comment: To appear at IMC 201

    Statistical Traffic State Analysis in Large-scale Transportation Networks Using Locality-Preserving Non-negative Matrix Factorization

    Get PDF
    Statistical traffic data analysis is a hot topic in traffic management and control. In this field, current research progresses focus on analyzing traffic flows of individual links or local regions in a transportation network. Less attention are paid to the global view of traffic states over the entire network, which is important for modeling large-scale traffic scenes. Our aim is precisely to propose a new methodology for extracting spatio-temporal traffic patterns, ultimately for modeling large-scale traffic dynamics, and long-term traffic forecasting. We attack this issue by utilizing Locality-Preserving Non-negative Matrix Factorization (LPNMF) to derive low-dimensional representation of network-level traffic states. Clustering is performed on the compact LPNMF projections to unveil typical spatial patterns and temporal dynamics of network-level traffic states. We have tested the proposed method on simulated traffic data generated for a large-scale road network, and reported experimental results validate the ability of our approach for extracting meaningful large-scale space-time traffic patterns. Furthermore, the derived clustering results provide an intuitive understanding of spatial-temporal characteristics of traffic flows in the large-scale network, and a basis for potential long-term forecasting.Comment: IET Intelligent Transport Systems (2013

    Model development and validation of a ceramic core injection molding process

    Get PDF
    In this study, a thermal-fluid finite element analysis software package calledProCAST was used to validate computer modeling of a ceramic injection molding process by comparing the modeling results with experimental data. The experiments were performed in an industrial environment under actual manufacturing conditions.The computer modeling was performed at the Advanced Casting Simulation and MoldDesign Laboratory at the University of Tennessee, Knoxville. Thermocouples were used to collect the experimental cooling curve data. The results from the computer simulations were then validated by comparing them to the experimental temperature data. The ceramic parts that were modeled are highly intricate three-dimensional parts. The research modeled filling and heat transfer with solidification and included three-dimensional, transient, and non-riewtoniari effects. Five different computer simulations were run with varying interfacial thermal boundary conditions and a uniform steady inlet velocity. Various filling patterns and shear rate heating were observed as different interfacial thermal boundary conditions were used in the computer simulations. The modeling results that agree most with the experimental results have heat transfer coefficients of 1,800 W/(m2 K) and 2,200 W/(m2K)

    Quantifying alternative splicing from paired-end RNA-sequencing data

    Full text link
    RNA-sequencing has revolutionized biomedical research and, in particular, our ability to study gene alternative splicing. The problem has important implications for human health, as alternative splicing may be involved in malfunctions at the cellular level and multiple diseases. However, the high-dimensional nature of the data and the existence of experimental biases pose serious data analysis challenges. We find that the standard data summaries used to study alternative splicing are severely limited, as they ignore a substantial amount of valuable information. Current data analysis methods are based on such summaries and are hence suboptimal. Further, they have limited flexibility in accounting for technical biases. We propose novel data summaries and a Bayesian modeling framework that overcome these limitations and determine biases in a nonparametric, highly flexible manner. These summaries adapt naturally to the rapid improvements in sequencing technology. We provide efficient point estimates and uncertainty assessments. The approach allows to study alternative splicing patterns for individual samples and can also be the basis for downstream analyses. We found a severalfold improvement in estimation mean square error compared popular approaches in simulations, and substantially higher consistency between replicates in experimental data. Our findings indicate the need for adjusting the routine summarization and analysis of alternative splicing RNA-seq studies. We provide a software implementation in the R package casper.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS687 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org). With correction

    Scale-dependent heterogeneity in fracture data sets and grayscale images

    Get PDF
    Lacunarity is a technique developed for multiscale analysis of spatial data and can quantify scale-dependent heterogeneity in a dataset. The present research is based on characterizing fracture data of various types by invoking lacunarity as a concept that can not only be applied to both fractal and non-fractal binary data but can also be extended to analyzing non-binary data sets comprising a spectrum of values between 0 and 1. Lacunarity has been variously modified in characterizing fracture data from maps and scanlines in tackling five different problems. In Chapter 2, it is shown that normalized lacunarity curves can differentiate between maps (2-dimensional binary data) belonging to the same fractal-fracture system and that clustering increases with decreasing spatial scale. Chapter 4 analyzes spacing data from scanlines (1-dimensional binary data) and employs log-transformed lacunarity curves along with their 1st derivatives in identifying the presence of fracture clusters and their spatial organization. This technique is extended to 1-dimensional non-binary data in chapter 5 where spacing is integrated with aperture values and a lacunarity ratio is invoked in addressing the question of whether large fractures occur within clusters. Finally, it is investigated in chapter 6 if lacunarity can find differences in clustering along various directions of a fracture netowork thus identifying differentially-clustered fracture sets. In addition to fracture data, chapter 3 employs lacunarity in identifying clustering and multifractal behavior in synthetic and natural 2-dimensional non-binary patterns in the form of soil thin sections. Future avenues for research include estimation of 2-dimensional clustering from 1-dimensional samples (e.g., scanlines and well-data), forward modeling of fracture networks using lacunarity, and the possible application of lacunarity in delineating shapes of other geologic patterns such as channel beds

    Multivariate Mixed Data Mining with Gifi System using Genetic Algorithm and Information Complexity

    Get PDF
    Statistical analysis is very much dependent on the quality and type of a data set. There are three types of data - continuous, categorical and mixed. Of these three types, statistical modeling on a mixed data had been a challenging job for a long time. This is due to the fact that most of the traditional statistical techniques are defined either for purely continuous data or for purely categorical data but not mixed data. In reality, most of the data sets are neither continuous nor categorical in a pure sense but are in mixed form which makes the statistical analysis quite difficult. For instance, in the medical sector where classification of the data is very important, presence of many categorical and continuous predictors results in a poor model. In the insurance and finance sectors, lots of categorical and continuous data are collected on customers for targeted marketing, detection of suspicious insurance claims, actuarial modeling, risk analysis, modeling of financial derivatives, detection of profitable zones etc. In this work, we bring together several relatively new developments in statistical model selection and data mining. In this work, we address two problems. The first problem is to determine the optimal number of mixtures from a multivariate Bernoulli distributed data using genetic algorithm and Bozdogan\u27s information complexity, ICOMP. We show that the results of the maximum likelihood values are not just sufficient in determining the optimal number of mixtures. We also address the issue of high dimensional binary data using a genetic algorithm to determine the optimal predictors. Finally, we show the results of our algorithm on a simulated and two real data sets. The second problem is to discovering interesting patterns from a complicated mixed data set. Since mixed data are a combination of continuous and categorical variables, we trans- form the non linear categorical variables to a linear scale by a mechanism called Gifi transformation, [Gifi, 1989]. Once the non linear variables are transformed to a linear scale (Euclidean space), we apply several classical multivariate techniques on the transformed continuous data to identify the unusual patterns. The advantage with this transformation is that it has a one-to-one mapping mechanism. Hence, the transformed set of continuous value(s) in the Gifi space can be remapped to a unique set of categorical value(s) in the original space. Once the data is transformed to the Gifi space, we implement various statistical techniques to identify interesting patterns. We also address the problem of high dimensional data using genetic algorithm for variable selection and Bozdogan\u27s information complexity (ICOMP) as our fitness function. We present details of our newly-developed Matlab toolbox, called Gifi System, that implements everything presented, and can readily be extended to add new functionality. Finally, results on both simulated and real world data sets are presented and discussed. Keywords: Gifi, homals, regression, multivariate logistic regression, fraud detection, medical diagnostics, supervised classification, unsupervised classification, variable selection, high dimensional data mining, stock market trading, detection of suspicious insurance claim estimates

    Adaptive learning for event modeling and pattern classification

    Get PDF
    It is crucial to detect, characterize and model events of interest in a new propulsion system. As technology advances, the amount of data being generated increases significantly with respect to time. This increase substantially strains our ability to interpret the data at an equivalent rate. It demands efficient methodologies and algorithms in the development of automated event modeling and pattern recognition to detect and characterize events of interest and correlate them to the system performance. The fact that the information required to properly evaluate system performance and health is seldom known in advance further exacerbates this issue. Event modeling and detection is essentially a discovery problem and involves the use of techniques in the pattern classification domain, specifically the use of cluster analysis if a prior information is unknown. In this dissertation, a framework of Adaptive Learning for Event Modeling and Characterization (ALEC) system is proposed to deal with this problem. Within this framework, a wavelet-based hierarchical fuzzy clustering approach which integrates several advanced technologies and overcomes the disadvantages of traditional clustering algorithms is developed to make the implementation of the system effective and computationally efficient. In another separate but related research, a generalized multi-dimensional Gaussian membership function is constructed and formulated to make the fuzzy classification of blade engine damage modes among a group of engines containing historical flight data after Principal Component Analysis (PCA) is applied to reduce the excessive dimensionality. This approach can be effectively used to deal with classification of patterns with overlapping structures in which some patterns fall into more than one classes or categories
    • …
    corecore