12,629 research outputs found
A General Spatio-Temporal Clustering-Based Non-local Formulation for Multiscale Modeling of Compartmentalized Reservoirs
Representing the reservoir as a network of discrete compartments with
neighbor and non-neighbor connections is a fast, yet accurate method for
analyzing oil and gas reservoirs. Automatic and rapid detection of coarse-scale
compartments with distinct static and dynamic properties is an integral part of
such high-level reservoir analysis. In this work, we present a hybrid framework
specific to reservoir analysis for an automatic detection of clusters in space
using spatial and temporal field data, coupled with a physics-based multiscale
modeling approach. In this work a novel hybrid approach is presented in which
we couple a physics-based non-local modeling framework with data-driven
clustering techniques to provide a fast and accurate multiscale modeling of
compartmentalized reservoirs. This research also adds to the literature by
presenting a comprehensive work on spatio-temporal clustering for reservoir
studies applications that well considers the clustering complexities, the
intrinsic sparse and noisy nature of the data, and the interpretability of the
outcome.
Keywords: Artificial Intelligence; Machine Learning; Spatio-Temporal
Clustering; Physics-Based Data-Driven Formulation; Multiscale Modelin
Mutual information based clustering of market basket data for profiling users
Attraction and commercial success of web sites depend heavily on the additional values visitors may find. Here, individual, automatically obtained and maintained user profiles are the key for user satisfaction. This contribution shows for the example of a cooking information site how user profiles might be obtained using category information provided by cooking recipes. It is shown that metrical distance functions and standard clustering procedures lead to erroneous results. Instead, we propose a new mutual information based clustering approach and outline its implications for the example of user profiling
Growing Regression Forests by Classification: Applications to Object Pose Estimation
In this work, we propose a novel node splitting method for regression trees
and incorporate it into the regression forest framework. Unlike traditional
binary splitting, where the splitting rule is selected from a predefined set of
binary splitting rules via trial-and-error, the proposed node splitting method
first finds clusters of the training data which at least locally minimize the
empirical loss without considering the input space. Then splitting rules which
preserve the found clusters as much as possible are determined by casting the
problem into a classification problem. Consequently, our new node splitting
method enjoys more freedom in choosing the splitting rules, resulting in more
efficient tree structures. In addition to the Euclidean target space, we
present a variant which can naturally deal with a circular target space by the
proper use of circular statistics. We apply the regression forest employing our
node splitting to head pose estimation (Euclidean target space) and car
direction estimation (circular target space) and demonstrate that the proposed
method significantly outperforms state-of-the-art methods (38.5% and 22.5%
error reduction respectively).Comment: Paper accepted by ECCV 201
Development of Multi-Locus Variable Number Tandem Repeat Analysis for Outbreak Detection of Neisseria meningitidis
Neisseria meningitidis is a major cause of septicemia and meningitis worldwide. Traditional typing methods like pulsed-field gel electrophoresis (PFGE) for identifying outbreaks are subjective and time consuming. Multi-locus variable number tandem repeats analysis (MLVA) is an objective typing method amenable to automation that has been used to type other bacterial pathogens. This report describes the development of MLVA for outbreak detection of N. meningitidis. Tandem Repeats Finder software was used to identify variable number tandem repeats (VNTRs) from 3 sequenced N. meningitidis genomes. PCR amplification of identified VNTRs was performed on DNA from 7 serogroup representative isolates. PCR products were sequenced and repeats were manually counted. VNTR loci identified by this screen were evaluated on a collection of 46 outbreak and sporadic serogroup C isolates. Alleles at each locus were concatenated to define the MLVA type for each isolate. Minimum spanning tree (MST) analysis was performed to determine the genetic relationships among the isolates. The genetic distance was defined as the summed tandem repeat difference (STRD) between isolates MLVA types. Outbreak clusters were defined by a STRD less than or equal to 3. These data was compared to PFGE data to determine the utility of MLVA for outbreak detection. Twenty-one VNTR loci with variable copy numbers among the sequenced genomes were identified that met the established criteria of short repeat length and consensus sequence > 85%. Seven VNTR loci were reliably amplified among the 7 serogroups tested. These loci had repeat lengths between 4 and 20 nucleotides and exhibited between 10 and 26 alleles among 61 isolates belonging to 7 different serogroups. MST analysis with 7 loci differentiated serogroups, discriminated sporadic isolates and identified 7 out of 8 serogroup C outbreaks. In summary, MLVA with 5 VNTR loci distinguished N. meningitidis isolates from 7 different serogroups and sporadic isolates within each serogroup. In addition, MLVA identified 88% of PFGE-defined serogroup C outbreaks. Further investigation of these and other outbreak-associated isolates is necessary to define the optimal combination of VNTR loci and to evaluate MST analysis criteria in order to determine the utility of MLVA for N. meningitidis outbreak detection
Challenges in Short Text Classification: The Case of Online Auction Disclosure
Text classification is an important research problem in many fields. We examine a special case of textual content namely, short text. Examples of short text appear in a number of contexts such as online reviews, chat messages, twitter feeds, etc. In this research, we examine short text for the purpose of classification in internet auctions. The “ask seller a question” forum of a large horizontal intermediary auction platform is used to conduct this research. We describe our approach to classification by examining various solution methods to the problem. The unsupervised K-Medoids clustering algorithm provides useful but limited insights into keywords extraction while the supervised Naïve Bayes algorithm successfully achieves on average, around 65% classification accuracy. We then present a score assigning approach to this issue which outperforms the other two methods. Finally, we discuss how our approach to short text classification can be used to analyse the effectiveness of internet auctions
Integrated data-driven techniques for environmental pollution monitoring
The adverse health e_x000B_ffects of tropospheric ozone around urban zones indicate a substantial risk for many segments of the population. This necessitates the short term forecast in order to take evasive action on days conducive to ozone formation. Therefore it is important to study the ozone formation mechanisms and predict the ozone levels in a geographic region. Multivariate statistical techniques provide a very e_x000B_ffective framework for the classifi_x000C_cation and monitoring of systems with multiple variables. Cluster analysis, sequence analysis and hidden Markov models (HMMs) are statistical methods which have been used in a wide range of studies to model the data structure. In this dissertation, we propose to formulate, implement and apply a data-driven computational framework for air quality monitoring and forecasting with application to ozone formation. The proposed framework integrates, in a unique way, advanced statistical data processing and analysis tools to investigate ozone formation mechanisms and predict the ozone levels in a geographic region. This dissertation focuses on cluster analysis for identi_x000C_fication and classi_x000C_fication of underlying mechanisms of a system and HMMs for predicting the occurrence of an extreme event in a system. The usefulness of the proposed methodology in air quality monitoring is demonstrated by applying it to study the ozone problem in Houston, Texas and Baton Rouge, Louisiana regions. Hierarchical clustering is used to visualize air flow patterns at two time scales relevant for ozone buildup. First, clustering is performed at the hourly time scale to identify surface flow patterns. Then, sequencing is performed at the daily time scale to identify groups of days sharing similar diurnal cycles for the surface flow. Selection of appropriate numbers of air flow patterns allowed inference of regional transport and dispersion patterns for understanding population exposure to ozone. This dissertation proposes to build HMMs for ozone prediction using air quality and meteorological measurements obtained from a network of surface monitors. The case study of the Houston, Texas region for the 2004 and 2005 ozone seasons showed that the results indicate the capability of HMMs as a simpler forecasting tool
- …