12,629 research outputs found

    A General Spatio-Temporal Clustering-Based Non-local Formulation for Multiscale Modeling of Compartmentalized Reservoirs

    Full text link
    Representing the reservoir as a network of discrete compartments with neighbor and non-neighbor connections is a fast, yet accurate method for analyzing oil and gas reservoirs. Automatic and rapid detection of coarse-scale compartments with distinct static and dynamic properties is an integral part of such high-level reservoir analysis. In this work, we present a hybrid framework specific to reservoir analysis for an automatic detection of clusters in space using spatial and temporal field data, coupled with a physics-based multiscale modeling approach. In this work a novel hybrid approach is presented in which we couple a physics-based non-local modeling framework with data-driven clustering techniques to provide a fast and accurate multiscale modeling of compartmentalized reservoirs. This research also adds to the literature by presenting a comprehensive work on spatio-temporal clustering for reservoir studies applications that well considers the clustering complexities, the intrinsic sparse and noisy nature of the data, and the interpretability of the outcome. Keywords: Artificial Intelligence; Machine Learning; Spatio-Temporal Clustering; Physics-Based Data-Driven Formulation; Multiscale Modelin

    Mutual information based clustering of market basket data for profiling users

    Get PDF
    Attraction and commercial success of web sites depend heavily on the additional values visitors may find. Here, individual, automatically obtained and maintained user profiles are the key for user satisfaction. This contribution shows for the example of a cooking information site how user profiles might be obtained using category information provided by cooking recipes. It is shown that metrical distance functions and standard clustering procedures lead to erroneous results. Instead, we propose a new mutual information based clustering approach and outline its implications for the example of user profiling

    Growing Regression Forests by Classification: Applications to Object Pose Estimation

    Full text link
    In this work, we propose a novel node splitting method for regression trees and incorporate it into the regression forest framework. Unlike traditional binary splitting, where the splitting rule is selected from a predefined set of binary splitting rules via trial-and-error, the proposed node splitting method first finds clusters of the training data which at least locally minimize the empirical loss without considering the input space. Then splitting rules which preserve the found clusters as much as possible are determined by casting the problem into a classification problem. Consequently, our new node splitting method enjoys more freedom in choosing the splitting rules, resulting in more efficient tree structures. In addition to the Euclidean target space, we present a variant which can naturally deal with a circular target space by the proper use of circular statistics. We apply the regression forest employing our node splitting to head pose estimation (Euclidean target space) and car direction estimation (circular target space) and demonstrate that the proposed method significantly outperforms state-of-the-art methods (38.5% and 22.5% error reduction respectively).Comment: Paper accepted by ECCV 201

    Development of Multi-Locus Variable Number Tandem Repeat Analysis for Outbreak Detection of Neisseria meningitidis

    Get PDF
    Neisseria meningitidis is a major cause of septicemia and meningitis worldwide. Traditional typing methods like pulsed-field gel electrophoresis (PFGE) for identifying outbreaks are subjective and time consuming. Multi-locus variable number tandem repeats analysis (MLVA) is an objective typing method amenable to automation that has been used to type other bacterial pathogens. This report describes the development of MLVA for outbreak detection of N. meningitidis. Tandem Repeats Finder software was used to identify variable number tandem repeats (VNTRs) from 3 sequenced N. meningitidis genomes. PCR amplification of identified VNTRs was performed on DNA from 7 serogroup representative isolates. PCR products were sequenced and repeats were manually counted. VNTR loci identified by this screen were evaluated on a collection of 46 outbreak and sporadic serogroup C isolates. Alleles at each locus were concatenated to define the MLVA type for each isolate. Minimum spanning tree (MST) analysis was performed to determine the genetic relationships among the isolates. The genetic distance was defined as the summed tandem repeat difference (STRD) between isolates MLVA types. Outbreak clusters were defined by a STRD less than or equal to 3. These data was compared to PFGE data to determine the utility of MLVA for outbreak detection. Twenty-one VNTR loci with variable copy numbers among the sequenced genomes were identified that met the established criteria of short repeat length and consensus sequence > 85%. Seven VNTR loci were reliably amplified among the 7 serogroups tested. These loci had repeat lengths between 4 and 20 nucleotides and exhibited between 10 and 26 alleles among 61 isolates belonging to 7 different serogroups. MST analysis with 7 loci differentiated serogroups, discriminated sporadic isolates and identified 7 out of 8 serogroup C outbreaks. In summary, MLVA with 5 VNTR loci distinguished N. meningitidis isolates from 7 different serogroups and sporadic isolates within each serogroup. In addition, MLVA identified 88% of PFGE-defined serogroup C outbreaks. Further investigation of these and other outbreak-associated isolates is necessary to define the optimal combination of VNTR loci and to evaluate MST analysis criteria in order to determine the utility of MLVA for N. meningitidis outbreak detection

    Challenges in Short Text Classification: The Case of Online Auction Disclosure

    Get PDF
    Text classification is an important research problem in many fields. We examine a special case of textual content namely, short text. Examples of short text appear in a number of contexts such as online reviews, chat messages, twitter feeds, etc. In this research, we examine short text for the purpose of classification in internet auctions. The “ask seller a question” forum of a large horizontal intermediary auction platform is used to conduct this research. We describe our approach to classification by examining various solution methods to the problem. The unsupervised K-Medoids clustering algorithm provides useful but limited insights into keywords extraction while the supervised Naïve Bayes algorithm successfully achieves on average, around 65% classification accuracy. We then present a score assigning approach to this issue which outperforms the other two methods. Finally, we discuss how our approach to short text classification can be used to analyse the effectiveness of internet auctions

    Integrated data-driven techniques for environmental pollution monitoring

    Get PDF
    The adverse health e_x000B_ffects of tropospheric ozone around urban zones indicate a substantial risk for many segments of the population. This necessitates the short term forecast in order to take evasive action on days conducive to ozone formation. Therefore it is important to study the ozone formation mechanisms and predict the ozone levels in a geographic region. Multivariate statistical techniques provide a very e_x000B_ffective framework for the classifi_x000C_cation and monitoring of systems with multiple variables. Cluster analysis, sequence analysis and hidden Markov models (HMMs) are statistical methods which have been used in a wide range of studies to model the data structure. In this dissertation, we propose to formulate, implement and apply a data-driven computational framework for air quality monitoring and forecasting with application to ozone formation. The proposed framework integrates, in a unique way, advanced statistical data processing and analysis tools to investigate ozone formation mechanisms and predict the ozone levels in a geographic region. This dissertation focuses on cluster analysis for identi_x000C_fication and classi_x000C_fication of underlying mechanisms of a system and HMMs for predicting the occurrence of an extreme event in a system. The usefulness of the proposed methodology in air quality monitoring is demonstrated by applying it to study the ozone problem in Houston, Texas and Baton Rouge, Louisiana regions. Hierarchical clustering is used to visualize air flow patterns at two time scales relevant for ozone buildup. First, clustering is performed at the hourly time scale to identify surface flow patterns. Then, sequencing is performed at the daily time scale to identify groups of days sharing similar diurnal cycles for the surface flow. Selection of appropriate numbers of air flow patterns allowed inference of regional transport and dispersion patterns for understanding population exposure to ozone. This dissertation proposes to build HMMs for ozone prediction using air quality and meteorological measurements obtained from a network of surface monitors. The case study of the Houston, Texas region for the 2004 and 2005 ozone seasons showed that the results indicate the capability of HMMs as a simpler forecasting tool
    corecore