12 research outputs found

    Report : data mining of life prediction data bases

    Get PDF

    Physics Constrained Data-Driven Technique for Reservoir Proxy Model and Model Order Reduction

    Get PDF
    In reservoir engineering, data-driven methodologies have been applied successfully to infer interwell connections and flow patterns in the subsurface, model order reduction of reservoir simulations, and in assisting field development plans, including, history matching and performance prediction phases, of conventional and unconventional reservoirs. In this work, we propose to utilize data driven methods for achieving two main objectives: (1) enhance model order reduction (MOR) techniques accounting for sparsity in the data; and (2) reservoir simulation proxy model development based solely on data. For the first objective, fast simulation algorithms based on reduced-order modeling have been developed in order to facilitate large-scale and complex computationally intensive reservoir simulation and optimization. Methods like proper orthogonal decomposition (POD) and Dynamic Mode Decomposition (DMD) have been successfully used to efficiently capture and predict the behavior of reservoir fluid flow. Non-intrusive techniques (e.g., DMD), are especially attractive as it is a data-driven approach that do not require code modifications (equation free). To achieve our first objective with the concept of sparsity in statistical learning, we further enhance the performance and reduce the dimension of standard DMD, by investigating sparse approximations of the snapshots. The method to achieve the second objective can further be classified into two categories: (1) building proxy model by system identification method; and (2) end to end production prediction with machine learning techniques. Although real-time data acquisition and analysis, are becoming routine in many workflows (such as in reservoir simulations), there is still a disconnect between raw data and the traditional theoretical first laws principles, whereby conservation laws and phenomenological behavior are used to derive the underlying spatio-temporal evolution equations. We propose to combine sparsity promoting methods and machine learning techniques to find the governing equation from the spatio-temporal data series from a reservoir simulator. The idea is to connect data with the physical interpretation of the dynamical system. We achieve this by identifying the nonlinear ODE system equations of our discretized reservoir system. In addition, as production prediction analysis has been the ultimate goal of many reservoir simulation/modeling, various types reservoir simulation has been developed to build efficient and accurate model to provide the most information about reserves and aid in decision making process. The other proxy model we developed is benefit from the evolution of machine learning technique and increasing availability of extensive amounts of historical data. A powerful technique called recurrent neural network (RNN) has been proved useful for modeling with sequence data. We apply RNN on analyzing control parameter data and synthetic historical production data for better reservoir characterization and prediction. All of the above mentioned MOR and proxy model development will be tested on single- and two-phase fluid flow reservoir simulation problem

    A Knowledge Mining Approach for Effective Customer Relationship Management

    Get PDF
    The problem of existing customer relationship management (CRM) system is not lack of information but the ability to differentiate useful information from chatter or even disinformation and also maximize the richness of these heterogeneous information sources. This paper describes an improved text mining approach for automatically extracting association rules from collections of textual documents. It discovers association rules from keyword features extracted from the documents. The main contributions of the technique are that, in selecting the most discriminative keywords for use in association rules generation, the system combines syntactic and semantic relevance into its Information Retrieval Scheme which is integrated with XML technology. Experiments carried out revealed that the extracted association rules contain important features which form a worthy platform for making effective decisions as regards customer relationship management. The performance of the improved text mining approach is compared with existing system that uses the GARW algorithm to reveal a significant reduction in the large itemsets, leading to reduction in rules generated to more interesting ones due to the semantic analysis component being introduced. Also, it has brought about reduction of the execution time, compared to the GARW algorithm.</p

    Final Report : learning system for life prediction of infrastructure

    Get PDF
    The project has further developed two programs for the industry partners related to service life prediction and salt deposition. The program for Queensland Department of Main Roads which predicts salt deposition on different bridge structures at any point in Queensland has been further refined by looking at more variables. It was found that the height of the bridge significantly affects the salt deposition levels only when very close to the coast. However the effect of natural cleaning of salt by rainfall was incorporated into the program. The user interface allows selection of a location in Queensland, followed by a bridge component. The program then predicts the annual salt deposition rate and rates the likely severity of the environment. The service life prediction program for the Queensland Department of Public Works has been expanded to include 10 common building components, in a variety of environments. Data mining procedures have been used to develop the program and increase the usefulness of the application. A Query Based Learning System (QBLS) has been developed which is based on a data-centric model with extensions to provide support for user interaction. The program is based on number of sources of information about the service life of building components. These include the Delphi survey, the CSIRO Holistic model and a school survey. During the project, the Holistic model was modified for each building component and databases generated for the locations of all Queensland schools. Experiments were carried out to verify and provide parameters for the modelling. These included instrumentation of a downpipe, measurements on pH and chloride levels in leaf litter, EIS measurements and chromate leaching from Colorbond materials and dose tests to measure corrosion rates of new materials. A further database was also generated for inclusion in the program through a large school survey. Over 30 schools in a range of environments from tropical coastal to temperate inland were visited and the condition of the building components rated on a scale of 0-5. The data was analysed and used to calculate an average service life for each component/material combination in the environments, where sufficient examples were available

    Unsupervised Spectral Ranking For Anomaly Detection

    Get PDF
    Anomaly detection is the problem of finding deviations from expected normal patterns. A wide variety of applications, such as fraud detection for credit cards and insurance, medical image monitoring, network intrusion detection, and military surveillance, can be viewed as anomaly detection. For anomaly detection, obtaining accurate labels, especially labels for anomalous cases, is costly and time consuming, if not practically infeasible. This makes supervised anomaly detection less desirable in the domain of anomaly detection. In this thesis, we propose a novel unsupervised spectral ranking method for anomaly detection (SRA). Based on the 1st non-principal eigenvectors from Laplacian matrices, the proposed SRA can generate anomaly ranking either with respect to a single majority class or with respect to multiple majority classes. The ranking type is based on whether the percentage of the smaller class instances (positive or negative) is larger than the expected upper bound of the anomaly ratio. We justify the proposed spectral ranking by establishing a connection between the unsupervised support vector machine optimization and the spectral Laplacian optimization problem. Using both synthetic and real data sets, we show that our proposed SRA is a meaningful and effective alternative to the state-of-art unsupervised anomaly ranking methods. In addition, we show that, in certain scenarios, unsupervised SRA method surpasses the state-of-art unsupervised anomaly ranking methods in terms of performance and robustness of parameter tuning. Finally, we demonstrate that choosing appropriate similarity measures remains crucial in applying our proposed SRA algorithm

    Measures and adjustments of pattern frequency distributions

    Get PDF
    Frequent pattern mining over large databases is fundamental to many data mining applications, where pattern frequency distribution plays a central role. Various approaches have been proposed for pattern mining with respectable computational performance. However, the appropriate evaluation of the pattern frequentness and the refinement of the mining result set are somewhat ignored. This has created a set of problems in conventional mining approaches which are identified in this thesis. Most conventional mining approaches evaluate pattern frequentness with an ill formed "support" measure, and generate patterns with full enumeration mode which produces excessive number of patterns in an application. Consequently, the mining result sets exhibit among other issues those of overfitting and underfitting, probability anomaly and bias for generated against original observations. Even worse, these results are delivered to users without any refinement. Overcoming these drawbacks is challenging, since these problems are rather philosophical than computational and hence their resolution demands a well established theory to reform the mining foundations and to pursue graceful knowledge degeneration. Based on the problems identified, this thesis first proposes a reformulation of the frequentness measure, which effectively resolves the probability anomaly and other related issues. To deal with the profound full enumeration mode, we first explore a set of properties governing raw pattern frequency distributions, such that a number of important mining parameters can be predetermined Based on these explorations, an approach to adjust the raw pattern frequency distributions is established and its theoretical merits are justified. This refinement theory shows that unconditional pattern reduction is achievable before domain constraints are imposed. The thesis then presents a maximum likelihood pattern sampling model and strategies to realize the adjustment. Findings presented in this thesis are based on known set theory, combinatorics, and probability theory, and they are theoretically fundamental and applicable to every item based or key words based pattern mining and the improvement of mining effectiveness. We expect that these findings would pave a way to replace the full enumeration pattern generation with selective generation mode, which would then radically change the state of the art of pattern mining

    Discover, recycle and reuse frequent patterns in association rule mining

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Investigating the Evolution of Electronic Markets

    No full text
    Abstract. Markets evolve through ‘entrepreneurial ’ intervention which is based on intuition and on timely information. An electronic market has been constructed in the laboratory as a collaborative virtual environment to identify timely entrepreneurial information for e-markets. This information is distilled from individual signals in the markets themselves and from signals observed on the Internet. Distributed, concurrent, time-constrained data mining methods are managed using business process management technology to extract timely, reliable information from this inherently unreliable environment. 1

    Effect of Industry 4.0 on Reducing the Occupational Fatigue Downtime using Constrained Data Mining Clustering Aided with Rule Induction

    No full text
    corecore