181,971 research outputs found

    Learning Energy Consumption and Demand Models through Data Mining for Reverse Engineering

    Get PDF
    abstract: The estimation of energy demand (by power plants) has traditionally relied on historical energy use data for the region(s) that a plant produces for. Regression analysis, artificial neural network and Bayesian theory are the most common approaches for analysing these data. Such data and techniques do not generate reliable results. Consequently, excess energy has to be generated to prevent blackout; causes for energy surge are not easily determined; and potential energy use reduction from energy efficiency solutions is usually not translated into actual energy use reduction. The paper highlights the weaknesses of traditional techniques, and lays out a framework to improve the prediction of energy demand by combining energy use models of equipment, physical systems and buildings, with the proposed data mining algorithms for reverse engineering. The research team first analyses data samples from large complex energy data, and then, presents a set of computationally efficient data mining algorithms for reverse engineering. In order to develop a structural system model for reverse engineering, two focus groups are developed that has direct relation with cause and effect variables. The research findings of this paper includes testing out different sets of reverse engineering algorithms, understand their output patterns and modify algorithms to elevate accuracy of the outputs

    Customer lifetime value : an integrated data mining approach

    Full text link
    Customer Lifetime Value (CLV) ---which is a measure of the profit generating potential, or value, of a customer---is increasingly being considered a touchstone for customer relationship management. As the guide and benchmark for Customer Relationship Management (CRM) applications, CLV analysis has received increasing attention from both the marketing practitioners and researchers from different domains. Furthermore, the central challenge in predicting CLV is the precise calculation of customer’s length of service (LOS). There are several statistical approaches for this problem and several researchers have used these approaches to perform survival analysis in different domains. However, classical survival analysis techniques like Kaplan-Meier approach which offers a fully non-parametric estimate ignores the covariates completely and assumes stationary of churn behavior along time, which makes it less practical. Further, segments of customers, whose lifetimes and covariate effects can vary widely, are not necessarily easy to detect. Like many other applications, data mining is emerging as a compelling analysis tool for the CLV application recently. Comparatively, data mining methods offer an interesting alternative with the fact that they are less limited than the conventional statistical approaches. Customer databases contain histories of vital events such as the acquisition and cancellation of products and services. The historical data is used to build predictive models for customer retention, cross-selling, and other database marketing endeavors. In this research project we discuss and investigate the possibility of combining these statistical approaches with data mining methods to improve the performance for the CLV problem in a real business context. Part of the research effort is placed on the precise prediction of LOS of the customers in concentration of a real world business. Using the conventional statistical approaches and data mining methods in tandem, we demonstrate how data mining tools can be apt complements of the classical statistical models ---resulting in a CLV prediction model that is both accurate and understandable. We also evaluate the proposed integrated method to extract interesting business domain knowledge within the scope of CLV problem. In particular, several data mining methods are discussed and evaluated according to their accuracy of prediction and interpretability of results. The research findings will lead us to a data mining method combined with survival analysis approaches as a robust tool for modeling CLV and for assisting management decision-making. A calling plan strategy is designed based on the predicted survival time and calculated CLV for the telecommunication industry. The calling plan strategy further investigates potential business knowledge assisted by the CLV calculated

    An approach for data mining of electronic health record data for suicide risk management: Database analysis for clinical decision support

    Get PDF
    Background: In an electronic health context, combining traditional structured clinical assessment methods and routine electronic health-based data capture may be a reliable method to build a dynamic clinical decision-support system (CDSS) for suicide prevention. Objective: The aim of this study was to describe the data mining module of a Web-based CDSS and to identify suicide repetition risk in a sample of suicide attempters. Methods: We analyzed a database of 2802 suicide attempters. Clustering methods were used to identify groups of similar patients, and regression trees were applied to estimate the number of suicide attempts among these patients. Results: We identified 3 groups of patients using clustering methods. In addition, relevant risk factors explaining the number of suicide attempts were highlighted by regression trees. Conclusions: Data mining techniques can help to identify different groups of patients at risk of suicide reattempt. The findings of this study can be combined with Web-based and smartphone-based data to improve dynamic decision making for clinicians.This study received a Hospital Clinical Research Grant (PHRC 2009) from the French Health Ministry. None of the funding sources had any involvement in the study design; collection, analysis, or interpretation of data; writing of the report; or the decision to submit the paper for publication. This study was funded partially by Instituto de Salud Carlos III (ISCIII PI13/02200; PI16/01852), DelegaciĂłn del Gobierno para el Plan Nacional de Drogas (20151073), and the American Foundation for Suicide Prevention (LSRG-1-005-16)

    Fused visualization of complex information spaces

    Full text link
    University of Technology, Sydney. Faculty of Information Technology.With the rapid growth of information analysis and data mining technologies, the massive data sets available for access have been merged and refined to manifold information, including raw data and all kinds of analytical results. Since data sets become increasingly complex, the current visual analytical techniques no longer satisfy the needs of exploring and analyzing data. This situation raises the challenges in the current state of information visualization: 1) Due to the complexity of information, sometimes it is unlikely to use a single visual metaphor to model the intricate information well in a single visualization. 2) Each existing visualization method has its own limitations in terms of satisfying domain specific requirements, when dealing with complex data sets. The proposed fused visualization methodology attempts to address the above issues by combining multiple existing visualization techniques in a single visualization. It takes the advantages and reduces the weaknesses of the existing methods. We have successfully applied this methodology to each stage of the proposed Analytical Information Visualization. In particular, three fused visualization techniques are developed to improve the quality of existing techniques. First, a fused visual metaphor that combines two visual metaphors in a single visualization allows users to navigate spatially referenced information across two different metaphors. Second, a fused layout algorithm that combines two graph drawing methods achieves the fast convergence in geometric layout for the force-directed layout algorithm; Third, a fused viewing technique that combines ID and 2D distortional visual viewing methods in one browser resolves the inefficient space utilization problem. Moreover, the fused layout algorithm has been evaluated against other existing force-directed layout algorithms. Two case studies that apply our techniques to an outbreak management system and an online bookstore respectively have been delivered

    A Dependency Parsing Approach to Biomedical Text Mining

    Get PDF
    Biomedical research is currently facing a new type of challenge: an excess of information, both in terms of raw data from experiments and in the number of scientific publications describing their results. Mirroring the focus on data mining techniques to address the issues of structured data, there has recently been great interest in the development and application of text mining techniques to make more effective use of the knowledge contained in biomedical scientific publications, accessible only in the form of natural human language. This thesis describes research done in the broader scope of projects aiming to develop methods, tools and techniques for text mining tasks in general and for the biomedical domain in particular. The work described here involves more specifically the goal of extracting information from statements concerning relations of biomedical entities, such as protein-protein interactions. The approach taken is one using full parsing—syntactic analysis of the entire structure of sentences—and machine learning, aiming to develop reliable methods that can further be generalized to apply also to other domains. The five papers at the core of this thesis describe research on a number of distinct but related topics in text mining. In the first of these studies, we assessed the applicability of two popular general English parsers to biomedical text mining and, finding their performance limited, identified several specific challenges to accurate parsing of domain text. In a follow-up study focusing on parsing issues related to specialized domain terminology, we evaluated three lexical adaptation methods. We found that the accurate resolution of unknown words can considerably improve parsing performance and introduced a domain-adapted parser that reduced the error rate of theoriginal by 10% while also roughly halving parsing time. To establish the relative merits of parsers that differ in the applied formalisms and the representation given to their syntactic analyses, we have also developed evaluation methodology, considering different approaches to establishing comparable dependency-based evaluation results. We introduced a methodology for creating highly accurate conversions between different parse representations, demonstrating the feasibility of unification of idiverse syntactic schemes under a shared, application-oriented representation. In addition to allowing formalism-neutral evaluation, we argue that such unification can also increase the value of parsers for domain text mining. As a further step in this direction, we analysed the characteristics of publicly available biomedical corpora annotated for protein-protein interactions and created tools for converting them into a shared form, thus contributing also to the unification of text mining resources. The introduced unified corpora allowed us to perform a task-oriented comparative evaluation of biomedical text mining corpora. This evaluation established clear limits on the comparability of results for text mining methods evaluated on different resources, prompting further efforts toward standardization. To support this and other research, we have also designed and annotated BioInfer, the first domain corpus of its size combining annotation of syntax and biomedical entities with a detailed annotation of their relationships. The corpus represents a major design and development effort of the research group, with manual annotation that identifies over 6000 entities, 2500 relationships and 28,000 syntactic dependencies in 1100 sentences. In addition to combining these key annotations for a single set of sentences, BioInfer was also the first domain resource to introduce a representation of entity relations that is supported by ontologies and able to capture complex, structured relationships. Part I of this thesis presents a summary of this research in the broader context of a text mining system, and Part II contains reprints of the five included publications.Siirretty Doriast

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
    • …
    corecore