888 research outputs found

    A Hybrid Mining Approach to Facilitate Health Insurance Decision: Case Study of Non-Traditional Data Mining Applications in Taiwan NHI Databases

    Get PDF
    This study examines time-sensitive applications of data mining methods to facilitate claims review processing and provide policy information for insurance decision-making vis-à-vis the Taiwan National Health Insurance databases. In order to obtain the best payment management, a hybrid mining approach, which has been grounded on the extant knowledge of data mining projects and health insurance domain knowledge, is proposed. Through the integration of data warehousing, online analytical processing, data mining techniques and traditional data analysis in the healthcare field, an easy-to-use decision support platform, which will facilitate the health insurance decision-making process, is built. Drawing from lessons learned in case study, results showed that not only is hybrid mining approach a reliable, powerful, and user-friendly platform for diversified payment decision support, but that it also has great relevance for the practice and acceptance of evidence-based medicine. Researchers should develop hybrid mining approach combined with their own application systems in the future

    Determination of Association Rules with Market Basket Analysis: Application in the Retail Sector

    Get PDF
    Market basket analysis is the process of extracting purchasing trends from records in company databases, taking into account the products that customers buy in a single transaction. In this study, a market basket analysis was conducted on a five-and-a-half year data of a large hardware company operating in the retail sector, and related product categories were identified. In determining the association rules, both the Apriori and FP-Growth algorithms were run separately and their usefulness in such a set of data was compared. In addition, the data set was divided into Data Set-1 and Data Set-2 so that the consistency of the rules was discussed by comparing the correctness of rules extracted from the first data set with rules derived from the second data set containing consecutive timed data

    A Hybrid Mining Approach to Facilitate Health Insurance Decision: Case Study of Non-Traditional Data Mining Applications in Taiwan NHI Databases

    Get PDF
    This study examines time-sensitive applications of data mining methods to facilitate claims review processing and provide policy information for insurance decision-making vis-à-vis the Taiwan National Health Insurance databases. In order to obtain the best payment management, a hybrid mining approach, which has been grounded on the extant knowledge of data mining projects and health insurance domain knowledge, is proposed. Through the integration of data warehousing, online analytical processing, data mining techniques and traditional data analysis in the healthcare field, an easy-to-use decision support platform, which will facilitate the health insurance decision-making process, is built. Drawing from lessons learned in case study, results showed that not only is hybrid mining approach a reliable, powerful, and user-friendly platform for diversified payment decision support, but that it also has great relevance for the practice and acceptance of evidence-based medicine. Researchers should develop hybrid mining approach combined with their own application systems in the future

    Ant colony optimization approach for stacking configurations

    Full text link
    In data mining, classifiers are generated to predict the class labels of the instances. An ensemble is a decision making system which applies certain strategies to combine the predictions of different classifiers and generate a collective decision. Previous research has empirically and theoretically demonstrated that an ensemble classifier can be more accurate and stable than its component classifiers in most cases. Stacking is a well-known ensemble which adopts a two-level structure: the base-level classifiers to generate predictions and the meta-level classifier to make collective decisions. A consequential problem is: what learning algorithms should be used to generate the base-level and meta-level classifier in the Stacking configuration? It is not easy to find a suitable configuration for a specific dataset. In some early works, the selection of a meta classifier and its training data are the major concern. Recently, researchers have tried to apply metaheuristic methods to optimize the configuration of the base classifiers and the meta classifier. Ant Colony Optimization (ACO), which is inspired by the foraging behaviors of real ant colonies, is one of the most popular approaches among the metaheuristics. In this work, we propose a novel ACO-Stacking approach that uses ACO to tackle the Stacking configuration problem. This work is the first to apply ACO to the Stacking configuration problem. Different implementations of the ACO-Stacking approach are developed. The first version identifies the appropriate learning algorithms in generating the base-level classifiers while using a specific algorithm to create the meta-level classifier. The second version simultaneously finds the suitable learning algorithms to create the base-level classifiers and the meta-level classifier. Moreover, we study how different kinds on local information of classifiers will affect the classification results. Several pieces of local information collected from the initial phase of ACO-Stacking are considered, such as the precision, f-measure of each classifier and correlative differences of paired classifiers. A series of experiments are performed to compare the ACO-Stacking approach with other ensembles on a number of datasets of different domains and sizes. The experiments show that the new approach can achieve promising results and gain advantages over other ensembles. The correlative differences of the classifiers could be the best local information in this approach. Under the agile ACO-Stacking framework, an application to deal with a direct marketing problem is explored. A real world database from a US-based catalog company, containing more than 100,000 customer marketing records, is used in the experiments. The results indicate that our approach can gain more cumulative response lifts and cumulative profit lifts in the top deciles. In conclusion, it is competitive with some well-known conventional and ensemble data mining methods

    Human Aspect on Chain of Custody (CoC) System Performance

    Get PDF
    The tropical forests cover 24% of tropical land area. They are the most productive terrestrial ecosystems on earth with high priorities for biodiversity conservation. These forests store a substantial amount of carbon in biomass and soil, and they also regulate the transfer of carbon into the atmosphere as carbon dioxide (CO2). Indonesia is having the third tropical forest area in the world after Brazil and Congo. Over 50 years forest has been felled both legally as well as illegally. High rate of forest degradation resulted from unsustainable forest management, rampant illegal logging, forest area encroachment, conversion and natural disaster. All urges rapid improvement of management system of Indonesia’s forest resources (Holmes, 2002). Forest certification is one tool that can support the achievement of sustainable forest management goal. Under current operation of join certification protocol between the Forest Stewardship Council (FSC) and the Indonesian Ecolabelling Institute (LEI) in Indonesia, forest management units must be able to show the required performance indicated in LEI criteria and indicator as well as FSC principles and criteria to attain certification of their products. The gap between current practices and performance required by forest certifications schemes is still enormous. The performance of forest certification system from LEI is determined very much by the human that is involved in the process of planning and operation. The name of certification system is chain of custody (CoC) certification. CoC operation involves activities such as tracing raw material from the forest to the factory, through shipping and manufacturing, to the final end product. In all of the above processes, the roles of human are critical, although the specific roles played from one process to another are different. In this paper we present an identification of human aspect and other factors that predominantly affect CoC system performance

    Determinación de patrones predictivos de consumo en grandes volúmenes de datos, usando métodos de Data Mining

    Get PDF
    El presente estudio tiene como objetivo generar una propuesta de patrones de consumo que permitan mejorar la entrega de fuentes de información para una publicidad focalizada a las necesidades de los clientes, mejorar el procesamiento de grandes volúmenes de datos y mejorar los tiempos de ejecución del proceso, la propuesta será materializada a través de la consolidación de las transacciones de los consumos, mediante un gestor de datos, realizar la limpieza y depuración de variables para aplicar técnicas de data mining, el resultado será disponibilizado en un repositorio que facilite el análisis y acceso para la obtención de los resultados. Para lograr el propósito se seguirá la metodología de investigación experimental, la cual consta de cinco fases, la fase planteamiento de un problema de conocimiento donde se realiza la elección del problema, continua con la formulación de hipótesis donde es la anticipación de un resultado, luego pasa a la realización de un diseño adecuado a la hipótesis donde el diseño refleja el plan o esquema de trabajo del investigador, es su organización formal. El diseño incluye diversos subprocesos, describe con detalle qué se debe hacer y cómo realizarlo, continúa con la fase de recogida y análisis de datos, y finalmente llega a la fase de elaboración de conclusiones. Al finalizar el estudio se espera tener un proceso documentado, que cumpla con el objetivo propuesto y que aporte a la gestión de Marketing en la mejora de su publicidad asertiva focalizada a las necesidades de client

    Forecasting Financial Distress With Machine Learning – A Review

    Get PDF
    Purpose – Evaluate the various academic researches with multiple views on credit risk and artificial intelligence (AI) and their evolution.Theoretical framework – The study is divided as follows: Section 1 introduces the article. Section 2 deals with credit risk and its relationship with computational models and techniques. Section 3 presents the methodology. Section 4 addresses a discussion of the results and challenges on the topic. Finally, section 5 presents the conclusions.Design/methodology/approach – A systematic review of the literature was carried out without defining the time period and using the Web of Science and Scopus database.Findings – The application of computational technology in the scope of credit risk analysis has drawn attention in a unique way. It was found that the demand for identification and introduction of new variables, classifiers and more assertive methods is constant. The effort to improve the interpretation of data and models is intense.Research, Practical & Social implications – It contributes to the verification of the theory, providing information in relation to the most used methods and techniques, it brings a wide analysis to deepen the knowledge of the factors and variables on the theme. It categorizes the lines of research and provides a summary of the literature, which serves as a reference, in addition to suggesting future research.Originality/value – Research in the area of Artificial Intelligence and Machine Learning is recent and requires attention and investigation, thus, this study contributes to the opening of new views in order to deepen the work on this topic

    Predicate based association rules mining with new interestingness measure

    Get PDF
    Association Rule Mining (ARM) is one of the fundamental components in the field of data mining that discovers frequent itemsets and interesting relationships for predicting the associative and correlative behaviours for new data. However, traditional ARM techniques are based on support-confidence that discovers interesting association rules (ARs) using predefined minimum support (minsupp) and minimum confidence (minconf) threshold. In addition, traditional AR techniques only consider frequent items while ignoring rare ones. Thus, a new parameter-less predicated based ARM technique was proposed to address these limitations, which was enhanced to handle the frequent and rare items at the same time. Furthermore, a new interestingness measure, called g measure, was developed to select only highly interesting rules. In this proposed technique, interesting combinations were firstly selected by considering both the frequent and the rare items from a dataset. They were then mapped to the pseudo implications using predefined logical conditions. Later, inference rules were used to validate the pseudo-implications to discover rules within the set of mapped pseudo-implications. The resultant set of interesting rules was then referred to as the predicate based association rules. Zoo, breast cancer, and car evaluation datasets were used for conducting experiments. The results of the experiments were evaluated by its comparison with various classification techniques, traditional ARM technique and the coherent rule mining technique. The predicate-based rule mining approach gained an accuracy of 93.33%. In addition, the results of the g measure were compared with a state-of-the-art interestingness measure developed for a coherent rule mining technique called the h value. Predicate rules were discovered with an average confidence value of 0.754 for the zoo dataset and 0.949 for the breast cancer dataset, while the average confidence of the predicate rules found from the car evaluation dataset was 0.582. Results of this study showed that a set of interesting and highly reliable rules were discovered, including frequent, rare and negative association rules that have a higher confidence value. This research resulted in designing a methodology in rule mining which does not rely on the minsupp and minconf threshold. Also, a complete set of association rules are discovered by the proposed technique. Finally, the interestingness measure property for the selection of combinations from datasets makes it possible to reduce the exponential searching of the rules

    Recent Advances in Social Data and Artificial Intelligence 2019

    Get PDF
    The importance and usefulness of subjects and topics involving social data and artificial intelligence are becoming widely recognized. This book contains invited review, expository, and original research articles dealing with, and presenting state-of-the-art accounts pf, the recent advances in the subjects of social data and artificial intelligence, and potentially their links to Cyberspace
    corecore