39,502 research outputs found

    On the combination of omics data for prediction of binary outcomes

    Full text link
    Enrichment of predictive models with new biomolecular markers is an important task in high-dimensional omic applications. Increasingly, clinical studies include several sets of such omics markers available for each patient, measuring different levels of biological variation. As a result, one of the main challenges in predictive research is the integration of different sources of omic biomarkers for the prediction of health traits. We review several approaches for the combination of omic markers in the context of binary outcome prediction, all based on double cross-validation and regularized regression models. We evaluate their performance in terms of calibration and discrimination and we compare their performance with respect to single-omic source predictions. We illustrate the methods through the analysis of two real datasets. On the one hand, we consider the combination of two fractions of proteomic mass spectrometry for the calibration of a diagnostic rule for the detection of early-stage breast cancer. On the other hand, we consider transcriptomics and metabolomics as predictors of obesity using data from the Dietary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome (DILGOM) study, a population-based cohort, from Finland

    Electronic fraud detection in the U.S. Medicaid Healthcare Program: lessons learned from other industries

    Get PDF
    It is estimated that between 600and600 and 850 billion annually is lost to fraud, waste, and abuse in the US healthcare system,with 125to125 to 175 billion of this due to fraudulent activity (Kelley 2009). Medicaid, a state-run, federally-matchedgovernment program which accounts for roughly one-quarter of all healthcare expenses in the US, has been particularlysusceptible targets for fraud in recent years. With escalating overall healthcare costs, payers, especially government-runprograms, must seek savings throughout the system to maintain reasonable quality of care standards. As such, the need foreffective fraud detection and prevention is critical. Electronic fraud detection systems are widely used in the insurance,telecommunications, and financial sectors. What lessons can be learned from these efforts and applied to improve frauddetection in the Medicaid health care program? In this paper, we conduct a systematic literature study to analyze theapplicability of existing electronic fraud detection techniques in similar industries to the US Medicaid program

    Non-Invasive Ambient Intelligence in Real Life: Dealing with Noisy Patterns to Help Older People

    Get PDF
    This paper aims to contribute to the field of ambient intelligence from the perspective of real environments, where noise levels in datasets are significant, by showing how machine learning techniques can contribute to the knowledge creation, by promoting software sensors. The created knowledge can be actionable to develop features helping to deal with problems related to minimally labelled datasets. A case study is presented and analysed, looking to infer high-level rules, which can help to anticipate abnormal activities, and potential benefits of the integration of these technologies are discussed in this context. The contribution also aims to analyse the usage of the models for the transfer of knowledge when different sensors with different settings contribute to the noise levels. Finally, based on the authors’ experience, a framework proposal for creating valuable and aggregated knowledge is depicted.This research was partially funded by Fundación Tecnalia Research & Innovation, and J.O.-M. also wants to recognise the support obtained from the EU RFCS program through project number 793505 ‘4.0 Lean system integrating workers and processes (WISEST)’ and from the grant PRX18/00036 given by the Spanish Secretaría de Estado de Universidades, Investigación, Desarrollo e Innovación del Ministerio de Ciencia, Innovación y Universidades

    Consumer finance: challenges for operational research

    No full text
    Consumer finance has become one of the most important areas of banking, both because of the amount of money being lent and the impact of such credit on global economy and the realisation that the credit crunch of 2008 was partly due to incorrect modelling of the risks in such lending. This paper reviews the development of credit scoring—the way of assessing risk in consumer finance—and what is meant by a credit score. It then outlines 10 challenges for Operational Research to support modelling in consumer finance. Some of these involve developing more robust risk assessment systems, whereas others are to expand the use of such modelling to deal with the current objectives of lenders and the new decisions they have to make in consumer finance. <br/

    Operations research in consumer finance: challenges for operational research

    No full text
    Consumer finance has become one of the most important areas of banking both because of the amount of money being lent and the impact of such credit on the global economy and the realisation that the credit crunch of 2008 was partly due to incorrect modelling of the risks in such lending. This paper reviews the development of credit scoring,-the way of assessing risk in consumer finance- and what is meant by a credit score. It then outlines ten challenges for Operational Research to support modelling in consumer finance. Some of these are to developing more robust risk assessment systems while others are to expand the use of such modelling to deal with the current objectives of lenders and the new decisions they have to make in consumer financ

    Combining Clustering techniques and Formal Concept Analysis to characterize Interestingness Measures

    Full text link
    Formal Concept Analysis "FCA" is a data analysis method which enables to discover hidden knowledge existing in data. A kind of hidden knowledge extracted from data is association rules. Different quality measures were reported in the literature to extract only relevant association rules. Given a dataset, the choice of a good quality measure remains a challenging task for a user. Given a quality measures evaluation matrix according to semantic properties, this paper describes how FCA can highlight quality measures with similar behavior in order to help the user during his choice. The aim of this article is the discovery of Interestingness Measures "IM" clusters, able to validate those found due to the hierarchical and partitioning clustering methods "AHC" and "k-means". Then, based on the theoretical study of sixty one interestingness measures according to nineteen properties, proposed in a recent study, "FCA" describes several groups of measures.Comment: 13 pages, 2 figure

    Threshold Choice Methods: the Missing Link

    Full text link
    Many performance metrics have been introduced for the evaluation of classification performance, with different origins and niches of application: accuracy, macro-accuracy, area under the ROC curve, the ROC convex hull, the absolute error, and the Brier score (with its decomposition into refinement and calibration). One way of understanding the relation among some of these metrics is the use of variable operating conditions (either in the form of misclassification costs or class proportions). Thus, a metric may correspond to some expected loss over a range of operating conditions. One dimension for the analysis has been precisely the distribution we take for this range of operating conditions, leading to some important connections in the area of proper scoring rules. However, we show that there is another dimension which has not received attention in the analysis of performance metrics. This new dimension is given by the decision rule, which is typically implemented as a threshold choice method when using scoring models. In this paper, we explore many old and new threshold choice methods: fixed, score-uniform, score-driven, rate-driven and optimal, among others. By calculating the loss of these methods for a uniform range of operating conditions we get the 0-1 loss, the absolute error, the Brier score (mean squared error), the AUC and the refinement loss respectively. This provides a comprehensive view of performance metrics as well as a systematic approach to loss minimisation, namely: take a model, apply several threshold choice methods consistent with the information which is (and will be) available about the operating condition, and compare their expected losses. In order to assist in this procedure we also derive several connections between the aforementioned performance metrics, and we highlight the role of calibration in choosing the threshold choice method

    Designing Web-enabled services to provide damage estimation maps caused by natural hazards

    Get PDF
    The availability of building stock inventory data and demographic information is an important requirement for risk assessment studies when attempting to predict and estimate losses due to natural hazards such as earthquakes, storms, floods or tsunamis. The better this information is provided, the more accurate are predictions on damage to structures and lifelines and the better can expected impacts on the population be estimated. When a disaster strikes, a map is often one of the first requirements for answering questions related to location, casualties and damage zones caused by the event. Maps of appropriate scale that represent relative and absolute damage distributions may be of great importance for rescuing lives and properties, and for providing relief. However, this type of maps is often difficult to obtain during the first hours or even days after the occurrence of a natural disaster. The Open Geospatial Consortium Web Services (OWS) Specifications enable access to datasets and services using shared, distributed and interoperable environments through web-enabled services. In this paper we propose the use of OWS in view of these advantages as a possible solution for issues related to suitable dataset acquisition for risk assessment studies. The design of web-enabled services was carried out using the municipality of Managua (Nicaragua) and the development of damage and loss estimation maps caused by earthquakes as a first case study. Four organizations located in different places are involved in this proposal and connected through web services, each one with a specific role
    corecore