39,502 research outputs found
On the combination of omics data for prediction of binary outcomes
Enrichment of predictive models with new biomolecular markers is an important
task in high-dimensional omic applications. Increasingly, clinical studies
include several sets of such omics markers available for each patient,
measuring different levels of biological variation. As a result, one of the
main challenges in predictive research is the integration of different sources
of omic biomarkers for the prediction of health traits. We review several
approaches for the combination of omic markers in the context of binary outcome
prediction, all based on double cross-validation and regularized regression
models. We evaluate their performance in terms of calibration and
discrimination and we compare their performance with respect to single-omic
source predictions. We illustrate the methods through the analysis of two real
datasets. On the one hand, we consider the combination of two fractions of
proteomic mass spectrometry for the calibration of a diagnostic rule for the
detection of early-stage breast cancer. On the other hand, we consider
transcriptomics and metabolomics as predictors of obesity using data from the
Dietary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome
(DILGOM) study, a population-based cohort, from Finland
Electronic fraud detection in the U.S. Medicaid Healthcare Program: lessons learned from other industries
It is estimated that between 850 billion annually is lost to fraud, waste, and abuse in the US healthcare system,with 175 billion of this due to fraudulent activity (Kelley 2009). Medicaid, a state-run, federally-matchedgovernment program which accounts for roughly one-quarter of all healthcare expenses in the US, has been particularlysusceptible targets for fraud in recent years. With escalating overall healthcare costs, payers, especially government-runprograms, must seek savings throughout the system to maintain reasonable quality of care standards. As such, the need foreffective fraud detection and prevention is critical. Electronic fraud detection systems are widely used in the insurance,telecommunications, and financial sectors. What lessons can be learned from these efforts and applied to improve frauddetection in the Medicaid health care program? In this paper, we conduct a systematic literature study to analyze theapplicability of existing electronic fraud detection techniques in similar industries to the US Medicaid program
Non-Invasive Ambient Intelligence in Real Life: Dealing with Noisy Patterns to Help Older People
This paper aims to contribute to the field of ambient intelligence from the perspective of real environments, where noise levels in datasets are significant, by showing how machine learning techniques can contribute to the knowledge creation, by promoting software sensors. The created knowledge can be actionable to develop features helping to deal with problems related to minimally labelled datasets. A case study is presented and analysed, looking to infer high-level rules, which can help to anticipate abnormal activities, and potential benefits of the integration of these technologies are discussed in this context. The contribution also aims to analyse the usage of the models for the transfer of knowledge when different sensors with different settings contribute to the noise levels. Finally, based on the authors’ experience, a framework proposal for creating valuable and aggregated knowledge is depicted.This research was partially funded by Fundación Tecnalia Research & Innovation, and J.O.-M. also wants
to recognise the support obtained from the EU RFCS program through project number 793505 ‘4.0 Lean system
integrating workers and processes (WISEST)’ and from the grant PRX18/00036 given by the Spanish Secretaría
de Estado de Universidades, Investigación, Desarrollo e Innovación del Ministerio de Ciencia, Innovación
y Universidades
Consumer finance: challenges for operational research
Consumer finance has become one of the most important areas of banking, both because of the amount of money being lent and the impact of such credit on global economy and the realisation that the credit crunch of 2008 was partly due to incorrect modelling of the risks in such lending. This paper reviews the development of credit scoring—the way of assessing risk in consumer finance—and what is meant by a credit score. It then outlines 10 challenges for Operational Research to support modelling in consumer finance. Some of these involve developing more robust risk assessment systems, whereas others are to expand the use of such modelling to deal with the current objectives of lenders and the new decisions they have to make in consumer finance. <br/
Operations research in consumer finance: challenges for operational research
Consumer finance has become one of the most important areas of banking both because of the amount of money being lent and the impact of such credit on the global economy and the realisation that the credit crunch of 2008 was partly due to incorrect modelling of the risks in such lending. This paper reviews the development of credit scoring,-the way of assessing risk in consumer finance- and what is meant by a credit score. It then outlines ten challenges for Operational Research to support modelling in consumer finance. Some of these are to developing more robust risk assessment systems while others are to expand the use of such modelling to deal with the current objectives of lenders and the new decisions they have to make in consumer financ
Combining Clustering techniques and Formal Concept Analysis to characterize Interestingness Measures
Formal Concept Analysis "FCA" is a data analysis method which enables to
discover hidden knowledge existing in data. A kind of hidden knowledge
extracted from data is association rules. Different quality measures were
reported in the literature to extract only relevant association rules. Given a
dataset, the choice of a good quality measure remains a challenging task for a
user. Given a quality measures evaluation matrix according to semantic
properties, this paper describes how FCA can highlight quality measures with
similar behavior in order to help the user during his choice. The aim of this
article is the discovery of Interestingness Measures "IM" clusters, able to
validate those found due to the hierarchical and partitioning clustering
methods "AHC" and "k-means". Then, based on the theoretical study of sixty one
interestingness measures according to nineteen properties, proposed in a recent
study, "FCA" describes several groups of measures.Comment: 13 pages, 2 figure
Threshold Choice Methods: the Missing Link
Many performance metrics have been introduced for the evaluation of
classification performance, with different origins and niches of application:
accuracy, macro-accuracy, area under the ROC curve, the ROC convex hull, the
absolute error, and the Brier score (with its decomposition into refinement and
calibration). One way of understanding the relation among some of these metrics
is the use of variable operating conditions (either in the form of
misclassification costs or class proportions). Thus, a metric may correspond to
some expected loss over a range of operating conditions. One dimension for the
analysis has been precisely the distribution we take for this range of
operating conditions, leading to some important connections in the area of
proper scoring rules. However, we show that there is another dimension which
has not received attention in the analysis of performance metrics. This new
dimension is given by the decision rule, which is typically implemented as a
threshold choice method when using scoring models. In this paper, we explore
many old and new threshold choice methods: fixed, score-uniform, score-driven,
rate-driven and optimal, among others. By calculating the loss of these methods
for a uniform range of operating conditions we get the 0-1 loss, the absolute
error, the Brier score (mean squared error), the AUC and the refinement loss
respectively. This provides a comprehensive view of performance metrics as well
as a systematic approach to loss minimisation, namely: take a model, apply
several threshold choice methods consistent with the information which is (and
will be) available about the operating condition, and compare their expected
losses. In order to assist in this procedure we also derive several connections
between the aforementioned performance metrics, and we highlight the role of
calibration in choosing the threshold choice method
Designing Web-enabled services to provide damage estimation maps caused by natural hazards
The availability of building stock inventory data and demographic information is an important requirement for risk assessment studies when attempting to predict and estimate losses due to natural hazards such as earthquakes, storms, floods or tsunamis. The better this information is provided, the more accurate are predictions on damage to structures and lifelines and the better can expected impacts on the population be estimated. When a disaster strikes, a map is often one of the first requirements for answering questions related to location, casualties and damage zones caused by the event. Maps of appropriate scale that represent relative and absolute damage distributions may be of great importance for rescuing lives and properties, and for providing relief. However, this type of maps is often difficult to obtain during the first hours or even days after the occurrence of a natural disaster. The Open Geospatial Consortium Web Services (OWS) Specifications enable access to datasets and services using shared, distributed and interoperable environments through web-enabled services. In this paper we propose the use of OWS in view of these advantages as a possible solution for issues related to suitable dataset acquisition for risk assessment studies. The design of web-enabled services was carried out using the municipality of Managua (Nicaragua) and the development of damage and loss estimation maps caused by earthquakes as a first case study. Four organizations located in different places are involved in this proposal and connected through web services, each one with a specific role
- …