7,831 research outputs found
Temporal video transcoding from H.264/AVC-to-SVC for digital TV broadcasting
Mobile digital TV environments demand flexible video compression like scalable video coding (SVC) because of varying bandwidths and devices. Since existing infrastructures highly rely on H.264/AVC video compression, network providers could adapt the current H.264/AVC encoded video to SVC. This adaptation needs to be done efficiently to reduce processing power and operational cost. This paper proposes two techniques to convert an H.264/AVC bitstream in Baseline (P-pictures based) and Main Profile (B-pictures based) without scalability to a scalable bitstream with temporal scalability as part of a framework for low-complexity video adaptation for digital TV broadcasting. Our approaches are based on accelerating the interprediction, focusing on reducing the coding complexity of mode decision and motion estimation tasks of the encoder stage by using information available after the H. 264/AVC decoding stage. The results show that when our techniques are applied, the complexity is reduced by 98 % while maintaining coding efficiency
A predictive analytics approach to reducing avoidable hospital readmission
Hospital readmission has become a critical metric of quality and cost of
healthcare. Medicare anticipates that nearly $17 billion is paid out on the 20%
of patients who are readmitted within 30 days of discharge. Although several
interventions such as transition care management and discharge reengineering
have been practiced in recent years, the effectiveness and sustainability
depends on how well they can identify and target patients at high risk of
rehospitalization. Based on the literature, most current risk prediction models
fail to reach an acceptable accuracy level; none of them considers patient's
history of readmission and impacts of patient attribute changes over time; and
they often do not discriminate between planned and unnecessary readmissions.
Tackling such drawbacks, we develop a new readmission metric based on
administrative data that can identify potentially avoidable readmissions from
all other types of readmission. We further propose a tree based classification
method to estimate the predicted probability of readmission that can directly
incorporate patient's history of readmission and risk factors changes over
time. The proposed methods are validated with 2011-12 Veterans Health
Administration data from inpatients hospitalized for heart failure, acute
myocardial infarction, pneumonia, or chronic obstructive pulmonary disease in
the State of Michigan. Results shows improved discrimination power compared to
the literature (c-statistics>80%) and good calibration.Comment: 30 pages, 4 figures, 7 table
Illuminant Estimation using Ensembles of Multivariate Regression Trees
White balancing is a fundamental step in the image processing pipeline. The
process involves estimating the chromaticity of the illuminant or light source
and using the estimate to correct the image to remove any color cast. Given the
importance of the problem, there has been much previous work on illuminant
estimation. Recently, an approach based on ensembles of univariate regression
trees that are fit using the squared-error loss function has been proposed and
shown to give excellent performance. In this paper, we show that a simpler and
more accurate ensemble model can be learned by (i) using multivariate
regression trees to take into account that the chromaticity components of the
illuminant are correlated and constrained, and (ii) fitting each tree by
directly minimizing a loss function of interest---such as recovery angular
error or reproduction angular error---rather than indirectly using the
squared-error loss function as a surrogate. We show empirically that overall
our method leads to improved performance on diverse image sets.Comment: 20 page
Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model
We aim to produce predictive models that are not only accurate, but are also
interpretable to human experts. Our models are decision lists, which consist of
a series of if...then... statements (e.g., if high blood pressure, then stroke)
that discretize a high-dimensional, multivariate feature space into a series of
simple, readily interpretable decision statements. We introduce a generative
model called Bayesian Rule Lists that yields a posterior distribution over
possible decision lists. It employs a novel prior structure to encourage
sparsity. Our experiments show that Bayesian Rule Lists has predictive accuracy
on par with the current top algorithms for prediction in machine learning. Our
method is motivated by recent developments in personalized medicine, and can be
used to produce highly accurate and interpretable medical scoring systems. We
demonstrate this by producing an alternative to the CHADS score, actively
used in clinical practice for estimating the risk of stroke in patients that
have atrial fibrillation. Our model is as interpretable as CHADS, but more
accurate.Comment: Published at http://dx.doi.org/10.1214/15-AOAS848 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Bayesian Additive Regression Trees using Bayesian Model Averaging
Bayesian Additive Regression Trees (BART) is a statistical sum of trees
model. It can be considered a Bayesian version of machine learning tree
ensemble methods where the individual trees are the base learners. However for
data sets where the number of variables is large (e.g. ) the
algorithm can become prohibitively expensive, computationally.
Another method which is popular for high dimensional data is random forests,
a machine learning algorithm which grows trees using a greedy search for the
best split points. However, as it is not a statistical model, it cannot produce
probabilistic estimates or predictions.
We propose an alternative algorithm for BART called BART-BMA, which uses
Bayesian Model Averaging and a greedy search algorithm to produce a model which
is much more efficient than BART for datasets with large . BART-BMA
incorporates elements of both BART and random forests to offer a model-based
algorithm which can deal with high-dimensional data.
We have found that BART-BMA can be run in a reasonable time on a standard
laptop for the "small large " scenario which is common in many areas of
bioinformatics. We showcase this method using simulated data and data from two
real proteomic experiments; one to distinguish between patients with
cardiovascular disease and controls and another to classify agressive from
non-agressive prostate cancer. We compare our results to their main
competitors.
Open source code written in R and Rcpp to run BART-BMA can be found at:
https://github.com/BelindaHernandez/BART-BMA.gi
Best-scored Random Forest Density Estimation
This paper presents a brand new nonparametric density estimation strategy
named the best-scored random forest density estimation whose effectiveness is
supported by both solid theoretical analysis and significant experimental
performance. The terminology best-scored stands for selecting one density tree
with the best estimation performance out of a certain number of purely random
density tree candidates and we then name the selected one the best-scored
random density tree. In this manner, the ensemble of these selected trees that
is the best-scored random density forest can achieve even better estimation
results than simply integrating trees without selection. From the theoretical
perspective, by decomposing the error term into two, we are able to carry out
the following analysis: First of all, we establish the consistency of the
best-scored random density trees under -norm. Secondly, we provide the
convergence rates of them under -norm concerning with three different tail
assumptions, respectively. Thirdly, the convergence rates under
-norm is presented. Last but not least, we also achieve the above
convergence rates analysis for the best-scored random density forest. When
conducting comparative experiments with other state-of-the-art density
estimation approaches on both synthetic and real data sets, it turns out that
our algorithm has not only significant advantages in terms of estimation
accuracy over other methods, but also stronger resistance to the curse of
dimensionality
Electricity clustering framework for automatic classification of customer loads
Clustering in energy markets is a top topic with high significance on expert and intelligent systems. The main impact of is paper is the proposal of a new clustering framework for the automatic classification of electricity customers’ loads. An automatic selection of the clustering classification algorithm is also highlighted. Finally, new customers can be assigned to a predefined set of clusters in the classificationphase. The computation time of the proposed framework is less than that of previous classification tech- niques, which enables the processing of a complete electric company sample in a matter of minutes on a personal computer. The high accuracy of the predicted classification results verifies the performance of the clustering technique. This classification phase is of significant assistance in interpreting the results, and the simplicity of the clustering phase is sufficient to demonstrate the quality of the complete mining framework.Ministerio de EconomÃa y Competitividad TEC2013-40767-RMinisterio de EconomÃa y Competitividad IDI- 2015004
See5 Algorithm versus Discriminant Analysis. An Application to the Prediction of Insolvency in Spanish Non-life Insurance Companies
Prediction of insurance companies insolvency has arised as an important problem in the field of financial research, due to the necessity of protecting the general public whilst minimizing the costs associated to this problem. Most methods applied in the past to tackle this question are traditional statistical techniques which use financial ratios as explicative variables. However, these variables do not usually satisfy statistical assumptions, what complicates the application of the mentioned methods.In this paper, a comparative study of the performance of a well-known parametric statistical technique (Linear Discriminant Analysis) and a non-parametric machine learning technique (See5) is carried out. We have applied the two methods to the problem of the prediction of insolvency of Spanish non-life insurance companies upon the basis of a set of financial ratios. Results indicate a higher performance of the machine learning technique, what shows that this method can be a useful tool to evaluate insolvency of insurance firms.Insolvency, Insurance Companies, Discriminant Analysis, See5.
Efficient Local Unfolding with Ancestor Stacks
The most successful unfolding rules used nowadays in the partial evaluation
of logic programs are based on well quasi orders (wqo) applied over (covering)
ancestors, i.e., a subsequence of the atoms selected during a derivation.
Ancestor (sub)sequences are used to increase the specialization power of
unfolding while still guaranteeing termination and also to reduce the number of
atoms for which the wqo has to be checked. Unfortunately, maintaining the
structure of the ancestor relation during unfolding introduces significant
overhead. We propose an efficient, practical local unfolding rule based on the
notion of covering ancestors which can be used in combination with a wqo and
allows a stack-based implementation without losing any opportunities for
specialization. Using our technique, certain non-leftmost unfoldings are
allowed as long as local unfolding is performed, i.e., we cover depth-first
strategies.Comment: Number of pages: 32 Number of figures: 7 Number of Tables:
Chapter Random effects regression trees for the analysis of INVALSI data
Mixed or multilevel models exploit random effects to deal with hierarchical data, where statistical units are clustered in groups and cannot be assumed as independent. Sometimes, the assumption of linear dependence of a response on a set of explanatory variables is not plausible, and model specification becomes a challenging task. Regression trees can be helpful to capture non-linear effects of the predictors. This method was extended to clustered data by modelling the fixed effects with a decision tree while accounting for the random effects with a linear mixed model in a separate step (Hajjem & Larocque, 2011; Sela & Simonoff, 2012). Random effect regression trees are shown to be less sensitive to parametric assumptions and provide improved predictive power compared to linear models with random effects and regression trees without random effects. We propose a new random effect model, called Tree embedded linear mixed model, where the regression function is piecewise-linear, consisting in the sum of a tree component and a linear component. This model can deal with both non-linear and interaction effects and cluster mean dependencies. The proposal is the mixed effect version of the semi-linear regression trees (Vannucci, 2019; Vannucci & Gottard, 2019). Model fitting is obtained by an iterative two-stage estimation procedure, where both the fixed and the random effects are jointly estimated. The proposed model allows a decomposition of the effect of a given predictor within and between clusters. We will show via a simulation study and an application to INVALSI data that these extensions improve the predictive performance of the model in the presence of quasi-linear relationships, avoiding overfitting, and facilitating interpretability
- …