21,887 research outputs found
Planar Ultrametric Rounding for Image Segmentation
We study the problem of hierarchical clustering on planar graphs. We
formulate this in terms of an LP relaxation of ultrametric rounding. To solve
this LP efficiently we introduce a dual cutting plane scheme that uses minimum
cost perfect matching as a subroutine in order to efficiently explore the space
of planar partitions. We apply our algorithm to the problem of hierarchical
image segmentation
How good are detection proposals, really?
Current top performing Pascal VOC object detectors employ detection proposals
to guide the search for objects thereby avoiding exhaustive sliding window
search across images. Despite the popularity of detection proposals, it is
unclear which trade-offs are made when using them during object detection. We
provide an in depth analysis of ten object proposal methods along with four
baselines regarding ground truth annotation recall (on Pascal VOC 2007 and
ImageNet 2013), repeatability, and impact on DPM detector performance. Our
findings show common weaknesses of existing methods, and provide insights to
choose the most adequate method for different settings
Machine-learning-based vs. manually designed approaches to anaphor resolution: the best of two worlds
In the last years, much effort went into the design of robust anaphor resolution algorithms. Many algorithms are based on antecedent filtering and preference strategies that are manually designed. Along a different line of research, corpus-based approaches have been investigated that employ machine-learning techniques for deriving strategies automatically. Since the knowledge-engineering effort for designing and optimizing the strategies is reduced, the latter approaches are considered particularly attractive. Since, however, the hand-coding of robust antecedent filtering strategies such as syntactic disjoint reference and agreement in person, number, and gender constitutes a once-for-all effort, the question arises whether at all they should be derived automatically. In this paper, it is investigated what might be gained by combining the best of two worlds: designing the universally valid antecedent filtering strategies manually, in a once-for-all fashion, and deriving the (potentially genre-specific) antecedent selection strategies automatically by applying machine-learning techniques. An anaphor resolution system ROSANA-ML, which follows this paradigm, is designed and implemented. Through a series of formal evaluations, it is shown that, while exhibiting additional advantages, ROSANAML reaches a performance level that compares with the performance of its manually designed ancestor ROSANA
Predicting diabetes-related hospitalizations based on electronic health records
OBJECTIVE: To derive a predictive model to identify patients likely to be hospitalized during the following year due to complications attributed to Type II diabetes. METHODS: A variety of supervised machine learning classification methods were tested and a new method that discovers hidden patient clusters in the positive class (hospitalized) was developed while, at the same time, sparse linear support vector machine classifiers were derived to separate positive samples from the negative ones (non-hospitalized). The convergence of the new method was established and theoretical guarantees were proved on how the classifiers it produces generalize to a test set not seen during training. RESULTS: The methods were tested on a large set of patients from the Boston Medical Center - the largest safety net hospital in New England. It is found that our new joint clustering/classification method achieves an accuracy of 89% (measured in terms of area under the ROC Curve) and yields informative clusters which can help interpret the classification results, thus increasing the trust of physicians to the algorithmic output and providing some guidance towards preventive measures. While it is possible to increase accuracy to 92% with other methods, this comes with increased computational cost and lack of interpretability. The analysis shows that even a modest probability of preventive actions being effective (more than 19%) suffices to generate significant hospital care savings. CONCLUSIONS: Predictive models are proposed that can help avert hospitalizations, improve health outcomes and drastically reduce hospital expenditures. The scope for savings is significant as it has been estimated that in the USA alone, about $5.8 billion are spent each year on diabetes-related hospitalizations that could be prevented.Accepted manuscrip
Towards Automated Performance Bug Identification in Python
Context: Software performance is a critical non-functional requirement,
appearing in many fields such as mission critical applications, financial, and
real time systems. In this work we focused on early detection of performance
bugs; our software under study was a real time system used in the
advertisement/marketing domain.
Goal: Find a simple and easy to implement solution, predicting performance
bugs.
Method: We built several models using four machine learning methods, commonly
used for defect prediction: C4.5 Decision Trees, Na\"{\i}ve Bayes, Bayesian
Networks, and Logistic Regression.
Results: Our empirical results show that a C4.5 model, using lines of code
changed, file's age and size as explanatory variables, can be used to predict
performance bugs (recall=0.73, accuracy=0.85, and precision=0.96). We show that
reducing the number of changes delivered on a commit, can decrease the chance
of performance bug injection.
Conclusions: We believe that our approach can help practitioners to eliminate
performance bugs early in the development cycle. Our results are also of
interest to theoreticians, establishing a link between functional bugs and
(non-functional) performance bugs, and explicitly showing that attributes used
for prediction of functional bugs can be used for prediction of performance
bugs
- âŠ