73,162 research outputs found
Cross-Modal Data Programming Enables Rapid Medical Machine Learning
Labeling training datasets has become a key barrier to building medical
machine learning models. One strategy is to generate training labels
programmatically, for example by applying natural language processing pipelines
to text reports associated with imaging studies. We propose cross-modal data
programming, which generalizes this intuitive strategy in a
theoretically-grounded way that enables simpler, clinician-driven input,
reduces required labeling time, and improves with additional unlabeled data. In
this approach, clinicians generate training labels for models defined over a
target modality (e.g. images or time series) by writing rules over an auxiliary
modality (e.g. text reports). The resulting technical challenge consists of
estimating the accuracies and correlations of these rules; we extend a recent
unsupervised generative modeling technique to handle this cross-modal setting
in a provably consistent way. Across four applications in radiography, computed
tomography, and electroencephalography, and using only several hours of
clinician time, our approach matches or exceeds the efficacy of
physician-months of hand-labeling with statistical significance, demonstrating
a fundamentally faster and more flexible way of building machine learning
models in medicine
Predicting and Evaluating Software Model Growth in the Automotive Industry
The size of a software artifact influences the software quality and impacts
the development process. In industry, when software size exceeds certain
thresholds, memory errors accumulate and development tools might not be able to
cope anymore, resulting in a lengthy program start up times, failing builds, or
memory problems at unpredictable times. Thus, foreseeing critical growth in
software modules meets a high demand in industrial practice. Predicting the
time when the size grows to the level where maintenance is needed prevents
unexpected efforts and helps to spot problematic artifacts before they become
critical.
Although the amount of prediction approaches in literature is vast, it is
unclear how well they fit with prerequisites and expectations from practice. In
this paper, we perform an industrial case study at an automotive manufacturer
to explore applicability and usability of prediction approaches in practice. In
a first step, we collect the most relevant prediction approaches from
literature, including both, approaches using statistics and machine learning.
Furthermore, we elicit expectations towards predictions from practitioners
using a survey and stakeholder workshops. At the same time, we measure software
size of 48 software artifacts by mining four years of revision history,
resulting in 4,547 data points. In the last step, we assess the applicability
of state-of-the-art prediction approaches using the collected data by
systematically analyzing how well they fulfill the practitioners' expectations.
Our main contribution is a comparison of commonly used prediction approaches
in a real world industrial setting while considering stakeholder expectations.
We show that the approaches provide significantly different results regarding
prediction accuracy and that the statistical approaches fit our data best
Connecting Software Metrics across Versions to Predict Defects
Accurate software defect prediction could help software practitioners
allocate test resources to defect-prone modules effectively and efficiently. In
the last decades, much effort has been devoted to build accurate defect
prediction models, including developing quality defect predictors and modeling
techniques. However, current widely used defect predictors such as code metrics
and process metrics could not well describe how software modules change over
the project evolution, which we believe is important for defect prediction. In
order to deal with this problem, in this paper, we propose to use the
Historical Version Sequence of Metrics (HVSM) in continuous software versions
as defect predictors. Furthermore, we leverage Recurrent Neural Network (RNN),
a popular modeling technique, to take HVSM as the input to build software
prediction models. The experimental results show that, in most cases, the
proposed HVSM-based RNN model has a significantly better effort-aware ranking
effectiveness than the commonly used baseline models
- …