1,354 research outputs found

    On Using Active Learning and Self-Training when Mining Performance Discussions on Stack Overflow

    Full text link
    Abundant data is the key to successful machine learning. However, supervised learning requires annotated data that are often hard to obtain. In a classification task with limited resources, Active Learning (AL) promises to guide annotators to examples that bring the most value for a classifier. AL can be successfully combined with self-training, i.e., extending a training set with the unlabelled examples for which a classifier is the most certain. We report our experiences on using AL in a systematic manner to train an SVM classifier for Stack Overflow posts discussing performance of software components. We show that the training examples deemed as the most valuable to the classifier are also the most difficult for humans to annotate. Despite carefully evolved annotation criteria, we report low inter-rater agreement, but we also propose mitigation strategies. Finally, based on one annotator's work, we show that self-training can improve the classification accuracy. We conclude the paper by discussing implication for future text miners aspiring to use AL and self-training.Comment: Preprint of paper accepted for the Proc. of the 21st International Conference on Evaluation and Assessment in Software Engineering, 201

    Suggestive Annotation: A Deep Active Learning Framework for Biomedical Image Segmentation

    Full text link
    Image segmentation is a fundamental problem in biomedical image analysis. Recent advances in deep learning have achieved promising results on many biomedical image segmentation benchmarks. However, due to large variations in biomedical images (different modalities, image settings, objects, noise, etc), to utilize deep learning on a new application, it usually needs a new set of training data. This can incur a great deal of annotation effort and cost, because only biomedical experts can annotate effectively, and often there are too many instances in images (e.g., cells) to annotate. In this paper, we aim to address the following question: With limited effort (e.g., time) for annotation, what instances should be annotated in order to attain the best performance? We present a deep active learning framework that combines fully convolutional network (FCN) and active learning to significantly reduce annotation effort by making judicious suggestions on the most effective annotation areas. We utilize uncertainty and similarity information provided by FCN and formulate a generalized version of the maximum set cover problem to determine the most representative and uncertain areas for annotation. Extensive experiments using the 2015 MICCAI Gland Challenge dataset and a lymph node ultrasound image segmentation dataset show that, using annotation suggestions by our method, state-of-the-art segmentation performance can be achieved by using only 50% of training data.Comment: Accepted at MICCAI 201

    Optimism in Active Learning with Gaussian Processes

    Get PDF
    International audienceIn the context of Active Learning for classification, the classification error depends on the joint distribution of samples and their labels which is initially unknown. The minimization of this error requires estimating this distribution. Online estimation of this distribution involves a trade-off between exploration and exploitation. This is a common problem in machine learning for which multi-armed bandit theory, building upon Optimism in the Face of Uncertainty, has been proven very efficient these last years. We introduce two novel algorithms that use Optimism in the Face of Uncertainty along with Gaussian Processes for the Active Learning problem. The evaluation lead on real world datasets shows that these new algorithms compare positively to state-of-the-art methods

    A Monte Carlo study of the three-dimensional Coulomb frustrated Ising ferromagnet

    Full text link
    We have investigated by Monte-Carlo simulation the phase diagram of a three-dimensional Ising model with nearest-neighbor ferromagnetic interactions and small, but long-range (Coulombic) antiferromagnetic interactions. We have developed an efficient cluster algorithm and used different lattice sizes and geometries, which allows us to obtain the main characteristics of the temperature-frustration phase diagram. Our finite-size scaling analysis confirms that the melting of the lamellar phases into the paramgnetic phase is driven first-order by the fluctuations. Transitions between ordered phases with different modulation patterns is observed in some regions of the diagram, in agreement with a recent mean-field analysis.Comment: 14 pages, 10 figures, submitted to Phys. Rev.

    Active Sampling-based Binary Verification of Dynamical Systems

    Full text link
    Nonlinear, adaptive, or otherwise complex control techniques are increasingly relied upon to ensure the safety of systems operating in uncertain environments. However, the nonlinearity of the resulting closed-loop system complicates verification that the system does in fact satisfy those requirements at all possible operating conditions. While analytical proof-based techniques and finite abstractions can be used to provably verify the closed-loop system's response at different operating conditions, they often produce conservative approximations due to restrictive assumptions and are difficult to construct in many applications. In contrast, popular statistical verification techniques relax the restrictions and instead rely upon simulations to construct statistical or probabilistic guarantees. This work presents a data-driven statistical verification procedure that instead constructs statistical learning models from simulated training data to separate the set of possible perturbations into "safe" and "unsafe" subsets. Binary evaluations of closed-loop system requirement satisfaction at various realizations of the uncertainties are obtained through temporal logic robustness metrics, which are then used to construct predictive models of requirement satisfaction over the full set of possible uncertainties. As the accuracy of these predictive statistical models is inherently coupled to the quality of the training data, an active learning algorithm selects additional sample points in order to maximize the expected change in the data-driven model and thus, indirectly, minimize the prediction error. Various case studies demonstrate the closed-loop verification procedure and highlight improvements in prediction error over both existing analytical and statistical verification techniques.Comment: 23 page

    An analysis of a manufacturing process using the GERT approach

    Get PDF
    Graphical Evaluation and Review Technique for analyzing manufacturing processe

    Higgs signals and hard photons at the Next Linear Collider: the ZZZZ-fusion channel in the Standard Model

    Get PDF
    In this paper, we extend the analyses carried out in a previous article for WWWW-fusion to the case of Higgs production via ZZZZ-fusion within the Standard Model at the Next Linear Collider, in presence of electromagnetic radiation due real photon emission. Calculations are carried out at tree-level and rates of the leading order (LO) processes e^+e^-\rightarrow e^+e^- H \ar e^+e^- b\bar b and e^+e^-\rightarrow e^+e^- H \ar e^+e^- WW \ar e^+e^- \mathrm{jjjj} are compared to those of the next-to-leading order (NLO) reactions e^+e^-\rightarrow e^+e^- H (\gamma)\ar e^+e^- b\bar b \gamma and e^+e^-\rightarrow e^+e^- H (\gamma)\ar e^+e^- WW (\gamma) \ar e^+e^- \mathrm{jjjj}\gamma, in the case of energetic and isolated photons.Comment: 12 pages, LaTeX, 5 PostScript figures embedded using epsfig and bitmapped at 100dpi, complete paper including high definition figures available at ftp://axpa.hep.phy.cam.ac.uk/stefano/cavendish_9611.ps or at http://www.hep.phy.cam.ac.uk/theory/papers

    Multiple-scattering effects on incoherent neutron scattering in glasses and viscous liquids

    Full text link
    Incoherent neutron scattering experiments are simulated for simple dynamic models: a glass (with a smooth distribution of harmonic vibrations) and a viscous liquid (described by schematic mode-coupling equations). In most situations multiple scattering has little influence upon spectral distributions, but it completely distorts the wavenumber-dependent amplitudes. This explains an anomaly observed in recent experiments

    Discovering Valuable Items from Massive Data

    Full text link
    Suppose there is a large collection of items, each with an associated cost and an inherent utility that is revealed only once we commit to selecting it. Given a budget on the cumulative cost of the selected items, how can we pick a subset of maximal value? This task generalizes several important problems such as multi-arm bandits, active search and the knapsack problem. We present an algorithm, GP-Select, which utilizes prior knowledge about similarity be- tween items, expressed as a kernel function. GP-Select uses Gaussian process prediction to balance exploration (estimating the unknown value of items) and exploitation (selecting items of high value). We extend GP-Select to be able to discover sets that simultaneously have high utility and are diverse. Our preference for diversity can be specified as an arbitrary monotone submodular function that quantifies the diminishing returns obtained when selecting similar items. Furthermore, we exploit the structure of the model updates to achieve an order of magnitude (up to 40X) speedup in our experiments without resorting to approximations. We provide strong guarantees on the performance of GP-Select and apply it to three real-world case studies of industrial relevance: (1) Refreshing a repository of prices in a Global Distribution System for the travel industry, (2) Identifying diverse, binding-affine peptides in a vaccine de- sign task and (3) Maximizing clicks in a web-scale recommender system by recommending items to users
    • …
    corecore