Search CORE

83,848 research outputs found

Iteration and labelled iteration

Author: Geron Bram
Levy Paul
Publication venue: 'Elsevier BV'
Publication date: 01/10/2016
Field of study

AbstractWe analyse the conventional sum-based representation of iteration from the perspective of programmers, and show that the syntax they suggest is fundamentally not a good representation of Java-style iteration with for, while, break, and continue. We present an alternative syntax, which we call “labelled iteration”, where loops are identified using labels.The languages are analysed: we give denotational and operational semantics, adequacy proofs for both languages, and a translation function from sum-based iteration to labelled iteration

Elsevier - Publisher Connector

Crossref

University of Birmingham Research Portal

Deep Active Learning for Classifying Cancer Pathology Reports

Author: Alawad Mohammed
Coyle Linda
De Angeli Kevin
Doherty Jennifer
Durbin Eric B.
Gao Shang
Penberthy Lynne
Schaeferkoetter Noah
Stroup Antoinette
Tourassi Georgia
Wu Xiao‑Cheng
Yoon Hong‑Jun
Publication venue: UKnowledge
Publication date: 09/03/2021
Field of study

Background: Automated text classification has many important applications in the clinical setting; however, obtaining labelled data for training machine learning and deep learning models is often difficult and expensive. Active learning techniques may mitigate this challenge by reducing the amount of labelled data required to effectively train a model. In this study, we analyze the effectiveness of 11 active learning algorithms on classifying subsite and histology from cancer pathology reports using a Convolutional Neural Network as the text classification model. Results: We compare the performance of each active learning strategy using two differently sized datasets and two different classification tasks. Our results show that on all tasks and dataset sizes, all active learning strategies except diversity-sampling strategies outperformed random sampling, i.e., no active learning. On our large dataset (15K initial labelled samples, adding 15K additional labelled samples each iteration of active learning), there was no clear winner between the different active learning strategies. On our small dataset (1K initial labelled samples, adding 1K additional labelled samples each iteration of active learning), marginal and ratio uncertainty sampling performed better than all other active learning techniques. We found that compared to random sampling, active learning strongly helps performance on rare classes by focusing on underrepresented classes. Conclusions: Active learning can save annotation cost by helping human annotators efficiently and intelligently select which samples to label. Our results show that a dataset constructed using effective active learning techniques requires less than half the amount of labelled data to achieve the same performance as a dataset constructed using random sampling

University of Kentucky

Computing Probabilistic Bisimilarity Distances via Policy Iteration

Author: Tang Qiyi
van Breugel Franck
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th International Conference on Concurrency Theory (CONCUR 2016)
Publication date: 01/01/2016
Field of study

A transformation mapping a labelled Markov chain to a simple stochastic game is presented. In the resulting simple stochastic game, each vertex corresponds to a pair of states of the labelled Markov chain. The value of a vertex of the simple stochastic game is shown to be equal to the probabilistic bisimilarity distance, a notion due to Desharnais, Gupta, Jagadeesan and Panangaden, of the corresponding pair of states of the labelled Markov chain. Bacci, Bacci, Larsen and Mardare introduced an algorithm to compute the probabilistic bisimilarity distances for a labelled Markov chain. A modification of a basic version of their algorithm for a labelled Markov chain is shown to be the policy iteration algorithm applied to the corresponding simple stochastic game. Furthermore, it is shown that this algorithm takes exponential time in the worst case

Dagstuhl Research Online Publication Server

AutoSimulate: (Quickly) Learning Synthetic Data Generation

Author: B Colson
C Sakaridis
E Shelhamer
F Pedregosa
I Georgiev
J Nocedal
M Mitchell
O Ronneberger
RJ Williams
SR Richter
Publication venue
Publication date: 01/01/2020
Field of study

Simulation is increasingly being used for generating large labelled datasets in many machine learning problems. Recent methods have focused on adjusting simulator parameters with the goal of maximising accuracy on a validation task, usually relying on REINFORCE-like gradient estimators. However these approaches are very expensive as they treat the entire data generation, model training, and validation pipeline as a black-box and require multiple costly objective evaluations at each iteration. We propose an efficient alternative for optimal synthetic data generation, based on a novel differentiable approximation of the objective. This allows us to optimize the simulator, which may be non-differentiable, requiring only one objective evaluation at each iteration with a little overhead. We demonstrate on a state-of-the-art photorealistic renderer that the proposed method finds the optimal data distribution faster (up to

50\times

), with significantly reduced training data generation (up to

30\times

) and better accuracy (

+8.7\%

) on real-world test datasets than previous methods.Comment: ECCV 202

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Construct, Merge, Solve and Adapt: Application to the repetition-free longest common subsequence problem

Author: A Aho
C Blum
C Blum
C Blum
D Gusfield
D Maier
D Pisinger
J Storer
M Castelli
SS Adi
T Jiang
T Smith
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

In this paper we present the application of a recently proposed, general, algorithm for combinatorial optimization to the repetition-free longest common subsequence problem. The applied algorithm, which is labelled Construct, Merge, Solve & Adapt, generates sub-instances based on merging the solution components found in randomly constructed solutions. These sub-instances are subsequently solved by means of an exact solver. Moreover, the considered sub-instances are dynamically changing due to adding new solution components at each iteration, and removing existing solution components on the basis of indicators about their usefulness. The results of applying this algorithm to the repetition-free longest common subsequence problem show that the algorithm generally outperforms competing approaches from the literature. Moreover, they show that the algorithm is competitive with CPLEX for small and medium size problem instances, whereas it outperforms CPLEX for larger problem instances.Peer ReviewedPostprint (author's final draft

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Digital.CSIC

Equilibrium States in Numerical Argumentation Networks

Author: Gabbay D.
Rodrigues O.
Publication venue
Publication date: 01/01/2014
Field of study

Given an argumentation network with initial values to the arguments, we look for algorithms which can yield extensions compatible with such initial values. We find that the best way of tackling this problem is to offer an iteration formula that takes the initial values and the attack relation and iterates a sequence of intermediate values that eventually converges leading to an extension. The properties surrounding the application of the iteration formula and its connection with other numerical and non-numerical techniques proposed by others are thoroughly investigated in this paper

arXiv.org e-Print Archive

CiteSeerX

Crossref

King's Research Portal

Open Repository and Bibliography - Luxembourg

Size versus truthfulness in the House Allocation problem

Author: Krysta Piotr
Manlove David
Rastegari Baharak
Zhang Jinshan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/05/2019
Field of study

We study the House Allocation problem (also known as the Assignment problem), i.e., the problem of allocating a set of objects among a set of agents, where each agent has ordinal preferences (possibly involving ties) over a subset of the objects. We focus on truthful mechanisms without monetary transfers for finding large Pareto optimal matchings. It is straightforward to show that no deterministic truthful mechanism can approximate a maximum cardinality Pareto optimal matching with ratio better than 2. We thus consider randomised mechanisms. We give a natural and explicit extension of the classical Random Serial Dictatorship Mechanism (RSDM) specifically for the House Allocation problem where preference lists can include ties. We thus obtain a universally truthful randomised mechanism for finding a Pareto optimal matching and show that it achieves an approximation ratio of

\frac{e}{e-1}

. The same bound holds even when agents have priorities (weights) and our goal is to find a maximum weight (as opposed to maximum cardinality) Pareto optimal matching. On the other hand we give a lower bound of

\frac{18}{13}

on the approximation ratio of any universally truthful Pareto optimal mechanism in settings with strict preferences. In the case that the mechanism must additionally be non-bossy with an additional technical assumption, we show by utilising a result of Bade that an improved lower bound of

\frac{e}{e-1}

holds. This lower bound is tight since RSDM for strict preference lists is non-bossy. We moreover interpret our problem in terms of the classical secretary problem and prove that our mechanism provides the best randomised strategy of the administrator who interviews the applicants.Comment: To appear in Algorithmica (preliminary version appeared in the Proceedings of EC 2014

arXiv.org e-Print Archive

University of Liverpool Repository

Southampton (e-Prints Soton)

Enlighten