1,990 research outputs found

    Disentangling human error from the ground truth in segmentation of medical images

    Get PDF
    Recent years have seen increasing use of supervised learning methods for segmentation tasks. However, the predictive performance of these algorithms depends on the quality of labels. This problem is particularly pertinent in the medical image domain, where both the annotation cost and inter-observer variability are high. In a typical label acquisition process, different human experts provide their estimates of the "true'' segmentation labels under the influence of their own biases and competence levels. Treating these noisy labels blindly as the ground truth limits the performance that automatic segmentation algorithms can achieve. In this work, we present a method for jointly learning, from purely noisy observations alone, the reliability of individual annotators and the true segmentation label distributions, using two coupled CNNs. The separation of the two is achieved by encouraging the estimated annotators to be maximally unreliable while achieving high fidelity with the noisy training data. We first define a toy segmentation dataset based on MNIST and study the properties of the proposed algorithm. We then demonstrate the utility of the method on three public medical imaging segmentation datasets with simulated (when necessary) and real diverse annotations: 1) MSLSC (multiple-sclerosis lesions); 2) BraTS (brain tumours); 3) LIDC-IDRI (lung abnormalities). In all cases, our method outperforms competing methods and relevant baselines particularly in cases where the number of annotations is small and the amount of disagreement is large. The experiments also show strong ability to capture the complex spatial characteristics of annotators' mistakes. Our code is available at \url{https://github.com/moucheng2017/LearnNoisyLabelsMedicalImages}

    Tight Complexity Bounds for Counting Generalized Dominating Sets in Bounded-Treewidth Graphs

    Full text link
    We investigate how efficiently a well-studied family of domination-type problems can be solved on bounded-treewidth graphs. For sets σ,ρ\sigma,\rho of non-negative integers, a (σ,ρ)(\sigma,\rho)-set of a graph GG is a set SS of vertices such that N(u)Sσ|N(u)\cap S|\in \sigma for every uSu\in S, and N(v)Sρ|N(v)\cap S|\in \rho for every v∉Sv\not\in S. The problem of finding a (σ,ρ)(\sigma,\rho)-set (of a certain size) unifies standard problems such as Independent Set, Dominating Set, Independent Dominating Set, and many others. For all pairs of finite or cofinite sets (σ,ρ)(\sigma,\rho), we determine (under standard complexity assumptions) the best possible value cσ,ρc_{\sigma,\rho} such that there is an algorithm that counts (σ,ρ)(\sigma,\rho)-sets in time cσ,ρtwnO(1)c_{\sigma,\rho}^{\sf tw}\cdot n^{O(1)} (if a tree decomposition of width tw{\sf tw} is given in the input). For example, for the Exact Independent Dominating Set problem (also known as Perfect Code) corresponding to σ={0}\sigma=\{0\} and ρ={1}\rho=\{1\}, we improve the 3twnO(1)3^{\sf tw}\cdot n^{O(1)} algorithm of [van Rooij, 2020] to 2twnO(1)2^{\sf tw}\cdot n^{O(1)}. Despite the unusually delicate definition of cσ,ρc_{\sigma,\rho}, we show that our algorithms are most likely optimal, i.e., for any pair (σ,ρ)(\sigma, \rho) of finite or cofinite sets where the problem is non-trivial, and any ε>0\varepsilon>0, a (cσ,ρε)twnO(1)(c_{\sigma,\rho}-\varepsilon)^{\sf tw}\cdot n^{O(1)}-algorithm counting the number of (σ,ρ)(\sigma,\rho)-sets would violate the Counting Strong Exponential-Time Hypothesis (#SETH). For finite sets σ\sigma and ρ\rho, our lower bounds also extend to the decision version, showing that our algorithms are optimal in this setting as well. In contrast, for many cofinite sets, we show that further significant improvements for the decision and optimization versions are possible using the technique of representative sets

    A physics-based machine learning technique rapidly reconstructs the wall-shear stress and pressure fields in coronary arteries

    Get PDF
    With the global rise of cardiovascular disease including atherosclerosis, there is a high demand for accurate diagnostic tools that can be used during a short consultation. In view of pathology, abnormal blood flow patterns have been demonstrated to be strong predictors of atherosclerotic lesion incidence, location, progression, and rupture. Prediction of patient-specific blood flow patterns can hence enable fast clinical diagnosis. However, the current state of art for the technique is by employing 3D-imaging-based Computational Fluid Dynamics (CFD). The high computational cost renders these methods impractical. In this work, we present a novel method to expedite the reconstruction of 3D pressure and shear stress fields using a combination of a reduced-order CFD modelling technique together with non-linear regression tools from the Machine Learning (ML) paradigm. Specifically, we develop a proof-of-concept automated pipeline that uses randomised perturbations of an atherosclerotic pig coronary artery to produce a large dataset of unique mesh geometries with variable blood flow. A total of 1,407 geometries were generated from seven reference arteries and were used to simulate blood flow using the CFD solver Abaqus. This CFD dataset was then post-processed using the mesh-domain common-base Proper Orthogonal Decomposition (cPOD) method to obtain Eigen functions and principal coefficients, the latter of which is a product of the individual mesh flow solutions with the POD Eigenvectors. Being a data-reduction method, the POD enables the data to be represented using only the ten most significant modes, which captures cumulatively greater than 95% of variance of flow features due to mesh variations. Next, the node coordinate data of the meshes were embedded in a two-dimensional coordinate system using the t-distributed Stochastic Neighbor Embedding ((Formula presented.) -SNE) algorithm. The reduced dataset for (Formula presented.) -SNE coordinates and corresponding vector of POD coefficients were then used to train a Random Forest Regressor (RFR) model. The same methodology was applied to both the volumetric pressure solution and the wall shear stress. The predicted pattern of blood pressure, and shear stress in unseen arterial geometries were compared with the ground truth CFD solutions on “unseen” meshes. The new method was able to reliably reproduce the 3D coronary artery haemodynamics in less than 10 s

    Measuring Strategic Uncertainty in Coordination Games

    Get PDF
    Lecture on the first SFB/TR 15 meeting, Gummersbach, July, 18 - 20, 2004This paper explores predictability of behavior in coordination games with multiple equilibria. In a laboratory experiment we measure subjects' certainty equivalents for three coordination games and one lottery. Attitudes towards strategic uncertainty in coordination games are related to risk aversion, experience seeking, gender and age. From the distribution of certainty equivalents among participating students we estimate probabilities for successful coordination in a wide range of coordination games. For many games success of coordination is predictable with a reasonable error rate. The best response of a risk neutral player is close to the global-game solution. Comparing choices in coordination games with revealed risk aversion, we estimate subjective probabilities for successful coordination. In games with a low coordination requirement, most subjects underestimate the probability of success. In games with a high coordination requirement, most subjects overestimate this probability. Data indicate that subjects have probabilistic beliefs about success or failure of coordination rather than beliefs about individual behavior of other players

    The distribution of hatching time in Anopheles gambiae

    Get PDF
    BACKGROUND: Knowledge of the ecological differences between the molecular forms of Anopheles gambiae and their sibling species, An. arabiensis might lead to understanding their unique contribution to disease transmission and to better vector control as well as to understanding the evolutionary forces that have separated them. METHODS: The distributions of hatching time of eggs of wild An. gambiae and An. arabiensis females were compared in different water types. Early and late hatchers of the S molecular form were compared with respect to their total protein content, sex ratio, development success, developmental time and adult body size. RESULTS: Overall, the distribution of hatching time was strongly skewed to the right, with 89% of the eggs hatching during the second and third day post oviposition, 10% hatching during the next four days and the remaining 1% hatching over the subsequent week. Slight, but significant differences were found between species and between the molecular forms in all water types. Differences in hatching time distribution were also found among water types (in each species and molecular form), suggesting that the eggs change their hatching time in response to chemical factors in the water. Early hatchers were similar to late hatchers except that they developed faster and produced smaller adults than late hatchers. CONCLUSION: Differences in hatching time and speed of development among eggs of the same batch may be adaptive if catastrophic events such as larval site desiccation are not rare and the site's quality is unpredictable. The egg is not passive and its hatching time depends on water factors. Differences in hatching time between species and molecular forms were slight, probably reflecting that conditions in their larval sites are rather similar

    Self-Organization, Layered Structure, and Aggregation Enhance Persistence of a Synthetic Biofilm Consortium

    Get PDF
    Microbial consortia constitute a majority of the earth’s biomass, but little is known about how these cooperating communities persist despite competition among community members. Theory suggests that non-random spatial structures contribute to the persistence of mixed communities; when particular structures form, they may provide associated community members with a growth advantage over unassociated members. If true, this has implications for the rise and persistence of multi-cellular organisms. However, this theory is difficult to study because we rarely observe initial instances of non-random physical structure in natural populations. Using two engineered strains of Escherichia coli that constitute a synthetic symbiotic microbial consortium, we fortuitously observed such spatial self-organization. This consortium forms a biofilm and, after several days, adopts a defined layered structure that is associated with two unexpected, measurable growth advantages. First, the consortium cannot successfully colonize a new, downstream environment until it selforganizes in the initial environment; in other words, the structure enhances the ability of the consortium to survive environmental disruptions. Second, when the layered structure forms in downstream environments the consortium accumulates significantly more biomass than it did in the initial environment; in other words, the structure enhances the global productivity of the consortium. We also observed that the layered structure only assembles in downstream environments that are colonized by aggregates from a previous, structured community. These results demonstrate roles for self-organization and aggregation in persistence of multi-cellular communities, and also illustrate a role for the techniques of synthetic biology in elucidating fundamental biological principles

    Prediction and treatment of asthma in preschool children at risk: study design and baseline data of a prospective cohort study in general practice (ARCADE)

    Get PDF
    Background: Asthma is a difficult diagnosis to establish in preschool children. A few years ago, our group presented a prediction rule for young children at risk for asthma in general practice. Before this prediction rule can safely be used in practice, cross-validation is required. In addition, general practitioners face many therapeutic management decisions in children at risk for asthma. The objectives of the study are: (1) identification of predictors for asthma in preschool children at risk for asthma with the aim of cross-validating an earlier derived prediction rule; (2) compare the effects of different treatment strategies in preschool children. Design: In this prospective cohort study one to five year old children at risk of developing asthma were selected from general practices. At risk was defined as 'visited the general practitioner with recurrent coughing (≥ 2 visits), wheezing (≥ 1) or shortness of breath (≥ 1) in the previous 12 months'. All children in this prospective cohort study will be followed until the age of six. For our prediction rule, demographic data, data with respect to clinical history and additional tests (specific immunoglobulin E (IgE), fractional exhaled nitric oxide (FENO), peak expiratory flow (PEF)) are collected. History of airway specific medication use, symptom severity and health-related quality of life (QoL) are collected to estimate the effect of different treatment intensities (as expressed in GINA levels) using recently developed statistical techniques. In total, 1,938 children at risk of asthma were selected from general practice and 771 children (40%) were enrolled. At the time of writing, follow-up for all 5-year olds and the majority of the 4-year olds is complete. The total and specific IgE measurements at baseline were carried out by 87% of the children. Response rates to the repeated questionnaires varied from 93% at baseline to 73% after 18 months follow-up; 89% and 87% performed PEF and FENO measurements, respectively. Discussion: In this study a prediction rule for asthma inyoung children, to be used in (general) practice, will be cross-validated. Our study will also provide more insight in the effect of treatment of asthma in preschool children

    Tight Complexity Bounds for Counting Generalized Dominating Sets in Bounded-Treewidth Graphs

    Get PDF
    We investigate how efficiently a well-studied family of domination-type problems can be solved on bounded-treewidth graphs. For sets σ,ρ\sigma,\rho of non-negative integers, a (σ,ρ)(\sigma,\rho)-set of a graph GG is a set SS of vertices such that N(u)Sσ|N(u)\cap S|\in \sigma for every uSu\in S, and N(v)Sρ|N(v)\cap S|\in \rho for every v∉Sv\not\in S. The problem of finding a (σ,ρ)(\sigma,\rho)-set (of a certain size) unifies standard problems such as \textsc{Independent Set}, \textsc{Dominating Set}, \textsc{Independent Dominating Set}, and many others. For almost all pairs of finite or cofinite sets (σ,ρ)(\sigma,\rho), we determine (under standard complexity assumptions) the best possible value cσ,ρc_{\sigma,\rho} such that there is an algorithm that counts (σ,ρ)(\sigma,\rho)-sets in time c_{\sigma,\rho}^\tw\cdot n^{\O(1)} (if a tree decomposition of width \tw is given in the input). Let \sigMax denote the largest element of σ\sigma if σ\sigma is finite, or the largest missing integer +1+1 if σ\sigma is cofinite; \rhoMax is defined analogously for ρ\rho. Surprisingly, cσ,ρc_{\sigma,\rho} is often significantly smaller than the natural bound \sigMax+\rhoMax+2 achieved by existing algorithms [van Rooij, 2020]. Toward defining cσ,ρc_{\sigma,\rho}, we say that (σ,ρ)(\sigma, \rho) is \mname-structured if there is a pair (α,β)(\alpha,\beta) such that every integer in σ\sigma equals α\alpha mod \mname, and every integer in ρ\rho equals β\beta mod \mname. Then, setting \begin{itemize} \item c_{\sigma,\rho}=\max\{\sigMax,\rhoMax\}+1 if (σ,ρ)(\sigma,\rho) is \mname-structured for some \mname \ge 3, or 2-structured with \sigMax\neq \rhoMax, or 2-structured with \sigMax=\rhoMax being odd, \item c_{\sigma,\rho}=\max\{\sigMax,\rhoMax\}+2 if (σ,ρ)(\sigma,\rho) is 2-structured, but not \mname-structured for any \mname \ge 3, and \sigMax=\rhoMax is even, and \item c_{\sigma,\rho}=\sigMax+\rhoMax+2 if (σ,ρ)(\sigma,\rho) is not \mname-structured for any \mname\ge 2, \end{itemize} we provide algorithms counting (σ,ρ)(\sigma,\rho)-sets in time c_{\sigma,\rho}^\tw\cdot n^{\O(1)}. For example, for the \textsc{Exact Independent Dominating Set} problem (also known as \textsc{Perfect Code}) corresponding to σ={0}\sigma=\{0\} and ρ={1}\rho=\{1\}, this improves the 3^\tw\cdot n^{\O(1)} algorithm of van Rooij to 2^\tw\cdot n^{\O(1)}. Despite the unusually delicate definition of cσ,ρc_{\sigma,\rho}, we show that our algorithms are most likely optimal, i.e., for any pair (σ,ρ)(\sigma, \rho) of finite or cofinite sets where the problem is non-trivial (except those having cofinite σ\sigma with ρ=Z0\rho=\mathbb Z_{\ge0}), and any ε>0\varepsilon>0, a (c_{\sigma,\rho}-\varepsilon)^\tw\cdot n^{\O(1)}-algorithm counting the number of (σ,ρ)(\sigma,\rho)-sets would violate the Counting Strong Exponential-Time Hypothesis (\#SETH). For finite sets σ\sigma and ρ\rho, our lower bounds also extend to the decision version, showing that our algorithms are optimal in this setting as well. In contrast, for many cofinite sets, we show that further significant improvements for the decision and optimization versions are possible using the technique of representative sets
    corecore