3,529 research outputs found

    Suggestive Annotation: A Deep Active Learning Framework for Biomedical Image Segmentation

    Full text link
    Image segmentation is a fundamental problem in biomedical image analysis. Recent advances in deep learning have achieved promising results on many biomedical image segmentation benchmarks. However, due to large variations in biomedical images (different modalities, image settings, objects, noise, etc), to utilize deep learning on a new application, it usually needs a new set of training data. This can incur a great deal of annotation effort and cost, because only biomedical experts can annotate effectively, and often there are too many instances in images (e.g., cells) to annotate. In this paper, we aim to address the following question: With limited effort (e.g., time) for annotation, what instances should be annotated in order to attain the best performance? We present a deep active learning framework that combines fully convolutional network (FCN) and active learning to significantly reduce annotation effort by making judicious suggestions on the most effective annotation areas. We utilize uncertainty and similarity information provided by FCN and formulate a generalized version of the maximum set cover problem to determine the most representative and uncertain areas for annotation. Extensive experiments using the 2015 MICCAI Gland Challenge dataset and a lymph node ultrasound image segmentation dataset show that, using annotation suggestions by our method, state-of-the-art segmentation performance can be achieved by using only 50% of training data.Comment: Accepted at MICCAI 201

    Resampling-based confidence regions and multiple tests for a correlated random vector

    Get PDF
    We derive non-asymptotic confidence regions for the mean of a random vector whose coordinates have an unknown dependence structure. The random vector is supposed to be either Gaussian or to have a symmetric bounded distribution, and we observe nn i.i.d copies of it. The confidence regions are built using a data-dependent threshold based on a weighted bootstrap procedure. We consider two approaches, the first based on a concentration approach and the second on a direct boostrapped quantile approach. The first one allows to deal with a very large class of resampling weights while our results for the second are restricted to Rademacher weights. However, the second method seems more accurate in practice. Our results are motivated by multiple testing problems, and we show on simulations that our procedures are better than the Bonferroni procedure (union bound) as soon as the observed vector has sufficiently correlated coordinates.Comment: submitted to COL

    Transcriptional adaptation of Mycobacterium tuberculosis within macrophages: Insights into the phagosomal environment

    Get PDF
    Little is known about the biochemical environment in phagosomes harboring an infectious agent. To assess the state of this organelle we captured the transcriptional responses of Mycobacterium tuberculosis (MTB) in macrophages from wild-type and nitric oxide (NO) synthase 2–deficient mice before and after immunologic activation. The intraphagosomal transcriptome was compared with the transcriptome of MTB in standard broth culture and during growth in diverse conditions designed to simulate features of the phagosomal environment. Genes expressed differentially as a consequence of intraphagosomal residence included an interferon � – and NO-induced response that intensifies an iron-scavenging program, converts the microbe from aerobic to anaerobic respiration, and induces a dormancy regulon. Induction of genes involved in the activation and �-oxidation of fatty acids indicated that fatty acids furnish carbon and energy. Induction of �E-dependent, sodium dodecyl sulfate–regulated genes and genes involved in mycolic acid modification pointed to damage and repair of the cell envelope. Sentinel genes within the intraphagosomal transcriptome were induced similarly by MTB in the lungs of mice. The microbial transcriptome thus served as a bioprobe of the MTB phagosomal environment

    Analyzing 2D gel images using a two-component empirical bayes model

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Two-dimensional polyacrylomide gel electrophoresis (2D gel, 2D PAGE, 2-DE) is a powerful tool for analyzing the proteome of a organism. Differential analysis of 2D gel images aims at finding proteins that change under different conditions, which leads to large-scale hypothesis testing as in microarray data analysis. Two-component empirical Bayes (EB) models have been widely discussed for large-scale hypothesis testing and applied in the context of genomic data. They have not been implemented for the differential analysis of 2D gel data. In the literature, the mixture and null densities of the test statistics are estimated separately. The estimation of the mixture density does not take into account assumptions about the null density. Thus, there is no guarantee that the estimated null component will be no greater than the mixture density as it should be.</p> <p>Results</p> <p>We present an implementation of a two-component EB model for the analysis of 2D gel images. In contrast to the published estimation method, we propose to estimate the mixture and null densities simultaneously using a constrained estimation approach, which relies on an iteratively re-weighted least-squares algorithm. The assumption about the null density is naturally taken into account in the estimation of the mixture density. This strategy is illustrated using a set of 2D gel images from a factorial experiment. The proposed approach is validated using a set of simulated gels.</p> <p>Conclusions</p> <p>The two-component EB model is a very useful for large-scale hypothesis testing. In proteomic analysis, the theoretical null density is often not appropriate. We demonstrate how to implement a two-component EB model for analyzing a set of 2D gel images. We show that it is necessary to estimate the mixture density and empirical null component simultaneously. The proposed constrained estimation method always yields valid estimates and more stable results. The proposed estimation approach proposed can be applied to other contexts where large-scale hypothesis testing occurs.</p

    Anomalous diffusion in a symbolic model

    Full text link
    We address this work to investigate some statistical properties of symbolic sequences generated by a numerical procedure in which the symbols are repeated following a power law probability density. In this analysis, we consider that the sum of n symbols represents the position of a particle in erratic movement. This approach revealed a rich diffusive scenario characterized by non-Gaussian distributions and, depending on the power law exponent and also on the procedure used to build the walker, we may have superdiffusion, subdiffusion or usual diffusion. Additionally, we use the continuous-time random walk framework to compare with the numerical data, finding a good agreement. Because of its simplicity and flexibility, this model can be a candidate to describe real systems governed by power laws probabilities densities.Comment: Accepted for publication in Physica Script

    A Recurrent Neural Network Survival Model: Predicting Web User Return Time

    Full text link
    The size of a website's active user base directly affects its value. Thus, it is important to monitor and influence a user's likelihood to return to a site. Essential to this is predicting when a user will return. Current state of the art approaches to solve this problem come in two flavors: (1) Recurrent Neural Network (RNN) based solutions and (2) survival analysis methods. We observe that both techniques are severely limited when applied to this problem. Survival models can only incorporate aggregate representations of users instead of automatically learning a representation directly from a raw time series of user actions. RNNs can automatically learn features, but can not be directly trained with examples of non-returning users who have no target value for their return time. We develop a novel RNN survival model that removes the limitations of the state of the art methods. We demonstrate that this model can successfully be applied to return time prediction on a large e-commerce dataset with a superior ability to discriminate between returning and non-returning users than either method applied in isolation.Comment: Accepted into ECML PKDD 2018; 8 figures and 1 tabl

    A constrained polynomial regression procedure for estimating the local False Discovery Rate

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In the context of genomic association studies, for which a large number of statistical tests are performed simultaneously, the local False Discovery Rate (<it>lFDR</it>), which quantifies the evidence of a specific gene association with a clinical or biological variable of interest, is a relevant criterion for taking into account the multiple testing problem. The <it>lFDR </it>not only allows an inference to be made for each gene through its specific value, but also an estimate of Benjamini-Hochberg's False Discovery Rate (<it>FDR</it>) for subsets of genes.</p> <p>Results</p> <p>In the framework of estimating procedures without any distributional assumption under the alternative hypothesis, a new and efficient procedure for estimating the <it>lFDR </it>is described. The results of a simulation study indicated good performances for the proposed estimator in comparison to four published ones. The five different procedures were applied to real datasets.</p> <p>Conclusion</p> <p>A novel and efficient procedure for estimating <it>lFDR </it>was developed and evaluated.</p

    Classes of Multiple Decision Functions Strongly Controlling FWER and FDR

    Full text link
    This paper provides two general classes of multiple decision functions where each member of the first class strongly controls the family-wise error rate (FWER), while each member of the second class strongly controls the false discovery rate (FDR). These classes offer the possibility that an optimal multiple decision function with respect to a pre-specified criterion, such as the missed discovery rate (MDR), could be found within these classes. Such multiple decision functions can be utilized in multiple testing, specifically, but not limited to, the analysis of high-dimensional microarray data sets.Comment: 19 page

    A Simple Iterative Algorithm for Parsimonious Binary Kernel Fisher Discrimination

    Get PDF
    By applying recent results in optimization theory variously known as optimization transfer or majorize/minimize algorithms, an algorithm for binary, kernel, Fisher discriminant analysis is introduced that makes use of a non-smooth penalty on the coefficients to provide a parsimonious solution. The problem is converted into a smooth optimization that can be solved iteratively with no greater overhead than iteratively re-weighted least-squares. The result is simple, easily programmed and is shown to perform, in terms of both accuracy and parsimony, as well as or better than a number of leading machine learning algorithms on two well-studied and substantial benchmarks

    Re-sampling strategy to improve the estimation of number of null hypotheses in FDR control under strong correlation structures

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>When conducting multiple hypothesis tests, it is important to control the number of false positives, or the False Discovery Rate (FDR). However, there is a tradeoff between controlling FDR and maximizing power. Several methods have been proposed, such as the q-value method, to estimate the proportion of true null hypothesis among the tested hypotheses, and use this estimation in the control of FDR. These methods usually depend on the assumption that the test statistics are independent (or only weakly correlated). However, many types of data, for example microarray data, often contain large scale correlation structures. Our objective was to develop methods to control the FDR while maintaining a greater level of power in highly correlated datasets by improving the estimation of the proportion of null hypotheses.</p> <p>Results</p> <p>We showed that when strong correlation exists among the data, which is common in microarray datasets, the estimation of the proportion of null hypotheses could be highly variable resulting in a high level of variation in the FDR. Therefore, we developed a re-sampling strategy to reduce the variation by breaking the correlations between gene expression values, then using a conservative strategy of selecting the upper quartile of the re-sampling estimations to obtain a strong control of FDR.</p> <p>Conclusion</p> <p>With simulation studies and perturbations on actual microarray datasets, our method, compared to competing methods such as q-value, generated slightly biased estimates on the proportion of null hypotheses but with lower mean square errors. When selecting genes with controlling the same FDR level, our methods have on average a significantly lower false discovery rate in exchange for a minor reduction in the power.</p
    corecore