36,580 research outputs found

    Beyond Pixels: A Comprehensive Survey from Bottom-up to Semantic Image Segmentation and Cosegmentation

    Full text link
    Image segmentation refers to the process to divide an image into nonoverlapping meaningful regions according to human perception, which has become a classic topic since the early ages of computer vision. A lot of research has been conducted and has resulted in many applications. However, while many segmentation algorithms exist, yet there are only a few sparse and outdated summarizations available, an overview of the recent achievements and issues is lacking. We aim to provide a comprehensive review of the recent progress in this field. Covering 180 publications, we give an overview of broad areas of segmentation topics including not only the classic bottom-up approaches, but also the recent development in superpixel, interactive methods, object proposals, semantic image parsing and image cosegmentation. In addition, we also review the existing influential datasets and evaluation metrics. Finally, we suggest some design flavors and research directions for future research in image segmentation.Comment: submitted to Elsevier Journal of Visual Communications and Image Representatio

    On Modular Training of Neural Acoustics-to-Word Model for LVCSR

    Full text link
    End-to-end (E2E) automatic speech recognition (ASR) systems directly map acoustics to words using a unified model. Previous works mostly focus on E2E training a single model which integrates acoustic and language model into a whole. Although E2E training benefits from sequence modeling and simplified decoding pipelines, large amount of transcribed acoustic data is usually required, and traditional acoustic and language modelling techniques cannot be utilized. In this paper, a novel modular training framework of E2E ASR is proposed to separately train neural acoustic and language models during training stage, while still performing end-to-end inference in decoding stage. Here, an acoustics-to-phoneme model (A2P) and a phoneme-to-word model (P2W) are trained using acoustic data and text data respectively. A phone synchronous decoding (PSD) module is inserted between A2P and P2W to reduce sequence lengths without precision loss. Finally, modules are integrated into an acousticsto-word model (A2W) and jointly optimized using acoustic data to retain the advantage of sequence modeling. Experiments on a 300- hour Switchboard task show significant improvement over the direct A2W model. The efficiency in both training and decoding also benefits from the proposed method.Comment: accepted by ICASSP201

    Co-Sparse Textural Similarity for Image Segmentation

    Full text link
    We propose an algorithm for segmenting natural images based on texture and color information, which leverages the co-sparse analysis model for image segmentation within a convex multilabel optimization framework. As a key ingredient of this method, we introduce a novel textural similarity measure, which builds upon the co-sparse representation of image patches. We propose a Bayesian approach to merge textural similarity with information about color and location. Combined with recently developed convex multilabel optimization methods this leads to an efficient algorithm for both supervised and unsupervised segmentation, which is easily parallelized on graphics hardware. The approach provides competitive results in unsupervised segmentation and outperforms state-of-the-art interactive segmentation methods on the Graz Benchmark

    Semi-supervised emotion lexicon expansion with label propagation and specialized word embeddings

    Full text link
    There exist two main approaches to automatically extract affective orientation: lexicon-based and corpus-based. In this work, we argue that these two methods are compatible and show that combining them can improve the accuracy of emotion classifiers. In particular, we introduce a novel variant of the Label Propagation algorithm that is tailored to distributed word representations, we apply batch gradient descent to accelerate the optimization of label propagation and to make the optimization feasible for large graphs, and we propose a reproducible method for emotion lexicon expansion. We conclude that label propagation can expand an emotion lexicon in a meaningful way and that the expanded emotion lexicon can be leveraged to improve the accuracy of an emotion classifier

    Context-Aware Query Selection for Active Learning in Event Recognition

    Full text link
    Activity recognition is a challenging problem with many practical applications. In addition to the visual features, recent approaches have benefited from the use of context, e.g., inter-relationships among the activities and objects. However, these approaches require data to be labeled, entirely available beforehand, and not designed to be updated continuously, which make them unsuitable for surveillance applications. In contrast, we propose a continuous-learning framework for context-aware activity recognition from unlabeled video, which has two distinct advantages over existing methods. First, it employs a novel active-learning technique that not only exploits the informativeness of the individual activities but also utilizes their contextual information during query selection; this leads to significant reduction in expensive manual annotation effort. Second, the learned models can be adapted online as more data is available. We formulate a conditional random field model that encodes the context and devise an information-theoretic approach that utilizes entropy and mutual information of the nodes to compute the set of most informative queries, which are labeled by a human. These labels are combined with graphical inference techniques for incremental updates. We provide a theoretical formulation of the active learning framework with an analytic solution. Experiments on six challenging datasets demonstrate that our framework achieves superior performance with significantly less manual labeling.Comment: To appear in Transactions of Pattern Pattern Analysis and Machine Intelligence (T-PAMI

    Modeling Precursors for Event Forecasting via Nested Multi-Instance Learning

    Full text link
    Forecasting events like civil unrest movements, disease outbreaks, financial market movements and government elections from open source indicators such as news feeds and social media streams is an important and challenging problem. From the perspective of human analysts and policy makers, forecasting algorithms need to provide supporting evidence and identify the causes related to the event of interest. We develop a novel multiple instance learning based approach that jointly tackles the problem of identifying evidence-based precursors and forecasts events into the future. Specifically, given a collection of streaming news articles from multiple sources we develop a nested multiple instance learning approach to forecast significant societal events across three countries in Latin America. Our algorithm is able to identify news articles considered as precursors for a protest. Our empirical evaluation shows the strengths of our proposed approaches in filtering candidate precursors, forecasting the occurrence of events with a lead time and predicting the characteristics of different events in comparison to several other formulations. We demonstrate through case studies the effectiveness of our proposed model in filtering the candidate precursors for inspection by a human analyst.Comment: The conference version of the paper is submitted for publicatio

    Spatially Constrained Location Prior for Scene Parsing

    Full text link
    Semantic context is an important and useful cue for scene parsing in complicated natural images with a substantial amount of variations in objects and the environment. This paper proposes Spatially Constrained Location Prior (SCLP) for effective modelling of global and local semantic context in the scene in terms of inter-class spatial relationships. Unlike existing studies focusing on either relative or absolute location prior of objects, the SCLP effectively incorporates both relative and absolute location priors by calculating object co-occurrence frequencies in spatially constrained image blocks. The SCLP is general and can be used in conjunction with various visual feature-based prediction models, such as Artificial Neural Networks and Support Vector Machine (SVM), to enforce spatial contextual constraints on class labels. Using SVM classifiers and a linear regression model, we demonstrate that the incorporation of SCLP achieves superior performance compared to the state-of-the-art methods on the Stanford background and SIFT Flow datasets.Comment: authors' pre-print version of a article published in IJCNN 201

    Generating Multi-label Discrete Patient Records using Generative Adversarial Networks

    Full text link
    Access to electronic health record (EHR) data has motivated computational advances in medical research. However, various concerns, particularly over privacy, can limit access to and collaborative use of EHR data. Sharing synthetic EHR data could mitigate risk. In this paper, we propose a new approach, medical Generative Adversarial Network (medGAN), to generate realistic synthetic patient records. Based on input real patient records, medGAN can generate high-dimensional discrete variables (e.g., binary and count features) via a combination of an autoencoder and generative adversarial networks. We also propose minibatch averaging to efficiently avoid mode collapse, and increase the learning efficiency with batch normalization and shortcut connections. To demonstrate feasibility, we showed that medGAN generates synthetic patient records that achieve comparable performance to real data on many experiments including distribution statistics, predictive modeling tasks and a medical expert review. We also empirically observe a limited privacy risk in both identity and attribute disclosure using medGAN.Comment: Accepted at Machine Learning in Health Care (MLHC) 201

    Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs

    Full text link
    We approach structured output prediction by optimizing a deep value network (DVN) to precisely estimate the task loss on different output configurations for a given input. Once the model is trained, we perform inference by gradient descent on the continuous relaxations of the output variables to find outputs with promising scores from the value network. When applied to image segmentation, the value network takes an image and a segmentation mask as inputs and predicts a scalar estimating the intersection over union between the input and ground truth masks. For multi-label classification, the DVN's objective is to correctly predict the F1 score for any potential label configuration. The DVN framework achieves the state-of-the-art results on multi-label prediction and image segmentation benchmarks.Comment: Published at ICML 201

    Prediction of Solar Flare Size and Time-to-Flare Using Support Vector Machine Regression

    Full text link
    We study the prediction of solar flare size and time-to-flare using 38 features describing magnetic complexity of the photospheric magnetic field. This work uses support vector regression to formulate a mapping from the 38-dimensional feature space to a continuous-valued label vector representing flare size or time-to-flare. When we consider flaring regions only, we find an average error in estimating flare size of approximately half a \emph{geostationary operational environmental satellite} (\emph{GOES}) class. When we additionally consider non-flaring regions, we find an increased average error of approximately 3/4 a \emph{GOES} class. We also consider thresholding the regressed flare size for the experiment containing both flaring and non-flaring regions and find a true positive rate of 0.69 and a true negative rate of 0.86 for flare prediction. The results for both of these size regression experiments are consistent across a wide range of predictive time windows, indicating that the magnetic complexity features may be persistent in appearance long before flare activity. This is supported by our larger error rates of some 40 hr in the time-to-flare regression problem. The 38 magnetic complexity features considered here appear to have discriminative potential for flare size, but their persistence in time makes them less discriminative for the time-to-flare problem.Comment: http://iopscience.iop.org/article/10.1088/0004-637X/812/1/51/met
    • …
    corecore