2,295 research outputs found

    Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos

    Full text link
    Every moment counts in action recognition. A comprehensive understanding of human activity in video requires labeling every frame according to the actions occurring, placing multiple labels densely over a video sequence. To study this problem we extend the existing THUMOS dataset and introduce MultiTHUMOS, a new dataset of dense labels over unconstrained internet videos. Modeling multiple, dense labels benefits from temporal relations within and across classes. We define a novel variant of long short-term memory (LSTM) deep networks for modeling these temporal relations via multiple input and output connections. We show that this model improves action labeling accuracy and further enables deeper understanding tasks ranging from structured retrieval to action prediction.Comment: To appear in IJC

    Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning

    Get PDF
    This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See: http://creativecommons.org/ licenses/by/4.0

    Semantic Information G Theory and Logical Bayesian Inference for Machine Learning

    Get PDF
    An important problem with machine learning is that when label number n\u3e2, it is very difficult to construct and optimize a group of learning functions, and we wish that optimized learning functions are still useful when prior distribution P(x) (where x is an instance) is changed. To resolve this problem, the semantic information G theory, Logical Bayesian Inference (LBI), and a group of Channel Matching (CM) algorithms together form a systematic solution. MultilabelMultilabel A semantic channel in the G theory consists of a group of truth functions or membership functions. In comparison with likelihood functions, Bayesian posteriors, and Logistic functions used by popular methods, membership functions can be more conveniently used as learning functions without the above problem. In Logical Bayesian Inference (LBI), every label’s learning is independent. For Multilabel learning, we can directly obtain a group of optimized membership functions from a big enough sample with labels, without preparing different samples for different labels. A group of Channel Matching (CM) algorithms are developed for machine learning. For the Maximum Mutual Information (MMI) classification of three classes with Gaussian distributions on a two-dimensional feature space, 2-3 iterations can make mutual information between three classes and three labels surpass 99% of the MMI for most initial partitions. For mixture models, the Expectation-Maxmization (EM) algorithm is improved and becomes the CM-EM algorithm, which can outperform the EM algorithm when mixture ratios are imbalanced, or local convergence exists. The CM iteration algorithm needs to combine neural networks for MMI classifications on high-dimensional feature spaces. LBI needs further studies for the unification of statistics and logic
    • …
    corecore