1,304,696 research outputs found

    Neural Language Modeling with Visual Features

    Full text link
    Multimodal language models attempt to incorporate non-linguistic features for the language modeling task. In this work, we extend a standard recurrent neural network (RNN) language model with features derived from videos. We train our models on data that is two orders-of-magnitude bigger than datasets used in prior work. We perform a thorough exploration of model architectures for combining visual and text features. Our experiments on two corpora (YouCookII and 20bn-something-something-v2) show that the best performing architecture consists of middle fusion of visual and text features, yielding over 25% relative improvement in perplexity. We report analysis that provides insights into why our multimodal language model improves upon a standard RNN language model

    Fast Modeling Methods for Complex System with Separable Features

    Full text link
    Data-driven modeling plays an increasingly important role in different areas of engineering. For most of existing methods, such as genetic programming (GP), the convergence speed might be too slow for large scale problems with a large number of variables. Fortunately, in many applications, the target models are separable in some sense. In this paper, we analyze different types of separability of some real-world engineering equations and establish a mathematical model of generalized separable system (GS system). In order to get the structure of the GS system, two concepts, namely block and factor are introduced, and a special method, block and factor detection is also proposed, in which the target model is decomposed into a number of blocks, further into minimal blocks and factors. Compare to the conventional GP, the new method can make large reductions to the search space. The minimal blocks and factors are optimized and assembled with a global optimization search engine, low dimensional simplex evolution (LDSE). An extensive study between the proposed method and a state-of-the-art data-driven fitting tool, Eureqa, has been presented with several man-made problems. Test results indicate that the proposed method is more effective and efficient under all the investigated cases.Comment: arXiv admin note: substantial text overlap with arXiv:1706.0228

    Learning Motion in Feature Space: Locally-Consistent Deformable Convolution Networks for Fine-Grained Action Detection

    Full text link
    Fine-grained action detection is an important task with numerous applications in robotics and human-computer interaction. Existing methods typically utilize a two-stage approach including extraction of local spatio-temporal features followed by temporal modeling to capture long-term dependencies. While most recent papers have focused on the latter (long-temporal modeling), here, we focus on producing features capable of modeling fine-grained motion more efficiently. We propose a novel locally-consistent deformable convolution, which utilizes the change in receptive fields and enforces a local coherency constraint to capture motion information effectively. Our model jointly learns spatio-temporal features (instead of using independent spatial and temporal streams). The temporal component is learned from the feature space instead of pixel space, e.g. optical flow. The produced features can be flexibly used in conjunction with other long-temporal modeling networks, e.g. ST-CNN, DilatedTCN, and ED-TCN. Overall, our proposed approach robustly outperforms the original long-temporal models on two fine-grained action datasets: 50 Salads and GTEA, achieving F1 scores of 80.22% and 75.39% respectively.Comment: Accepted at ICCV 2019 as ora

    Mixed-Effect Modeling for Longitudinal Prediction of Cancer Tumor

    Full text link
    In this paper, a mixed-effect modeling scheme is proposed to construct a predictor for different features of cancer tumor. For this purpose, a set of features is extracted from two groups of patients with the same type of cancer but with two medical outcome: 1) survived and 2) passed away. The goal is to build different models for the two groups, where in each group, patient-specified behavior of individuals can be characterized. These models are then used as predictors to forecast future state of patients with a given history or initial state. To this end, a leave-on-out cross validation method is used to measure the prediction accuracy of each patient-specified model. Experiments show that compared to fixed-effect modeling (regression), mixed-effect modeling has a superior performance on some of the extracted features and similar or worse performance on the others.Comment: arXiv admin note: substantial text overlap with arXiv:1803.0424

    Restricted Indian Buffet Processes

    Full text link
    Latent feature models are a powerful tool for modeling data with globally-shared features. Nonparametric exchangeable models such as the Indian Buffet Process offer modeling flexibility by letting the number of latent features be unbounded. However, current models impose implicit distributions over the number of latent features per data point, and these implicit distributions may not match our knowledge about the data. In this paper, we demonstrate how the Restricted Indian Buffet Process circumvents this restriction, allowing arbitrary distributions over the number of features in an observation. We discuss several alternative constructions of the model and use the insights gained to develop Markov Chain Monte Carlo and variational methods for simulation and posterior inference

    Theoretical Modeling Expressions for Networked Enzymatic Signal Processing Steps as Logic Gates Optimized by Filtering

    Full text link
    We describe modeling approaches to a "network" of connected enzyme-catalyzed reactions, with added (bio)chemical processes that introduce biochemical filtering steps into the functioning of such a biocatalytic cascade. Theoretical expressions are derived that allow simple, few-parameter modeling of processes concatenated in such cascades, both with and without filtering. The modeling approach captures and explains features identified in earlier studies of enzymatic processes considered as potential "network components" for multi-step information/signal processing systems.Comment: arXiv admin note: substantial text overlap with arXiv:1312.423

    Predictive Modeling with Delayed Information: a Case Study in E-commerce Transaction Fraud Control

    Full text link
    In Business Intelligence, accurate predictive modeling is the key for providing adaptive decisions. We studied predictive modeling problems in this research which was motivated by real-world cases that Microsoft data scientists encountered while dealing with e-commerce transaction fraud control decisions using transaction streaming data in an uncertain probabilistic decision environment. The values of most online transactions related features can return instantly, while the true fraud labels only return after a stochastic delay. Using partially mature data directly for predictive modeling in an uncertain probabilistic decision environment would lead to significant inaccuracy on risk decision-making. To improve accurate estimation of the probabilistic prediction environment, which leads to more accurate predictive modeling, two frameworks, Current Environment Inference (CEI) and Future Environment Inference (FEI), are proposed. These frameworks generated decision environment related features using long-term fully mature and short-term partially mature data, and the values of those features were estimated using varies of learning methods, including linear regression, random forest, gradient boosted tree, artificial neural network, and recurrent neural network. Performance tests were conducted using some e-commerce transaction data from Microsoft. Testing results suggested that proposed frameworks significantly improved the accuracy of decision environment estimation

    Modeling Images using Transformed Indian Buffet Processes

    Full text link
    Latent feature models are attractive for image modeling, since images generally contain multiple objects. However, many latent feature models ignore that objects can appear at different locations or require pre-segmentation of images. While the transformed Indian buffet process (tIBP) provides a method for modeling transformation-invariant features in unsegmented binary images, its current form is inappropriate for real images because of its computational cost and modeling assumptions. We combine the tIBP with likelihoods appropriate for real images and develop an efficient inference, using the cross-correlation between images and features, that is theoretically and empirically faster than existing inference techniques. Our method discovers reasonable components and achieve effective image reconstruction in natural images.Comment: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012

    Anchored Bayesian Gaussian Mixture Models

    Full text link
    Finite mixtures are a flexible modeling tool for irregularly shaped densities and samples from heterogeneous populations. When modeling with mixtures using an exchangeable prior on the component features, the component labels are arbitrary and are indistinguishable in posterior analysis. This makes it impossible to attribute any meaningful interpretation to the marginal posterior distributions of the component features. We propose a model in which a small number of observations are assumed to arise from some of the labeled component densities. The resulting model is not exchangeable, allowing inference on the component features without post-processing. Our method assigns meaning to the component labels at the modeling stage and can be justified as a data-dependent informative prior on the labelings. We show that our method produces interpretable results, often (but not always) similar to those resulting from relabeling algorithms, with the added benefit that the marginal inferences originate directly from a well specified probability model rather than a post hoc manipulation. We provide asymptotic results leading to practical guidelines for model selection that are motivated by maximizing prior information about the class labels and demonstrate our method on real and simulated data.Comment: 65 pages, 11 figures, 11 table

    Kronecker PCA Based Spatio-Temporal Modeling of Video for Dismount Classification

    Full text link
    We consider the application of KronPCA spatio-temporal modeling techniques [Greenewald et al 2013, Tsiligkaridis et al 2013] to the extraction of spatiotemporal features for video dismount classification. KronPCA performs a low-rank type of dimensionality reduction that is adapted to spatio-temporal data and is characterized by the T frame multiframe mean and covariance of p spatial features. For further regularization and improved inverse estimation, we also use the diagonally corrected KronPCA shrinkage methods we presented in [Greenewald et al 2013]. We apply this very general method to the modeling of the multivariate temporal behavior of HOG features extracted from pedestrian bounding boxes in video, with gender classification in a challenging dataset chosen as a specific application. The learned covariances for each class are used to extract spatiotemporal features which are then classified, achieving competitive classification performance.Comment: 8 pages. To appear in Proceeding of SPIE DSS. arXiv admin note: text overlap with arXiv:1402.556
    • …
    corecore