1,304,696 research outputs found
Neural Language Modeling with Visual Features
Multimodal language models attempt to incorporate non-linguistic features for
the language modeling task. In this work, we extend a standard recurrent neural
network (RNN) language model with features derived from videos. We train our
models on data that is two orders-of-magnitude bigger than datasets used in
prior work. We perform a thorough exploration of model architectures for
combining visual and text features. Our experiments on two corpora (YouCookII
and 20bn-something-something-v2) show that the best performing architecture
consists of middle fusion of visual and text features, yielding over 25%
relative improvement in perplexity. We report analysis that provides insights
into why our multimodal language model improves upon a standard RNN language
model
Fast Modeling Methods for Complex System with Separable Features
Data-driven modeling plays an increasingly important role in different areas
of engineering. For most of existing methods, such as genetic programming (GP),
the convergence speed might be too slow for large scale problems with a large
number of variables. Fortunately, in many applications, the target models are
separable in some sense. In this paper, we analyze different types of
separability of some real-world engineering equations and establish a
mathematical model of generalized separable system (GS system). In order to get
the structure of the GS system, two concepts, namely block and factor are
introduced, and a special method, block and factor detection is also proposed,
in which the target model is decomposed into a number of blocks, further into
minimal blocks and factors. Compare to the conventional GP, the new method can
make large reductions to the search space. The minimal blocks and factors are
optimized and assembled with a global optimization search engine, low
dimensional simplex evolution (LDSE). An extensive study between the proposed
method and a state-of-the-art data-driven fitting tool, Eureqa, has been
presented with several man-made problems. Test results indicate that the
proposed method is more effective and efficient under all the investigated
cases.Comment: arXiv admin note: substantial text overlap with arXiv:1706.0228
Learning Motion in Feature Space: Locally-Consistent Deformable Convolution Networks for Fine-Grained Action Detection
Fine-grained action detection is an important task with numerous applications
in robotics and human-computer interaction. Existing methods typically utilize
a two-stage approach including extraction of local spatio-temporal features
followed by temporal modeling to capture long-term dependencies. While most
recent papers have focused on the latter (long-temporal modeling), here, we
focus on producing features capable of modeling fine-grained motion more
efficiently. We propose a novel locally-consistent deformable convolution,
which utilizes the change in receptive fields and enforces a local coherency
constraint to capture motion information effectively. Our model jointly learns
spatio-temporal features (instead of using independent spatial and temporal
streams). The temporal component is learned from the feature space instead of
pixel space, e.g. optical flow. The produced features can be flexibly used in
conjunction with other long-temporal modeling networks, e.g. ST-CNN,
DilatedTCN, and ED-TCN. Overall, our proposed approach robustly outperforms the
original long-temporal models on two fine-grained action datasets: 50 Salads
and GTEA, achieving F1 scores of 80.22% and 75.39% respectively.Comment: Accepted at ICCV 2019 as ora
Mixed-Effect Modeling for Longitudinal Prediction of Cancer Tumor
In this paper, a mixed-effect modeling scheme is proposed to construct a
predictor for different features of cancer tumor. For this purpose, a set of
features is extracted from two groups of patients with the same type of cancer
but with two medical outcome: 1) survived and 2) passed away. The goal is to
build different models for the two groups, where in each group,
patient-specified behavior of individuals can be characterized. These models
are then used as predictors to forecast future state of patients with a given
history or initial state. To this end, a leave-on-out cross validation method
is used to measure the prediction accuracy of each patient-specified model.
Experiments show that compared to fixed-effect modeling (regression),
mixed-effect modeling has a superior performance on some of the extracted
features and similar or worse performance on the others.Comment: arXiv admin note: substantial text overlap with arXiv:1803.0424
Restricted Indian Buffet Processes
Latent feature models are a powerful tool for modeling data with
globally-shared features. Nonparametric exchangeable models such as the Indian
Buffet Process offer modeling flexibility by letting the number of latent
features be unbounded. However, current models impose implicit distributions
over the number of latent features per data point, and these implicit
distributions may not match our knowledge about the data. In this paper, we
demonstrate how the Restricted Indian Buffet Process circumvents this
restriction, allowing arbitrary distributions over the number of features in an
observation. We discuss several alternative constructions of the model and use
the insights gained to develop Markov Chain Monte Carlo and variational methods
for simulation and posterior inference
Theoretical Modeling Expressions for Networked Enzymatic Signal Processing Steps as Logic Gates Optimized by Filtering
We describe modeling approaches to a "network" of connected enzyme-catalyzed
reactions, with added (bio)chemical processes that introduce biochemical
filtering steps into the functioning of such a biocatalytic cascade.
Theoretical expressions are derived that allow simple, few-parameter modeling
of processes concatenated in such cascades, both with and without filtering.
The modeling approach captures and explains features identified in earlier
studies of enzymatic processes considered as potential "network components" for
multi-step information/signal processing systems.Comment: arXiv admin note: substantial text overlap with arXiv:1312.423
Predictive Modeling with Delayed Information: a Case Study in E-commerce Transaction Fraud Control
In Business Intelligence, accurate predictive modeling is the key for
providing adaptive decisions. We studied predictive modeling problems in this
research which was motivated by real-world cases that Microsoft data scientists
encountered while dealing with e-commerce transaction fraud control decisions
using transaction streaming data in an uncertain probabilistic decision
environment. The values of most online transactions related features can return
instantly, while the true fraud labels only return after a stochastic delay.
Using partially mature data directly for predictive modeling in an uncertain
probabilistic decision environment would lead to significant inaccuracy on risk
decision-making. To improve accurate estimation of the probabilistic prediction
environment, which leads to more accurate predictive modeling, two frameworks,
Current Environment Inference (CEI) and Future Environment Inference (FEI), are
proposed. These frameworks generated decision environment related features
using long-term fully mature and short-term partially mature data, and the
values of those features were estimated using varies of learning methods,
including linear regression, random forest, gradient boosted tree, artificial
neural network, and recurrent neural network. Performance tests were conducted
using some e-commerce transaction data from Microsoft. Testing results
suggested that proposed frameworks significantly improved the accuracy of
decision environment estimation
Modeling Images using Transformed Indian Buffet Processes
Latent feature models are attractive for image modeling, since images
generally contain multiple objects. However, many latent feature models ignore
that objects can appear at different locations or require pre-segmentation of
images. While the transformed Indian buffet process (tIBP) provides a method
for modeling transformation-invariant features in unsegmented binary images,
its current form is inappropriate for real images because of its computational
cost and modeling assumptions. We combine the tIBP with likelihoods appropriate
for real images and develop an efficient inference, using the cross-correlation
between images and features, that is theoretically and empirically faster than
existing inference techniques. Our method discovers reasonable components and
achieve effective image reconstruction in natural images.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
Anchored Bayesian Gaussian Mixture Models
Finite mixtures are a flexible modeling tool for irregularly shaped densities
and samples from heterogeneous populations. When modeling with mixtures using
an exchangeable prior on the component features, the component labels are
arbitrary and are indistinguishable in posterior analysis. This makes it
impossible to attribute any meaningful interpretation to the marginal posterior
distributions of the component features. We propose a model in which a small
number of observations are assumed to arise from some of the labeled component
densities. The resulting model is not exchangeable, allowing inference on the
component features without post-processing. Our method assigns meaning to the
component labels at the modeling stage and can be justified as a data-dependent
informative prior on the labelings. We show that our method produces
interpretable results, often (but not always) similar to those resulting from
relabeling algorithms, with the added benefit that the marginal inferences
originate directly from a well specified probability model rather than a post
hoc manipulation. We provide asymptotic results leading to practical guidelines
for model selection that are motivated by maximizing prior information about
the class labels and demonstrate our method on real and simulated data.Comment: 65 pages, 11 figures, 11 table
Kronecker PCA Based Spatio-Temporal Modeling of Video for Dismount Classification
We consider the application of KronPCA spatio-temporal modeling techniques
[Greenewald et al 2013, Tsiligkaridis et al 2013] to the extraction of
spatiotemporal features for video dismount classification. KronPCA performs a
low-rank type of dimensionality reduction that is adapted to spatio-temporal
data and is characterized by the T frame multiframe mean and covariance of p
spatial features. For further regularization and improved inverse estimation,
we also use the diagonally corrected KronPCA shrinkage methods we presented in
[Greenewald et al 2013]. We apply this very general method to the modeling of
the multivariate temporal behavior of HOG features extracted from pedestrian
bounding boxes in video, with gender classification in a challenging dataset
chosen as a specific application. The learned covariances for each class are
used to extract spatiotemporal features which are then classified, achieving
competitive classification performance.Comment: 8 pages. To appear in Proceeding of SPIE DSS. arXiv admin note: text
overlap with arXiv:1402.556
- …