985 research outputs found
The Long-Short Story of Movie Description
Generating descriptions for videos has many applications including assisting
blind people and human-robot interaction. The recent advances in image
captioning as well as the release of large-scale movie description datasets
such as MPII Movie Description allow to study this task in more depth. Many of
the proposed methods for image captioning rely on pre-trained object classifier
CNNs and Long-Short Term Memory recurrent networks (LSTMs) for generating
descriptions. While image description focuses on objects, we argue that it is
important to distinguish verbs, objects, and places in the challenging setting
of movie description. In this work we show how to learn robust visual
classifiers from the weak annotations of the sentence descriptions. Based on
these visual classifiers we learn how to generate a description using an LSTM.
We explore different design choices to build and train the LSTM and achieve the
best performance to date on the challenging MPII-MD dataset. We compare and
analyze our approach and prior work along various dimensions to better
understand the key challenges of the movie description task
Much Ado About Time: Exhaustive Annotation of Temporal Data
Large-scale annotated datasets allow AI systems to learn from and build upon
the knowledge of the crowd. Many crowdsourcing techniques have been developed
for collecting image annotations. These techniques often implicitly rely on the
fact that a new input image takes a negligible amount of time to perceive. In
contrast, we investigate and determine the most cost-effective way of obtaining
high-quality multi-label annotations for temporal data such as videos. Watching
even a short 30-second video clip requires a significant time investment from a
crowd worker; thus, requesting multiple annotations following a single viewing
is an important cost-saving strategy. But how many questions should we ask per
video? We conclude that the optimal strategy is to ask as many questions as
possible in a HIT (up to 52 binary questions after watching a 30-second video
clip in our experiments). We demonstrate that while workers may not correctly
answer all questions, the cost-benefit analysis nevertheless favors consensus
from multiple such cheap-yet-imperfect iterations over more complex
alternatives. When compared with a one-question-per-video baseline, our method
is able to achieve a 10% improvement in recall 76.7% ours versus 66.7%
baseline) at comparable precision (83.8% ours versus 83.0% baseline) in about
half the annotation time (3.8 minutes ours compared to 7.1 minutes baseline).
We demonstrate the effectiveness of our method by collecting multi-label
annotations of 157 human activities on 1,815 videos.Comment: HCOMP 2016 Camera Read
Move Forward and Tell: A Progressive Generator of Video Descriptions
We present an efficient framework that can generate a coherent paragraph to
describe a given video. Previous works on video captioning usually focus on
video clips. They typically treat an entire video as a whole and generate the
caption conditioned on a single embedding. On the contrary, we consider videos
with rich temporal structures and aim to generate paragraph descriptions that
can preserve the story flow while being coherent and concise. Towards this
goal, we propose a new approach, which produces a descriptive paragraph by
assembling temporally localized descriptions. Given a video, it selects a
sequence of distinctive clips and generates sentences thereon in a coherent
manner. Particularly, the selection of clips and the production of sentences
are done jointly and progressively driven by a recurrent network -- what to
describe next depends on what have been said before. Here, the recurrent
network is learned via self-critical sequence training with both sentence-level
and paragraph-level rewards. On the ActivityNet Captions dataset, our method
demonstrated the capability of generating high-quality paragraph descriptions
for videos. Compared to those by other methods, the descriptions produced by
our method are often more relevant, more coherent, and more concise.Comment: Accepted by ECCV 201
Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding
Computer vision has a great potential to help our daily lives by searching
for lost keys, watering flowers or reminding us to take a pill. To succeed with
such tasks, computer vision methods need to be trained from real and diverse
examples of our daily dynamic scenes. While most of such scenes are not
particularly exciting, they typically do not appear on YouTube, in movies or TV
broadcasts. So how do we collect sufficiently many diverse but boring samples
representing our lives? We propose a novel Hollywood in Homes approach to
collect such data. Instead of shooting videos in the lab, we ensure diversity
by distributing and crowdsourcing the whole process of video creation from
script writing to video recording and annotation. Following this procedure we
collect a new dataset, Charades, with hundreds of people recording videos in
their own homes, acting out casual everyday activities. The dataset is composed
of 9,848 annotated videos with an average length of 30 seconds, showing
activities of 267 people from three continents. Each video is annotated by
multiple free-text descriptions, action labels, action intervals and classes of
interacted objects. In total, Charades provides 27,847 video descriptions,
66,500 temporally localized intervals for 157 action classes and 41,104 labels
for 46 object classes. Using this rich data, we evaluate and provide baseline
results for several tasks including action recognition and automatic
description generation. We believe that the realism, diversity, and casual
nature of this dataset will present unique challenges and new opportunities for
computer vision community
Evaluation of soil-tire interaction on a soil bin
A single wheel tester with the attention to the size of soil bin has been designed and fabricated to study soil tire interactions, in controlled soil environment. Â The main parts of a single wheel tester include chassis, reduction gear unit, three-phase AC electric motor, hydraulic cylinder, tank, pump and valve, load cell and tires. Â The experiment was designed with two levels of tire axle loads (15 and 25 kN) and two inflation pressures (70 and 150 kPa). Â The tire (18.4/15-30) was run at a constant forward speed of 0.3 m s-1, 13% slip and 12% moisture content(d.b.) on clay loam soil. Â A statistical comparison was made for the cone index values measured in the undisturbed soil, at the center of the track, and at the edge of the track. Â A significant difference in cone index was found for all treatments. Â Inflation pressure at the center and load at the edge of tire track has significant effect on cone index and dry bulk density. Â Keywords: cone index, inflation pressure, load; dry bulk density, soil bi
The effects of alfalfa particle size and acid treated protein on ruminal chemical composition, liquid, particulate, escapable and non escapable phases in Zel sheep
This study was conducted to investigate the effects of alfalfa particle size (long vs. fine) and canola meal treated with hydrochloric acid solution (untreated vs treated) on ruminal chemical composition, liquid, particulate, escapable and non escapable phases in Zel sheep. Four ruminally cannulated sheep received a mixed diet (% of dry matter) consisting of 23.73 alfalfa, 8.70 canola meal, 39.56 wheat straw, 13.45 beet pulp and 13.45 barley grain and 1 mineral-vitamin mixture. The experimental design was a 4 × 4 Latin square with 22-days periods. The diet was offered twice daily (09:00 and 21:00 h). The rumens were evacuated manually at 3, 7.5 and 12 h post-feeding and total ruminal contents were separated into mat and liquids. Dry matter weight distribution of total recovered particles was determined by a wetsieving procedure and used to partition ruminal mat and liquids among percentage of large (≥ 6.35 mm), medium (< 6.35 and ≥ 1.18 mm), and small (< 1.18 and ≥ 0.5 mm) particles. Lyophilized ruminal digesta were analyzed for chemical composition especially for CP, NDF and EE. No interactions (P > 0.05) between dietary particle size and acid level were observed for ruminal chemical composition, liquid, particulate, escapable and non escapable phase. Treatment of canola meal and increase of particle size reduced the values of CP. Generally, with increase in time after feeding, the values of each nutrient decreased. Particle size and time post-feeding had a pronounced effect on the distribution of different particle fractions, whereas acid level did not influence it. With increase in time after feeding, percentage of particles ≥ 6.35 mm decreased, whereas the percentage of particles < 6.35 mm increased, illustrating intensive particle breakdown in the reticulo-rumen. Different particle size and time post-feeding had pronounced effect on total mass of ruminal digesta, ruminal mat and liquid part, in which fine particles and 12 h post feeding caused the lowest rumen mat. Time post feeding and acid level did not influence the values of pH significantly, whereas with increase in particle size, the values of pH increased.Key words: Canola meal, particle size, rumen mat, escapable, non escapable phase
Risk-Based Capacitor Placement in Distribution Networks
In this paper, the problem of sizing and placement of constant and switching capacitors in electrical distribution systems is modelled considering the load uncertainty. This model is formu- lated as a multicriteria mathematical problem. The risk of voltage violation is calculated, and the stability index is modelled using fuzzy logic and fuzzy equations. The instability risk is introduced as the deviation of our fuzzy-based stability index with respect to the stability margin. The capacitor placement objectives in our paper include: (i) minimizing investment and installation costs as well as loss cost; (ii) reducing the risk of voltage violation; and (iii) reducing the instability risk. The proposed mathematical model is solved using a multi-objective version of a genetic algorithm. The model is implemented on a distribution network, and the results of the experiment are discussed. The impacts of constant and switching capacitors are assessed separately and concurrently. Moreo- ver, the impact of uncertainty on the multi-objectives is determined based on a sensitivity analysis. It is demonstrated that the more the uncertainty is, the higher the system cost, the voltage risk and the instability risk are
Estimation of genetic parameters for body weight at different ages in Mehraban sheep
The objective of the present study is to estimate genetic parameters of birth weight (BW, n = 3005), weaning weight (WW, n = 2800), 6 months weight (6 MW, n = 2600), 9 months weight (9 MW, n = 1990) and yearling weight (YW, n = 1450) of Mehraban sheep, collected during 1995 - 2007 at Mehraban sheep Breeding Station in Hamedan province, Iran. (Co)variance components and genetic parameters were estimated with univariate and multivariate animal model using restricted maximum likelihood (REML) procedure. Effect of herd, lamb's sex, and year of birth were significant on all traits (P < 0.05). The estimates of direct heritability for BW, WW, 6MW, 9MW and YW were 0.30±0.05, 0.30±0.04, 0.35±0.05, 0.37±0.04 and 0.43±0.04 respectively. Maternal heritability estimates for mentioned traits were 0.17±0.03, 0.18±0.03, 0.14±0.03, 0.12±0.03 and 0.10±0.02, respectively. The estimates of the direct genetic correlation between BW-WW, BW-6MW, BW-9MW, BW-YW, WW-6MW, WW-9MW, WW-YW, 6MW-9MW, 6MW-YW and 9MW-YW were 0.287±0.09, 0.305±0.09, 0.249±0.03, 0.136±0.07, 0.825±0.34, 0.713±0.05, 0.845±0.52, 0.862±0.06, 0.596±0.09 and 0.712±0.02 respectively. The estimates of the phenotypic correlation between traits were positive and ranged from 0.152 for BW-9MW to 0.835 for 9MW-YW.Key words: Mehraban sheep, heritability, genetic correlation, body weight traits
- …