4,053 research outputs found
What's Cookin'? Interpreting Cooking Videos using Text, Speech and Vision
We present a novel method for aligning a sequence of instructions to a video
of someone carrying out a task. In particular, we focus on the cooking domain,
where the instructions correspond to the recipe. Our technique relies on an HMM
to align the recipe steps to the (automatically generated) speech transcript.
We then refine this alignment using a state-of-the-art visual food detector,
based on a deep convolutional neural network. We show that our technique
outperforms simpler techniques based on keyword spotting. It also enables
interesting applications, such as automatically illustrating recipes with
keyframes, and searching within a video for events of interest.Comment: To appear in NAACL 201
Pyramidal Stochastic Graphlet Embedding for Document Pattern Classification
This is the author accepted manuscript. The final version is available from IEEE via the DOI in this recordDocument pattern classification methods using graphs have received a lot of attention because of its robust representation paradigm and rich theoretical background. However, the way of preserving and the process for delineating documents with graphs introduce noise in the rendition of underlying data, which creates instability in the graph representation. To deal with such unreliability in representation, in this paper, we propose Pyramidal Stochastic Graphlet Embedding (PSGE). Given a graph representing a document pattern, our method first computes a graph pyramid by successively reducing the base graph. Once the graph pyramid is computed, we apply Stochastic Graphlet Embedding (SGE) for each level of the pyramid and combine their embedded representation to obtain a global delineation of the original graph. The consideration of pyramid of graphs rather than just a base graph extends the representational power of the graph embedding, which reduces the instability caused due to noise and distortion. When plugged with support vector machine, our proposed PSGE has outperformed the state-of-The-art results in recognition of handwritten words as well as graphical symbols.European Union Horizon 2020Ministerio de Educación, Cultura y Deporte, SpainRamon y Cajal FellowshipCERCA Program/Generalitat de Cataluny
- …