6,704 research outputs found
Segmentation of Continuous Document Flow by a modified Backward- Forward algorithm
International audienceThis paper describes a segmentation method of continuous document flow. A document flow is a list of successive scanned pages, put in a production chain, representing several documents without explicit separation mark between them. To separate the documents for their recognition, it is needed to analyze the content of the successive pages and to point out the limit pages of each document. The method proposed here is similar to the variable horizon models (VHM) or multi-grams used in speech recognition. It consists in maximizing the flow likelihood knowing all the Markov Models of the constituent elements. As the calculation of this likelihood on all the flow is NP-complete, the solution consists in studying them in windows of reduced observations. The first results obtained on homogeneous flows of invoices reaches more than 75% of precision and 90% of recall
SAIVT-QUT@TRECVid 2012: Interactive surveillance event detection
In this paper, we propose an approach which attempts to solve the problem of surveillance event detection, assuming that we know the definition of the events. To facilitate the discussion, we first define two concepts. The event of interest refers to the event that the user requests the system to detect; and the background activities are any other events in the video corpus. This is an unsolved problem due to many factors as listed below: 1) Occlusions and clustering: The surveillance scenes which are of significant interest at locations such as airports, railway stations, shopping centers are often crowded, where occlusions and clustering of people are frequently encountered. This significantly affects the feature extraction step, and for instance, trajectories generated by object tracking algorithms are usually not robust under such a situation. 2) The requirement for real time detection: The system should process the video fast enough in both of the feature extraction and the detection step to facilitate real time operation. 3) Massive size of the training data set: Suppose there is an event that lasts for 1 minute in a video with a frame rate of 25fps, the number of frames for this events is 60X25 = 1500. If we want to have a training data set with many positive instances of the event, the video is likely to be very large in size (i.e. hundreds of thousands of frames or more). How to handle such a large data set is a problem frequently encountered in this application. 4) Difficulty in separating the event of interest from background activities: The events of interest often co-exist with a set of background activities. Temporal groundtruth typically very ambiguous, as it does not distinguish the event of interest from a wide range of co-existing background activities. However, it is not practical to annotate the locations of the events in large amounts of video data. This problem becomes more serious in the detection of multi-agent interactions, since the location of these events can often not be constrained to within a bounding box. 5) Challenges in determining the temporal boundaries of the events: An event can occur at any arbitrary time with an arbitrary duration. The temporal segmentation of events is difficult and ambiguous, and also affected by other factors such as occlusions
SplineCNN: Fast Geometric Deep Learning with Continuous B-Spline Kernels
We present Spline-based Convolutional Neural Networks (SplineCNNs), a variant
of deep neural networks for irregular structured and geometric input, e.g.,
graphs or meshes. Our main contribution is a novel convolution operator based
on B-splines, that makes the computation time independent from the kernel size
due to the local support property of the B-spline basis functions. As a result,
we obtain a generalization of the traditional CNN convolution operator by using
continuous kernel functions parametrized by a fixed number of trainable
weights. In contrast to related approaches that filter in the spectral domain,
the proposed method aggregates features purely in the spatial domain. In
addition, SplineCNN allows entire end-to-end training of deep architectures,
using only the geometric structure as input, instead of handcrafted feature
descriptors. For validation, we apply our method on tasks from the fields of
image graph classification, shape correspondence and graph node classification,
and show that it outperforms or pars state-of-the-art approaches while being
significantly faster and having favorable properties like domain-independence.Comment: Presented at CVPR 201
Symbol Emergence in Robotics: A Survey
Humans can learn the use of language through physical interaction with their
environment and semiotic communication with other people. It is very important
to obtain a computational understanding of how humans can form a symbol system
and obtain semiotic skills through their autonomous mental development.
Recently, many studies have been conducted on the construction of robotic
systems and machine-learning methods that can learn the use of language through
embodied multimodal interaction with their environment and other systems.
Understanding human social interactions and developing a robot that can
smoothly communicate with human users in the long term, requires an
understanding of the dynamics of symbol systems and is crucially important. The
embodied cognition and social interaction of participants gradually change a
symbol system in a constructive manner. In this paper, we introduce a field of
research called symbol emergence in robotics (SER). SER is a constructive
approach towards an emergent symbol system. The emergent symbol system is
socially self-organized through both semiotic communications and physical
interactions with autonomous cognitive developmental agents, i.e., humans and
developmental robots. Specifically, we describe some state-of-art research
topics concerning SER, e.g., multimodal categorization, word discovery, and a
double articulation analysis, that enable a robot to obtain words and their
embodied meanings from raw sensory--motor information, including visual
information, haptic information, auditory information, and acoustic speech
signals, in a totally unsupervised manner. Finally, we suggest future
directions of research in SER.Comment: submitted to Advanced Robotic
Top-Down Processing: Top-Down Network Combines Back-Propagation with Attention
Early neural network models relied exclusively on bottom-up processing going
from the input signals to higher-level representations. Many recent models also
incorporate top-down networks going in the opposite direction. Top-down
processing in deep learning models plays two primary roles: learning and
directing attention. These two roles are accomplished in current models through
distinct mechanisms. While top-down attention is often implemented by extending
the model's architecture with additional units that propagate information from
high to low levels of the network, learning is typically accomplished by an
external learning algorithm such as back-propagation. In the current work, we
present an integration of the two functions above, which appear unrelated,
using a single unified mechanism. We propose a novel symmetric bottom-up
top-down network structure that can integrate standard bottom-up networks with
a symmetric top-down counterpart, allowing each network to guide and influence
the other. The same top-down network is being used for both learning, via
back-propagating feedback signals, and at the same time also for top-down
attention, by guiding the bottom-up network to perform a selected task. We show
that our method achieves competitive performance on a standard multi-task
learning benchmark. Yet, we rely on standard single-task architectures and
optimizers, without any task-specific parameters. Additionally, our learning
algorithm addresses in a new way some neuroscience issues that arise in
biological modeling of learning in the brain
- …