23,564 research outputs found
Exploring Deep-Sea Data
The Monterey Bay Aquarium Research Institute (MBARI) has collected and archived deep-sea video from remotely operated vehicle dives since 1988. The video archive contains footage and data on the biological, chemical, geological, and physical aspects of deep regions of the Pacific. MBARI developed a software system, the Video Annotation and Reference System, to create, store, and retrieve video annotations. The system is based on a hierarchical catalog of biological, geological, and technical terms that allows consistent and rapid classification of objects seen on video. Based on knowledge collected by the annotation process, MBARI staff developed a web-based Deep-Sea Guide to the organisms and geologic features recorded on remotely operated vehicle dives into the deep sea. The searchable guide provides information about biological taxonomy, geology, and habitats, and displays dynamic histograms and useful statistics derived from the video annotations
Highly efficient low-level feature extraction for video representation and retrieval.
PhDWitnessing the omnipresence of digital video media, the research community has
raised the question of its meaningful use and management. Stored in immense
multimedia databases, digital videos need to be retrieved and structured in an
intelligent way, relying on the content and the rich semantics involved. Current
Content Based Video Indexing and Retrieval systems face the problem of the semantic
gap between the simplicity of the available visual features and the richness of user
semantics.
This work focuses on the issues of efficiency and scalability in video indexing and
retrieval to facilitate a video representation model capable of semantic annotation. A
highly efficient algorithm for temporal analysis and key-frame extraction is developed.
It is based on the prediction information extracted directly from the compressed domain
features and the robust scalable analysis in the temporal domain. Furthermore,
a hierarchical quantisation of the colour features in the descriptor space is presented.
Derived from the extracted set of low-level features, a video representation model that
enables semantic annotation and contextual genre classification is designed.
Results demonstrate the efficiency and robustness of the temporal analysis algorithm
that runs in real time maintaining the high precision and recall of the detection task.
Adaptive key-frame extraction and summarisation achieve a good overview of the
visual content, while the colour quantisation algorithm efficiently creates hierarchical
set of descriptors. Finally, the video representation model, supported by the genre
classification algorithm, achieves excellent results in an automatic annotation system by
linking the video clips with a limited lexicon of related keywords
Context-aware person identification in personal photo collections
Identifying the people in photos is an important need for users of photo management systems. We present MediAssist, one such system which facilitates browsing, searching and semi-automatic annotation of personal photos, using analysis of both image content and the context in which the photo is captured. This semi-automatic annotation includes annotation of the identity of people in photos. In this paper, we focus on such person annotation, and propose person identification techniques based on a combination of context and content. We propose language modelling and nearest neighbor approaches to context-based person identification, in addition to novel face color and image color content-based features (used alongside face recognition and body patch features). We conduct a comprehensive empirical study of these techniques using the real private photo collections of a number of users, and show that combining context- and content-based analysis improves performance over content or context alone
A Formal Framework for Linguistic Annotation
`Linguistic annotation' covers any descriptive or analytic notations applied
to raw language data. The basic data may be in the form of time functions --
audio, video and/or physiological recordings -- or it may be textual. The added
notations may include transcriptions of all sorts (from phonetic features to
discourse structures), part-of-speech and sense tagging, syntactic analysis,
`named entity' identification, co-reference annotation, and so on. While there
are several ongoing efforts to provide formats and tools for such annotations
and to publish annotated linguistic databases, the lack of widely accepted
standards is becoming a critical problem. Proposed standards, to the extent
they exist, have focussed on file formats. This paper focuses instead on the
logical structure of linguistic annotations. We survey a wide variety of
existing annotation formats and demonstrate a common conceptual core, the
annotation graph. This provides a formal framework for constructing,
maintaining and searching linguistic annotations, while remaining consistent
with many alternative data structures and file formats.Comment: 49 page
SAINE: Scientific Annotation and Inference Engine of Scientific Research
We present SAINE, an Scientific Annotation and Inference ENgine based on a
set of standard open-source software, such as Label Studio and MLflow. We show
that our annotation engine can benefit the further development of a more
accurate classification. Based on our previous work on hierarchical discipline
classifications, we demonstrate its application using SAINE in understanding
the space for scholarly publications. The user study of our annotation results
shows that user input collected with the help of our system can help us better
understand the classification process. We believe that our work will help to
foster greater transparency and better understand scientific research. Our
annotation and inference engine can further support the downstream meta-science
projects. We welcome collaboration and feedback from the scientific community
on these projects. The demonstration video can be accessed from
https://youtu.be/yToO-G9YQK4. A live demo website is available at
https://app.heartex.com/user/signup/?token=e2435a2f97449fa1 upon free
registration.Comment: Under review in IJCNLP-AACL Demo 202
Recommended from our members
Automatic parsing of sports videos with grammars
Motivated by the analogies between languages and sports videos, we introduce a novel
approach for video parsing with grammars. It utilizes compiler techniques for integrating both semantic
annotation and syntactic analysis to generate a semantic index of events and a table of content for a given
sports video. The video sequence is first segmented and annotated by event detection with domain
knowledge. A grammar-based parser is then used to identify the structure of the video content.
Meanwhile, facilities for error handling are introduced which are particularly useful when the results of
automatic parsing need to be adjusted. As a case study, we have developed a system for video parsing in
the particular domain of TV diving programs. Experimental results indicate the proposed approach is
effectiv
Unsupervised decoding of long-term, naturalistic human neural recordings with automated video and audio annotations
Fully automated decoding of human activities and intentions from direct
neural recordings is a tantalizing challenge in brain-computer interfacing.
Most ongoing efforts have focused on training decoders on specific, stereotyped
tasks in laboratory settings. Implementing brain-computer interfaces (BCIs) in
natural settings requires adaptive strategies and scalable algorithms that
require minimal supervision. Here we propose an unsupervised approach to
decoding neural states from human brain recordings acquired in a naturalistic
context. We demonstrate our approach on continuous long-term
electrocorticographic (ECoG) data recorded over many days from the brain
surface of subjects in a hospital room, with simultaneous audio and video
recordings. We first discovered clusters in high-dimensional ECoG recordings
and then annotated coherent clusters using speech and movement labels extracted
automatically from audio and video recordings. To our knowledge, this
represents the first time techniques from computer vision and speech processing
have been used for natural ECoG decoding. Our results show that our
unsupervised approach can discover distinct behaviors from ECoG data, including
moving, speaking and resting. We verify the accuracy of our approach by
comparing to manual annotations. Projecting the discovered cluster centers back
onto the brain, this technique opens the door to automated functional brain
mapping in natural settings
A Data-Driven Approach for Tag Refinement and Localization in Web Videos
Tagging of visual content is becoming more and more widespread as web-based
services and social networks have popularized tagging functionalities among
their users. These user-generated tags are used to ease browsing and
exploration of media collections, e.g. using tag clouds, or to retrieve
multimedia content. However, not all media are equally tagged by users. Using
the current systems is easy to tag a single photo, and even tagging a part of a
photo, like a face, has become common in sites like Flickr and Facebook. On the
other hand, tagging a video sequence is more complicated and time consuming, so
that users just tag the overall content of a video. In this paper we present a
method for automatic video annotation that increases the number of tags
originally provided by users, and localizes them temporally, associating tags
to keyframes. Our approach exploits collective knowledge embedded in
user-generated tags and web sources, and visual similarity of keyframes and
images uploaded to social sites like YouTube and Flickr, as well as web sources
like Google and Bing. Given a keyframe, our method is able to select on the fly
from these visual sources the training exemplars that should be the most
relevant for this test sample, and proceeds to transfer labels across similar
images. Compared to existing video tagging approaches that require training
classifiers for each tag, our system has few parameters, is easy to implement
and can deal with an open vocabulary scenario. We demonstrate the approach on
tag refinement and localization on DUT-WEBV, a large dataset of web videos, and
show state-of-the-art results.Comment: Preprint submitted to Computer Vision and Image Understanding (CVIU
- …