5,752 research outputs found
Video retrieval using dialogue, keyframe similarity and video objects
There are several different approaches to video retrieval which vary in sophistication, and in the level of their deployment. Some are well-known, others are not yet within our reach for any kind of large volumes of video. In particular, object-based video retrieval, where an object from within a video is used for retrieval, is often particularly desirable from a searcher's perspective. In this paper we introduce Fischlar-Simpsons, a system providing retrieval from an archive of video using any combination of text searching, keyframe image matching, shot-level browsing, as well as object-based retrieval. The system is driven by user feedback and interaction rather than having the conventional search/browse/search metaphor and the purpose of the system is to explore how users can use detected objects in a shot as part of a retrieval task
Ranking Archived Documents for Structured Queries on Semantic Layers
Archived collections of documents (like newspaper and web archives) serve as
important information sources in a variety of disciplines, including Digital
Humanities, Historical Science, and Journalism. However, the absence of
efficient and meaningful exploration methods still remains a major hurdle in
the way of turning them into usable sources of information. A semantic layer is
an RDF graph that describes metadata and semantic information about a
collection of archived documents, which in turn can be queried through a
semantic query language (SPARQL). This allows running advanced queries by
combining metadata of the documents (like publication date) and content-based
semantic information (like entities mentioned in the documents). However, the
results returned by such structured queries can be numerous and moreover they
all equally match the query. In this paper, we deal with this problem and
formalize the task of "ranking archived documents for structured queries on
semantic layers". Then, we propose two ranking models for the problem at hand
which jointly consider: i) the relativeness of documents to entities, ii) the
timeliness of documents, and iii) the temporal relations among the entities.
The experimental results on a new evaluation dataset show the effectiveness of
the proposed models and allow us to understand their limitation
Bridging Vision and Language over Time with Neural Cross-modal Embeddings
Giving computers the ability to understand multimedia content is one of the goals
of Artificial Intelligence systems. While humans excel at this task, it remains a challenge,
requiring bridging vision and language, which inherently have heterogeneous
computational representations. Cross-modal embeddings are used to tackle this challenge,
by learning a common space that uni es these representations. However, to grasp
the semantics of an image, one must look beyond the pixels and consider its semantic
and temporal context, with the latter being de ned by images’ textual descriptions and
time dimension, respectively. As such, external causes (e.g. emerging events) change the
way humans interpret and describe the same visual element over time, leading to the
evolution of visual-textual correlations.
In this thesis we investigate models that capture patterns of visual and textual interactions
over time, by incorporating time in cross-modal embeddings: 1) in a relative manner,
where by using pairwise temporal correlations to aid data structuring, we obtained a
model that provides better visual-textual correspondences on dynamic corpora, and 2) in
a diachronic manner, where the temporal dimension is fully preserved, thus capturing
visual-textual correlations evolution under a principled approach that jointly models
vision+language+time. Rich insights stemming from data evolution were extracted from
a 20 years large-scale dataset. Additionally, towards improving the e ectiveness of these
embedding learning models, we proposed a novel loss function that increases the expressiveness
of the standard triplet-loss, by making it adaptive to the data at hand. With our
adaptive triplet-loss, in which triplet speci c constraints are inferred and scheduled, we
achieved state-of-the-art performance on the standard cross-modal retrieval task
Improving the quality of the personalized electronic program guide
As Digital TV subscribers are offered more and more channels, it is becoming increasingly difficult for them to locate the right programme information at the right time. The personalized Electronic Programme Guide (pEPG) is one solution to this problem; it leverages artificial intelligence and user profiling techniques to learn about the viewing preferences of individual users in order to compile personalized viewing guides that fit their individual preferences. Very often the limited availability of profiling information is a key limiting factor in such personalized recommender systems. For example, it is well known that collaborative filtering approaches suffer significantly from the sparsity problem, which exists because the expected item-overlap between profiles is usually very low. In this article we address the sparsity problem in the Digital TV domain. We propose the use of data mining techniques as a way of supplementing meagre ratings-based profile knowledge with additional item-similarity knowledge that can be automatically discovered by mining user profiles. We argue that this new similarity knowledge can significantly enhance the performance of a recommender system in even the sparsest of profile spaces. Moreover, we provide an extensive evaluation of our approach using two large-scale, state-of-the-art online systems—PTVPlus, a personalized TV listings portal and Físchlár, an online digital video library system
Adaptive Training of Video Sets for Image Recognition on Mobile Phones
We present an enhancement towards adaptive video training for PhoneGuide, a digital museum guidance system for ordinary camera–equipped mobile phones. It enables museum visitors to identify exhibits by capturing photos of them. In this article, a combined solution of object recognition and pervasive tracking is extended to a client–server–system for improving data acquisition and for supporting scale–invariant object recognition
Diversification Based Static Index Pruning - Application to Temporal Collections
Nowadays, web archives preserve the history of large portions of the web. As
medias are shifting from printed to digital editions, accessing these huge
information sources is drawing increasingly more attention from national and
international institutions, as well as from the research community. These
collections are intrinsically big, leading to index files that do not fit into
the memory and an increase query response time. Decreasing the index size is a
direct way to decrease this query response time.
Static index pruning methods reduce the size of indexes by removing a part of
the postings. In the context of web archives, it is necessary to remove
postings while preserving the temporal diversity of the archive. None of the
existing pruning approaches take (temporal) diversification into account.
In this paper, we propose a diversification-based static index pruning
method. It differs from the existing pruning approaches by integrating
diversification within the pruning context. We aim at pruning the index while
preserving retrieval effectiveness and diversity by pruning while maximizing a
given IR evaluation metric like DCG. We show how to apply this approach in the
context of web archives. Finally, we show on two collections that search
effectiveness in temporal collections after pruning can be improved using our
approach rather than diversity oblivious approaches
Music Generation by Deep Learning - Challenges and Directions
In addition to traditional tasks such as prediction, classification and
translation, deep learning is receiving growing attention as an approach for
music generation, as witnessed by recent research groups such as Magenta at
Google and CTRL (Creator Technology Research Lab) at Spotify. The motivation is
in using the capacity of deep learning architectures and training techniques to
automatically learn musical styles from arbitrary musical corpora and then to
generate samples from the estimated distribution. However, a direct application
of deep learning to generate content rapidly reaches limits as the generated
content tends to mimic the training set without exhibiting true creativity.
Moreover, deep learning architectures do not offer direct ways for controlling
generation (e.g., imposing some tonality or other arbitrary constraints).
Furthermore, deep learning architectures alone are autistic automata which
generate music autonomously without human user interaction, far from the
objective of interactively assisting musicians to compose and refine music.
Issues such as: control, structure, creativity and interactivity are the focus
of our analysis. In this paper, we select some limitations of a direct
application of deep learning to music generation, analyze why the issues are
not fulfilled and how to address them by possible approaches. Various examples
of recent systems are cited as examples of promising directions.Comment: 17 pages. arXiv admin note: substantial text overlap with
arXiv:1709.01620. Accepted for publication in Special Issue on Deep learning
for music and audio, Neural Computing & Applications, Springer Nature, 201
- …