3,823 research outputs found
Digital Image Access & Retrieval
The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio
FrameNet annotation for multimodal corpora: devising a methodology for the semantic representation of text-image interactions in audiovisual productions
Multimodal analyses have been growing in importance within several approaches to
Cognitive Linguistics and applied fields such as Natural Language Understanding. Nonetheless
fine-grained semantic representations of multimodal objects are still lacking, especially in terms
of integrating areas such as Natural Language Processing and Computer Vision, which are key
for the implementation of multimodality in Computational Linguistics. In this dissertation, we
propose a methodology for extending FrameNet annotation to the multimodal domain, since
FrameNet can provide fine-grained semantic representations, particularly with a database
enriched by Qualia and other interframal and intraframal relations, as it is the case of FrameNet
Brasil. To make FrameNet Brasil able to conduct multimodal analysis, we outlined the
hypothesis that similarly to the way in which words in a sentence evoke frames and organize
their elements in the syntactic locality accompanying them, visual elements in video shots may,
also, evoke frames and organize their elements on the screen or work complementarily with the
frame evocation patterns of the sentences narrated simultaneously to their appearance on screen,
providing different profiling and perspective options for meaning construction. The corpus
annotated for testing the hypothesis is composed of episodes of a Brazilian TV Travel Series
critically acclaimed as an exemplar of good practices in audiovisual composition. The TV genre
chosen also configures a novel experimental setting for research on integrated image and text
comprehension, since, in this corpus, text is not a direct description of the image sequence but
correlates with it indirectly in a myriad of ways. The dissertation also reports on an eye-tracker
experiment conducted to validate the approach proposed to a text-oriented annotation. The
experiment demonstrated that it is not possible to determine that text impacts gaze directly and
was taken as a reinforcement to the approach of valorizing modes combination. Last, we present
the Frame2 dataset, the product of the annotation task carried out for the corpus following both
the methodology and guidelines proposed. The results achieved demonstrate that, at least for
this TV genre but possibly also for others, a fine-grained semantic annotation tackling the
diverse correlations that take place in a multimodal setting provides new perspective in
multimodal comprehension modeling. Moreover, multimodal annotation also enriches the
development of FrameNets, to the extent that correlations found between modalities can attest
the modeling choices made by those building frame-based resources.Análises multimodais vêm crescendo em importância em várias abordagens da
Linguística Cognitiva e em diversas áreas de aplicação, como o da Compreensão de Linguagem
Natural. No entanto, há significativa carência de representações semânticas refinadas de objetos
multimodais, especialmente em termos de integração de áreas como Processamento de
Linguagem Natural e Visão Computacional, que são fundamentais para a implementação de
multimodalidade no campo da Linguística Computacional. Nesta tese, propomos uma
metodologia para estender o método de anotação da FrameNet ao domínio multimodal, uma
vez que a FrameNet pode fornecer representações semânticas refinadas, particularmente com
um banco de dados enriquecido por Qualia e outras relações interframe e intraframe, como é o
caso do FrameNet Brasil. Para tornar a FrameNet Brasil capaz de realizar análises multimodais,
delineamos a hipótese de que, assim como as palavras em uma frase evocam frames e
organizam seus elementos na localidade sintática que os acompanha, os elementos visuais nos
planos de vídeo também podem evocar frames e organizar seus elementos na tela ou trabalhar
de forma complementar aos padrões de evocação de frames das sentenças narradas
simultaneamente ao seu aparecimento na tela, proporcionando diferentes perfis e opções de
perspectiva para a construção de sentido. O corpus anotado para testar a hipótese é composto
por episódios de um programa televisivo de viagens brasileiro aclamado pela crítica como um
exemplo de boas práticas em composição audiovisual. O gênero televisivo escolhido também
configura um novo conjunto experimental para a pesquisa em imagem integrada e compreensão
textual, uma vez que, neste corpus, o texto não é uma descrição direta da sequência de imagens,
mas se correlaciona com ela indiretamente em uma miríade de formas diversa. A Tese também
relata um experimento de rastreamento ocular realizado para validar a abordagem proposta para
uma anotação orientada por texto. O experimento demonstrou que não é possível determinar
que o texto impacta diretamente o direcionamento do olhar e foi tomado como um reforço para
a abordagem de valorização da combinação de modos. Por fim, apresentamos o conjunto de
dados Frame2, produto da tarefa de anotação realizada para o corpus seguindo a metodologia e
as diretrizes propostas. Os resultados obtidos demonstram que, pelo menos para esse gênero de
TV, mas possivelmente também para outros, uma anotação semântica refinada que aborde as
diversas correlações que ocorrem em um ambiente multimodal oferece uma nova perspectiva
na modelagem da compreensão multimodal. Além disso, a anotação multimodal também
enriquece o desenvolvimento de FrameNets, na medida em que as correlações encontradas entre
as modalidades podem atestar as escolhas de modelagem feitas por aqueles que criam recursos
baseados em frames.CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superio
Query engine of novelty in video streams
Prior research on novelty detection has primarily focused on algorithms to detect novelty for a given application domain. Effective storage, indexing and retrieval of novel events (beyond detection) are largely ignored as a problem in itself. In light of the recent advances in counter-terrorism efforts and link discovery initiatives, the need for effective data management of novel events assumes apparent importance. Automatically detecting novel events in video data streams is an extremely challenging task. The aim of this thesis is to provide evidence to the fact that the notion of novelty in video as perceived by a human is extremely subjective and therefore algorithmically illdefined. Though it comes as no surprise that current machine-based parametric learning systems to accurately mimic human novelty perception are far from perfect such systems have recently been very successful in exhaustively capturing novelty in video once the novelty function is well-defined by a human expert. So, how truly effective are these machine based novelty detection systems as compared to human novelty detection? In this paper we outline an experimental evaluation of the human vs machine based novelty systems in terms of qualitative performance. We then quantify this evaluation using a variety of metrics based on location of novel events, number of novel events found in the video, etc. We begin by describing a machine-based system for detecting novel events in video data streams. We then discuss the issues of designing an indexing-strategy or Manga (comic-book representation is termed as manga in Japanese) to effectively determine the most-representative novel frames for a video sequence. We then evaluate the performance of machine-based novelty detection system against human novelty detection and present the results. The distance metrics we suggest for novelty comparison may eventually aide a variety of end-users to effectively drive the indexing, retrieval and analysis of large video databases. It should also be noted that the techniques we describe in this paper are based on low-level features extracted from video such as color, intensity and focus of attention. The video processing component does not include any semantic processing such as object detection in video for this framework. We conjecture that such advances, though beyond the scope of this particular paper, would undoubtedly benefit the machine-based novelty detection systems and experimentally validate this. We believe that developing a novelty detection system that works in conjunction with the human expert will lead to a more user-centered data mining approach for such domains. JPEG 2000 is a new method of compressing images better than other image formats such as JPEG, GIF, PNG, etc. The main reason this format is in need for investigation is it allows metadata to be embedded within the image itself. The types of data can essentially be anything such as text, audio, video, images, etc. Currently image annotations are stored and collected side by side. Even though this method is very common, it brings up a lot of risks and flaws. Imagine if medical images were annotated by doctors to describe a tumor within the brain, then suddenly some of the annotations are lost. Without these annotations, the images itself would be useless. By embedding these annotations within the image will guarentee that the description and the image will never be seperated. The metadata embedded within the image has no influence to the image iteself. In this thesis we initially develop a metric to index novelty by comparing it to traditional indexing techniques and to human perception. In the second phase of this thesis, we will investigate the new emerging technology of JPEG 2000 and show that novelty stored in this format will outperform traditional image structures. One of the contributions this thesis is making is to develop metrics to measure the performance and quality between the query results of JPEG 2000 and traditional image formats. Since JPEG 2000 is a new technology, there are no existing metrics to measure this type of performance with traditional images
- …