36 research outputs found
Editorial: web of science and scopus impact in IJMIR
Computer Systems, Imagery and Medi
Multimedia Information Retrieval nelle biblioteche
The paper aims to introduce libraries to the view that operating within the terms of traditional Information Retrieval (IR), only through textual language, is limitative, and that considering broader criteria, as those of Multimedia Information Retrieval (MIR), is necessary. The paper stresses the story of MIR fundamental principles, from early years of questioning on documentation to today’s theories on semantic means. New issues for a LIS methodology of processing and searching multimedia documents are theoretically argued, introducing MIR as a holistic whole composed by content-based and semantic information retrieval methodologies. MIR offers a better information searching way: every kind of digital document can be analyzed and retrieved through the elements of language appropriate to its own nature. MIR approach directly handles the concrete content of documents, also considering semantic aspects. Paper conclusions remark the organic integration of the revolutionary contentual conception of information processing with an improved semantics conception, gathering and composing advantages of both systems for accessing to information.L'articolo vuole introdurre le biblioteche alla prospettiva che operare entro i termini dell'Information Retrieval (IR) tradizionale mediante il solo uso del linguaggio testuale è limitativo, e che prendere in considerazione i criteri più ampi del Multimedia Information Retrieval (MIR) è invece necessario. L'articolo illustra la storia dei principi fondamentali del MIR, a partire dai primi anni di dibattito sulla documentazione fino alle teorie odierne sui significati semantici. Vengono dibattute nuovi argomentazioni teoriche per una metodologia LIS di trattamento e ricerca di documenti multimediali, proponendo il MIR come un tutto olistico composto da metolodogie di information retrieval semantico e basato sul contenuto. Il MIR offre modalità di ricerca migliori: ogni tipologia di documento digitale può essere analizzata e recuperata attraverso elementi del linguaggio appropriato alla sua specifica natura. L'approccio del MIR si basa sulla gestione diretta del contenuto dei documenti, considerando anche gli aspetti semantici. Le conclusioni dell'articolo rimarcano l'integrazione organica della rivoluzione della concezione di tipo contenutistico del trattamento dell'informazione con una concezione semantica migliorata, raccogliendo e componendo i vantaggi di entrambi i sistemi per l'accesso all'informazione
TagBook: A Semantic Video Representation without Supervision for Event Detection
We consider the problem of event detection in video for scenarios where only
few, or even zero examples are available for training. For this challenging
setting, the prevailing solutions in the literature rely on a semantic video
representation obtained from thousands of pre-trained concept detectors.
Different from existing work, we propose a new semantic video representation
that is based on freely available social tagged videos only, without the need
for training any intermediate concept detectors. We introduce a simple
algorithm that propagates tags from a video's nearest neighbors, similar in
spirit to the ones used for image retrieval, but redesign it for video event
detection by including video source set refinement and varying the video tag
assignment. We call our approach TagBook and study its construction,
descriptiveness and detection performance on the TRECVID 2013 and 2014
multimedia event detection datasets and the Columbia Consumer Video dataset.
Despite its simple nature, the proposed TagBook video representation is
remarkably effective for few-example and zero-example event detection, even
outperforming very recent state-of-the-art alternatives building on supervised
representations.Comment: accepted for publication as a regular paper in the IEEE Transactions
on Multimedi
Characterization and classification of semantic image-text relations
The beneficial, complementary nature of visual and textual information to convey information is widely known, for example, in entertainment, news, advertisements, science, or education. While the complex interplay of image and text to form semantic meaning has been thoroughly studied in linguistics and communication sciences for several decades, computer vision and multimedia research remained on the surface of the problem more or less. An exception is previous work that introduced the two metrics Cross-Modal Mutual Information and Semantic Correlation in order to model complex image-text relations. In this paper, we motivate the necessity of an additional metric called Status in order to cover complex image-text relations more completely. This set of metrics enables us to derive a novel categorization of eight semantic image-text classes based on three dimensions. In addition, we demonstrate how to automatically gather and augment a dataset for these classes from the Web. Further, we present a deep learning system to automatically predict either of the three metrics, as well as a system to directly predict the eight image-text classes. Experimental results show the feasibility of the approach, whereby the predict-all approach outperforms the cascaded approach of the metric classifiers
Information fusion in content based image retrieval: A comprehensive overview
An ever increasing part of communication between persons involve the use of pictures, due to the cheap availability of powerful cameras on smartphones, and the cheap availability of storage space. The rising popularity of social networking applications such as Facebook, Twitter, Instagram, and of instant messaging applications, such as WhatsApp, WeChat, is the clear evidence of this phenomenon, due to the opportunity of sharing in real-time a pictorial representation of the context each individual is living in. The media rapidly exploited this phenomenon, using the same channel, either to publish their reports, or to gather additional information on an event through the community of users. While the real-time use of images is managed through metadata associated with the image (i.e., the timestamp, the geolocation, tags, etc.), their retrieval from an archive might be far from trivial, as an image bears a rich semantic content that goes beyond the description provided by its metadata. It turns out that after more than 20 years of research on Content-Based Image Retrieval (CBIR), the giant increase in the number and variety of images available in digital format is challenging the research community. It is quite easy to see that any approach aiming at facing such challenges must rely on different image representations that need to be conveniently fused in order to adapt to the subjectivity of image semantics. This paper offers a journey through the main information fusion ingredients that a recipe for the design of a CBIR system should include to meet the demanding needs of users
Dynamicity and Durability in Scalable Visual Instance Search.
Visual instance search involves retrieving from a collection of images the
ones that contain an instance of a visual query. Systems designed for visual
instance search face the major challenge of scalability: a collection of a few
million images used for instance search typically creates a few billion
features that must be indexed. Furthermore, as real image collections grow
rapidly, systems must also provide dynamicity, i.e., be able to handle on-line
insertions while concurrently serving retrieval operations. Durability, which
is the ability to recover correctly from software and hardware crashes, is the
natural complement of dynamicity. Durability, however, has rarely been
integrated within scalable and dynamic high-dimensional indexing solutions.
This article addresses the issue of dynamicity and durability for scalable
indexing of very large and rapidly growing collections of local features for
instance retrieval. By extending the NV-tree, a scalable disk-based
high-dimensional index, we show how to implement the ACID properties of
transactions which ensure both dynamicity and durability. We present a detailed
performance evaluation of the transactional NV-tree: (i) We show that the
insertion throughput is excellent despite the overhead for enforcing the ACID
properties; (ii) We also show that this transactional index is truly scalable
using a standard image benchmark embedded in collections of up to 28.5 billion
high-dimensional vectors; the largest single-server evaluations reported in the
literature