144 research outputs found

    Unsupervised Visual and Textual Information Fusion in Multimedia Retrieval - A Graph-based Point of View

    Full text link
    Multimedia collections are more than ever growing in size and diversity. Effective multimedia retrieval systems are thus critical to access these datasets from the end-user perspective and in a scalable way. We are interested in repositories of image/text multimedia objects and we study multimodal information fusion techniques in the context of content based multimedia information retrieval. We focus on graph based methods which have proven to provide state-of-the-art performances. We particularly examine two of such methods : cross-media similarities and random walk based scores. From a theoretical viewpoint, we propose a unifying graph based framework which encompasses the two aforementioned approaches. Our proposal allows us to highlight the core features one should consider when using a graph based technique for the combination of visual and textual information. We compare cross-media and random walk based results using three different real-world datasets. From a practical standpoint, our extended empirical analysis allow us to provide insights and guidelines about the use of graph based methods for multimodal information fusion in content based multimedia information retrieval.Comment: An extended version of the paper: Visual and Textual Information Fusion in Multimedia Retrieval using Semantic Filtering and Graph based Methods, by J. Ah-Pine, G. Csurka and S. Clinchant, submitted to ACM Transactions on Information System

    Image retrieval using the combination of text-based and content-based algorithms

    Get PDF
    Image retrieval is an important research field which has received great attention in the last decades. In this paper, we present an approach for the image retrieval based on the combination of text-based and content-based features. For text-based features, keywords and for content-based features, color and texture features have been used. Query in this system contains some keywords and an input image. At first, the images are retrieved based on the input keywords. Then, visual features are extracted to retrieve ideal output images. For extraction of color features we have used color moments and for texture we have used color co-occurrence matrix. The COREL image database have been used for our experimental results. The experimental results show that the performance of the combination of both text- and content- based features is much higher than each of them which is applied separately

    Conceptual modeling of multimedia databases

    Get PDF
    The gap between the semantic content of multimedia data and its underlying physical representation is one of the main problems in the modern multimedia research in general, and, in particular, in the field of multimedia database modeling. We believe that one of the principal reasons of this problem is the attempt to conceptually represent multimedia data in a way, which is similar to its low-level representation by applications dealing with encoding standards, feature-based multimedia analysis, etc. In our opinion, such conceptual representation of multimedia contributes to the semantic gap by separating the representation of multimedia information from the representation of the universe of discourse of an application, to which the multimedia information pertains. In this research work we address the problem of conceptual modeling of multimedia data in a way to deal with the above-mentioned limitations. First, we introduce two different paradigms of conceptual understanding of the essence of multimedia data, namely: multimedia as data and multimedia as metadata. The multimedia as data paradigm, which views multimedia data as the subject of modeling in its own right, is inherent to so-called multimedia-centric applications, where multimedia information itself represents the main part of the universe of discourse. The examples of such kind of applications are digital photo collections or digital movie archives. On the other hand, the multimedia as metadata paradigm, which is inherent to so-called multimedia-enhanced applications, views multimedia data as just another (optional) source of information about whatever universe of discourse that the application pertains to. An example of a multimedia-enhanced application is a human-resource database augmented with employee photos. Here the universe of discourse is the totality of company employees, while their photos simply represent an additional (possibly optional) kind of information describing the universe of discourse. The multimedia conceptual modeling approach that we present in this work allows addressing multimedia-centric applications, as well as, in particular, multimedia-enhanced applications. The model that we propose builds upon MADS (Modeling Application Data with Spatio-temporal features), which is a rich conceptual model defined in our laboratory, and which is, in particular, characterized by structural completeness, spatio-temporal modeling capabilities, and multirepresentation support. The proposed multimedia model is provided in the form of a new modeling dimension of MADS, whose orthogonality principle allows to integrate the new multimedia modeling dimension with already existing modeling features of MADS. The following multimedia modeling constructs are provided: multimedia datatypes, simple and complex representational constraints (relationships), a multimedia partitioning mechanism, and multimedia multirepresentation features. Following the description of our conceptual multimedia modeling approach based on MADS, we present the peculiarities of logical multimedia modeling and of conceptual-to-logical inter-layer transformations. We provide a set of mapping guidelines intended to help the schema designer in coming up with rich logical multimedia document representations of the application domain, which conform with the conceptual multimedia schema. The practical interest of our research is illustrated by a mock-up application, which has been developed to support the theoretical ideas described in this work. In particular, we show how the abstract conceptual set-based representations of multimedia data elements, as well as simple and complex multimedia representational relationships can be implemented using Oracle DBMS

    Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021

    Get PDF
    The eighth edition of the Italian Conference on Computational Linguistics (CLiC-it 2021) was held at UniversitĂ  degli Studi di Milano-Bicocca from 26th to 28th January 2022. After the edition of 2020, which was held in fully virtual mode due to the health emergency related to Covid-19, CLiC-it 2021 represented the first moment for the Italian research community of Computational Linguistics to meet in person after more than one year of full/partial lockdown

    Digital History and Hermeneutics

    Get PDF
    For doing history in the digital age, we need to investigate the “digital kitchen” as the place where the “raw” is transformed into the “cooked”. The novel field of digital hermeneutics provides a critical and reflexive frame for digital humanities research by acquiring digital literacy and skills. The Doctoral Training Unit "Digital History and Hermeneutics" is applying this new digital practice by reflecting on digital tools and methods

    Machine Learning Algorithm for the Scansion of Old Saxon Poetry

    Get PDF
    Several scholars designed tools to perform the automatic scansion of poetry in many languages, but none of these tools deal with Old Saxon or Old English. This project aims to be a first attempt to create a tool for these languages. We implemented a Bidirectional Long Short-Term Memory (BiLSTM) model to perform the automatic scansion of Old Saxon and Old English poems. Since this model uses supervised learning, we manually annotated the Heliand manuscript, and we used the resulting corpus as labeled dataset to train the model. The evaluation of the performance of the algorithm reached a 97% for the accuracy and a 99% of weighted average for precision, recall and F1 Score. In addition, we tested the model with some verses from the Old Saxon Genesis and some from The Battle of Brunanburh, and we observed that the model predicted almost all Old Saxon metrical patterns correctly misclassified the majority of the Old English input verses

    Visual Concept Detection in Images and Videos

    Get PDF
    The rapidly increasing proliferation of digital images and videos leads to a situation where content-based search in multimedia databases becomes more and more important. A prerequisite for effective image and video search is to analyze and index media content automatically. Current approaches in the field of image and video retrieval focus on semantic concepts serving as an intermediate description to bridge the “semantic gap” between the data representation and the human interpretation. Due to the large complexity and variability in the appearance of visual concepts, the detection of arbitrary concepts represents a very challenging task. In this thesis, the following aspects of visual concept detection systems are addressed: First, enhanced local descriptors for mid-level feature coding are presented. Based on the observation that scale-invariant feature transform (SIFT) descriptors with different spatial extents yield large performance differences, a novel concept detection system is proposed that combines feature representations for different spatial extents using multiple kernel learning (MKL). A multi-modal video concept detection system is presented that relies on Bag-of-Words representations for visual and in particular for audio features. Furthermore, a method for the SIFT-based integration of color information, called color moment SIFT, is introduced. Comparative experimental results demonstrate the superior performance of the proposed systems on the Mediamill and on the VOC Challenge. Second, an approach is presented that systematically utilizes results of object detectors. Novel object-based features are generated based on object detection results using different pooling strategies. For videos, detection results are assembled to object sequences and a shot-based confidence score as well as further features, such as position, frame coverage or movement, are computed for each object class. These features are used as additional input for the support vector machine (SVM)-based concept classifiers. Thus, other related concepts can also profit from object-based features. Extensive experiments on the Mediamill, VOC and TRECVid Challenge show significant improvements in terms of retrieval performance not only for the object classes, but also in particular for a large number of indirectly related concepts. Moreover, it has been demonstrated that a few object-based features are beneficial for a large number of concept classes. On the VOC Challenge, the additional use of object-based features led to a superior performance for the image classification task of 63.8% mean average precision (AP). Furthermore, the generalization capabilities of concept models are investigated. It is shown that different source and target domains lead to a severe loss in concept detection performance. In these cross-domain settings, object-based features achieve a significant performance improvement. Since it is inefficient to run a large number of single-class object detectors, it is additionally demonstrated how a concurrent multi-class object detection system can be constructed to speed up the detection of many object classes in images. Third, a novel, purely web-supervised learning approach for modeling heterogeneous concept classes in images is proposed. Tags and annotations of multimedia data in the WWW are rich sources of information that can be employed for learning visual concepts. The presented approach is aimed at continuous long-term learning of appearance models and improving these models periodically. For this purpose, several components have been developed: a crawling component, a multi-modal clustering component for spam detection and subclass identification, a novel learning component, called “random savanna”, a validation component, an updating component, and a scalability manager. Only a single word describing the visual concept is required to initiate the learning process. Experimental results demonstrate the capabilities of the individual components. Finally, a generic concept detection system is applied to support interdisciplinary research efforts in the field of psychology and media science. The psychological research question addressed in the field of behavioral sciences is, whether and how playing violent content in computer games may induce aggression. Therefore, novel semantic concepts most notably “violence” are detected in computer game videos to gain insights into the interrelationship of violent game events and the brain activity of a player. Experimental results demonstrate the excellent performance of the proposed automatic concept detection approach for such interdisciplinary research

    Hypertext Semiotics in the Commercialized Internet

    Get PDF
    Die Hypertext Theorie verwendet die selbe Terminologie, welche seit Jahrzehnten in der semiotischen Forschung untersucht wird, wie z.B. Zeichen, Text, Kommunikation, Code, Metapher, Paradigma, Syntax, usw. Aufbauend auf jenen Ergebnissen, welche in der Anwendung semiotischer Prinzipien und Methoden auf die Informatik erfolgreich waren, wie etwa Computer Semiotics, Computational Semiotics und Semiotic Interface Engineering, legt diese Dissertation einen systematischen Ansatz für all jene Forscher dar, die bereit sind, Hypertext aus einer semiotischen Perspektive zu betrachten. Durch die Verknüpfung existierender Hypertext-Modelle mit den Resultaten aus der Semiotik auf allen Sinnesebenen der textuellen, auditiven, visuellen, taktilen und geruchlichen Wahrnehmung skizziert der Autor Prolegomena einer Hypertext-Semiotik-Theorie, anstatt ein völlig neues Hypertext-Modell zu präsentieren. Eine Einführung in die Geschichte der Hypertexte, von ihrer Vorgeschichte bis zum heutigen Entwicklungsstand und den gegenwärtigen Entwicklungen im kommerzialisierten World Wide Web bilden den Rahmen für diesen Ansatz, welcher als Fundierung des Brückenschlages zwischen Mediensemiotik und Computer-Semiotik angesehen werden darf. Während Computer-Semiotiker wissen, dass der Computer eine semiotische Maschine ist und Experten der künstlichen Intelligenz-Forschung die Rolle der Semiotik in der Entwicklung der nächsten Hypertext-Generation betonen, bedient sich diese Arbeit einer breiteren methodologischen Basis. Dementsprechend reichen die Teilgebiete von Hypertextanwendungen, -paradigmen, und -strukturen, über Navigation, Web Design und Web Augmentation zu einem interdisziplinären Spektrum detaillierter Analysen, z.B. des Zeigeinstrumentes der Web Browser, des Klammeraffen-Zeichens und der sogenannten Emoticons. Die Bezeichnung ''Icon'' wird als unpassender Name für jene Bildchen, welche von der graphischen Benutzeroberfläche her bekannt sind und in Hypertexten eingesetzt werden, zurückgewiesen und diese Bildchen durch eine neue Generation mächtiger Graphic Link Markers ersetzt. Diese Ergebnisse werden im Kontext der Kommerzialisierung des Internet betrachtet. Neben der Identifizierung der Hauptprobleme des eCommerce aus der Perspektive der Hypertext Semiotik, widmet sich der Autor den Informationsgütern und den derzeitigen Hindernissen für die New Economy, wie etwa der restriktiven Gesetzeslage in Sachen Copyright und Intellectual Property. Diese anachronistischen Beschränkungen basieren auf der problematischen Annahme, dass auch der Informationswert durch die Knappheit bestimmt wird. Eine semiotische Analyse der iMarketing Techniken, wie z.B. Banner Werbung, Keywords und Link Injektion, sowie Exkurse über den Browser Krieg und den Toywar runden die Dissertation ab
    • …
    corecore