27 research outputs found

    Parallelization Strategies for Graph-Code-Based Similarity Search

    Get PDF
    The volume of multimedia assets in collections is growing exponentially, and the retrieval of information is becoming more complex. The indexing and retrieval of multimedia content is generally implemented by employing feature graphs. Feature graphs contain semantic information on multimedia assets. Machine learning can produce detailed semantic information on multimedia assets, reflected in a high volume of nodes and edges in the feature graphs. While increasing the effectiveness of the information retrieval results, the high level of detail and also the growing collections increase the processing time. Addressing this problem, Multimedia Feature Graphs (MMFGs) and Graph Codes (GCs) have been proven to be fast and effective structures for information retrieval. However, the huge volume of data requires more processing time. As Graph Code algorithms were designed to be parallelizable, different paths of parallelization can be employed to prove or evaluate the scalability options of Graph Code processing. These include horizontal and vertical scaling with the use of Graphic Processing Units (GPUs), Multicore Central Processing Units (CPUs), and distributed computing. In this paper, we show how different parallelization strategies based on Graph Codes can be combined to provide a significant improvement in efficiency. Our modeling work shows excellent scalability with a theoretical speedup of 16,711 on a top-of-the-line Nvidia H100 GPU with 16,896 cores. Our experiments with a mediocre GPU show that a speedup of 225 can be achieved and give credence to the theoretical speedup. Thus, Graph Codes provide fast and effective multimedia indexing and retrieval, even in billion-scale use cases

    Editor’s Response

    No full text

    Temporal relations in visual semantics of verbs

    Get PDF
    Numerous temporal relations of verbal actions have been analysed in terms of various grammatical means of expressing verbal temporalisation such as tense, aspect, duration and iteration. Here the temporal relations within verb semantics, particularly ordered pairs of verb entailment, are studied using Allen’s interval-based temporal formalism. Their application to the compositional visual definitions in our intelligent storytelling system, CONFUCIUS, is presented, including the representation of procedural events, achievement events and lexical causatives. In applying these methods we consider both language modalities and visual modalities since CONFUCIUS is a multimodal system

    Dynamic conceptualization in a mechanical-object assembly environment

    No full text
    Wachsmuth I, Jung B. Dynamic conceptualization in a mechanical-object assembly environment. In: Mc Kevitt P, ed. Integration of Natural Language Processing and Vision Processing, Vol. IV. Vol Vol. IV. Dordrecht, The Netherlands: Kluwer Academic Publishers; 1996: 191-214

    Mobile Multimodal Dynamic Output Morphing Tourist Systems

    No full text
    TeleMorph dynamically generates multimedia presentations using output modalities that are determined by the bandwidth available on a mobile device’s wireless connection. To demonstrate the effectiveness of this research TeleTuras, a tourist information guide for the city of Derry will implement the solution provided by TeleMorph, thus demonstrating its effectiveness. This paper concentrates on the motivation for & issues surrounding intelligent tourist systems.

    Pragmatic Linguistic Constraint Models for Large-Vocabulary Speech Processing

    No full text
    Current systems for speech recognition suffer from uncertainty: rather than delivering a uniquely-identified word, each input segment is associated with a set of recognitioncandidates or word-hypotheses. Thus an input sequence of sounds or images leads to, not an unambiguous sequence of words, but a lattice of word-hypotheses. To choose the best candidate from each word-hypothesis set (i.e. to find the best route through the lattice) , linguistic context needs to be taken into account, at several levels: lexis and morphology, parts-of-speech, phrase structure, semantics and pragmatics. We believe that an intuitively simple, naive model will suffice at each level; the sophistication required for full Natural Language Understanding (NLU) (e.g. Alvey Natural Language Toolkit (ANLT)) is inappropriate for real-time language recognition. We describe here models of each linguistic level which are simple but robust and computationally straightforward (hence `pragmatic' in the everyday sense) ..

    Explainable Multimedia Feature Fusion for Medical Applications

    No full text
    Due to the exponential growth of medical information in the form of, e.g., text, images, Electrocardiograms (ECGs), X-ray, and multimedia, the management of a patient's data has become a huge challenge. In particular, the extraction of features from various different formats and their representation in a homogeneous way are areas of interest in medical applications. Multimedia Information Retrieval (MMIR) frameworks, like the Generic Multimedia Analysis Framework (GMAF), can contribute to solving this problem, when adapted to special requirements and modalities of medical applications. In this paper, we demonstrate how typical multimedia processing techniques can be extended and adapted to medical applications and how these applications benefit from employing a Multimedia Feature Graph (MMFG) and specialized, efficient indexing indexing structures in the form of Graph Codes. These Graph Codes are transformed to feature relevant Graph Codes by employing a modified Term Frequency Inverse Document Frequency (TFIDF) algorithm, which further supports value ranges and Boolean operations required in the medical context. On this basis, various metrics for the calculation of similarity, recommendations, and automated inferencing and reasoning can be applied supporting the field of diagnostics. Finally, the presentation of these new facilities in the form of explainability is introduced and demonstrated. Thus, in this paper, we show how Graph Codes contribute new querying options for diagnosis and how Explainable Graph Codes can help to readily understand medical multimedia formats.open access</p

    Communicative rhythm in gesture and speech

    No full text
    Wachsmuth I. Communicative rhythm in gesture and speech. In: Mc Kevitt P, Ó Nualláin S, Mulvihill C, eds. Language, Vision and Music. Reprinted by permission of Springer Publishing. Amsterdam: Benjamins; 2002: 117-132.Led by the fundamental role that rhythms apparently play in speech and gestural communication among humans, this study was undertaken to substantiate a biologically motivated model for synchronizing speech and gesture input in human computer interaction. Our approach presents a novel method which conceptualizes a multimodal user interface on the basis of timed agent systems. We use multiple agents for the purpose of polling presemantic information from different sensory channels (speech and hand gestures) and integrating them to multimodal data structures that can be processed by an application system which is again based on agent systems. This article was previously published under the same title in: Annelies Braffort et al. (Eds.) (1999), Gesture-Based Communication in Human-Computer Interaction, Lecture Notes in Artificial Intelligence, Vol. 1739, Berlin: Springer-Verlag, and is reprinted here by kind permission of Springer-Verlag
    corecore