3,183 research outputs found

    Video browsing interfaces and applications: a review

    Get PDF
    We present a comprehensive review of the state of the art in video browsing and retrieval systems, with special emphasis on interfaces and applications. There has been a significant increase in activity (e.g., storage, retrieval, and sharing) employing video data in the past decade, both for personal and professional use. The ever-growing amount of video content available for human consumption and the inherent characteristics of video data—which, if presented in its raw format, is rather unwieldy and costly—have become driving forces for the development of more effective solutions to present video contents and allow rich user interaction. As a result, there are many contemporary research efforts toward developing better video browsing solutions, which we summarize. We review more than 40 different video browsing and retrieval interfaces and classify them into three groups: applications that use video-player-like interaction, video retrieval applications, and browsing solutions based on video surrogates. For each category, we present a summary of existing work, highlight the technical aspects of each solution, and compare them against each other

    Detection and Generalization of Spatio-temporal Trajectories for Motion Imagery

    Get PDF
    In today\u27s world of vast information availability users often confront large unorganized amounts of data with limited tools for managing them. Motion imagery datasets have become increasingly popular means for exposing and disseminating information. Commonly, moving objects are of primary interest in modeling such datasets. Users may require different levels of detail mainly for visualization and further processing purposes according to the application at hand. In this thesis we exploit the geometric attributes of objects for dataset summarization by using a series of image processing and neural network tools. In order to form data summaries we select representative time instances through the segmentation of an object\u27s spatio-temporal trajectory lines. High movement variation instances are selected through a new hybrid self-organizing map (SOM) technique to describe a single spatio-temporal trajectory. Multiple objects move in diverse yet classifiable patterns. In order to group corresponding trajectories we utilize an abstraction mechanism that investigates a vague moving relevance between the data in space and time. Thus, we introduce the spatio-temporal neighborhood unit as a variable generalization surface. By altering the unit\u27s dimensions, scaled generalization is accomplished. Common complications in tracking applications that include occlusion, noise, information gaps and unconnected segments of data sequences are addressed through the hybrid-SOM analysis. Nevertheless, entangled data sequences where no information on which data entry belongs to each corresponding trajectory are frequently evident. A multidimensional classification technique that combines geometric and backpropagation neural network implementation is used to distinguish between trajectory data. Further more, modeling and summarization of two-dimensional phenomena evolving in time brings forward the novel concept of spatio-temporal helixes as compact event representations. The phenomena models are comprised of SOM movement nodes (spines) and cardinality shape-change descriptors (prongs). While we focus on the analysis of MI datasets, the framework can be generalized to function with other types of spatio-temporal datasets. Multiple scale generalization is allowed in a dynamic significance-based scale rather than a constant one. The constructed summaries are not just a visualization product but they support further processing for metadata creation, indexing, and querying. Experimentation, comparisons and error estimations for each technique support the analyses discussed

    Deep Features and Clustering Based Keyframes Selection with Security

    Get PDF
    The digital world is developing more quickly than ever. Multimedia processing and distribution, however become vulnerable issues due to the enormous quantity and significance of vital information. Therefore, extensive technologies and algorithms are required for the safe transmission of messages, images, and video files. This paper proposes a secure framework by acute integration of video summarization and image encryption. Three parts comprise the proposed cryptosystem framework. The informative frames are first extracted using an efficient and lightweight technique that make use of the color histogram-clustering (RGB-HSV) approach's processing capabilities. Each frame of a video is represented by deep features, which are based on an enhanced pre-trained Inception-v3 network. After that summary is obtain using the K-means optimal clustering algorithm. The representative keyframes then extracted using the clusters highest possible entropy nodes. Experimental validation on two well-known standard datasets demonstrates the proposed methods superiority to numerous state-of-the-art approaches. Finally, the proposed framework performs an efficient image encryption and decryption algorithm by employing a general linear group function GLn (F). The analysis and testing outcomes prove the superiority of the proposed adaptive RSA

    Semantics-aware image understanding

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Learning image‐text associations

    Get PDF

    Group Analysis of Self-organizing Maps based on Functional MRI using Restricted Frechet Means

    Full text link
    Studies of functional MRI data are increasingly concerned with the estimation of differences in spatio-temporal networks across groups of subjects or experimental conditions. Unsupervised clustering and independent component analysis (ICA) have been used to identify such spatio-temporal networks. While these approaches have been useful for estimating these networks at the subject-level, comparisons over groups or experimental conditions require further methodological development. In this paper, we tackle this problem by showing how self-organizing maps (SOMs) can be compared within a Frechean inferential framework. Here, we summarize the mean SOM in each group as a Frechet mean with respect to a metric on the space of SOMs. We consider the use of different metrics, and introduce two extensions of the classical sum of minimum distance (SMD) between two SOMs, which take into account the spatio-temporal pattern of the fMRI data. The validity of these methods is illustrated on synthetic data. Through these simulations, we show that the three metrics of interest behave as expected, in the sense that the ones capturing temporal, spatial and spatio-temporal aspects of the SOMs are more likely to reach significance under simulated scenarios characterized by temporal, spatial and spatio-temporal differences, respectively. In addition, a re-analysis of a classical experiment on visually-triggered emotions demonstrates the usefulness of this methodology. In this study, the multivariate functional patterns typical of the subjects exposed to pleasant and unpleasant stimuli are found to be more similar than the ones of the subjects exposed to emotionally neutral stimuli. Taken together, these results indicate that our proposed methods can cast new light on existing data by adopting a global analytical perspective on functional MRI paradigms.Comment: 23 pages, 5 figures, 4 tables. Submitted to Neuroimag

    Temporal contextual descriptors and applications to emotion analysis.

    Get PDF
    The current trends in technology suggest that the next generation of services and devices allows smarter customization and automatic context recognition. Computers learn the behavior of the users and can offer them customized services depending on the context, location, and preferences. One of the most important challenges in human-machine interaction is the proper understanding of human emotions by machines and automated systems. In the recent years, the progress made in machine learning and pattern recognition led to the development of algorithms that are able to learn the detection and identification of human emotions from experience. These algorithms use different modalities such as image, speech, and physiological signals to analyze and learn human emotions. In many settings, the vocal information might be more available than other modalities due to widespread of voice sensors in phones, cars, and computer systems in general. In emotion analysis from speech, an audio utterance is represented by an ordered (in time) sequence of features or a multivariate time series. Typically, the sequence is further mapped into a global descriptor representative of the entire utterance/sequence. This descriptor is used for classification and analysis. In classic approaches, statistics are computed over the entire sequence and used as a global descriptor. This often results in the loss of temporal ordering from the original sequence. Emotion is a succession of acoustic events. By discarding the temporal ordering of these events in the mapping, the classic approaches cannot detect acoustic patterns that lead to a certain emotion. In this dissertation, we propose a novel feature mapping framework. The proposed framework maps temporally ordered sequence of acoustic features into data-driven global descriptors that integrate the temporal information from the original sequence. The framework contains three mapping algorithms. These algorithms integrate the temporal information implicitly and explicitly in the descriptor\u27s representation. In the rst algorithm, the Temporal Averaging Algorithm, we average the data temporally using leaky integrators to produce a global descriptor that implicitly integrates the temporal information from the original sequence. In order to integrate the discrimination between classes in the mapping, we propose the Temporal Response Averaging Algorithm which combines the temporal averaging step of the previous algorithm and unsupervised learning to produce data driven temporal contextual descriptors. In the third algorithm, we use the topology preserving property of the Self-Organizing Maps and the continuous nature of speech to map a temporal sequence into an ordered trajectory representing the behavior over time of the input utterance on a 2-D map of emotions. The temporal information is integrated explicitly in the descriptor which makes it easier to monitor emotions in long speeches. The proposed mapping framework maps speech data of different length to the same equivalent representation which alleviates the problem of dealing with variable length temporal sequences. This is advantageous in real time setting where the size of the analysis window can be variable. Using the proposed feature mapping framework, we build a novel data-driven speech emotion detection and recognition system that indexes speech databases to facilitate the classification and retrieval of emotions. We test the proposed system using two datasets. The first corpus is acted. We showed that the proposed mapping framework outperforms the classic approaches while providing descriptors that are suitable for the analysis and visualization of humans’ emotions in speech data. The second corpus is an authentic dataset. In this dissertation, we evaluate the performances of our system using a collection of debates. For that purpose, we propose a novel debate collection that is one of the first initiatives in the literature. We show that the proposed system is able to learn human emotions from debates
    corecore