893 research outputs found

    Highly efficient low-level feature extraction for video representation and retrieval.

    Get PDF
    PhDWitnessing the omnipresence of digital video media, the research community has raised the question of its meaningful use and management. Stored in immense multimedia databases, digital videos need to be retrieved and structured in an intelligent way, relying on the content and the rich semantics involved. Current Content Based Video Indexing and Retrieval systems face the problem of the semantic gap between the simplicity of the available visual features and the richness of user semantics. This work focuses on the issues of efficiency and scalability in video indexing and retrieval to facilitate a video representation model capable of semantic annotation. A highly efficient algorithm for temporal analysis and key-frame extraction is developed. It is based on the prediction information extracted directly from the compressed domain features and the robust scalable analysis in the temporal domain. Furthermore, a hierarchical quantisation of the colour features in the descriptor space is presented. Derived from the extracted set of low-level features, a video representation model that enables semantic annotation and contextual genre classification is designed. Results demonstrate the efficiency and robustness of the temporal analysis algorithm that runs in real time maintaining the high precision and recall of the detection task. Adaptive key-frame extraction and summarisation achieve a good overview of the visual content, while the colour quantisation algorithm efficiently creates hierarchical set of descriptors. Finally, the video representation model, supported by the genre classification algorithm, achieves excellent results in an automatic annotation system by linking the video clips with a limited lexicon of related keywords

    The act of viewing: indeterminacy and interpretation in narrative film

    Get PDF
    This thesis argues that the presentation of narrative in film involves a fundamental indeterminacy, derived from the status of the event in film. I elaborate this idea of indeterminacy through Gilles Deleuze’s ontology of the filmic image and Daniel Frampton’s phenomenology of film-thinking. I analyse various manifestations of narrative indeterminacy, looking at examples from silent-era, classical and contemporary cinema from around the world, both within the studio model and outside of it. I look at how we may theorise narrative agency in light of this indeterminacy and its various forms, proposing an alternative to previous models of filmic narration, as well as examining the implications of indeterminacy for a viewer’s activity in understanding narrative and how this relates to narrative agency. Here I use Wolfgang Iser’s reader-response theory and his theory of literary indeterminacy to propose that this act of viewing is fundamentally interpretive, exploring the extent to which a filmic equivalent to Iser’s implied reader may be identified, and the implications of this for conceptions of the relationship between the various types of viewer proposed throughout film theory. What emerges from this is a theory of the act of viewing that attends to the particular status of the event in the moving image of film and the indeterminacy that follows from this in a manner that previous theories do not, proposing an alternative to David Bordwell’s theory of narrative comprehension and the related dismissal of interpretation. I suggest how viewer activity can be theorised alongside – rather than instead of – the 'passive' spectators of ideologically oriented film theory, and that what is required is attention to this intersection of viewing positions in film theory

    Speech Mode Classification using the Fusion of CNNs and LSTM Networks

    Get PDF
    Speech mode classification is an area that has not been as widely explored in the field of sound classification as others such as environmental sounds, music genre, and speaker identification. But what is speech mode? While mode is defined as the way or the manner in which something occurs or is expressed or done, speech mode is defined as the style in which the speech is delivered by a person. There are some reports on speech mode classification using conventional methods, such as whispering and talking using a normal phonetic sound. However, to the best of our knowledge, deep learning-based methods have not been reported in the open literature for the aforementioned classification scenario. Specifically, in this work we assess the performance of image-based classification algorithms on this challenging speech mode classification problem, including the usage of pre-trained deep neural networks, namely AlexNet, ResNet18 and SqueezeNet. Thus, we compare the classification efficiency of a set of deep learning-based classifiers, while we also assess the impact of different 2D image representations (spectrograms, mel-spectrograms, and their image-based fusion) on classification accuracy. These representations are used as input to the networks after being generated from the original audio signals. Next, we compare the accuracy of the DL-based classifies to a set of machine learning (ML) ones that use as their inputs Mel-Frequency Cepstral Coefficients (MFCCs) features. Then, after determining the most efficient sampling rate for our classification problem (i.e. 32kHz), we study the performance of our proposed method of combining CNN with LSTM (Long Short-Term Memory) networks. For this purpose, we use the features extracted from the deep networks of the previous step. We conclude our study by evaluating the role of sampling rates on classification accuracy by generating two sets of 2D image representations – one with 32kHz and the other with 16kHz sampling. Experimental results show that after cross validation the accuracy of DL-based approaches is 15% higher than ML ones, with SqueezeNet yielding an accuracy of more than 91% at 32kHz, whether we use transfer learning, feature-level fusion or score-level fusion (92.5%). Our proposed method using LSTMs further increased that accuracy by more than 3%, resulting in an average accuracy of 95.7%

    Pattern Recognition

    Get PDF
    A wealth of advanced pattern recognition algorithms are emerging from the interdiscipline between technologies of effective visual features and the human-brain cognition process. Effective visual features are made possible through the rapid developments in appropriate sensor equipments, novel filter designs, and viable information processing architectures. While the understanding of human-brain cognition process broadens the way in which the computer can perform pattern recognition tasks. The present book is intended to collect representative researches around the globe focusing on low-level vision, filter design, features and image descriptors, data mining and analysis, and biologically inspired algorithms. The 27 chapters coved in this book disclose recent advances and new ideas in promoting the techniques, technology and applications of pattern recognition

    Shklovsky in the Cinema, 1926-1932

    Get PDF
    The following research project is grounded in the interrelated contexts of the Russian intelligentsia’s ambivalent engagement with post-revolutionary culture and cinema’s rise as an artistic medium and instrument of Russian cultural development. By examining Viktor Shklovsky’s earliest activities in the Soviet film industry, this project will explore how narrative, aesthetic, and ideological programmes were repeatedly and variously moulded, undermined, and complicated by the twentieth-century Russian avant-garde interest in dissolving creative boundaries between the domains of the ‘internal’ (embracing private, individual, and domestic concerns) and ‘external’ (their public, communal, and social counterparts) in a bid ‘to turn space outwards’ (vyvorachivat´ prostranstvo vovne). These critical enquiries will lend themselves to an investigation of how the behaviours of Shklovsky, his colleagues, and his artistic creations were affected by internal and/or external loci of control and how these activities were reconciled (if at all) in a society where the relationship between freedom and necessity was in a constant state of fluctuation. This research aims not simply to establish the extent and significance of Shklovsky’s influence on cinema as an individual, but rather to utilise his personal narrative for an assessment of the levels of interaction between theory and practice and between the verbal and the visual as integral to the intelligentsia movement. The project will investigate the part that Shklovsky played in conceptualising the boundaries, exchanges, and conflicts that arose between different artistic media and the critical institutions that developed around them, before considering how these relations changed as Soviet culture entered and emerged from the period of Cultural Revolution. In addition, an exploration of the effects of personal and professional tensions between different ideological groups will not only develop a better understanding of Shklovsky’s role in the cinema as theorist, critic, polemicist, screenwriter, and ‘creative administrator’, but will also help to establish the similarities and/or disparities between his film-works and contemporary cultural experience

    The historical relationship of musical form and the moving image in the current context of the digitisation of media

    No full text
    Contemporary developments in the medium of the moving picture, particularly in relation to the general digitisation of media, are bringing about substantial changes to long-held conceptions of both its theory and its practice. This thesis asserts that a significant factor in these, both historically and in terms of potential development, is the influence of musical form. Currently underappreciated, the strong interrelationship of musical form and film goes back to the very early days of cinema. The consideration of information in multidirectional form (mosaic; rhizomatic; database), rather than linearly, is directly relatable to concepts in seminal media-studies that equate multilinearity to the acoustic, and linearity to the visual (sound coming to us from all around, and vision from one direction only). The traditional role of music, the art of sound, as the quintessential expression of multiplicity, is an important subject for consideration in this context, and in terms of its ongoing formal relationship to the moving image. Taking a long historical view of this relationship, the research aims to provide a useful perspective on the 'pre-history' of current multimedia/intermedia, thereby indicating certain nascent directions of innovation
    corecore