76 research outputs found

    Π-Avida -A Personalized Interactive Audio and Video Portal

    Get PDF
    Abstract We describe a system for enregistering, storing and distributing multimedia data streams. For each modality -audio, speech, video -characteristic features are extracted and used to classify the content into a range of topic categories. Using data mining techniques classifier models are determined from training data. These models are able to assign existing and new multimedia documents to one or several topic categories. We describe the features used as inputs for these classifiers. We demonstrate that the classification of audio material may be improved by using phonemes and syllables instead of words. Finally we show that the categorization performance mainly depends on the quality of speech recognition and that the simple video features we tested are of only marginal utility

    Automatic Analysis of Image Sequences Using Statistical Methods for Pattern Recognition

    No full text
    In this thesis new methods for the automatic recognition of the content of image sequences are presented. Solutions to the following video sequences analysis tasks are developed: temporal decomposition of an image sequence into scenes and classification of the scenes, and the recognition of people and their movements in the image sequence. The temporal segmentation of a image sequence and the classification of the segments can be used for image sequences with a given content structure, like broadcast news. The image sequences have a defined chronology of scenes, which belong to certain content classes. The content classes and their chronology are represented by nested Hidden Markov models during the recognition. Another application of the Hidden Markov Modells are the classification of movements of objects in the image sequence. The recognition of human gestures for the application of human-computer-interaction is investigated. The recognition system is capable of recognizing a set of pre-defined gestures that are performed in the viewing area of a camera. The system is able to identify undefined movements and can distinguish them from the gestures. The final task is recognizing people visible in image sequences. The recognition of the people is done by recognizing their faces. The indexing of the faces is composed of the two sub-tasks: detection of the faces and recognition of the faces. It is shown that the face-based video indexing can be used to find known persons in the image sequence as well as to group the people in the sequence unsupervised

    Fast and accurate vanishing point estimation on structured roads

    No full text
    We propose a method for estimating the vanishing point of structured roads directly in the image plane using the parallel nature of road markings as well as intelligent preprocessing and data reduction steps. The resulting vanishing point enables estimating the image to world projection, which then is used to perform subsequent tasks such as object detection. The major advantages of the proposed method are modest computational requirements as well as independence of the used camera model and without a calibration phase

    Content-Based Video Indexing Of TV Broadcast News Using Hidden Markov Models

    No full text
    This paper presents a new approach to content-based video indexing using Hidden Markov Models (HMMs). In this approach one feature vector is calculated for each image of the video sequence. These feature vectors are modeled and classified using HMMs. This approach has many advantages compared to other video indexing approaches. The system has automatic learning capabilities. It is trained by presenting manually indexed video sequences. To improve the system we use a video model, that allows the classification of complex video sequences. The presented approach works three times faster than real-time. We tested our system on TV broadcast news. The rate of 97.3 % correctly classified frames shows the efficiency of our system

    Logical structure recognition for heterogeneous periodical collections

    No full text
    This work introduces a practical method for performing logical layout analysis on heterogeneous periodical collections. The described module is incorporated into the Fraunhofer document image understanding system and has been successfully used as part of mass digitization projects on more than 500 000 scanned pages. Our primary target are documents with complex layouts such as newspapers, however the described methods can easily be adapted to non-periodical publications. While encouraging, experimental results obtained on a heterogeneous set of digitized newspaper and chronicle pages spanning about 70 years reflect the high complexity of the generic, automated layout analysis problem. Our results allow the identification of promising areas for future investigation and provide a baseline for current in-the-wild document logical structure recognition

    Enhancing Railway Detection by Priming Neural Networks with Project Exaptations

    No full text
    When integrating railway constructions and refurbishments into an existing infrastructure, it is beneficial to have knowledge of the exact state, geometry, and placement of the connected assets. While new constructions and the maintenance of existing lines can directly use existing digital models and incorporate them into their processes, existing railways often predate digital technologies. This gap in digital models leaves the planning processes of new constructions and refurbishments to primarily rely on non-automated and analogue workflows. With a multitude of asset types, layouts and country-specific standards, the automatic generation of adequate detection models is complicated and needs to be tailored to the current project environment, generating considerable overhead. Addressing this issue, this paper presents the concept of priming. Priming increases the adaptation performance to highly volatile, low-data environments by leveraging previous, existing CAD projects. We introduce a translation scheme that converts the existing 3D models into realistic, project-specific, synthetic surveys and a complemental dialled-in training routine. When applied to a convolutional neural network, we show that the primed training will converge faster and with greater stability, especially when using sparse training data. Our experiments show that priming can reduce the time for network adaptation by over 50%, while also improving resilience to underrepresented object types
    • …
    corecore