421 research outputs found

    Prediction and Tracking of Moving Objects in Image Sequences

    Get PDF
    We employ a prediction model for moving object velocity and location estimation derived from Bayesian theory. The optical flow of a certain moving object depends on the history of its previous values. A joint optical flow estimation and moving object segmentation algorithm is used for the initialization of the tracking algorithm. The segmentation of the moving objects is determined by appropriately classifying the unlabeled and the occluding regions. Segmentation and optical flow tracking is used for predicting future frames

    Binary morphological shape-based interpolation applied to 3-D tooth reconstruction

    Get PDF
    In this paper we propose an interpolation algorithm using a mathematical morphology morphing approach. The aim of this algorithm is to reconstruct the nn-dimensional object from a group of (n-1)-dimensional sets representing sections of that object. The morphing transformation modifies pairs of consecutive sets such that they approach in shape and size. The interpolated set is achieved when the two consecutive sets are made idempotent by the morphing transformation. We prove the convergence of the morphological morphing. The entire object is modeled by successively interpolating a certain number of intermediary sets between each two consecutive given sets. We apply the interpolation algorithm for 3-D tooth reconstruction

    Multimodal decision-level fusion for person authentication

    Get PDF
    In this paper, the use of clustering algorithms for decision-level data fusion is proposed. Person authentication results coming from several modalities (e.g., still image, speech), are combined by using fuzzy k-means (FKM), fuzzy vector quantization (FVQ) algorithms, and median radial basis function (MRBF) network. The quality measure of the modalities data is used for fuzzification. Two modifications of the FKM and FVQ algorithms, based on a novel fuzzy vector distance definition, are proposed to handle the fuzzy data and utilize the quality measure. Simulations show that fuzzy clustering algorithms have better performance compared to the classical clustering algorithms and other known fusion algorithms. MRBF has better performance especially when two modalities are combined. Moreover, the use of the quality via the proposed modified algorithms increases the performance of the fusion system

    A framework for dialogue detection in movies

    No full text
    In this paper, we investigate a novel framework for dialogue detection that is based on indicator functions. An indicator function defines that a particular actor is present at each time instant. Two dialogue detection rules are developed and assessed. The first rule relies on the value of the cross-correlation function at zero time lag that is compared to a threshold. The second rule is based on the cross-power in a particular frequency band that is also compared to a threshold. Experiments are carried out in order to validate the feasibility of the aforementioned dialogue detection rules by using ground-truth indicator functions determined by human observers from six different movies. A total of 25 dialogue scenes and another 8 non-dialogue scenes are employed. The probabilities of false alarm and detection are estimated by cross-validation, where 70% of the available scenes are used to learn the thresholds employed in the dialogue detection rules and the remaining 30% of the scenes are used for testing. An almost perfect dialogue detection is reported for every distinct threshold. © Springer-Verlag Berlin Heidelberg 2006

    Projection distortion analysis for flattened image mosaicing from straight uniform generalized cylinders

    Get PDF
    This paper presents a new approach for reconstructing images mapped or painted on straight uniform generalized cylinders (SUGC). A set of monocular images is taken from different viewpoints in order to be mosaiced and to represent the entire scene in detail. The expressions of the SUGC's projection axis are derived from two cross-sections projected onto the image plane. Based on these axes we derive the SUGC localization in the camera coordinate system. We explain how we can find a virtual image representation when the intersection of the two axes is matched to the image center. We analyze the perspective distortions when flattening a scene which is mapped on a SUGC. We evaluate the lower and the upper bounds of the necessary number of views in order to represent the entire scene from a SUGC, by considering the distortions produced by perspective projection. A region matching based mosaicing method is proposed to be applied on the flattened images in order to obtain the complete scene. The mosaiced scene is visualized on a new synthetic surface by a mapping procedure. The proposed algorithm is used for the representation of mural paintings located on SUGCs with closed cross-sections (circles for columns), or opened cross-sections (ellipses or parabolas for vaults). (C) 2001 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved

    Knowledge Distillation-driven Communication Framework for Neural Networks: Enabling Efficient Student-Teacher Interactions

    Get PDF
    This paper presents a novel framework for facilitating communication and knowledge exchange among neural networks, leveraging the roles of both students and teachers. In our proposed framework, each node represents a neural network, capable of acting as either a student or a teacher. When new data is introduced and a network has not been trained on it, the node assumes the role of a student, initiating a communication process. The student node communicates with potential teachers, identifying those networks that have already been trained on the incoming data. Subsequently, the student node employs knowledge distillation techniques to learn from the teachers and gain insights from their accumulated knowledge. This approach enables efficient and effective knowledge transfer within the neural network ecosystem, enhancing learning capabilities and fostering collaboration among diverse networks. Experimental results demonstrate the efficacy of our framework in improving overall network performance and knowledge utilization

    Audio-assisted movie dialogue detection

    Get PDF
    An audio-assisted system is investigated that detects if a movie scene is a dialogue or not. The system is based on actor indicator functions. That is, functions which define if an actor speaks at a certain time instant. In particular, the crosscorrelation and the magnitude of the corresponding the crosspower spectral density of a pair of indicator functions are input to various classifiers, such as voted perceptrons, radial basis function networks, random trees, and support vector machines for dialogue/non-dialogue detection. To boost classifier efficiency AdaBoost is also exploited. The aforementioned classifiers are trained using ground truth indicator functions determined by human annotators for 41 dialogue and another 20 non-dialogue audio instances. For testing, actual indicator functions are derived by applying audio activity detection and actor clustering to audio recordings. 23 instances are randomly chosen among the aforementioned 41 dialogue instances, 17 of which correspond to dialogue scenes and 6 to non-dialogue ones. Accuracy ranging between 0.739 and 0.826 is reported

    Audio-assisted movie dialogue detection

    Get PDF
    An audio-assisted system is investigated that detects if a movie scene is a dialogue or not. The system is based on actor indicator functions. That is, functions which define if an actor speaks at a certain time instant. In particular, the cross-correlation and the magnitude of the corresponding the cross-power spectral density of a pair of indicator functions are input to various classifiers, such as voted perceptions, radial basis function networks, random trees, and support vector machines for dialogue/non-dialogue detection. To boost classifier efficiency AdaBoost is also exploited. The aforementioned classifiers are trained using ground truth indicator functions determined by human annotators for 41 dialogue and another 20 non-dialogue audio instances. For testing, actual indicator functions are derived by applying audio activity detection and actor clustering to audio recordings. 23 instances are randomly chosen among the aforementioned 41 dialogue instances, 17 of which correspond to dialogue scenes and 6 to non-dialogue ones. Accuracy ranging between 0.739 and 0.826 is reported. © 2008 IEEE
    corecore