19 research outputs found

    Acoustic Echo Cancellation for Human-Robot Communications

    Get PDF
    This master thesis presents a new efficient method of acoustic echo cancellation targeted at speech recognition for robots. The proposed algorithm features a new double-talk detector, an enhanced initialization and a new noise estimation method. The DTD algorithm is based on the normalized cross-correlation method, uses noise power estimation to be more robust in noisy environment and reacts more accurately to double-talk. The new initialization method switches between two different DTD algorithms to prevent problems during filter convergence. The simple, yet robust Geigel DTD is used during adaptive filter convergence, whereas the program switches to the newly developed DTD after convergence. Finally, the new noise estimation algorithm relies on the output auto-correlation to correctly estimate the noise. To improve speech recognition performance, center clipping is applied on the output of the echo canceler, to further remove the residual echo. White noise is also added to the output signal, in order to make the signal power more stable, which helps the speech recognition engine. Evaluation of the proposed algorithm has been done on a large set of sequences and results have shown that the new algorithm can increase the word recognition rate by up to 80%

    Pedestrian localization, tracking and behavior analysis from multiple cameras

    Get PDF
    Video surveillance is currently undergoing a rapid growth. However, while thousands of cameras are being installed in public places all over the world, computer programs that could reliably detect and track people in order to analyze their behavior are not yet operational. Challenges are numerous, ranging from low image quality, suboptimal scene lighting, changing appearances of pedestrians, occlusions with environment and between people, complex interacting trajectories in crowds, etc. In this thesis, we propose a complete approach for detecting and tracking an unknown number of interacting people from multiple cameras located at eye level. Our system works reliably in spite of significant occlusions and delivers metrically accurate trajectories for each tracked individual. Furthermore, we develop a method for representing the most common types of motion in a specific environment and learning them automatically from image data. We demonstrate that a generative model for detection can effectively handle occlusions in each time frame independently, even when the only data available comes from the output of a simple background subtraction algorithm and when the number of individuals is unknown a priori. We then advocate that multi-people tracking can be achieved by detecting people in individual frames and then linking detections across frames. We formulate the linking step as a problem of finding the most probable state of a hidden Markov process given the set of images and frame-independent detections. We first propose to solve this problem by optimizing trajectories independently with Dynamic Programming. In a second step, we reformulate the problem as a constrained flow optimization resulting in a convex problem that can be solved using standard Linear Programming techniques and is far simpler formally and algorithmically than existing techniques. We show that the particular structure of this framework lets us solve it equivalently using the k-shortest paths algorithm, which leads to a much faster optimization. Finally, we introduce a novel behavioral model to describe pedestrians motions, which is able to capture sophisticated motion patterns resulting from the mixture of different categories of random trajectories. Due to its simplicity, this model can be learned from video sequences in a totally unsupervised manner through an Expectation-Maximization procedure. We show that this behavior model can be used to make tracking systems more robust in ambiguous situations. Moreover, we demonstrate its ability to characterize and detect atypical individual motions

    Robust People Tracking with Global Trajectory Optimization

    Get PDF
    Given three or four synchronized videos taken at eye level and from different angles, we show that we can effectively use dynamic programming to accurately follow up to six individuals across thousands of frames in spite of significant occlusions. In addition, we also derive metrically accurate trajectories for each one of them. Our main contribution is to show that multi-person tracking can be reliably achieved by processing individual trajectories separately over long sequences, provided that a reasonable heuristic is used to rank these individuals and avoid confusing them with one another. In this way, we achieve robustness by finding optimal trajectories over many frames while avoiding the combinatorial explosion that would result from simultaneously dealing with all the individuals

    Principled Detection-by-Classification from Multiple Views

    Get PDF
    Machine-learning based classification techniques have been shown to be effective at detecting objects in complex scenes. However, the final results are often obtained from the alarms produced by the classifiers through a post-processing which typically relies on ad hoc heuristics. Spatially close alarms are assumed to be triggered by the same target and grouped together. Here we replace those heuristics by a principled Bayesian approach, which uses knowledge about both the classifier response model and the scene geometry to combine multiple classification answers. We demonstrate its effectiveness for multi-view pedestrian detection. We estimate the marginal probabilities of presence of people at any location in a scene, given the responses of classifiers evaluated in each view. Our approach naturally takes into account both the occlusions and the very low metric accuracy of the classifiers due to their invariance to translation and scale. Results show our method produces one order of magnitude fewer false positives than a method that is representative of typical state-of-the-art approaches. Moreover, the framework we propose is generic and could be applied to any detection-by-classification task

    Image-Based Mobile Service: Automatic Text Extraction and Translation

    Get PDF
    We present a new mobile service for the translation of text from images taken by consumer-grade cell-phone cameras. Such capability represents a new paradigm for users where a simple image provides the basis for a service. The ubiquity and ease of use of cell-phone cameras enables acquisition and transmission of images anywhere and at any time a user wishes, delivering rapid and accurate translation over the phone’s MMS and SMS facilities. Target text is extracted completely automatically, requiring no bounding box delineation or related user intervention. The service uses localization, binarization, text deskewing, and optical character recognition (OCR) in its analysis. Once the text is translated, an SMS message is sent to the user with the result. Further novelties include that no software installation is required on the handset, any service provider or camera phone can be used, and the entire service is implemented on the server side

    Cell Phones as Imaging Sensors

    Get PDF
    Camera phones are ubiquitous, and consumers have been adopting them faster than any other technology in modern history. When connected to a network, though, they are capable of more than just picture taking: Suddenly, they gain access to the power of the cloud. We exploit this capability by providing a series of image-based personal advisory services. These are designed to work with any handset over any cellular carrier using commonly available Multimedia Messaging Service (MMS) and Short Message Service (SMS) features. Targeted at the unsophisticated consumer, these applications must be quick and easy to use, not requiring download capabilities or preplanning. Thus, all application processing occurs in the back-end system (i.e., as a cloud service) and not on the handset itself. Presenting an image to an advisory service in the cloud, a user receives information that can be acted upon immediately. Two of our examples involve color assessment – selecting cosmetics and home décor paint palettes; the third provides the ability to extract text from a scene. In the case of the color imaging applications, we have shown that our service rivals the advice quality of experts. The result of this capability is a new paradigm for mobile interactions — image-based information services exploiting the ubiquity of camera phones

    Multi-Camera Tracking and Atypical Motion Detection with Behavioral Maps

    Get PDF
    Abstract. We introduce a novel behavioral model to describe pedestrians motions, which is able to capture sophisticated motion patterns resulting from the mixture of different categories of random trajectories. Due to its simplicity, this model can be learned from video sequences in a totally unsupervised manner through an Expectation-Maximization procedure. When integrated into a complete multi-camera tracking system, it improves the tracking performance in ambiguous situations, compared to a standard ad-hoc isotropic Markovian motion model. Moreover, it can be used to compute a score which characterizes atypical individual motions. Experiments on outdoor video sequences demonstrate both the improvement of tracking performance when compared to a state-of-the-art tracking system and the reliability of the atypical motion detection.

    Robust People Tracking with Global Trajectory Optimization

    No full text
    Given three or four synchronized videos taken at eye level and from different angles, we show that we can effectively use dynamic programming to accurately follow up to six individuals across thousands of frames in spite of significant occlusions. In addition, we also derive metrically accurate trajectories for each one of them

    Multiple Object Tracking using Flow Linear Programming

    Get PDF
    Multi-object tracking can be achieved by detecting objects in individual frames and then linking detections across frames. Such an approach can be made very robust to the occasional detection failure: If an object is not detected in a frame but is in previous and following ones, a correct trajectory will nevertheless be produced. By contrast, a false-positive detection in a few frames will be ignored. However, when dealing with a multiple target problem, the linking step results in a difficult optimization problem in the space of all possible families of trajectories. This is usually dealt with by sampling or greedy search based on variants of Dynamic Programming, which can easily miss the global optimum. In this paper, we show that reformulating that step as a constrained flow optimization problem results in a convex problem that can be solved using standard Linear Programming techniques. In addition, this new approach is far simpler formally and algorithmically than existing techniques and lets us demonstrate excellent performance in two very different contexts
    corecore