568 research outputs found

    Machine Analysis of Facial Expressions

    Get PDF
    No abstract

    Automated Multi-Modal Search and Rescue using Boosted Histogram of Oriented Gradients

    Get PDF
    Unmanned Aerial Vehicles (UAVs) provides a platform for many automated tasks and with an ever increasing advances in computing, these tasks can be more complex. The use of UAVs is expanded in this thesis with the goal of Search and Rescue (SAR), where a UAV can assist fast responders to search for a lost person and relay possible search areas back to SAR teams. To identify a person from an aerial perspective, low-level Histogram of Oriented Gradients (HOG) feature descriptors are used over a segmented region, provided from thermal data, to increase classification speed. This thesis also introduces a dataset to support a Bird’s-Eye-View (BEV) perspective and tests the viability of low level HOG feature descriptors on this dataset. The low-level feature descriptors are known as Boosted Histogram of Oriented Gradients (BHOG) features, which discretizes gradients over varying sized cells and blocks that are trained with a Cascaded Gentle AdaBoost Classifier using our compiled BEV dataset. The classification is supported by multiple sensing modes with color and thermal videos to increase classification speed. The thermal video is segmented to indicate any Region of Interest (ROI) that are mapped to the color video where classification occurs. The ROI decreases classification time needed for the aerial platform by eliminating a per-frame sliding window. Testing reveals that with the use of only color data iv and a classifier trained for a profile of a person, there is an average recall of 78%, while the thermal detection results with an average recall of 76%. However, there is a speed up of 2 with a video of 240x320 resolution. The BEV testing reveals that higher resolutions are favored with a recall rate of 71% using BHOG features, and 92% using Haar-Features. In the lower resolution BEV testing, the recall rates are 42% and 55%, for BHOG and Haar-Features, respectively

    The 9th Conference of PhD Students in Computer Science

    Get PDF

    Intelligent summarization of sports videos using automatic saliency detection

    Get PDF
    The aim of this thesis is to present an efficient and intelligent way of creating sports summary videos by automatically identifying the highlights or salient events from one or multiple video footage using computer vision techniques and combining them to form a video summary of the game. The thesis presents a twofold solution -Identification of salient parts from single or multiple video footage of a certain sports event. -Remixing of video by extracting and merging various segments, with effects (such as slow replay) and mixing audio. This project involves applying methods of machine learning and computer vision to identify regions of interest in the video frames and detect action areas and scoring attempts. These methods were developed for the sport of basketball. However, the methods may be tweaked or enhanced for other sports such as football, hockey etc. For creating summary videos, various video processing techniques have been experimented to add certain visual effects to improve the quality of summary videos. The goal has been to deliver a fully automated, fast and robust system that could work with large high definition video files

    Taking the bite out of automated naming of characters in TV video

    No full text
    We investigate the problem of automatically labelling appearances of characters in TV or film material with their names. This is tremendously challenging due to the huge variation in imaged appearance of each character and the weakness and ambiguity of available annotation. However, we demonstrate that high precision can be achieved by combining multiple sources of information, both visual and textual. The principal novelties that we introduce are: (i) automatic generation of time stamped character annotation by aligning subtitles and transcripts; (ii) strengthening the supervisory information by identifying when characters are speaking. In addition, we incorporate complementary cues of face matching and clothing matching to propose common annotations for face tracks, and consider choices of classifier which can potentially correct errors made in the automatic extraction of training data from the weak textual annotation. Results are presented on episodes of the TV series ‘‘Buffy the Vampire Slayer”

    Machine Analysis of Facial Expressions

    Get PDF

    Sonar image interpretation for sub-sea operations

    Get PDF
    Mine Counter-Measure (MCM) missions are conducted to neutralise underwater explosives. Automatic Target Recognition (ATR) assists operators by increasing the speed and accuracy of data review. ATR embedded on vehicles enables adaptive missions which increase the speed of data acquisition. This thesis addresses three challenges; the speed of data processing, robustness of ATR to environmental conditions and the large quantities of data required to train an algorithm. The main contribution of this thesis is a novel ATR algorithm. The algorithm uses features derived from the projection of 3D boxes to produce a set of 2D templates. The template responses are independent of grazing angle, range and target orientation. Integer skewed integral images, are derived to accelerate the calculation of the template responses. The algorithm is compared to the Haar cascade algorithm. For a single model of sonar and cylindrical targets the algorithm reduces the Probability of False Alarm (PFA) by 80% at a Probability of Detection (PD) of 85%. The algorithm is trained on target data from another model of sonar. The PD is only 6% lower even though no representative target data was used for training. The second major contribution is an adaptive ATR algorithm that uses local sea-floor characteristics to address the problem of ATR robustness with respect to the local environment. A dual-tree wavelet decomposition of the sea-floor and an Markov Random Field (MRF) based graph-cut algorithm is used to segment the terrain. A Neural Network (NN) is then trained to filter ATR results based on the local sea-floor context. It is shown, for the Haar Cascade algorithm, that the PFA can be reduced by 70% at a PD of 85%. Speed of data processing is addressed using novel pre-processing techniques. The standard three class MRF, for sonar image segmentation, is formulated using graph-cuts. Consequently, a 1.2 million pixel image is segmented in 1.2 seconds. Additionally, local estimation of class models is introduced to remove range dependent segmentation quality. Finally, an A* graph search is developed to remove the surface return, a line of saturated pixels often detected as false alarms by ATR. The A* search identifies the surface return in 199 of 220 images tested with a runtime of 2.1 seconds. The algorithm is robust to the presence of ripples and rocks

    Face recognition-based real-time system for surveillance

    Get PDF
    The ability to automatically recognize human faces based on dynamic facial images is important in security, surveillance and the health/independent living domains. Specific applications include access control to secure environments, identification of individuals at a particular place and intruder detection. This research proposes a real-time system for surveillance using cameras. The process is broken into two steps: (1) face detection and (2) face recognition to identify particular persons. For the first step, the system tracks and selects the faces of the detected persons. An efficient recognition algorithm is then used to recognize detected faces with a known database. The proposed approach exploits the Viola-Jones method for face detection, the Kanade-Lucas-Tomasi algorithm as a feature tracker and Principal Component Analysis (PCA) for face recognition. This system can be implemented at different restricted areas, such as at the office or house of a suspicious person or at the entrance of a sensitive installation. The system works almost perfectly under reasonable lighting conditions and image depths

    Video-based Pedestrian Intention Recognition and Path Prediction for Advanced Driver Assistance Systems

    Get PDF
    Fortgeschrittene Fahrerassistenzsysteme (FAS) spielen eine sehr wichtige Rolle in zukĂŒnftigen Fahrzeugen um die Sicherheit fĂŒr den Fahrer, der FahrgĂ€ste und ungeschĂŒtzte Verkehrsteilnehmer wie FußgĂ€nger und Radfahrer zu erhöhen. Diese Art von Systemen versucht in begrenztem Rahmen, ZusammenstĂ¶ĂŸe in gefĂ€hrlichen Situationen mit einem unaufmerksamen Fahrer und FußgĂ€nger durch das Auslösen einer automatischen Notbremsung zu vermeiden. Aufgrund der hohen VariabilitĂ€t an FußgĂ€ngerbewegungsmustern werden bestehende Systeme in einer konservativen Art und Weise konzipiert, um durch eine Restriktion auf beherrschbare Umgebungen mögliche Fehlauslöseraten drastisch zu reduzieren, wie z.B. in Szenarien in denen FußgĂ€nger plötzlich anhalten und dadurch die Situation deeskalieren. Um dieses Problem zu ĂŒberwinden, stellt eine zuverlĂ€ssige FußgĂ€ngerabsichtserkennung und Pfad\-vorhersage einen großen Wert dar. In dieser Arbeit wird die gesamte Ablaufkette eines Stereo-Video basierten Systems zur IntentionsschĂ€tzung und Pfadvorhersage von FußgĂ€ngern beschrieben, welches in einer spĂ€teren Funktionsentscheidung fĂŒr eine automatische Notbremsung verwendet wird. Im ersten von drei Hauptbestandteilen wird ein Echtzeit-Verfahren vorgeschlagen, das in niedrig aufgelösten Bildern aus komplexen und hoch dynamischen Innerstadt-Szenarien versucht, die Köpfe von FußgĂ€ngern zu lokalisieren und deren Pose zu schĂ€tzen. Einzelbild-basierte SchĂ€tzungen werden aus den Wahrscheinlichkeitsausgaben von acht angelernten Kopfposen-spezifischen Detektoren abgeleitet, die im Bildbereich eines FußgĂ€ngerkandidaten angewendet werden. Weitere Robustheit in der Kopflokalisierung wird durch Hinzunahme von Stereo-Tiefeninformation erreicht. DarĂŒber hinaus werden die Kopfpositionen und deren Pose ĂŒber die Zeit durch die Implementierung eines Partikelfilters geglĂ€ttet. FĂŒr die IntentionsschĂ€tzung von FußgĂ€ngern wird die Verwendung eines robusten und leistungsstarken Ansatzes des Maschinellen Lernens in unterschiedlichen Szenarien untersucht. Dieser Ansatz ist in der Lage, fĂŒr Zeitreihen von Beobachtungen, die inneren Unterstrukturen einer bestimmten Absichtsklasse zu modellieren und zusĂ€tzlich die extrinsische Dynamik zwischen unterschiedlichen Absichtsklassen zu erfassen. Das Verfahren integriert bedeutsame extrahierte Merkmale aus der FußgĂ€ngerdynamik sowie Kontextinformationen mithilfe der menschlichen Kopfpose. Zum Schluss wird ein Verfahren zur Pfadvorhersage vorgestellt, welches die PrĂ€diktionsschritte eines Filters fĂŒr multiple Bewegungsmodelle fĂŒr einen Zeithorizont von ungefĂ€hr einer Sekunde durch Einbeziehung der geschĂ€tzten FußgĂ€ngerabsichten steuert. Durch Hilfestellungen fĂŒr den Filter das geeignete Bewegungsmodell zu wĂ€hlen, kann der resultierende PfadprĂ€diktionsfehler um ein signifikantes Maß reduziert werden. Eine Vielzahl von Szenarien wird behandelt, einschließlich seitlich querender oder anhaltender FußgĂ€nger oder Personen, die zunĂ€chst entlang des BĂŒrgersteigs gehen aber dann plötzlich in Richtung der Fahrbahn einbiegen
    • 

    corecore