Search CORE

568 research outputs found

Machine Analysis of Facial Expressions

Author: Bartlett M.S.
Pantic M.
Publication venue: I-Tech Education and Publishing
Publication date: 01/01/2007
Field of study

No abstract

IntechOpen

CiteSeerX

Crossref

University of Twente Research Information

Automated Multi-Modal Search and Rescue using Boosted Histogram of Oriented Gradients

Author: Lienemann Matthew A
Publication venue: DigitalCommons@CalPoly
Publication date: 01/12/2015
Field of study

Unmanned Aerial Vehicles (UAVs) provides a platform for many automated tasks and with an ever increasing advances in computing, these tasks can be more complex. The use of UAVs is expanded in this thesis with the goal of Search and Rescue (SAR), where a UAV can assist fast responders to search for a lost person and relay possible search areas back to SAR teams. To identify a person from an aerial perspective, low-level Histogram of Oriented Gradients (HOG) feature descriptors are used over a segmented region, provided from thermal data, to increase classification speed. This thesis also introduces a dataset to support a Bird’s-Eye-View (BEV) perspective and tests the viability of low level HOG feature descriptors on this dataset. The low-level feature descriptors are known as Boosted Histogram of Oriented Gradients (BHOG) features, which discretizes gradients over varying sized cells and blocks that are trained with a Cascaded Gentle AdaBoost Classifier using our compiled BEV dataset. The classification is supported by multiple sensing modes with color and thermal videos to increase classification speed. The thermal video is segmented to indicate any Region of Interest (ROI) that are mapped to the color video where classification occurs. The ROI decreases classification time needed for the aerial platform by eliminating a per-frame sliding window. Testing reveals that with the use of only color data iv and a classifier trained for a profile of a person, there is an average recall of 78%, while the thermal detection results with an average recall of 76%. However, there is a speed up of 2 with a video of 240x320 resolution. The BEV testing reveals that higher resolutions are favored with a recall rate of 71% using BHOG features, and 92% using Haar-Features. In the lower resolution BEV testing, the recall rates are 42% and 55%, for BHOG and Haar-Features, respectively

DigitalCommons@CalPoly

The 9th Conference of PhD Students in Computer Science

Author
Publication venue
Publication date: 01/01/2014
Field of study

University of Szeged

Intelligent summarization of sports videos using automatic saliency detection

Author: Shetty Ranjeeth Ravindra
Publication venue
Publication date: 09/09/2015
Field of study

The aim of this thesis is to present an efficient and intelligent way of creating sports summary videos by automatically identifying the highlights or salient events from one or multiple video footage using computer vision techniques and combining them to form a video summary of the game. The thesis presents a twofold solution -Identification of salient parts from single or multiple video footage of a certain sports event. -Remixing of video by extracting and merging various segments, with effects (such as slow replay) and mixing audio. This project involves applying methods of machine learning and computer vision to identify regions of interest in the video frames and detect action areas and scoring attempts. These methods were developed for the sport of basketball. However, the methods may be tweaked or enhanced for other sports such as football, hockey etc. For creating summary videos, various video processing techniques have been experimented to add certain visual effects to improve the quality of summary videos. The goal has been to deliver a fully automated, fast and robust system that could work with large high definition video files

Trepo - Institutional Repository of Tampere University

Taking the bite out of automated naming of characters in TV video

Author: Everingham M.
Sivic J.
Zisserman A.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

We investigate the problem of automatically labelling appearances of characters in TV or film material with their names. This is tremendously challenging due to the huge variation in imaged appearance of each character and the weakness and ambiguity of available annotation. However, we demonstrate that high precision can be achieved by combining multiple sources of information, both visual and textual. The principal novelties that we introduce are: (i) automatic generation of time stamped character annotation by aligning subtitles and transcripts; (ii) strengthening the supervisory information by identifying when characters are speaking. In addition, we incorporate complementary cues of face matching and clothing matching to propose common annotations for face tracks, and consider choices of classifier which can potentially correct errors made in the automatic extraction of training data from the weak textual annotation. Results are presented on episodes of the TV series ‘‘Buffy the Vampire Slayer”

Oxford University Research Archive

White Rose Research Online

Machine Analysis of Facial Expressions

Author: Bartlett M.S.
Pantic Maja
Publication venue: I-Tech Education and Publishing
Publication date: 01/07/2007
Field of study

University of Twente Research Information

Sonar image interpretation for sub-sea operations

Author: Daniell Oliver James
Publication venue: Engineering and Physical Sciences
Publication date: 01/10/2015
Field of study

Mine Counter-Measure (MCM) missions are conducted to neutralise underwater explosives. Automatic Target Recognition (ATR) assists operators by increasing the speed and accuracy of data review. ATR embedded on vehicles enables adaptive missions which increase the speed of data acquisition. This thesis addresses three challenges; the speed of data processing, robustness of ATR to environmental conditions and the large quantities of data required to train an algorithm. The main contribution of this thesis is a novel ATR algorithm. The algorithm uses features derived from the projection of 3D boxes to produce a set of 2D templates. The template responses are independent of grazing angle, range and target orientation. Integer skewed integral images, are derived to accelerate the calculation of the template responses. The algorithm is compared to the Haar cascade algorithm. For a single model of sonar and cylindrical targets the algorithm reduces the Probability of False Alarm (PFA) by 80% at a Probability of Detection (PD) of 85%. The algorithm is trained on target data from another model of sonar. The PD is only 6% lower even though no representative target data was used for training. The second major contribution is an adaptive ATR algorithm that uses local sea-floor characteristics to address the problem of ATR robustness with respect to the local environment. A dual-tree wavelet decomposition of the sea-floor and an Markov Random Field (MRF) based graph-cut algorithm is used to segment the terrain. A Neural Network (NN) is then trained to filter ATR results based on the local sea-floor context. It is shown, for the Haar Cascade algorithm, that the PFA can be reduced by 70% at a PD of 85%. Speed of data processing is addressed using novel pre-processing techniques. The standard three class MRF, for sonar image segmentation, is formulated using graph-cuts. Consequently, a 1.2 million pixel image is segmented in 1.2 seconds. Additionally, local estimation of class models is introduced to remove range dependent segmentation quality. Finally, an A* graph search is developed to remove the surface return, a line of saturated pixels often detected as false alarms by ATR. The A* search identifies the surface return in 199 of 220 images tested with a runtime of 2.1 seconds. The algorithm is robust to the presence of ripples and rocks

ROS: The Research Output Service. Heriot-Watt University Edinburgh

Face recognition-based real-time system for surveillance

Author: Habib Md. Mahmudul
Mahdi Fahad Parvez
McKeever Susan
Moslehuddin A.S.M.
Vasant Pandian
Publication venue: Technological University Dublin
Publication date: 01/01/2016
Field of study

The ability to automatically recognize human faces based on dynamic facial images is important in security, surveillance and the health/independent living domains. Specific applications include access control to secure environments, identification of individuals at a particular place and intruder detection. This research proposes a real-time system for surveillance using cameras. The process is broken into two steps: (1) face detection and (2) face recognition to identify particular persons. For the first step, the system tracks and selects the faces of the detected persons. An efficient recognition algorithm is then used to recognize detected faces with a known database. The proposed approach exploits the Viola-Jones method for face detection, the Kanade-Lucas-Tomasi algorithm as a feature tracker and Principal Component Analysis (PCA) for face recognition. This system can be implemented at different restricted areas, such as at the office or house of a suspicious person or at the entrance of a sensitive installation. The system works almost perfectly under reasonable lighting conditions and image depths

Arrow@TUDublin

Video-based Pedestrian Intention Recognition and Path Prediction for Advanced Driver Assistance Systems

Author: Schulz Andreas
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2017
Field of study

Fortgeschrittene Fahrerassistenzsysteme (FAS) spielen eine sehr wichtige Rolle in zukünftigen Fahrzeugen um die Sicherheit für den Fahrer, der Fahrgäste und ungeschützte Verkehrsteilnehmer wie Fußgänger und Radfahrer zu erhöhen. Diese Art von Systemen versucht in begrenztem Rahmen, Zusammenstöße in gefährlichen Situationen mit einem unaufmerksamen Fahrer und Fußgänger durch das Auslösen einer automatischen Notbremsung zu vermeiden. Aufgrund der hohen Variabilität an Fußgängerbewegungsmustern werden bestehende Systeme in einer konservativen Art und Weise konzipiert, um durch eine Restriktion auf beherrschbare Umgebungen mögliche Fehlauslöseraten drastisch zu reduzieren, wie z.B. in Szenarien in denen Fußgänger plötzlich anhalten und dadurch die Situation deeskalieren. Um dieses Problem zu überwinden, stellt eine zuverlässige Fußgängerabsichtserkennung und Pfad\-vorhersage einen großen Wert dar. In dieser Arbeit wird die gesamte Ablaufkette eines Stereo-Video basierten Systems zur Intentionsschätzung und Pfadvorhersage von Fußgängern beschrieben, welches in einer späteren Funktionsentscheidung für eine automatische Notbremsung verwendet wird. Im ersten von drei Hauptbestandteilen wird ein Echtzeit-Verfahren vorgeschlagen, das in niedrig aufgelösten Bildern aus komplexen und hoch dynamischen Innerstadt-Szenarien versucht, die Köpfe von Fußgängern zu lokalisieren und deren Pose zu schätzen. Einzelbild-basierte Schätzungen werden aus den Wahrscheinlichkeitsausgaben von acht angelernten Kopfposen-spezifischen Detektoren abgeleitet, die im Bildbereich eines Fußgängerkandidaten angewendet werden. Weitere Robustheit in der Kopflokalisierung wird durch Hinzunahme von Stereo-Tiefeninformation erreicht. Darüber hinaus werden die Kopfpositionen und deren Pose über die Zeit durch die Implementierung eines Partikelfilters geglättet. Für die Intentionsschätzung von Fußgängern wird die Verwendung eines robusten und leistungsstarken Ansatzes des Maschinellen Lernens in unterschiedlichen Szenarien untersucht. Dieser Ansatz ist in der Lage, für Zeitreihen von Beobachtungen, die inneren Unterstrukturen einer bestimmten Absichtsklasse zu modellieren und zusätzlich die extrinsische Dynamik zwischen unterschiedlichen Absichtsklassen zu erfassen. Das Verfahren integriert bedeutsame extrahierte Merkmale aus der Fußgängerdynamik sowie Kontextinformationen mithilfe der menschlichen Kopfpose. Zum Schluss wird ein Verfahren zur Pfadvorhersage vorgestellt, welches die Prädiktionsschritte eines Filters für multiple Bewegungsmodelle für einen Zeithorizont von ungefähr einer Sekunde durch Einbeziehung der geschätzten Fußgängerabsichten steuert. Durch Hilfestellungen für den Filter das geeignete Bewegungsmodell zu wählen, kann der resultierende Pfadprädiktionsfehler um ein signifikantes Maß reduziert werden. Eine Vielzahl von Szenarien wird behandelt, einschließlich seitlich querender oder anhaltender Fußgänger oder Personen, die zunächst entlang des Bürgersteigs gehen aber dann plötzlich in Richtung der Fahrbahn einbiegen

KITopen