79 research outputs found

    End-to-End Multiview Gesture Recognition for Autonomous Car Parking System

    Get PDF
    The use of hand gestures can be the most intuitive human-machine interaction medium. The early approaches for hand gesture recognition used device-based methods. These methods use mechanical or optical sensors attached to a glove or markers, which hinders the natural human-machine communication. On the other hand, vision-based methods are not restrictive and allow for a more spontaneous communication without the need of an intermediary between human and machine. Therefore, vision gesture recognition has been a popular area of research for the past thirty years. Hand gesture recognition finds its application in many areas, particularly the automotive industry where advanced automotive human-machine interface (HMI) designers are using gesture recognition to improve driver and vehicle safety. However, technology advances go beyond active/passive safety and into convenience and comfort. In this context, one of America’s big three automakers has partnered with the Centre of Pattern Analysis and Machine Intelligence (CPAMI) at the University of Waterloo to investigate expanding their product segment through machine learning to provide an increased driver convenience and comfort with the particular application of hand gesture recognition for autonomous car parking. In this thesis, we leverage the state-of-the-art deep learning and optimization techniques to develop a vision-based multiview dynamic hand gesture recognizer for self-parking system. We propose a 3DCNN gesture model architecture that we train on a publicly available hand gesture database. We apply transfer learning methods to fine-tune the pre-trained gesture model on a custom-made data, which significantly improved the proposed system performance in real world environment. We adapt the architecture of the end-to-end solution to expand the state of the art video classifier from a single image as input (fed by monocular camera) to a multiview 360 feed, offered by a six cameras module. Finally, we optimize the proposed solution to work on a limited resources embedded platform (Nvidia Jetson TX2) that is used by automakers for vehicle-based features, without sacrificing the accuracy robustness and real time functionality of the system

    Towards pedestrian-aware autonomous cars

    Get PDF

    Towards pedestrian-aware autonomous cars

    Get PDF

    A Steering Wheel Mounted Grip Sensor: Design, Development and Evaluation

    Get PDF
    Department of Human Factors EngineeringDriving is a commonplace but safety critical daily activity for billions of people. It remains one of the leading causes of death worldwide, particularly in younger adults. In the last decades, a wide range of technologies, such as intelligent braking or speed regulating systems, have been integrated into vehicles to improve safetyannually decreasing death rates testify to their success. A recent research focus in this area has been in the development of systems that sense human states or activities during driving. This is valuable because human error remains a key reason underlying many vehicle accidents and incidents. Technologies that can intervene in response to information sensed about a driver may be able to detect, predict and ultimately prevent problems before they progress into accidents, thus avoiding the occurrence of critical situations rather than just mitigating their consequences. Commercial examples of this kind of technology include systems that monitor driver alertness or lane holding and prompt drivers who are sleepy or drifting off-lane. More exploratory research in this area has sought to capture emotional state or stress/workload levels via physiological measurements of Heart Rate Variability (HRV), Electrocardiogram (ECG) and Electroencephalogram (EEG), or behavioral measurements of eye gaze or face pose. Other research has monitored explicitly user actions, such as head pose or foot movements to infer intended actions (such as overtaking or lane change) and provide automatic assessments of the safety of these future behaviors ??? for example, providing a timely warning to a driver who is planning to overtake about a vehicle in his or her blind spot. Researchers have also explored how sensing hands on the wheel can be used to infer a driver???s presence, identity or emotional state. This thesis extends this body of work through the design, development and evaluation of a steering wheel sensor platform that can directly detect a driver???s hand pose all around a steering wheel. This thesis argues that full steering hand pose is a potentially rich source of information about a driver???s intended actions. For example, it proposes a link between hand posture on the wheel and subsequent turning or lane change behavior. To explore this idea, this thesis describes the construction of a touch sensor in the form of a steering wheel cover. This cover integrates 32 equidistantly spread touch sensing electrodes (11.250 inter-sensor spacing) in the form of conductive ribbons (0.2" wide and 0.03" thick). Data from each ribbons is captured separately via a set of capacitive touch sensor microcontrollers every 64 ms. We connected this hardware platform to an OpenDS, an open source driving simulator and ran two studies capturing hand pose during a sequential lane change task and a slalom task. We analyzed the data to determine whether hand pose is a useful predictor of future turning behavior. For this we classified a 5-lane road into 4 turn sizes and used machine-learning recognizers to predict the future turn size from the change in hand posture in terms of hand movement properties from the early driving data. Driving task scenario of the first experiment was not appropriately matched with the real life turning task therefore we modified the scenario with more appropriate task in the second experiments. Class-wise prediction of the turn sizes for both experiments didn???t show good accuracy, however prediction accuracy was improved when the classes were reduced into two classes from four classes. In the experiment 2 turn sizes were overlapped between themselves, which made it very difficult to distinguish them. Therefore, we did continuous prediction as well and the prediction accuracy was better than the class-wise prediction system for the both experiments. In summary, this thesis designed, developed and evaluated a combined hardware and software system that senses the steering behavior of a driver by capturing grip pose. We assessed the value of this information via two studies that explored the relationship between wheel grip and future turning behaviors. The ultimate outcome of this study can inform the development of in car sensing systems to support safer driving.ope

    Articulatory Copy Synthesis Based on the Speech Synthesizer VocalTractLab

    Get PDF
    Articulatory copy synthesis (ACS), a subarea of speech inversion, refers to the reproduction of natural utterances and involves both the physiological articulatory processes and their corresponding acoustic results. This thesis proposes two novel methods for the ACS of human speech using the articulatory speech synthesizer VocalTractLab (VTL) to address or mitigate the existing problems of speech inversion, such as non-unique mapping, acoustic variation among different speakers, and the time-consuming nature of the process. The first method involved finding appropriate VTL gestural scores for given natural utterances using a genetic algorithm. It consisted of two steps: gestural score initialization and optimization. In the first step, gestural scores were initialized using the given acoustic signals with speech recognition, grapheme-to-phoneme (G2P), and a VTL rule-based method for converting phoneme sequences to gestural scores. In the second step, the initial gestural scores were optimized by a genetic algorithm via an analysis-by-synthesis (ABS) procedure that sought to minimize the cosine distance between the acoustic features of the synthetic and natural utterances. The articulatory parameters were also regularized during the optimization process to restrict them to reasonable values. The second method was based on long short-term memory (LSTM) and convolutional neural networks, which were responsible for capturing the temporal dependence and the spatial structure of the acoustic features, respectively. The neural network regression models were trained, which used acoustic features as inputs and produced articulatory trajectories as outputs. In addition, to cover as much of the articulatory and acoustic space as possible, the training samples were augmented by manipulating the phonation type, speaking effort, and the vocal tract length of the synthetic utterances. Furthermore, two regularization methods were proposed: one based on the smoothness loss of articulatory trajectories and another based on the acoustic loss between original and predicted acoustic features. The best-performing genetic algorithms and convolutional LSTM systems (evaluated in terms of the difference between the estimated and reference VTL articulatory parameters) obtained average correlation coefficients of 0.985 and 0.983 for speaker-dependent utterances, respectively, and their reproduced speech achieved recognition accuracies of 86.25% and 64.69% for speaker-independent utterances of German words, respectively. When applied to German sentence utterances, as well as English and Mandarin Chinese word utterances, the neural network based ACS systems achieved recognition accuracies of 73.88%, 52.92%, and 52.41%, respectively. The results showed that both of these methods not only reproduced the articulatory processes but also reproduced the acoustic signals of reference utterances. Moreover, the regularization methods led to more physiologically plausible articulatory processes and made the estimated articulatory trajectories be more articulatorily preferred by VTL, thus reproducing more natural and intelligible speech. This study also found that the convolutional layers, when used in conjunction with batch normalization layers, automatically learned more distinctive features from log power spectrograms. Furthermore, the neural network based ACS systems trained using German data could be generalized to the utterances of other languages

    System Abstractions for Scalable Application Development at the Edge

    Get PDF
    Recent years have witnessed an explosive growth of Internet of Things (IoT) devices, which collect or generate huge amounts of data. Given diverse device capabilities and application requirements, data processing takes place across a range of settings, from on-device to a nearby edge server/cloud and remote cloud. Consequently, edge-cloud coordination has been studied extensively from the perspectives of job placement, scheduling and joint optimization. Typical approaches focus on performance optimization for individual applications. This often requires domain knowledge of the applications, but also leads to application-specific solutions. Application development and deployment over diverse scenarios thus incur repetitive manual efforts. There are two overarching challenges to provide system-level support for application development at the edge. First, there is inherent heterogeneity at the device hardware level. The execution settings may range from a small cluster as an edge cloud to on-device inference on embedded devices, differing in hardware capability and programming environments. Further, application performance requirements vary significantly, making it even more difficult to map different applications to already heterogeneous hardware. Second, there are trends towards incorporating edge and cloud and multi-modal data. Together, these add further dimensions to the design space and increase the complexity significantly. In this thesis, we propose a novel framework to simplify application development and deployment over a continuum of edge to cloud. Our framework provides key connections between different dimensions of design considerations, corresponding to the application abstraction, data abstraction and resource management abstraction respectively. First, our framework masks hardware heterogeneity with abstract resource types through containerization, and abstracts away the application processing pipelines into generic flow graphs. Further, our framework further supports a notion of degradable computing for application scenarios at the edge that are driven by multimodal sensory input. Next, as video analytics is the killer app of edge computing, we include a generic data management service between video query systems and a video store to organize video data at the edge. We propose a video data unit abstraction based on a notion of distance between objects in the video, quantifying the semantic similarity among video data. Last, considering concurrent application execution, our framework supports multi-application offloading with device-centric control, with a userspace scheduler service that wraps over the operating system scheduler

    Automotive user interfaces for the support of non-driving-related activities

    Get PDF
    Driving a car has changed a lot since the first car was invented. Today, drivers do not only maneuver the car to their destination but also perform a multitude of additional activities in the car. This includes for instance activities related to assistive functions that are meant to increase driving safety and reduce the driver’s workload. However, since drivers spend a considerable amount of time in the car, they often want to perform non-driving-related activities as well. In particular, these activities are related to entertainment, communication, and productivity. The driver’s need for such activities has vastly increased, particularly due to the success of smart phones and other mobile devices. As long as the driver is in charge of performing the actual driving task, such activities can distract the driver and may result in severe accidents. Due to these special requirements of the driving environment, the driver ideally performs such activities by using appropriately designed in-vehicle systems. The challenge for such systems is to enable flexible and easily usable non-driving-related activities while maintaining and increasing driving safety at the same time. The main contribution of this thesis is a set of guidelines and exemplary concepts for automotive user interfaces that offer safe, diverse, and easy-to-use means to perform non-driving-related activities besides the regular driving tasks. Using empirical methods that are commonly used in human-computer interaction, we investigate various aspects of automotive user interfaces with the goal to support the design and development of future interfaces that facilitate non-driving-related activities. The first aspect is related to using physiological data in order to infer information about the driver’s workload. As a second aspect, we propose a multimodal interaction style to facilitate the interaction with multiple activities in the car. In addition, we introduce two concepts for the support of commonly used and demanded non-driving-related activities: For communication with the outside world, we investigate the driver’s needs with regard to sharing ride details with remote persons in order to increase driving safety. Finally, we present a concept of time-adjusted activities (e.g., entertainment and productivity) which enable the driver to make use of times where only little attention is required. Starting with manual, non-automated driving, we also consider the rise of automated driving modes.When cars were invented, they allowed the driver and potential passengers to get to a distant location. The only activities the driver was able and supposed to perform were related to maneuvering the vehicle, i.e., accelerate, decelerate, and steer the car. Today drivers perform many activities that go beyond these driving tasks. This includes for example activities related to driving assistance, location-based information and navigation, entertainment, communication, and productivity. To perform these activities, drivers use functions that are provided by in-vehicle information systems in the car. Many of these functions are meant to increase driving safety or to make the ride more enjoyable. The latter is important since people spend a considerable amount of time in their cars and want to perform similar activities like those to which they are accustomed to from using mobile devices. However, as long as the driver is responsible for driving, these activities can be distracting and pose driver, passengers, and the environment at risk. One goal for the development of automotive user interfaces is therefore to enable an easy and appropriate operation of in-vehicle systems such that driving tasks and non-driving-related activities can be performed easily and safely. The main contribution of this thesis is a set of guidelines and exemplary concepts for automotive user interfaces that offer safe, diverse, and easy-to-use means to perform also non-driving-related activities while driving. Using empirical methods that are commonly used in human-computer interaction, we approach various aspects of automotive user interfaces in order to support the design and development of future interfaces that also enable non-driving-related activities. Starting with manual, non-automated driving, we also consider the transition towards automated driving modes. As a first part, we look at the prerequisites that enable non-driving-related activities in the car. We propose guidelines for the design and development of automotive user interfaces that also support non-driving-related activities. This includes for instance rules on how to adapt or interrupt activities when the level of automation changes. To enable activities in the car, we propose a novel interaction concept that facilitates multimodal interaction in the car by combining speech interaction and touch gestures. Moreover, we reveal aspects on how to infer information about the driver's state (especially mental workload) by using physiological data. We conducted a real-world driving study to extract a data set with physiological and context data. This can help to better understand the driver state, to adapt interfaces to the driver and driving situations, and to adapt the route selection process. Second, we propose two concepts for supporting non-driving-related activities that are frequently used and demanded in the car. For telecommunication, we propose a concept to increase driving safety when communicating with the outside world. This concept enables the driver to share different types of information with remote parties. Thereby, the driver can choose between different levels of details ranging from abstract information such as ``Alice is driving right now'' up to sharing a video of the driving scene. We investigated the drivers' needs on the go and derived guidelines for the design of communication-related functions in the car through an online survey and in-depth interviews. As a second aspect, we present an approach to offer time-adjusted entertainment and productivity tasks to the driver. The idea is to allow time-adjusted tasks during periods where the demand for the driver's attention is low, for instance at traffic lights or during a highly automated ride. Findings from a web survey and a case study demonstrate the feasibility of this approach. With the findings of this thesis we envision to provide a basis for future research and development in the domain of automotive user interfaces and non-driving-related activities in the transition from manual driving to highly and fully automated driving.Als das Auto erfunden wurde, ermöglichte es den Insassen hauptsächlich, entfernte Orte zu erreichen. Die einzigen Tätigkeiten, die Fahrerinnen und Fahrer während der Fahrt erledigen konnten und sollten, bezogen sich auf die Steuerung des Fahrzeugs. Heute erledigen die Fahrerinnen und Fahrer diverse Tätigkeiten, die über die ursprünglichen Aufgaben hinausgehen und sich nicht unbedingt auf die eigentliche Fahraufgabe beziehen. Dies umfasst unter anderem die Bereiche Fahrerassistenz, standortbezogene Informationen und Navigation, Unterhaltung, Kommunikation und Produktivität. Informationssysteme im Fahrzeug stellen den Fahrerinnen und Fahrern Funktionen bereit, um diese Aufgaben auch während der Fahrt zu erledigen. Viele dieser Funktionen verbessern die Fahrsicherheit oder dienen dazu, die Fahrt angenehm zu gestalten. Letzteres wird immer wichtiger, da man inzwischen eine beträchtliche Zeit im Auto verbringt und dabei nicht mehr auf die Aktivitäten und Funktionen verzichten möchte, die man beispielsweise durch die Benutzung von Smartphone und Tablet gewöhnt ist. Solange der Fahrer selbst fahren muss, können solche Aktivitäten von der Fahrtätigkeit ablenken und eine Gefährdung für die Insassen oder die Umgebung darstellen. Ein Ziel bei der Entwicklung automobiler Benutzungsschnittstellen ist daher eine einfache, adäquate Bedienung solcher Systeme, damit Fahraufgabe und Nebentätigkeiten gut und vor allem sicher durchgeführt werden können. Der Hauptbeitrag dieser Arbeit umfasst einen Leitfaden und beispielhafte Konzepte für automobile Benutzungsschnittstellen, die eine sichere, abwechslungsreiche und einfache Durchführung von Tätigkeiten jenseits der eigentlichen Fahraufgabe ermöglichen. Basierend auf empirischen Methoden der Mensch-Computer-Interaktion stellen wir verschiedene Lösungen vor, die die Entwicklung und Gestaltung solcher Benutzungsschnittstellen unterstützen. Ausgehend von der heute üblichen nicht automatisierten Fahrt betrachten wir dabei auch Aspekte des automatisierten Fahrens. Zunächst betrachten wir die notwendigen Voraussetzungen, um Tätigkeiten jenseits der Fahraufgabe zu ermöglichen. Wir stellen dazu einen Leitfaden vor, der die Gestaltung und Entwicklung von automobilen Benutzungsschnittstellen unterstützt, die das Durchführen von Nebenaufgaben erlauben. Dies umfasst zum Beispiel Hinweise, wie Aktivitäten angepasst oder unterbrochen werden können, wenn sich der Automatisierungsgrad während der Fahrt ändert. Um Aktivitäten im Auto zu unterstützen, stellen wir ein neuartiges Interaktionskonzept vor, das eine multimodale Interaktion im Fahrzeug mit Sprachbefehlen und Touch-Gesten ermöglicht. Für automatisierte Fahrzeugsysteme und zur Anpassung der Interaktionsmöglichkeiten an die Fahrsituation stellt der Fahrerzustand (insbesondere die mentale Belastung) eine wichtige Information dar. Durch eine Fahrstudie im realen Straßenverkehr haben wir einen Datensatz generiert, der physiologische Daten und Kontextinformationen umfasst und damit Rückschlüsse auf den Fahrerzustand ermöglicht. Mit diesen Informationen über Fahrerinnen und Fahrer wird es möglich, den Fahrerzustand besser zu verstehen, Benutzungsschnittstellen an die aktuelle Fahrsituation anzupassen und die Routenwahl anzupassen. Außerdem stellen wir zwei konkrete Konzepte zur Unterstützung von Nebentätigkeiten vor, die schon heute regelmäßig bei der Fahrt getätigt oder verlangt werden. Im Bereich der Telekommunikation stellen wir dazu ein Konzept vor, das die Fahrsicherheit beim Kommunizieren mit Personen außerhalb des Autos erhöht. Das Konzept erlaubt es dem Fahrer, unterschiedliche Arten von Kontextinformationen mit Kommunikationspartnern zu teilen. Dies reicht von der abstrakten Information, dass man derzeit im Auto unterwegs ist bis hin zum Teilen eines Live-Videos der aktuellen Fahrsituation. Diesbezüglich haben wir über eine Web-Umfrage und detaillierte Interviews die Bedürfnisse der Nutzer(innen) erhoben und ausgewertet. Zudem stellen wir ein prototypisches Konzept sowie Richtlinien vor, wie künftige Kommunikationsaufgaben im Fahrzeug gestaltet werden sollen. Als ein zweites Konzept betrachten wir zeitbeschränkte Aufgaben zur Unterhaltung und Produktivität im Fahrzeug. Die Idee ist hier, zeitlich begrenzte Aufgaben in Zeiten niedriger Belastung zuzulassen, wie zum Beispiel beim Warten an einer Ampel oder während einer hochautomatisierten (Teil-) Fahrt. Ergebnisse aus einer Web-Umfrage und einer Fallstudie zeigen die Machbarkeit dieses Ansatzes auf. Mit den Ergebnissen dieser Arbeit soll eine Basis für künftige Forschung und Entwicklung gelegt werden, um im Bereich automobiler Benutzungsschnittstellen insbesondere nicht-fahr-bezogene Aufgaben im Übergang zwischen manuellem Fahren und einer hochautomatisierten Autofahrt zu unterstützen

    Robust and Efficient Activity Recognition from Videos

    Get PDF
    With technological advancement in embedded system design, powerful cameras have been embedded within smart phones, and wireless cameras can be easily deployed at street corners, traffic lights, big stadiums, train stations, etc. Besides, the growth of online media, surveillance, and mobile cameras have resulted in an explosion of videos being uploaded to social media sites such as Facebook and YouTube. The availability of such a vast volume of videos has attracted the computer vision community to conduct much research on human activity recognition since people are arguably the most interesting subjects of such videos. Automatic human activity recognition allows engineers and computer scientists to design smarter surveillance systems, semantically aware video indexes and also more natural human-computer interfaces. Despite the explosion of video data, the ability to automatically recognize and understand human activities is still rather limited. This is primarily due to multiple challenges inherent to the recognition task, namely large variability in human execution styles, the complexity of the visual stimuli in terms of camera motion, background clutter, viewpoint changes, etc., and the number of activities that can be recognized. In addition, the ability to predict future actions of objects based on past observed video frames is very useful. Therefore, in this thesis, we explore four designs to solve the problems we discussed earlier, namely (1) A semantics-based deep learning model, namely SBGAR, is proposed to do group activity recognition. This model achieves higher accuracy and efficiency than existing group activity recognition methods. (2) Despite its high accuracy, SBGAR has some limitations, namely (i) it requires a large dataset with caption information, (ii) activity recognition model is independent of the caption generation model and hence SBGAR may not perform well in some cases. To remove such limitations, we design ReHAR, a robust and efficient human activity recognition scheme. ReHAR can be used to recognize both single-person activities and group activities. (3) In many application scenarios, merely knowing what the moving agents are doing is not sufficient. It also requires predictions of future trajectories of moving agents. Thus, we propose GRIP, a graph-based interaction-aware motion intent prediction scheme. The scheme uses a graph to represent the relationships between two objects, e.g., human joints or traffic agents, and predict the motion intents of all observed objects simultaneously. (4) Action recognition and trajectory prediction schemes are typically deployed in resource-constrained devices. Thus, any technique that can accelerate the computation speed of our schemes is important. Hence, we propose a novel deep learning model decomposition method called DAC that is capable of factorizing an ordinary convolutional layer into two layers with much fewer parameters. DAC computes the corresponding weights for the newly generated layers directly from the weights of the original convolutional layer. Thus, no training (or fine-tuning) or any data is needed
    corecore