1,179 research outputs found

    R2S100K: Road-Region Segmentation Dataset For Semi-Supervised Autonomous Driving in the Wild

    Full text link
    Semantic understanding of roadways is a key enabling factor for safe autonomous driving. However, existing autonomous driving datasets provide well-structured urban roads while ignoring unstructured roadways containing distress, potholes, water puddles, and various kinds of road patches i.e., earthen, gravel etc. To this end, we introduce Road Region Segmentation dataset (R2S100K) -- a large-scale dataset and benchmark for training and evaluation of road segmentation in aforementioned challenging unstructured roadways. R2S100K comprises 100K images extracted from a large and diverse set of video sequences covering more than 1000 KM of roadways. Out of these 100K privacy respecting images, 14,000 images have fine pixel-labeling of road regions, with 86,000 unlabeled images that can be leveraged through semi-supervised learning methods. Alongside, we present an Efficient Data Sampling (EDS) based self-training framework to improve learning by leveraging unlabeled data. Our experimental results demonstrate that the proposed method significantly improves learning methods in generalizability and reduces the labeling cost for semantic segmentation tasks. Our benchmark will be publicly available to facilitate future research at https://r2s100k.github.io/

    Monitoring Dangerous Goods on Roads Using Computer Vision

    Get PDF
    The transportation of dangerous goods on roads poses significant risks, as accidents involving hazardous materials can be deadly and devastating, especially in densely populated areas and confined spaces like tunnels. In the event of an accident, fast emergency response is crucial. Vehicles carrying dangerous goods must be marked with orange hazmat plates, one attached to the front and another to the back of the vehicle. This establishes a universal way to recognize these trucks within ADR member states. This thesis aims to determine if it is possible to use computer vision to automatically monitor dangerous goods in an efficient manner based on the visual information provided by the hazmat plates. This research began with a review of relevant literature and background information. Then, a dataset was created for training deep learning-based object detectors to identify and locate trucks and hazmat plates in images. A method for classifying trucks as carrying dangerous goods was also developed. Finally, the effectiveness of the method was tested on a set of videos of highway traffic that were collected manually. The proposed approach for dangerous goods detection and monitoring in this thesis proved to be highly accurate. The method was evaluated using a set of highway traffic videos with varying levels of complexity, including scenarios where false positives could occur. The proposed method was able to identify all trucks carrying dangerous goods and avoid false positives. The method also proved to be highly efficient, as it could process 26 frames per second on a modern edge device

    Guided rewriting and constraint satisfaction for parallel GPU code generation

    Get PDF
    Graphics Processing Units (GPUs) are notoriously hard to optimise for manually due to their scheduling and memory hierarchies. What is needed are good automatic code generators and optimisers for such parallel hardware. Functional approaches such as Accelerate, Futhark and LIFT leverage a high-level algorithmic Intermediate Representation (IR) to expose parallelism and abstract the implementation details away from the user. However, producing efficient code for a given accelerator remains challenging. Existing code generators depend on the user input to choose a subset of hard-coded optimizations or automated exploration of implementation search space. The former suffers from the lack of extensibility, while the latter is too costly due to the size of the search space. A hybrid approach is needed, where a space of valid implementations is built automatically and explored with the aid of human expertise. This thesis presents a solution combining user-guided rewriting and automatically generated constraints to produce high-performance code. The first contribution is an automatic tuning technique to find a balance between performance and memory consumption. Leveraging its functional patterns, the LIFT compiler is empowered to infer tuning constraints and limit the search to valid tuning combinations only. Next, the thesis reframes parallelisation as a constraint satisfaction problem. Parallelisation constraints are extracted automatically from the input expression, and a solver is used to identify valid rewriting. The constraints truncate the search space to valid parallel mappings only by capturing the scheduling restrictions of the GPU in the context of a given program. A synchronisation barrier insertion technique is proposed to prevent data races and improve the efficiency of the generated parallel mappings. The final contribution of this thesis is the guided rewriting method, where the user encodes a design space of structural transformations using high-level IR nodes called rewrite points. These strongly typed pragmas express macro rewrites and expose design choices as explorable parameters. The thesis proposes a small set of reusable rewrite points to achieve tiling, cache locality, data reuse and memory optimisation. A comparison with the vendor-provided handwritten kernel ARM Compute Library and the TVM code generator demonstrates the effectiveness of this thesis' contributions. With convolution as a use case, LIFT-generated direct and GEMM-based convolution implementations are shown to perform on par with the state-of-the-art solutions on a mobile GPU. Overall, this thesis demonstrates that a functional IR yields well to user-guided and automatic rewriting for high-performance code generation

    Image-based Decision Support Systems: Technical Concepts, Design Knowledge, and Applications for Sustainability

    Get PDF
    Unstructured data accounts for 80-90% of all data generated, with image data contributing its largest portion. In recent years, the field of computer vision, fueled by deep learning techniques, has made significant advances in exploiting this data to generate value. However, often computer vision models are not sufficient for value creation. In these cases, image-based decision support systems (IB-DSSs), i.e., decision support systems that rely on images and computer vision, can be used to create value by combining human and artificial intelligence. Despite its potential, there is only little work on IB-DSSs so far. In this thesis, we develop technical foundations and design knowledge for IBDSSs and demonstrate the possible positive effect of IB-DSSs on environmental sustainability. The theoretical contributions of this work are based on and evaluated in a series of artifacts in practical use cases: First, we use technical experiments to demonstrate the feasibility of innovative approaches to exploit images for IBDSSs. We show the feasibility of deep-learning-based computer vision and identify future research opportunities based on one of our practical use cases. Building on this, we develop and evaluate a novel approach for combining human and artificial intelligence for value creation from image data. Second, we develop design knowledge that can serve as a blueprint for future IB-DSSs. We perform two design science research studies to formulate generalizable principles for purposeful design — one for IB-DSSs and one for the subclass of image-mining-based decision support systems (IM-DSSs). While IB-DSSs can provide decision support based on single images, IM-DSSs are suitable when large amounts of image data are available and required for decision-making. Third, we demonstrate the viability of applying IBDSSs to enhance environmental sustainability by performing life cycle assessments for two practical use cases — one in which the IB-DSS enables a prolonged product lifetime and one in which the IB-DSS facilitates an improvement of manufacturing processes. We hope this thesis will contribute to expand the use and effectiveness of imagebased decision support systems in practice and will provide directions for future research

    Deep learning in crowd counting: A survey

    Get PDF
    Counting high-density objects quickly and accurately is a popular area of research. Crowd counting has significant social and economic value and is a major focus in artificial intelligence. Despite many advancements in this field, many of them are not widely known, especially in terms of research data. The authors proposed a three-tier standardised dataset taxonomy (TSDT). The Taxonomy divides datasets into small-scale, large-scale and hyper-scale, according to different application scenarios. This theory can help researchers make more efficient use of datasets and improve the performance of AI algorithms in specific fields. Additionally, the authors proposed a new evaluation index for the clarity of the dataset: average pixel occupied by each object (APO). This new evaluation index is more suitable for evaluating the clarity of the dataset in the object counting task than the image resolution. Moreover, the authors classified the crowd counting methods from a data-driven perspective: multi-scale networks, single-column networks, multi-column networks, multi-task networks, attention networks and weak-supervised networks and introduced the classic crowd counting methods of each class. The authors classified the existing 36 datasets according to the theory of three-tier standardised dataset taxonomy and discussed and evaluated these datasets. The authors evaluated the performance of more than 100 methods in the past five years on different levels of popular datasets. Recently, progress in research on small-scale datasets has slowed down. There are few new datasets and algorithms on small-scale datasets. The studies focused on large or hyper-scale datasets appear to be reaching a saturation point. The combined use of multiple approaches began to be a major research direction. The authors discussed the theoretical and practical challenges of crowd counting from the perspective of data, algorithms and computing resources. The field of crowd counting is moving towards combining multiple methods and requires fresh, targeted datasets. Despite advancements, the field still faces challenges such as handling real-world scenarios and processing large crowds in real-time. Researchers are exploring transfer learning to overcome the limitations of small datasets. The development of effective algorithms for crowd counting remains a challenging and important task in computer vision and AI, with many opportunities for future research.BHF, AA/18/3/34220Hope Foundation for Cancer Research, RM60G0680GCRF, P202PF11;Sino‐UK Industrial Fund, RP202G0289LIAS, P202ED10, P202RE969Data Science Enhancement Fund, P202RE237Sino‐UK Education Fund, OP202006Fight for Sight, 24NN201Royal Society International Exchanges Cost Share Award, RP202G0230MRC, MC_PC_17171BBSRC, RM32G0178B

    Continuous Estimation of Smoking Lapse Risk from Noisy Wrist Sensor Data Using Sparse and Positive-Only Labels

    Get PDF
    Estimating the imminent risk of adverse health behaviors provides opportunities for developing effective behavioral intervention mechanisms to prevent the occurrence of the target behavior. One of the key goals is to find opportune moments for intervention by passively detecting the rising risk of an imminent adverse behavior. Significant progress in mobile health research and the ability to continuously sense internal and external states of individual health and behavior has paved the way for detecting diverse risk factors from mobile sensor data. The next frontier in this research is to account for the combined effects of these risk factors to produce a composite risk score of adverse behaviors using wearable sensors convenient for daily use. Developing a machine learning-based model for assessing the risk of smoking lapse in the natural environment faces significant outstanding challenges requiring the development of novel and unique methodologies for each of them. The first challenge is coming up with an accurate representation of noisy and incomplete sensor data to encode the present and historical influence of behavioral cues, mental states, and the interactions of individuals with their ever-changing environment. The next noteworthy challenge is the absence of confirmed negative labels of low-risk states and adequate precise annotations of high-risk states. Finally, the model should work on convenient wearable devices to facilitate widespread adoption in research and practice. In this dissertation, we develop methods that account for the multi-faceted nature of smoking lapse behavior to train and evaluate a machine learning model capable of estimating composite risk scores in the natural environment. We first develop mRisk, which combines the effects of various mHealth biomarkers such as stress, physical activity, and location history in producing the risk of smoking lapse using sequential deep neural networks. We propose an event-based encoding of sensor data to reduce the effect of noises and then present an approach to efficiently model the historical influence of recent and past sensor-derived contexts on the likelihood of smoking lapse. To circumvent the lack of confirmed negative labels (i.e., annotated low-risk moments) and only a few positive labels (i.e., sensor-based detection of smoking lapse corroborated by self-reports), we propose a new loss function to accurately optimize the models. We build the mRisk models using biomarker (stress, physical activity) streams derived from chest-worn sensors. Adapting the models to work with less invasive and more convenient wrist-based sensors requires adapting the biomarker detection models to work with wrist-worn sensor data. To that end, we develop robust stress and activity inference methodologies from noisy wrist-sensor data. We first propose CQP, which quantifies wrist-sensor collected PPG data quality. Next, we show that integrating CQP within the inference pipeline improves accuracy-yield trade-offs associated with stress detection from wrist-worn PPG sensors in the natural environment. mRisk also requires sensor-based precise detection of smoking events and confirmation through self-reports to extract positive labels. Hence, we develop rSmoke, an orientation-invariant smoking detection model that is robust to the variations in sensor data resulting from orientation switches in the field. We train the proposed mRisk risk estimation models using the wrist-based inferences of lapse risk factors. To evaluate the utility of the risk models, we simulate the delivery of intelligent smoking interventions to at-risk participants as informed by the composite risk scores. Our results demonstrate the envisaged impact of machine learning-based models operating on wrist-worn wearable sensor data to output continuous smoking lapse risk scores. The novel methodologies we propose throughout this dissertation help instigate a new frontier in smoking research that can potentially improve the smoking abstinence rate in participants willing to quit

    Approximate Computing Survey, Part I: Terminology and Software & Hardware Approximation Techniques

    Full text link
    The rapid growth of demanding applications in domains applying multimedia processing and machine learning has marked a new era for edge and cloud computing. These applications involve massive data and compute-intensive tasks, and thus, typical computing paradigms in embedded systems and data centers are stressed to meet the worldwide demand for high performance. Concurrently, the landscape of the semiconductor field in the last 15 years has constituted power as a first-class design concern. As a result, the community of computing systems is forced to find alternative design approaches to facilitate high-performance and/or power-efficient computing. Among the examined solutions, Approximate Computing has attracted an ever-increasing interest, with research works applying approximations across the entire traditional computing stack, i.e., at software, hardware, and architectural levels. Over the last decade, there is a plethora of approximation techniques in software (programs, frameworks, compilers, runtimes, languages), hardware (circuits, accelerators), and architectures (processors, memories). The current article is Part I of our comprehensive survey on Approximate Computing, and it reviews its motivation, terminology and principles, as well it classifies and presents the technical details of the state-of-the-art software and hardware approximation techniques.Comment: Under Review at ACM Computing Survey

    Blending the Material and Digital World for Hybrid Interfaces

    Get PDF
    The development of digital technologies in the 21st century is progressing continuously and new device classes such as tablets, smartphones or smartwatches are finding their way into our everyday lives. However, this development also poses problems, as these prevailing touch and gestural interfaces often lack tangibility, take little account of haptic qualities and therefore require full attention from their users. Compared to traditional tools and analog interfaces, the human skills to experience and manipulate material in its natural environment and context remain unexploited. To combine the best of both, a key question is how it is possible to blend the material world and digital world to design and realize novel hybrid interfaces in a meaningful way. Research on Tangible User Interfaces (TUIs) investigates the coupling between physical objects and virtual data. In contrast, hybrid interfaces, which specifically aim to digitally enrich analog artifacts of everyday work, have not yet been sufficiently researched and systematically discussed. Therefore, this doctoral thesis rethinks how user interfaces can provide useful digital functionality while maintaining their physical properties and familiar patterns of use in the real world. However, the development of such hybrid interfaces raises overarching research questions about the design: Which kind of physical interfaces are worth exploring? What type of digital enhancement will improve existing interfaces? How can hybrid interfaces retain their physical properties while enabling new digital functions? What are suitable methods to explore different design? And how to support technology-enthusiast users in prototyping? For a systematic investigation, the thesis builds on a design-oriented, exploratory and iterative development process using digital fabrication methods and novel materials. As a main contribution, four specific research projects are presented that apply and discuss different visual and interactive augmentation principles along real-world applications. The applications range from digitally-enhanced paper, interactive cords over visual watch strap extensions to novel prototyping tools for smart garments. While almost all of them integrate visual feedback and haptic input, none of them are built on rigid, rectangular pixel screens or use standard input modalities, as they all aim to reveal new design approaches. The dissertation shows how valuable it can be to rethink familiar, analog applications while thoughtfully extending them digitally. Finally, this thesis’ extensive work of engineering versatile research platforms is accompanied by overarching conceptual work, user evaluations and technical experiments, as well as literature reviews.Die Durchdringung digitaler Technologien im 21. Jahrhundert schreitet stetig voran und neue Geräteklassen wie Tablets, Smartphones oder Smartwatches erobern unseren Alltag. Diese Entwicklung birgt aber auch Probleme, denn die vorherrschenden berührungsempfindlichen Oberflächen berücksichtigen kaum haptische Qualitäten und erfordern daher die volle Aufmerksamkeit ihrer Nutzer:innen. Im Vergleich zu traditionellen Werkzeugen und analogen Schnittstellen bleiben die menschlichen Fähigkeiten ungenutzt, die Umwelt mit allen Sinnen zu begreifen und wahrzunehmen. Um das Beste aus beiden Welten zu vereinen, stellt sich daher die Frage, wie neuartige hybride Schnittstellen sinnvoll gestaltet und realisiert werden können, um die materielle und die digitale Welt zu verschmelzen. In der Forschung zu Tangible User Interfaces (TUIs) wird die Verbindung zwischen physischen Objekten und virtuellen Daten untersucht. Noch nicht ausreichend erforscht wurden hingegen hybride Schnittstellen, die speziell darauf abzielen, physische Gegenstände des Alltags digital zu erweitern und anhand geeigneter Designparameter und Entwurfsräume systematisch zu untersuchen. In dieser Dissertation wird daher untersucht, wie Materialität und Digitalität nahtlos ineinander übergehen können. Es soll erforscht werden, wie künftige Benutzungsschnittstellen nützliche digitale Funktionen bereitstellen können, ohne ihre physischen Eigenschaften und vertrauten Nutzungsmuster in der realen Welt zu verlieren. Die Entwicklung solcher hybriden Ansätze wirft jedoch übergreifende Forschungsfragen zum Design auf: Welche Arten von physischen Schnittstellen sind es wert, betrachtet zu werden? Welche Art von digitaler Erweiterung verbessert das Bestehende? Wie können hybride Konzepte ihre physischen Eigenschaften beibehalten und gleichzeitig neue digitale Funktionen ermöglichen? Was sind geeignete Methoden, um verschiedene Designs zu erforschen? Wie kann man Technologiebegeisterte bei der Erstellung von Prototypen unterstützen? Für eine systematische Untersuchung stützt sich die Arbeit auf einen designorientierten, explorativen und iterativen Entwicklungsprozess unter Verwendung digitaler Fabrikationsmethoden und neuartiger Materialien. Im Hauptteil werden vier Forschungsprojekte vorgestellt, die verschiedene visuelle und interaktive Prinzipien entlang realer Anwendungen diskutieren. Die Szenarien reichen von digital angereichertem Papier, interaktiven Kordeln über visuelle Erweiterungen von Uhrarmbändern bis hin zu neuartigen Prototyping-Tools für intelligente Kleidungsstücke. Um neue Designansätze aufzuzeigen, integrieren nahezu alle visuelles Feedback und haptische Eingaben, um Alternativen zu Standard-Eingabemodalitäten auf starren Pixelbildschirmen zu schaffen. Die Dissertation hat gezeigt, wie wertvoll es sein kann, bekannte, analoge Anwendungen zu überdenken und sie dabei gleichzeitig mit Bedacht digital zu erweitern. Dabei umfasst die vorliegende Arbeit sowohl realisierte technische Forschungsplattformen als auch übergreifende konzeptionelle Arbeiten, Nutzerstudien und technische Experimente sowie die Analyse existierender Forschungsarbeiten

    Novel neural architectures & algorithms for efficient inference

    Get PDF
    In the last decade, the machine learning universe embraced deep neural networks (DNNs) wholeheartedly with the advent of neural architectures such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), transformers, etc. These models have empowered many applications, such as ChatGPT, Imagen, etc., and have achieved state-of-the-art (SOTA) performance on many vision, speech, and language modeling tasks. However, SOTA performance comes with various issues, such as large model size, compute-intensive training, increased inference latency, higher working memory, etc. This thesis aims at improving the resource efficiency of neural architectures, i.e., significantly reducing the computational, storage, and energy consumption of a DNN without any significant loss in performance. Towards this goal, we explore novel neural architectures as well as training algorithms that allow low-capacity models to achieve near SOTA performance. We divide this thesis into two dimensions: \textit{Efficient Low Complexity Models}, and \textit{Input Hardness Adaptive Models}. Along the first dimension, i.e., \textit{Efficient Low Complexity Models}, we improve DNN performance by addressing instabilities in the existing architectures and training methods. We propose novel neural architectures inspired by ordinary differential equations (ODEs) to reinforce input signals and attend to salient feature regions. In addition, we show that carefully designed training schemes improve the performance of existing neural networks. We divide this exploration into two parts: \textsc{(a) Efficient Low Complexity RNNs.} We improve RNN resource efficiency by addressing poor gradients, noise amplifications, and BPTT training issues. First, we improve RNNs by solving ODEs that eliminate vanishing and exploding gradients during the training. To do so, we present Incremental Recurrent Neural Networks (iRNNs) that keep track of increments in the equilibrium surface. Next, we propose Time Adaptive RNNs that mitigate the noise propagation issue in RNNs by modulating the time constants in the ODE-based transition function. We empirically demonstrate the superiority of ODE-based neural architectures over existing RNNs. Finally, we propose Forward Propagation Through Time (FPTT) algorithm for training RNNs. We show that FPTT yields significant gains compared to the more conventional Backward Propagation Through Time (BPTT) scheme. \textsc{(b) Efficient Low Complexity CNNs.} Next, we improve CNN architectures by reducing their resource usage. They require greater depth to generate high-level features, resulting in computationally expensive models. We design a novel residual block, the Global layer, that constrains the input and output features by approximately solving partial differential equations (PDEs). It yields better receptive fields than traditional convolutional blocks and thus results in shallower networks. Further, we reduce the model footprint by enforcing a novel inductive bias that formulates the output of a residual block as a spatial interpolation between high-compute anchor pixels and low-compute cheaper pixels. This results in spatially interpolated convolutional blocks (SI-CNNs) that have better compute and performance trade-offs. Finally, we propose an algorithm that enforces various distributional constraints during training in order to achieve better generalization. We refer to this scheme as distributionally constrained learning (DCL). In the second dimension, i.e., \textit{Input Hardness Adaptive Models}, we introduce the notion of the hardness of any input relative to any architecture. In the first dimension, a neural network allocates the same resources, such as compute, storage, and working memory, for all the inputs. It inherently assumes that all examples are equally hard for a model. In this dimension, we challenge this assumption using input hardness as our reasoning that some inputs are relatively easy for a network to predict compared to others. Input hardness enables us to create selective classifiers wherein a low-capacity network handles simple inputs while abstaining from a prediction on the complex inputs. Next, we create hybrid models that route the hard inputs from the low-capacity abstaining network to a high-capacity expert model. We design various architectures that adhere to this hybrid inference style. Further, input hardness enables us to selectively distill the knowledge of a high-capacity model into a low-capacity model by cleverly discarding hard inputs during the distillation procedure. Finally, we conclude this thesis by sketching out various interesting future research directions that emerge as an extension of different ideas explored in this work
    corecore