1,942 research outputs found

    Efficient Visual Computing with Camera RAW Snapshots

    Get PDF
    Conventional cameras capture image irradiance (RAW) on a sensor and convert it to RGB images using an image signal processor (ISP). The images can then be used for photography or visual computing tasks in a variety of applications, such as public safety surveillance and autonomous driving. One can argue that since RAW images contain all the captured information, the conversion of RAW to RGB using an ISP is not necessary for visual computing. In this paper, we propose a novel ρ-Vision framework to perform high-level semantic understanding and low-level compression using RAW images without the ISP subsystem used for decades. Considering the scarcity of available RAW image datasets, we first develop an unpaired CycleR2R network based on unsupervised CycleGAN to train modular unrolled ISP and inverse ISP (invISP) models using unpaired RAW and RGB images. We can then flexibly generate simulated RAW images (simRAW) using any existing RGB image dataset and finetune different models originally trained in the RGB domain to process real-world camera RAW images. We demonstrate object detection and image compression capabilities in RAW-domain using RAW-domain YOLOv3 and RAW image compressor (RIC) on camera snapshots. Quantitative results reveal that RAW-domain task inference provides better detection accuracy and compression efficiency compared to that in the RGB domain. Furthermore, the proposed ρ-Vision generalizes across various camera sensors and different task-specific models. An added benefit of employing the ρ-Vision is the elimination of the need for ISP, leading to potential reductions in computations and processing times

    A Comprehensive Study on Object Detection Techniques in Unconstrained Environments

    Full text link
    Object detection is a crucial task in computer vision that aims to identify and localize objects in images or videos. The recent advancements in deep learning and Convolutional Neural Networks (CNNs) have significantly improved the performance of object detection techniques. This paper presents a comprehensive study of object detection techniques in unconstrained environments, including various challenges, datasets, and state-of-the-art approaches. Additionally, we present a comparative analysis of the methods and highlight their strengths and weaknesses. Finally, we provide some future research directions to further improve object detection in unconstrained environments.Comment: 9 pages, 3 Figures, 2 Table

    Malarial Diagnosis with Deep Learning and Image Processing Approaches

    Get PDF
    Malaria is a mosquito-borne disease that has killed an estimated a half-a-million people worldwide since 2000. It may be time consuming and costly to conduct thorough laboratory testing for malaria, and it also requires the skills of trained laboratory personnel. Additionally, human analysis might make mistakes. Integrating denoising and image segmentation techniques with Generative Adversarial Network (GAN) as a data augmentation technique can enhance the performance of diagnosis. Various deep learning models, such as CNN, ResNet50, and VGG19, for recognising the Plasmodium parasite in thick blood smear images have been used. The experimental results indicate that the VGG19 model performed best by achieving 98.46% compared to other approaches. This study demonstrates the potential of artificial intelligence to improve the speed and precision of pathogen detection which is more effective than manual analysis

    MatSpectNet: Material Segmentation Network with Domain-Aware and Physically-Constrained Hyperspectral Reconstruction

    Full text link
    Achieving accurate material segmentation for 3-channel RGB images is challenging due to the considerable variation in a material's appearance. Hyperspectral images, which are sets of spectral measurements sampled at multiple wavelengths, theoretically offer distinct information for material identification, as variations in intensity of electromagnetic radiation reflected by a surface depend on the material composition of a scene. However, existing hyperspectral datasets are impoverished regarding the number of images and material categories for the dense material segmentation task, and collecting and annotating hyperspectral images with a spectral camera is prohibitively expensive. To address this, we propose a new model, the MatSpectNet to segment materials with recovered hyperspectral images from RGB images. The network leverages the principles of colour perception in modern cameras to constrain the reconstructed hyperspectral images and employs the domain adaptation method to generalise the hyperspectral reconstruction capability from a spectral recovery dataset to material segmentation datasets. The reconstructed hyperspectral images are further filtered using learned response curves and enhanced with human perception. The performance of MatSpectNet is evaluated on the LMD dataset as well as the OpenSurfaces dataset. Our experiments demonstrate that MatSpectNet attains a 1.60% increase in average pixel accuracy and a 3.42% improvement in mean class accuracy compared with the most recent publication. The project code is attached to the supplementary material and will be published on GitHub.Comment: 7 pages main pape

    Under construction: infrastructure and modern fiction

    Full text link
    In this dissertation, I argue that infrastructural development, with its technological promises but widening geographic disparities and social and environmental consequences, informs both the narrative content and aesthetic forms of modernist and contemporary Anglophone fiction. Despite its prevalent material forms—roads, rails, pipes, and wires—infrastructure poses particular formal and narrative problems, often receding into the background as mere setting. To address how literary fiction theorizes the experience of infrastructure requires reading “infrastructurally”: that is, paying attention to the seemingly mundane interactions between characters and their built environments. The writers central to this project—James Joyce, William Faulkner, Karen Tei Yamashita, and Mohsin Hamid—take up the representational challenges posed by infrastructure by bringing transit networks, sanitation systems, and electrical grids and the histories of their development and use into the foreground. These writers call attention to the political dimensions of built environments, revealing the ways infrastructures produce, reinforce, and perpetuate racial and socioeconomic fault lines. They also attempt to formalize the material relations of power inscribed by and within infrastructure; the novel itself becomes an imaginary counterpart to the technologies of infrastructure, a form that shapes and constrains what types of social action and affiliation are possible

    Self-supervised AutoFlow

    Full text link
    Recently, AutoFlow has shown promising results on learning a training set for optical flow, but requires ground truth labels in the target domain to compute its search metric. Observing a strong correlation between the ground truth search metric and self-supervised losses, we introduce self-supervised AutoFlow to handle real-world videos without ground truth labels. Using self-supervised loss as the search metric, our self-supervised AutoFlow performs on par with AutoFlow on Sintel and KITTI where ground truth is available, and performs better on the real-world DAVIS dataset. We further explore using self-supervised AutoFlow in the (semi-)supervised setting and obtain competitive results against the state of the art

    Current Challenges in the Application of Algorithms in Multi-institutional Clinical Settings

    Get PDF
    The Coronavirus disease pandemic has highlighted the importance of artificial intelligence in multi-institutional clinical settings. Particularly in situations where the healthcare system is overloaded, and a lot of data is generated, artificial intelligence has great potential to provide automated solutions and to unlock the untapped potential of acquired data. This includes the areas of care, logistics, and diagnosis. For example, automated decision support applications could tremendously help physicians in their daily clinical routine. Especially in radiology and oncology, the exponential growth of imaging data, triggered by a rising number of patients, leads to a permanent overload of the healthcare system, making the use of artificial intelligence inevitable. However, the efficient and advantageous application of artificial intelligence in multi-institutional clinical settings faces several challenges, such as accountability and regulation hurdles, implementation challenges, and fairness considerations. This work focuses on the implementation challenges, which include the following questions: How to ensure well-curated and standardized data, how do algorithms from other domains perform on multi-institutional medical datasets, and how to train more robust and generalizable models? Also, questions of how to interpret results and whether there exist correlations between the performance of the models and the characteristics of the underlying data are part of the work. Therefore, besides presenting a technical solution for manual data annotation and tagging for medical images, a real-world federated learning implementation for image segmentation is introduced. Experiments on a multi-institutional prostate magnetic resonance imaging dataset showcase that models trained by federated learning can achieve similar performance to training on pooled data. Furthermore, Natural Language Processing algorithms with the tasks of semantic textual similarity, text classification, and text summarization are applied to multi-institutional, structured and free-text, oncology reports. The results show that performance gains are achieved by customizing state-of-the-art algorithms to the peculiarities of the medical datasets, such as the occurrence of medications, numbers, or dates. In addition, performance influences are observed depending on the characteristics of the data, such as lexical complexity. The generated results, human baselines, and retrospective human evaluations demonstrate that artificial intelligence algorithms have great potential for use in clinical settings. However, due to the difficulty of processing domain-specific data, there still exists a performance gap between the algorithms and the medical experts. In the future, it is therefore essential to improve the interoperability and standardization of data, as well as to continue working on algorithms to perform well on medical, possibly, domain-shifted data from multiple clinical centers

    Towards Object-Centric Scene Understanding

    Get PDF
    Visual perception for autonomous agents continues to attract community attention due to the disruptive technologies and the wide applicability of such solutions. Autonomous Driving (AD), a major application in this domain, promises to revolutionize our approach to mobility while bringing critical advantages in limiting accident fatalities. Fueled by recent advances in Deep Learning (DL), more computer vision tasks are being addressed using a learning paradigm. Deep Neural Networks (DNNs) succeeded consistently in pushing performances to unprecedented levels and demonstrating the ability of such approaches to generalize to an increasing number of difficult problems, such as 3D vision tasks. In this thesis, we address two main challenges arising from the current approaches. Namely, the computational complexity of multi-task pipelines, and the increasing need for manual annotations. On the one hand, AD systems need to perceive the surrounding environment on different levels of detail and, subsequently, take timely actions. This multitasking further limits the time available for each perception task. On the other hand, the need for universal generalization of such systems to massively diverse situations requires the use of large-scale datasets covering long-tailed cases. Such requirement renders the use of traditional supervised approaches, despite the data readily available in the AD domain, unsustainable in terms of annotation costs, especially for 3D tasks. Driven by the AD environment nature and the complexity dominated (unlike indoor scenes) by the presence of other scene elements (mainly cars and pedestrians) we focus on the above-mentioned challenges in object-centric tasks. We, then, situate our contributions appropriately in fast-paced literature, while supporting our claims with extensive experimental analysis leveraging up-to-date state-of-the-art results and community-adopted benchmarks
    • 

    corecore