1,942 research outputs found
Efficient Visual Computing with Camera RAW Snapshots
Conventional cameras capture image irradiance (RAW) on a sensor and convert it to RGB images using an image signal
processor (ISP). The images can then be used for photography or visual computing tasks in a variety of applications, such as public
safety surveillance and autonomous driving. One can argue that since RAW images contain all the captured information, the conversion
of RAW to RGB using an ISP is not necessary for visual computing. In this paper, we propose a novel Ï-Vision framework to perform
high-level semantic understanding and low-level compression using RAW images without the ISP subsystem used for decades.
Considering the scarcity of available RAW image datasets, we first develop an unpaired CycleR2R network based on unsupervised
CycleGAN to train modular unrolled ISP and inverse ISP (invISP) models using unpaired RAW and RGB images. We can then flexibly
generate simulated RAW images (simRAW) using any existing RGB image dataset and finetune different models originally trained in
the RGB domain to process real-world camera RAW images. We demonstrate object detection and image compression capabilities in
RAW-domain using RAW-domain YOLOv3 and RAW image compressor (RIC) on camera snapshots. Quantitative results reveal that
RAW-domain task inference provides better detection accuracy and compression efficiency compared to that in the RGB domain.
Furthermore, the proposed Ï-Vision generalizes across various camera sensors and different task-specific models. An added benefit of
employing the Ï-Vision is the elimination of the need for ISP, leading to potential reductions in computations and processing times
A Comprehensive Study on Object Detection Techniques in Unconstrained Environments
Object detection is a crucial task in computer vision that aims to identify
and localize objects in images or videos. The recent advancements in deep
learning and Convolutional Neural Networks (CNNs) have significantly improved
the performance of object detection techniques. This paper presents a
comprehensive study of object detection techniques in unconstrained
environments, including various challenges, datasets, and state-of-the-art
approaches. Additionally, we present a comparative analysis of the methods and
highlight their strengths and weaknesses. Finally, we provide some future
research directions to further improve object detection in unconstrained
environments.Comment: 9 pages, 3 Figures, 2 Table
Malarial Diagnosis with Deep Learning and Image Processing Approaches
Malaria is a mosquito-borne disease that has killed an estimated a half-a-million people worldwide since 2000. It may be time consuming and costly to conduct thorough laboratory testing for malaria, and it also requires the skills of trained laboratory personnel. Additionally, human analysis might make mistakes. Integrating denoising and image segmentation techniques with Generative Adversarial Network (GAN) as a data augmentation technique can enhance the performance of diagnosis. Various deep learning models, such as CNN, ResNet50, and VGG19, for recognising the Plasmodium parasite in thick blood smear images have been used. The experimental results indicate that the VGG19 model performed best by achieving 98.46% compared to other approaches. This study demonstrates the potential of artificial intelligence to improve the speed and precision of pathogen detection which is more effective than manual analysis
MatSpectNet: Material Segmentation Network with Domain-Aware and Physically-Constrained Hyperspectral Reconstruction
Achieving accurate material segmentation for 3-channel RGB images is
challenging due to the considerable variation in a material's appearance.
Hyperspectral images, which are sets of spectral measurements sampled at
multiple wavelengths, theoretically offer distinct information for material
identification, as variations in intensity of electromagnetic radiation
reflected by a surface depend on the material composition of a scene. However,
existing hyperspectral datasets are impoverished regarding the number of images
and material categories for the dense material segmentation task, and
collecting and annotating hyperspectral images with a spectral camera is
prohibitively expensive. To address this, we propose a new model, the
MatSpectNet to segment materials with recovered hyperspectral images from RGB
images. The network leverages the principles of colour perception in modern
cameras to constrain the reconstructed hyperspectral images and employs the
domain adaptation method to generalise the hyperspectral reconstruction
capability from a spectral recovery dataset to material segmentation datasets.
The reconstructed hyperspectral images are further filtered using learned
response curves and enhanced with human perception. The performance of
MatSpectNet is evaluated on the LMD dataset as well as the OpenSurfaces
dataset. Our experiments demonstrate that MatSpectNet attains a 1.60% increase
in average pixel accuracy and a 3.42% improvement in mean class accuracy
compared with the most recent publication. The project code is attached to the
supplementary material and will be published on GitHub.Comment: 7 pages main pape
Under construction: infrastructure and modern fiction
In this dissertation, I argue that infrastructural development, with its technological promises but widening geographic disparities and social and environmental consequences, informs both the narrative content and aesthetic forms of modernist and contemporary Anglophone fiction. Despite its prevalent material formsâroads, rails, pipes, and wiresâinfrastructure poses particular formal and narrative problems, often receding into the background as mere setting. To address how literary fiction theorizes the experience of infrastructure requires reading âinfrastructurallyâ: that is, paying attention to the seemingly mundane interactions between characters and their built environments. The writers central to this projectâJames Joyce, William Faulkner, Karen Tei Yamashita, and Mohsin Hamidâtake up the representational challenges posed by infrastructure by bringing transit networks, sanitation systems, and electrical grids and the histories of their development and use into the foreground. These writers call attention to the political dimensions of built environments, revealing the ways infrastructures produce, reinforce, and perpetuate racial and socioeconomic fault lines. They also attempt to formalize the material relations of power inscribed by and within infrastructure; the novel itself becomes an imaginary counterpart to the technologies of infrastructure, a form that shapes and constrains what types of social action and affiliation are possible
Recommended from our members
It's about Time: Analytical Time Periodization
This paper presents a novel approach to the problem of time periodization, which involves dividing the time span of a complex dynamic phenomenon into periods that enclose different relatively stable states or development trends. The challenge lies in finding such a division of the time that takes into account diverse behaviours of multiple components of the phenomenon while being simple and easy to interpret. Despite the importance of this problem, it has not received sufficient attention in the fields of visual analytics and data science. We use a real-world example from aviation and an additional usage scenario on analysing mobility trends during the COVID-19 pandemic to develop and test an analytical workflow that combines computational and interactive visual techniques. We highlight the differences between the two cases and show how they affect the use of different techniques. Through our investigation of possible variations in the time periodization problem, we discuss the potential of our approach to be used in various applications. Our contributions include defining and investigating an earlier neglected problem type, developing a practical and reproducible approach to solving problems of this type, and uncovering potential for formalization and development of computational methods
Self-supervised AutoFlow
Recently, AutoFlow has shown promising results on learning a training set for
optical flow, but requires ground truth labels in the target domain to compute
its search metric. Observing a strong correlation between the ground truth
search metric and self-supervised losses, we introduce self-supervised AutoFlow
to handle real-world videos without ground truth labels. Using self-supervised
loss as the search metric, our self-supervised AutoFlow performs on par with
AutoFlow on Sintel and KITTI where ground truth is available, and performs
better on the real-world DAVIS dataset. We further explore using
self-supervised AutoFlow in the (semi-)supervised setting and obtain
competitive results against the state of the art
Current Challenges in the Application of Algorithms in Multi-institutional Clinical Settings
The Coronavirus disease pandemic has highlighted the importance of artificial intelligence in multi-institutional clinical settings. Particularly in situations where the healthcare system is overloaded, and a lot of data is generated, artificial intelligence has great potential to provide automated solutions and to unlock the untapped potential of acquired data. This includes the areas of care, logistics, and diagnosis. For example, automated decision support applications could tremendously help physicians in their daily clinical routine. Especially in radiology and oncology, the exponential growth of imaging data, triggered by a rising number of patients, leads to a permanent overload of the healthcare system, making the use of artificial intelligence inevitable. However, the efficient and advantageous application of artificial intelligence in multi-institutional clinical settings faces several challenges, such as accountability and regulation hurdles, implementation challenges, and fairness considerations. This work focuses on the implementation challenges, which include the following questions: How to ensure well-curated and standardized data, how do algorithms from other domains perform on multi-institutional medical datasets, and how to train more robust and generalizable models? Also, questions of how to interpret results and whether there exist correlations between the performance of the models and the characteristics of the underlying data are part of the work. Therefore, besides presenting a technical solution for manual data annotation and tagging for medical images, a real-world federated learning implementation for image segmentation is introduced. Experiments on a multi-institutional prostate magnetic resonance imaging dataset showcase that models trained by federated learning can achieve similar performance to training on pooled data. Furthermore, Natural Language Processing algorithms with the tasks of semantic textual similarity, text classification, and text summarization are applied to multi-institutional, structured and free-text, oncology reports. The results show that performance gains are achieved by customizing state-of-the-art algorithms to the peculiarities of the medical datasets, such as the occurrence of medications, numbers, or dates. In addition, performance influences are observed depending on the characteristics of the data, such as lexical complexity. The generated results, human baselines, and retrospective human evaluations demonstrate that artificial intelligence algorithms have great potential for use in clinical settings. However, due to the difficulty of processing domain-specific data, there still exists a performance gap between the algorithms and the medical experts. In the future, it is therefore essential to improve the interoperability and standardization of data, as well as to continue working on algorithms to perform well on medical, possibly, domain-shifted data from multiple clinical centers
Towards Object-Centric Scene Understanding
Visual perception for autonomous agents continues to attract community attention due to the disruptive technologies and the wide applicability of such solutions. Autonomous Driving (AD), a major application in this domain, promises to revolutionize our approach to mobility while bringing critical advantages in limiting accident fatalities.
Fueled by recent advances in Deep Learning (DL), more computer vision tasks are being addressed using a learning paradigm. Deep Neural Networks (DNNs) succeeded consistently in pushing performances to unprecedented levels and demonstrating the ability of such approaches to generalize to an increasing number of difficult problems, such as 3D vision tasks.
In this thesis, we address two main challenges arising from the current approaches. Namely, the computational complexity of multi-task pipelines, and the increasing need for manual annotations. On the one hand, AD systems need to perceive the surrounding environment on different levels of detail and, subsequently, take timely actions. This multitasking further limits the time available for each perception task. On the other hand, the need for universal generalization of such systems to massively diverse situations requires the use of large-scale datasets covering long-tailed cases. Such requirement renders the use of traditional supervised approaches, despite the data readily available in the AD domain, unsustainable in terms of annotation costs, especially for 3D tasks.
Driven by the AD environment nature and the complexity dominated (unlike indoor scenes) by the presence of other scene elements (mainly cars and pedestrians) we focus on the above-mentioned challenges in object-centric tasks. We, then, situate our contributions appropriately in fast-paced literature, while supporting our claims with extensive experimental analysis leveraging up-to-date state-of-the-art results and community-adopted benchmarks
- âŠ