226 research outputs found
Cashew dataset generation using augmentation and RaLSGAN and a transfer learning based tinyML approach towards disease detection
Cashew is one of the most extensively consumed nuts in the world, and it is
also known as a cash crop. A tree may generate a substantial yield in a few
months and has a lifetime of around 70 to 80 years. Yet, in addition to the
benefits, there are certain constraints to its cultivation. With the exception
of parasites and algae, anthracnose is the most common disease affecting trees.
When it comes to cashew, the dense structure of the tree makes it difficult to
diagnose the disease with ease compared to short crops. Hence, we present a
dataset that exclusively consists of healthy and diseased cashew leaves and
fruits. The dataset is authenticated by adding RGB color transformation to
highlight diseased regions, photometric and geometric augmentations, and
RaLSGAN to enlarge the initial collection of images and boost performance in
real-time situations when working with a constrained dataset. Further, transfer
learning is used to test the classification efficiency of the dataset using
algorithms such as MobileNet and Inception. TensorFlow lite is utilized to
develop these algorithms for disease diagnosis utilizing drones in real-time.
Several post-training optimization strategies are utilized, and their memory
size is compared. They have proven their effectiveness by delivering high
accuracy (up to 99%) and a decrease in memory and latency, making them ideal
for use in applications with limited resources
Реставрація зображень методом super-resolution з використанням згорткових нейронних мереж
The main goal of the super resolution method is to create a higher resolution image from a lower resolution image. High-resolution images provide a high pixel density, hence more detail in the original image. The need for high resolution is widespread in computer vision techniques, pattern recognition applications, or general image analysis. However, high-resolution images are not always available. This is due to the fact that the conversion processes and processing methods require ultra-powerful processes, and the equipment for obtaining high-resolution images is expensive. These problems can be overcome by using image processing algorithms that are relatively inexpensive, which has led to the concept of super-resolution. This has the advantage that it can cost less and existing low-resolution imaging systems are readily available. High resolution is essential in medical imaging for diagnosis. Many applications require zooming into a specific image area, where high resolution becomes essential, such as surveillance, forensics, and satellite imaging. The method is presented in this paper, using a convolutional neural network to reproduce super-resolution images, directly performs the conversion from a low-resolution image to an image similar to the original. To speed up the output time, the proposed method performs most computational operations in low-resolution space, while reducing the sampling does not lead to information loss. The main task of the neural network is to reconstruct the distorted image and find the ideal reconstruction function, according to which, in fact, a neural network of a simple structure creates high-quality images with better performance, such as resolution, signal-to-noise ratio, with less time spent on image restoration. During the experiment, we determined an algorithm by which the proposed neural network can reconstruct any image with different types of distortion. The super-resolution method is implemented using the python 3.6 programming language and the tensorflow and tensorlayer software modules for convolutional neural networks. Graphical data of signal-to-noise ratio, structural similarity, and loss plots are obtained using the tensorboardX module.Головна мета методу супер роздільної здатності (super-resolution) полягає у створенні зображення більш високої роздільної здатності з зображень нижчої роздільної здатності. Зображення високої роздільної здатності забезпечують високу щільність пікселів, отже, більше деталей на вихідному зображені. Необхідність високої роздільної здатності широко поширена у методах комп'ютерного зору, в програмах для розпізнавання образів або звичайного аналізу зображень. Проте зображення високої роздільної здатності не завжди є доступними. Це пов'язано з тим, що процеси перетворення та методи для обробки вимагають надпотужних процесів, тому і обладнання для отримання зображень високої роздільної здатності виявляється дорогим. Ці проблеми можуть бути подолані за допомогою алгоритмів обробки зображень, які є відносно недорогими, що призвело до появи концепції надрозв'язання. Це дає перевагу, тому що може коштувати дешевше, а існуючі системи візуалізації з низькою роздільною здатністю є достатньо доступними. Висока роздільна здатність має велике значення у медичній візуалізації для діагностики. Багато програм вимагають масштабування конкретної області зображення, при цьому висока роздільна здатність стає необхідною, наприклад, для спостереження, криміналістики та супутникової візуалізації. Наведений в роботі метод з використанням згорткової нейронної мережі для відтворення зображень супер роздільної здатності напряму виконує перетворення з низького зображення на зображення подібне до оригіналу. Щоб прискорити час виходу, запропонований метод виконує більшість обчислювальних операцій у просторі з низьким дозволом та при цьому зменшення дискретизації не призводить до втрати інформації. Головна задача роботи нейронної мережі полягає в реконструкції спотвореного зображення та пошуку ідеальної функції відтворення, по якій, власне, нейронна мережа простої структури створює якісні зображення з кращими показниками, таким як роздільна здатність, співвідношення сигнал/шум, менші часові витрати на відновлення зображення. Під час експеременту було визначено алгоритм, по якому запропонована нейронна мережа може реконструювати будь-яке зображення, з різними видами спотворень. Метод super-resolution був реалізований з використанням мови програмування python 3.6 та програмних модулів для згорткових нейронних мереж tensorflow та tensorlayer. Графічні данні співвідношення сигнал/шум, структурної подібності та графіки втрат були отриманні за допомогою модулю tensorboardX
A Systematic Literature Review of Drone Utility in Railway Condition Monitoring
Raj Bridgelall is the program director for the Upper Great Plains Transportation Institute (UGPTI) Center for Surface Mobility Applications & Real-time Simulation environments (SMARTSeSM).Drones have recently become a new tool in railway inspection and monitoring (RIM) worldwide, but there is still a lack of information about the specific benefits and costs. This study conducts a systematic literature review (SLR) of the applications, opportunities, and challenges of using drones for RIM. The SLR technique yielded 47 articles filtered from 7,900 publications from 2014 to 2022. The SLR found that key motivations for using drones in RIM are to reduce costs, improve safety, save time, improve mobility, increase flexibility, and enhance reliability. Nearly all the applications fit into the categories of defect identification, situation assessment, rail network mapping, infrastructure asset monitoring, track condition monitoring, and obstruction detection. The authors assessed the open technical, safety, and regulatory challenges. The authors also contributed a cost analysis framework, identified factors that affect drone performance in RIM, and offered implications for new theories, management, and impacts to society.The authors conducted this work with support from North Dakota State University and the Mountain-Plains Consortium, a University Transportation Center funded by the U.S. Department of Transportation.https://www.ugpti.org/about/staff/viewbio.php?id=7
Computer Vision Detection of Explosive Ordnance: A High-Performance 9N235/9N210 Cluster Submunition Detector
The detection of explosive ordnance (EO) objects is experiencing a period of innovation driven by the convergence of new technologies including artificial intelligence (AI) and machine learning, open-source intelligence (OSINT) processing, and remote mobility capabilities such as drones and robotics.1 Advances are being made on at least two tracks: in the automated searching of photographic image archives, and in the real-time detection of objects in the field.2 Different technologies are responsive to different types of EO detection challenges, such as objects that are buried, semi-buried, or partially damaged. Computer vision—a type of artificial intelligence (AI) that enables computers and systems to derive meaningful information from digital images, videos, and other visual inputs, and take actions or make recommendations based on that information—is a promising AI technology that can greatly enhance humanitarian mine action (HMA), as well as support evidentiary documentation of the use of EO that are prohibited under international humanitarian law. This article describes a computer vision algorithm creation workflow developed to automate the detection of the 9N235/9N210 cluster submunition, a heavily deployed munition in the Ukraine conflict. The six-step process described here incorporates photography, photogrammetry, 3D-rendering, 3D-printing, and deep convolutional neural networks.3 The resulting high-performance detector can be deployed for searching and filtering images generated as part of OSINT investigations and soon, for real-time field detection objectives
Scene representation and matching for visual localization in hybrid camera scenarios
Scene representation and matching are crucial steps in a variety of tasks ranging from 3D reconstruction to virtual/augmented/mixed reality applications, to robotics, and others. While approaches exist that tackle these tasks, they mostly overlook the issue of efficiency in the scene representation, which is fundamental in resource-constrained systems and for increasing computing speed. Also, they normally assume the use of projective cameras, while performance on systems based on other camera geometries remains suboptimal. This dissertation contributes with a new efficient scene representation method that dramatically reduces the number of 3D points. The approach sets up an optimization problem for the automated selection of the most relevant points to retain. This leads to a constrained quadratic program, which is solved optimally with a newly introduced variant of the sequential minimal optimization method. In addition, a new initialization approach is introduced for the fast convergence of the method. Extensive experimentation on public benchmark datasets demonstrates that the approach produces a compressed scene representation quickly while delivering accurate pose estimates.
The dissertation also contributes with new methods for scene matching that go beyond the use of projective cameras. Alternative camera geometries, like fisheye cameras, produce images with very high distortion, making current image feature point detectors and descriptors less efficient, since designed for projective cameras. New methods based on deep learning are introduced to address this problem, where feature detectors and descriptors can overcome distortion effects and more effectively perform feature matching between pairs of fisheye images, and also between hybrid pairs of fisheye and perspective images. Due to the limited availability of fisheye-perspective image datasets, three datasets were collected for training and testing the methods. The results demonstrate an increase of the detection and matching rates which outperform the current state-of-the-art methods
Fearless Luminance Adaptation: A Macro-Micro-Hierarchical Transformer for Exposure Correction
Photographs taken with less-than-ideal exposure settings often display poor
visual quality. Since the correction procedures vary significantly, it is
difficult for a single neural network to handle all exposure problems.
Moreover, the inherent limitations of convolutions, hinder the models ability
to restore faithful color or details on extremely over-/under- exposed regions.
To overcome these limitations, we propose a Macro-Micro-Hierarchical
transformer, which consists of a macro attention to capture long-range
dependencies, a micro attention to extract local features, and a hierarchical
structure for coarse-to-fine correction. In specific, the complementary
macro-micro attention designs enhance locality while allowing global
interactions. The hierarchical structure enables the network to correct
exposure errors of different scales layer by layer. Furthermore, we propose a
contrast constraint and couple it seamlessly in the loss function, where the
corrected image is pulled towards the positive sample and pushed away from the
dynamically generated negative samples. Thus the remaining color distortion and
loss of detail can be removed. We also extend our method as an image enhancer
for low-light face recognition and low-light semantic segmentation. Experiments
demonstrate that our approach obtains more attractive results than
state-of-the-art methods quantitatively and qualitatively.Comment: Accepted by ACM MM 202
Share and multiply: modeling communication and generated traffic in private WhatsApp groups
Group-based communication is a highly popular communication paradigm, which is especially prominent in mobile instant messaging (MIM) applications, such as WhatsApp. Chat groups in MIM applications facilitate the sharing of various types of messages (e.g., text, voice, image, video) among a large number of participants. As each message has to be transmitted to every other member of the group, which multiplies the traffic, this has a massive impact on the underlying communication networks. However, most chat groups are private and network operators cannot obtain deep insights into MIM communication via network measurements due to end-to-end encryption. Thus, the generation of traffic is not well understood, given that it depends on sizes of communication groups, speed of communication, and exchanged message types. In this work, we provide a huge data set of 5,956 private WhatsApp chat histories, which contains over 76 million messages from more than 117,000 users. We describe and model the properties of chat groups and users, and the communication within these chat groups, which gives unprecedented insights into private MIM communication. In addition, we conduct exemplary measurements for the most popular message types, which empower the provided models to estimate the traffic over time in a chat group
Efficient image-based rendering
Recent advancements in real-time ray tracing and deep learning have significantly enhanced the realism of computer-generated images. However, conventional 3D computer graphics (CG) can still be time-consuming and resource-intensive, particularly when creating photo-realistic simulations of complex or animated scenes. Image-based rendering (IBR) has emerged as an alternative approach that utilizes pre-captured images from the real world to generate realistic images in real-time, eliminating the need for extensive modeling. Although IBR has its advantages, it faces challenges in providing the same level of control over scene attributes as traditional CG pipelines and accurately reproducing complex scenes and objects with different materials, such as transparent objects. This thesis endeavors to address these issues by harnessing the power of deep learning and incorporating the fundamental principles of graphics and physical-based rendering. It offers an efficient solution that enables interactive manipulation of real-world dynamic scenes captured from sparse views, lighting positions, and times, as well as a physically-based approach that facilitates accurate reproduction of the view dependency effect resulting from the interaction between transparent objects and their surrounding environment. Additionally, this thesis develops a visibility metric that can identify artifacts in the reconstructed IBR images without observing the reference image, thereby contributing to the design of an effective IBR acquisition pipeline. Lastly, a perception-driven rendering technique is developed to provide high-fidelity visual content in virtual reality displays while retaining computational efficiency.Jüngste Fortschritte im Bereich Echtzeit-Raytracing und Deep Learning haben den Realismus computergenerierter Bilder erheblich verbessert. Konventionelle 3DComputergrafik (CG) kann jedoch nach wie vor zeit- und ressourcenintensiv sein, insbesondere bei der Erstellung fotorealistischer Simulationen von komplexen oder animierten Szenen. Das bildbasierte Rendering (IBR) hat sich als alternativer Ansatz herauskristallisiert, bei dem vorab aufgenommene Bilder aus der realen Welt verwendet werden, um realistische Bilder in Echtzeit zu erzeugen, so dass keine umfangreiche Modellierung erforderlich ist. Obwohl IBR seine Vorteile hat, ist es eine Herausforderung, das gleiche Maß an Kontrolle über Szenenattribute zu bieten wie traditionelle CG-Pipelines und komplexe Szenen und Objekte mit unterschiedlichen Materialien, wie z.B. transparente Objekte, akkurat wiederzugeben. In dieser Arbeit wird versucht, diese Probleme zu lösen, indem die Möglichkeiten des Deep Learning genutzt und die grundlegenden Prinzipien der Grafik und des physikalisch basierten Renderings einbezogen werden. Sie bietet eine effiziente Lösung, die eine interaktive Manipulation von dynamischen Szenen aus der realen Welt ermöglicht, die aus spärlichen Ansichten, Beleuchtungspositionen und Zeiten erfasst wurden, sowie einen physikalisch basierten Ansatz, der eine genaue Reproduktion des Effekts der Sichtabhängigkeit ermöglicht, der sich aus der Interaktion zwischen transparenten Objekten und ihrer Umgebung ergibt. Darüber hinaus wird in dieser Arbeit eine Sichtbarkeitsmetrik entwickelt, mit der Artefakte in den rekonstruierten IBR-Bildern identifiziert werden können, ohne das Referenzbild zu betrachten, und die somit zur Entwicklung einer effektiven IBR-Erfassungspipeline beiträgt. Schließlich wird ein wahrnehmungsgesteuertes Rendering-Verfahren entwickelt, um visuelle Inhalte in Virtual-Reality-Displays mit hoherWiedergabetreue zu liefern und gleichzeitig die Rechenleistung zu erhalten
CNN Injected Transformer for Image Exposure Correction
Capturing images with incorrect exposure settings fails to deliver a
satisfactory visual experience. Only when the exposure is properly set, can the
color and details of the images be appropriately preserved. Previous exposure
correction methods based on convolutions often produce exposure deviation in
images as a consequence of the restricted receptive field of convolutional
kernels. This issue arises because convolutions are not capable of capturing
long-range dependencies in images accurately. To overcome this challenge, we
can apply the Transformer to address the exposure correction problem,
leveraging its capability in modeling long-range dependencies to capture global
representation. However, solely relying on the window-based Transformer leads
to visually disturbing blocking artifacts due to the application of
self-attention in small patches. In this paper, we propose a CNN Injected
Transformer (CIT) to harness the individual strengths of CNN and Transformer
simultaneously. Specifically, we construct the CIT by utilizing a window-based
Transformer to exploit the long-range interactions among different regions in
the entire image. Within each CIT block, we incorporate a channel attention
block (CAB) and a half-instance normalization block (HINB) to assist the
window-based self-attention to acquire the global statistics and refine local
features. In addition to the hybrid architecture design for exposure
correction, we apply a set of carefully formulated loss functions to improve
the spatial coherence and rectify potential color deviations. Extensive
experiments demonstrate that our image exposure correction method outperforms
state-of-the-art approaches in terms of both quantitative and qualitative
metrics
The Journal of Conventional Weapons Destruction Issue 27.2
Updates on recent enhancements to IMAS. Food security and its connection to mine action as it applies to Ukraine. Digital EORE as a small NGO in mine action. A case study on moving beyond do no harm in environmental mainstreaming in mine action. Efforts of JICA and CMAC in fostering South-South cooperation in mine action. UAV Lidar imaging in mine action to detect and map minefields in Angola. Land disputes and rights in mine action. Computer vision detection of explosive ordnance
- …