86 research outputs found
Proceedings of the 8th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2023)
This volume gathers the papers presented at the Detection and Classification of Acoustic Scenes and Events 2023 Workshop (DCASE2023), Tampere, Finland, during 21–22 September 2023
Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images
Stable diffusion, a generative model used in text-to-image synthesis,
frequently encounters resolution-induced composition problems when generating
images of varying sizes. This issue primarily stems from the model being
trained on pairs of single-scale images and their corresponding text
descriptions. Moreover, direct training on images of unlimited sizes is
unfeasible, as it would require an immense number of text-image pairs and
entail substantial computational expenses. To overcome these challenges, we
propose a two-stage pipeline named Any-Size-Diffusion (ASD), designed to
efficiently generate well-composed images of any size, while minimizing the
need for high-memory GPU resources. Specifically, the initial stage, dubbed Any
Ratio Adaptability Diffusion (ARAD), leverages a selected set of images with a
restricted range of ratios to optimize the text-conditional diffusion model,
thereby improving its ability to adjust composition to accommodate diverse
image sizes. To support the creation of images at any desired size, we further
introduce a technique called Fast Seamless Tiled Diffusion (FSTD) at the
subsequent stage. This method allows for the rapid enlargement of the ASD
output to any high-resolution size, avoiding seaming artifacts or memory
overloads. Experimental results on the LAION-COCO and MM-CelebA-HQ benchmarks
demonstrate that ASD can produce well-structured images of arbitrary sizes,
cutting down the inference time by 2x compared to the traditional tiled
algorithm
Offline and Online Interactive Frameworks for MRI and CT Image Analysis in the Healthcare Domain : The Case of COVID-19, Brain Tumors and Pancreatic Tumors
Medical imaging represents the organs, tissues and structures underneath the outer layers of skin and bones etc. and stores information on normal anatomical structures for abnormality detection and diagnosis. In this thesis, tools and techniques are used to automate the analysis of medical images, emphasizing the detection of brain tumor anomalies from brain MRIs, Covid infections from lung CT images and pancreatic tumor from pancreatic CT images. Image processing methods such as filtering and thresholding models, geometry models, graph models, region-based analysis, connected component analysis, machine learning models, and recent deep learning models are used. The following problems for medical images : abnormality detection, abnormal region segmentation, interactive user interface to represent the results of detection and segmentation while receiving feedbacks from healthcare professionals to improve the analysis procedure, and finally report generation, are considered in this research. Complete interactive systems containing conventional models, machine learning, and deep learning methods for different types of medical abnormalities have been proposed and developed in this thesis. The experimental results show promising outcomes that has led to the incorporation of the methods for the proposed solutions based on the observations of the performance metrics and their comparisons. Although currently separate systems have been developed for brain tumor, Covid and pancreatic cancer, the success of the developed systems show a promising potential to combine them to form a generalized system for analyzing medical imaging of different types collected from any organs to detect any type of abnormalities
Aiding the conservation of two wooden Buddhist sculptures with 3D imaging and spectroscopic techniques
The conservation of Buddhist sculptures that were transferred to Europe at some point during their lifetime raises numerous questions: while these objects historically served a religious, devotional purpose, many of them currently belong to museums or private collections, where they are detached from their original context and often adapted to western taste.
A scientific study was carried out to address questions from Museo d'Arte Orientale of Turin curators in terms of whether these artifacts might be forgeries or replicas, and how they may have transformed over time. Several analytical techniques were used for materials identification and to study the production technique, ultimately aiming to discriminate the original materials from those added within later interventions
Deep Learning based Novel Anomaly Detection Methods for Diabetic Retinopathy Screening
Programa Oficial de Doutoramento en Computación. 5009V01[Abstract] Computer-Aided Screening (CAS) systems are getting popularity in disease diagnosis. Modern CAS systems exploit data driven machine learning algorithms including supervised and unsupervised methods.
In medical imaging, annotating pathological samples are much harder and time consuming work than healthy samples. Therefore, there is always an abundance of healthy samples and scarcity of annotated and labelled pathological samples. Unsupervised anomaly detection algorithms
can be implemented for the development of CAS system using the largely available healthy samples, especially when disease/nodisease decision is important for screening.
This thesis proposes unsupervised machine learning methodologies for anomaly detection in retinal fundus images. A novel patchbased image reconstructor architecture for DR detection is presented, that addresses the shortcomings of standard autoencoders-based reconstructors.
Furthermore, a full-size image based anomaly map generation methodology is presented, where the potential DR lesions can be visualized at the pixel-level. Afterwards, a novel methodology is proposed to extend the patch-based architecture to a fully-convolutional
architecture for one-shot full-size image reconstruction. Finally, a novel methodology for supervised DR classification is proposed that utilizes the anomaly maps
WiFi-Based Human Activity Recognition Using Attention-Based BiLSTM
Recently, significant efforts have been made to explore human activity recognition (HAR) techniques that use information gathered by existing indoor wireless infrastructures through WiFi signals without demanding the monitored subject to carry a dedicated device. The key intuition is that different activities introduce different multi-paths in WiFi signals and generate different patterns in the time series of channel state information (CSI). In this paper, we propose and evaluate a full pipeline for a CSI-based human activity recognition framework for 12 activities in three different spatial environments using two deep learning models: ABiLSTM and CNN-ABiLSTM. Evaluation experiments have demonstrated that the proposed models outperform state-of-the-art models. Also, the experiments show that the proposed models can be applied to other environments with different configurations, albeit with some caveats. The proposed ABiLSTM model achieves an overall accuracy of 94.03%, 91.96%, and 92.59% across the 3 target environments. While the proposed CNN-ABiLSTM model reaches an accuracy of 98.54%, 94.25% and 95.09% across those same environments
Quilt-1M: One Million Image-Text Pairs for Histopathology
Recent accelerations in multi-modal applications have been made possible with
the plethora of image and text data available online. However, the scarcity of
analogous data in the medical field, specifically in histopathology, has halted
comparable progress. To enable similar representation learning for
histopathology, we turn to YouTube, an untapped resource of videos, offering
hours of valuable educational histopathology videos from expert
clinicians. From YouTube, we curate Quilt: a large-scale vision-language
dataset consisting of image and text pairs. Quilt was automatically
curated using a mixture of models, including large language models, handcrafted
algorithms, human knowledge databases, and automatic speech recognition. In
comparison, the most comprehensive datasets curated for histopathology amass
only around K samples. We combine Quilt with datasets from other sources,
including Twitter, research papers, and the internet in general, to create an
even larger dataset: Quilt-1M, with M paired image-text samples, marking it
as the largest vision-language histopathology dataset to date. We demonstrate
the value of Quilt-1M by fine-tuning a pre-trained CLIP model. Our model
outperforms state-of-the-art models on both zero-shot and linear probing tasks
for classifying new histopathology images across diverse patch-level
datasets of different sub-pathologies and cross-modal retrieval tasks
State of the Art on Diffusion Models for Visual Computing
The field of visual computing is rapidly advancing due to the emergence of
generative artificial intelligence (AI), which unlocks unprecedented
capabilities for the generation, editing, and reconstruction of images, videos,
and 3D scenes. In these domains, diffusion models are the generative AI
architecture of choice. Within the last year alone, the literature on
diffusion-based tools and applications has seen exponential growth and relevant
papers are published across the computer graphics, computer vision, and AI
communities with new works appearing daily on arXiv. This rapid growth of the
field makes it difficult to keep up with all recent developments. The goal of
this state-of-the-art report (STAR) is to introduce the basic mathematical
concepts of diffusion models, implementation details and design choices of the
popular Stable Diffusion model, as well as overview important aspects of these
generative AI tools, including personalization, conditioning, inversion, among
others. Moreover, we give a comprehensive overview of the rapidly growing
literature on diffusion-based generation and editing, categorized by the type
of generated medium, including 2D images, videos, 3D objects, locomotion, and
4D scenes. Finally, we discuss available datasets, metrics, open challenges,
and social implications. This STAR provides an intuitive starting point to
explore this exciting topic for researchers, artists, and practitioners alike
- …