86 research outputs found

    Proceedings of the 8th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2023)

    Get PDF
    This volume gathers the papers presented at the Detection and Classification of Acoustic Scenes and Events 2023 Workshop (DCASE2023), Tampere, Finland, during 21–22 September 2023

    Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images

    Full text link
    Stable diffusion, a generative model used in text-to-image synthesis, frequently encounters resolution-induced composition problems when generating images of varying sizes. This issue primarily stems from the model being trained on pairs of single-scale images and their corresponding text descriptions. Moreover, direct training on images of unlimited sizes is unfeasible, as it would require an immense number of text-image pairs and entail substantial computational expenses. To overcome these challenges, we propose a two-stage pipeline named Any-Size-Diffusion (ASD), designed to efficiently generate well-composed images of any size, while minimizing the need for high-memory GPU resources. Specifically, the initial stage, dubbed Any Ratio Adaptability Diffusion (ARAD), leverages a selected set of images with a restricted range of ratios to optimize the text-conditional diffusion model, thereby improving its ability to adjust composition to accommodate diverse image sizes. To support the creation of images at any desired size, we further introduce a technique called Fast Seamless Tiled Diffusion (FSTD) at the subsequent stage. This method allows for the rapid enlargement of the ASD output to any high-resolution size, avoiding seaming artifacts or memory overloads. Experimental results on the LAION-COCO and MM-CelebA-HQ benchmarks demonstrate that ASD can produce well-structured images of arbitrary sizes, cutting down the inference time by 2x compared to the traditional tiled algorithm

    Offline and Online Interactive Frameworks for MRI and CT Image Analysis in the Healthcare Domain : The Case of COVID-19, Brain Tumors and Pancreatic Tumors

    Get PDF
    Medical imaging represents the organs, tissues and structures underneath the outer layers of skin and bones etc. and stores information on normal anatomical structures for abnormality detection and diagnosis. In this thesis, tools and techniques are used to automate the analysis of medical images, emphasizing the detection of brain tumor anomalies from brain MRIs, Covid infections from lung CT images and pancreatic tumor from pancreatic CT images. Image processing methods such as filtering and thresholding models, geometry models, graph models, region-based analysis, connected component analysis, machine learning models, and recent deep learning models are used. The following problems for medical images : abnormality detection, abnormal region segmentation, interactive user interface to represent the results of detection and segmentation while receiving feedbacks from healthcare professionals to improve the analysis procedure, and finally report generation, are considered in this research. Complete interactive systems containing conventional models, machine learning, and deep learning methods for different types of medical abnormalities have been proposed and developed in this thesis. The experimental results show promising outcomes that has led to the incorporation of the methods for the proposed solutions based on the observations of the performance metrics and their comparisons. Although currently separate systems have been developed for brain tumor, Covid and pancreatic cancer, the success of the developed systems show a promising potential to combine them to form a generalized system for analyzing medical imaging of different types collected from any organs to detect any type of abnormalities

    Aiding the conservation of two wooden Buddhist sculptures with 3D imaging and spectroscopic techniques

    Get PDF
    The conservation of Buddhist sculptures that were transferred to Europe at some point during their lifetime raises numerous questions: while these objects historically served a religious, devotional purpose, many of them currently belong to museums or private collections, where they are detached from their original context and often adapted to western taste. A scientific study was carried out to address questions from Museo d'Arte Orientale of Turin curators in terms of whether these artifacts might be forgeries or replicas, and how they may have transformed over time. Several analytical techniques were used for materials identification and to study the production technique, ultimately aiming to discriminate the original materials from those added within later interventions

    Deep Learning based Novel Anomaly Detection Methods for Diabetic Retinopathy Screening

    Get PDF
    Programa Oficial de Doutoramento en Computación. 5009V01[Abstract] Computer-Aided Screening (CAS) systems are getting popularity in disease diagnosis. Modern CAS systems exploit data driven machine learning algorithms including supervised and unsupervised methods. In medical imaging, annotating pathological samples are much harder and time consuming work than healthy samples. Therefore, there is always an abundance of healthy samples and scarcity of annotated and labelled pathological samples. Unsupervised anomaly detection algorithms can be implemented for the development of CAS system using the largely available healthy samples, especially when disease/nodisease decision is important for screening. This thesis proposes unsupervised machine learning methodologies for anomaly detection in retinal fundus images. A novel patchbased image reconstructor architecture for DR detection is presented, that addresses the shortcomings of standard autoencoders-based reconstructors. Furthermore, a full-size image based anomaly map generation methodology is presented, where the potential DR lesions can be visualized at the pixel-level. Afterwards, a novel methodology is proposed to extend the patch-based architecture to a fully-convolutional architecture for one-shot full-size image reconstruction. Finally, a novel methodology for supervised DR classification is proposed that utilizes the anomaly maps

    WiFi-Based Human Activity Recognition Using Attention-Based BiLSTM

    Get PDF
    Recently, significant efforts have been made to explore human activity recognition (HAR) techniques that use information gathered by existing indoor wireless infrastructures through WiFi signals without demanding the monitored subject to carry a dedicated device. The key intuition is that different activities introduce different multi-paths in WiFi signals and generate different patterns in the time series of channel state information (CSI). In this paper, we propose and evaluate a full pipeline for a CSI-based human activity recognition framework for 12 activities in three different spatial environments using two deep learning models: ABiLSTM and CNN-ABiLSTM. Evaluation experiments have demonstrated that the proposed models outperform state-of-the-art models. Also, the experiments show that the proposed models can be applied to other environments with different configurations, albeit with some caveats. The proposed ABiLSTM model achieves an overall accuracy of 94.03%, 91.96%, and 92.59% across the 3 target environments. While the proposed CNN-ABiLSTM model reaches an accuracy of 98.54%, 94.25% and 95.09% across those same environments

    Quilt-1M: One Million Image-Text Pairs for Histopathology

    Full text link
    Recent accelerations in multi-modal applications have been made possible with the plethora of image and text data available online. However, the scarcity of analogous data in the medical field, specifically in histopathology, has halted comparable progress. To enable similar representation learning for histopathology, we turn to YouTube, an untapped resource of videos, offering 1,0871,087 hours of valuable educational histopathology videos from expert clinicians. From YouTube, we curate Quilt: a large-scale vision-language dataset consisting of 768,826768,826 image and text pairs. Quilt was automatically curated using a mixture of models, including large language models, handcrafted algorithms, human knowledge databases, and automatic speech recognition. In comparison, the most comprehensive datasets curated for histopathology amass only around 200200K samples. We combine Quilt with datasets from other sources, including Twitter, research papers, and the internet in general, to create an even larger dataset: Quilt-1M, with 11M paired image-text samples, marking it as the largest vision-language histopathology dataset to date. We demonstrate the value of Quilt-1M by fine-tuning a pre-trained CLIP model. Our model outperforms state-of-the-art models on both zero-shot and linear probing tasks for classifying new histopathology images across 1313 diverse patch-level datasets of 88 different sub-pathologies and cross-modal retrieval tasks

    State of the Art on Diffusion Models for Visual Computing

    Full text link
    The field of visual computing is rapidly advancing due to the emergence of generative artificial intelligence (AI), which unlocks unprecedented capabilities for the generation, editing, and reconstruction of images, videos, and 3D scenes. In these domains, diffusion models are the generative AI architecture of choice. Within the last year alone, the literature on diffusion-based tools and applications has seen exponential growth and relevant papers are published across the computer graphics, computer vision, and AI communities with new works appearing daily on arXiv. This rapid growth of the field makes it difficult to keep up with all recent developments. The goal of this state-of-the-art report (STAR) is to introduce the basic mathematical concepts of diffusion models, implementation details and design choices of the popular Stable Diffusion model, as well as overview important aspects of these generative AI tools, including personalization, conditioning, inversion, among others. Moreover, we give a comprehensive overview of the rapidly growing literature on diffusion-based generation and editing, categorized by the type of generated medium, including 2D images, videos, 3D objects, locomotion, and 4D scenes. Finally, we discuss available datasets, metrics, open challenges, and social implications. This STAR provides an intuitive starting point to explore this exciting topic for researchers, artists, and practitioners alike

    Occupancy Analysis of the Outdoor Football Fields

    Get PDF
    • …
    corecore