62 research outputs found

    3D Detection and Characterisation of ALMA Sources through Deep Learning

    Full text link
    We present a Deep-Learning (DL) pipeline developed for the detection and characterization of astronomical sources within simulated Atacama Large Millimeter/submillimeter Array (ALMA) data cubes. The pipeline is composed of six DL models: a Convolutional Autoencoder for source detection within the spatial domain of the integrated data cubes, a Recurrent Neural Network (RNN) for denoising and peak detection within the frequency domain, and four Residual Neural Networks (ResNets) for source characterization. The combination of spatial and frequency information improves completeness while decreasing spurious signal detection. To train and test the pipeline, we developed a simulation algorithm able to generate realistic ALMA observations, i.e. both sky model and dirty cubes. The algorithm simulates always a central source surrounded by fainter ones scattered within the cube. Some sources were spatially superimposed in order to test the pipeline deblending capabilities. The detection performances of the pipeline were compared to those of other methods and significant improvements in performances were achieved. Source morphologies are detected with subpixel accuracies obtaining mean residual errors of 10−310^{-3} pixel (0.10.1 mas) and 10−110^{-1} mJy/beam on positions and flux estimations, respectively. Projection angles and flux densities are also recovered within 10%10\% of the true values for 80%80\% and 73%73\% of all sources in the test set, respectively. While our pipeline is fine-tuned for ALMA data, the technique is applicable to other interferometric observatories, as SKA, LOFAR, VLBI, and VLTI

    Development of a real-time classifier for the identification of the Sit-To-Stand motion pattern

    Get PDF
    The Sit-to-Stand (STS) movement has significant importance in clinical practice, since it is an indicator of lower limb functionality. As an optimal trade-off between costs and accuracy, accelerometers have recently been used to synchronously recognise the STS transition in various Human Activity Recognition-based tasks. However, beyond the mere identification of the entire action, a major challenge remains the recognition of clinically relevant phases inside the STS motion pattern, due to the intrinsic variability of the movement. This work presents the development process of a deep-learning model aimed at recognising specific clinical valid phases in the STS, relying on a pool of 39 young and healthy participants performing the task under self-paced (SP) and controlled speed (CT). The movements were registered using a total of 6 inertial sensors, and the accelerometric data was labelised into four sequential STS phases according to the Ground Reaction Force profiles acquired through a force plate. The optimised architecture combined convolutional and recurrent neural networks into a hybrid approach and was able to correctly identify the four STS phases, both under SP and CT movements, relying on the single sensor placed on the chest. The overall accuracy estimate (median [95% confidence intervals]) for the hybrid architecture was 96.09 [95.37 - 96.56] in SP trials and 95.74 [95.39 \u2013 96.21] in CT trials. Moreover, the prediction delays ( 4533 ms) were compatible with the temporal characteristics of the dataset, sampled at 10 Hz (100 ms). These results support the implementation of the proposed model in the development of digital rehabilitation solutions able to synchronously recognise the STS movement pattern, with the aim of effectively evaluate and correct its execution

    Vision-based context-aware assistance for minimally invasive surgery.

    Get PDF
    Context-aware surgical system is a system that can collect surgical data and analyze the operating environment to guide responses for surgeons at any given time, which improves the efficiency, augment the performance and lowers the risk of minimally invasive surgery (MIS). It allows various applications through the whole patient care pathway, such as medical resources scheduling and report generation. Automatic surgical activities understanding is an essential component for building context-aware surgical system. However, analyzing surgical activities is a challenging task, because the operating environment is considerably complicated. Previous methods either require the additional devices or have limited ability to capture discriminating features from surgical data. This thesis aims to solve the challenges of surgical activities analysis and provide context-aware assistance for MIS. In our study, we consider the surgical visual data as the only input. Because videos and images own high-dimensional and representative features, and it is much easier to access than other data format, for example, kinematic information or motion trajectory. Following the granularity of surgical activity in a top-down manner, we first propose an attention-based multi-task framework to assess the expertise level and evaluate six standards for surgeons with different skill level in three fundamental surgical robotic tasks, namely suturing, knot tying and needle passing. Second, we present a symmetric dilated convolution structure embedded with self-attention kernel to jointly detect and segment fine-grained surgical gestures for surgical videos. In addition, we use the transformer encoder-decoder architecture with reinforcement learning to generate surgical instructions based on images. Overall, this thesis develops a series of novel deep learning frame- works to extract high-level semantic information from surgical video and image content to assist MIS, pushing the boundaries towards integrated context-aware system through the patient care pathway

    Phenotyping functional brain dynamics:A deep learning prespective on psychiatry

    Get PDF
    This thesis explores the potential of deep learning (DL) techniques combined with multi-site functional magnetic resonance imaging (fMRI) to enable automated diagnosis and biomarker discovery for psychiatric disorders. This marks a shift from the convention in the field of applying standard machine learning techniques on hand-crafted features from a single cohort.To enable this, we have focused on three main strategies: utilizing minimally pre-processed data to maintain spatio-temporal dynamics, developing sample-efficient DL models, and applying emerging DL training techniques like self-supervised and transfer learning to leverage large population-based datasets.Our empirical results suggest that DL models can sometimes outperform existing machine learning methods in diagnosing Autism Spectrum Disorder (ASD) and Major Depressive Disorder (MDD) from resting-state fMRI data, despite the smaller datasets and the high data dimensionality. Nonetheless, the generalization performance of these models is currently insufficient for clinical use, raising questions about the feasibility of applying supervised DL for diagnosis or biomarker discovery due to the highly heterogeneous nature of the disorders. Our findings suggest that normative modeling on functional brain dynamics provides a promising alternative to the current paradigm

    Sequential learning and shared representation for sensor-based human activity recognition

    Get PDF
    Human activity recognition based on sensor data has rapidly attracted considerable research attention due to its wide range of applications including senior monitoring, rehabilitation, and healthcare. These applications require accurate systems of human activity recognition to track and understand human behaviour. Yet, developing such accurate systems pose critical challenges and struggle to learn from temporal sequential sensor data due to the variations and complexity of human activities. The main challenges of developing human activity recognition are accuracy and robustness due to the diversity and similarity of human activities, skewed distribution of human activities, and also lack of a rich quantity of wellcurated human activity data. This thesis addresses these challenges by developing robust deep sequential learning models to boost the performance of human activity recognition and handle the imbalanced class problems as well as reduce the need for a large amount of annotated data. This thesis develops a set of new networks specifically designed for the challenges in building better HAR systems compared to the existing methods. First, this thesis proposes robust and sequential deep learning models to accurately recognise human activities and boost the performance of the human activity recognition systems against the current methods from smart home and wearable sensors collected data. The proposed methods integrate convolutional neural networks and different attention mechanisms to efficiently process human activity data and capture significant information for recognising human activities. Next, the thesis proposes methods to address the imbalanced class problems for human activity recognition systems. Joint learning of sequential deep learning algorithms, i.e., long short-term memory and convolutional neural networks is proposed to boost the performance of human activity recognition, particularly for infrequent human activities. In addition to that, also propose a data-level solution to address imbalanced class problems by extending the synthetic minority over-sampling technique (SMOTE) which we named (iSMOTE) to accurately label the generated synthetic samples. These methods have enhanced the results of the minority human activities and outperformed the current state-of-the-art methods. In this thesis, sequential deep learning networks are proposed to boost the performance of human activity recognition in addition to reducing the dependency for a rich quantity of well-curated human activity data by transfer learning techniques. A multi-domain learning network is proposed to process data from multi-domains, transfer knowledge across different but related domains of human activities and mitigate isolated learning paradigms using a shared representation. The advantage of the proposed method is firstly to reduce the need and effort for labelled data of the target domain. The proposed network uses the training data of the target domain with restricted size and the full training data of the source domain, yet provided better performance than using the full training data in a single domain setting. Secondly, the proposed method can be used for small datasets. Lastly, the proposed multidomain learning network reduces the training time by rendering a generic model for related domains compared to fitting a model for each domain separately. In addition, the thesis also proposes a self-supervised model to reduce the need for a considerable amount of annotated human activity data. The self-supervised method is pre-trained on the unlabeled data and fine-tuned on a small amount of labelled data for supervised learning. The proposed self-supervised pre-training network renders human activity representations that are semantically meaningful and provides a good initialization for supervised fine tuning. The developed network enhances the performance of human activity recognition in addition to minimizing the need for a considerable amount of labelled data. The proposed models are evaluated by multiple public and benchmark datasets of sensorbased human activities and compared with the existing state-of-the-art methods. The experimental results show that the proposed networks boost the performance of human activity recognition systems

    Automated liver tissues delineation based on machine learning techniques: A survey, current trends and future orientations

    Get PDF
    There is no denying how machine learning and computer vision have grown in the recent years. Their highest advantages lie within their automation, suitability, and ability to generate astounding results in a matter of seconds in a reproducible manner. This is aided by the ubiquitous advancements reached in the computing capabilities of current graphical processing units and the highly efficient implementation of such techniques. Hence, in this paper, we survey the key studies that are published between 2014 and 2020, showcasing the different machine learning algorithms researchers have used to segment the liver, hepatic-tumors, and hepatic-vasculature structures. We divide the surveyed studies based on the tissue of interest (hepatic-parenchyma, hepatic-tumors, or hepatic-vessels), highlighting the studies that tackle more than one task simultaneously. Additionally, the machine learning algorithms are classified as either supervised or unsupervised, and further partitioned if the amount of works that fall under a certain scheme is significant. Moreover, different datasets and challenges found in literature and websites, containing masks of the aforementioned tissues, are thoroughly discussed, highlighting the organizers original contributions, and those of other researchers. Also, the metrics that are used excessively in literature are mentioned in our review stressing their relevancy to the task at hand. Finally, critical challenges and future directions are emphasized for innovative researchers to tackle, exposing gaps that need addressing such as the scarcity of many studies on the vessels segmentation challenge, and why their absence needs to be dealt with in an accelerated manner.Comment: 41 pages, 4 figures, 13 equations, 1 table. A review paper on liver tissues segmentation based on automated ML-based technique

    Deployment of Deep Neural Networks on Dedicated Hardware Accelerators

    Get PDF
    Deep Neural Networks (DNNs) have established themselves as powerful tools for a wide range of complex tasks, for example computer vision or natural language processing. DNNs are notoriously demanding on compute resources and as a result, dedicated hardware accelerators for all use cases are developed. Different accelerators provide solutions from hyper scaling cloud environments for the training of DNNs to inference devices in embedded systems. They implement intrinsics for complex operations directly in hardware. A common example are intrinsics for matrix multiplication. However, there exists a gap between the ecosystems of applications for deep learning practitioners and hardware accelerators. HowDNNs can efficiently utilize the specialized hardware intrinsics is still mainly defined by human hardware and software experts. Methods to automatically utilize hardware intrinsics in DNN operators are a subject of active research. Existing literature often works with transformationdriven approaches, which aim to establish a sequence of program rewrites and data-layout transformations such that the hardware intrinsic can be used to compute the operator. However, the complexity this of task has not yet been explored, especially for less frequently used operators like Capsule Routing. And not only the implementation of DNN operators with intrinsics is challenging, also their optimization on the target device is difficult. Hardware-in-the-loop tools are often used for this problem. They use latency measurements of implementations candidates to find the fastest one. However, specialized accelerators can have memory and programming limitations, so that not every arithmetically correct implementation is a valid program for the accelerator. These invalid implementations can lead to unnecessary long the optimization time. This work investigates the complexity of transformation-driven processes to automatically embed hardware intrinsics into DNN operators. It is explored with a custom, graph-based intermediate representation (IR). While operators like Fully Connected Layers can be handled with reasonable effort, increasing operator complexity or advanced data-layout transformation can lead to scaling issues. Building on these insights, this work proposes a novel method to embed hardware intrinsics into DNN operators. It is based on a dataflow analysis. The dataflow embedding method allows the exploration of how intrinsics and operators match without explicit transformations. From the results it can derive the data layout and program structure necessary to compute the operator with the intrinsic. A prototype implementation for a dedicated hardware accelerator demonstrates state-of-the art performance for a wide range of convolutions, while being agnostic to the data layout. For some operators in the benchmark, the presented method can also generate alternative implementation strategies to improve hardware utilization, resulting in a geo-mean speed-up of ×2.813 while reducing the memory footprint. Lastly, by curating the initial set of possible implementations for the hardware-in-the-loop optimization, the median timeto- solution is reduced by a factor of ×2.40. At the same time, the possibility to have prolonged searches due a bad initial set of implementations is reduced, improving the optimization’s robustness by ×2.35

    Automatic sleep staging of EEG signals: recent development, challenges, and future directions.

    Get PDF
    Modern deep learning holds a great potential to transform clinical studies of human sleep. Teaching a machine to carry out routine tasks would be a tremendous reduction in workload for clinicians. Sleep staging, a fundamental step in sleep practice, is a suitable task for this and will be the focus in this article. Recently, automatic sleep-staging systems have been trained to mimic manual scoring, leading to similar performance to human sleep experts, at least on scoring of healthy subjects. Despite tremendous progress, we have not seen automatic sleep scoring adopted widely in clinical environments. This review aims to provide the shared view of the authors on the most recent state-of-the-art developments in automatic sleep staging, the challenges that still need to be addressed, and the future directions needed for automatic sleep scoring to achieve clinical value

    Data-driven depth and 3D architectural layout estimation of an interior environment from monocular panoramic input

    Get PDF
    Recent years have seen significant interest in the automatic 3D reconstruction of indoor scenes, leading to a distinct and very-active sub-field within 3D reconstruction. The main objective is to convert rapidly measured data representing real-world indoor environments into models encompassing geometric, structural, and visual abstractions. This thesis focuses on the particular subject of extracting geometric information from single panoramic images, using either visual data alone or sparse registered depth information. The appeal of this setup lies in the efficiency and cost-effectiveness of data acquisition using 360o images. The challenge, however, is that creating a comprehensive model from mostly visual input is extremely difficult, due to noise, missing data, and clutter. My research has concentrated on leveraging prior information, in the form of architectural and data-driven priors derived from large annotated datasets, to develop end-to-end deep learning solutions for specific tasks in the structured reconstruction pipeline. My first contribution consists in a deep neural network architecture for estimating a depth map from a single monocular indoor panorama, operating directly on the equirectangular projection. Leveraging the characteristics of indoor 360-degree images and recognizing the impact of gravity on indoor scene design, the network efficiently encodes the scene into vertical spherical slices. By exploiting long- and short- term relationships among these slices, it recovers an equirectangular depth map directly from the corresponding RGB image. My second contribution generalizes the approach to handle multimodal input, also covering the situation in which the equirectangular input image is paired with a sparse depth map, as provided from common capture setups. Depth is inferred using an efficient single-branch network with a dynamic gating system, processing both dense visual data and sparse geometric data. Additionally, a new augmentation strategy enhances the model's robustness to various types of sparsity, including those from structured light sensors and LiDAR setups. While the first two contributions focus on per-pixel geometric information, my third contribution addresses the recovery of the 3D shape of permanent room surfaces from a single panoramic image. Unlike previous methods, this approach tackles the problem in 3D, expanding the reconstruction space. It employs a graph convolutional network to directly infer the room structure as a 3D mesh, deforming a graph- encoded tessellated sphere mapped to the spherical panorama. Gravity- aligned features are actively incorporated using a projection layer with multi-head self-attention, and specialized losses guide plausible solutions in the presence of clutter and occlusions. The benchmarks on publicly available data show that all three methods provided significant improvements over the state-of-the-art

    MRI Artefact Augmentation: Robust Deep Learning Systems and Automated Quality Control

    Get PDF
    Quality control (QC) of magnetic resonance imaging (MRI) is essential to establish whether a scan or dataset meets a required set of standards. In MRI, many potential artefacts must be identified so that problematic images can either be excluded or accounted for in further image processing or analysis. To date, the gold standard for the identification of these issues is visual inspection by experts. A primary source of MRI artefacts is caused by patient movement, which can affect clinical diagnosis and impact the accuracy of Deep Learning systems. In this thesis, I present a method to simulate motion artefacts from artefact-free images to augment convolutional neural networks (CNNs), increasing training appearance variability and robustness to motion artefacts. I show that models trained with artefact augmentation generalise better and are more robust to real-world artefacts, with negligible cost to performance on clean data. I argue that it is often better to optimise frameworks end-to-end with artefact augmentation rather than learning to retrospectively remove artefacts, thus enforcing robustness to artefacts at the feature level representation of the data. The labour-intensive and subjective nature of QC has increased interest in automated methods. To address this, I approach MRI quality estimation as the uncertainty in performing a downstream task, using probabilistic CNNs to predict segmentation uncertainty as a function of the input data. Extending this framework, I introduce a novel decoupled uncertainty model, enabling separate uncertainty predictions for different types of image degradation. Training with an extended k-space artefact augmentation pipeline, the model provides informative measures of uncertainty on problematic real-world scans classified by QC raters and enables sources of segmentation uncertainty to be identified. Suitable quality for algorithmic processing may differ from an image's perceptual quality. Exploring this, I pose MRI visual quality assessment as an image restoration task. Using Bayesian CNNs to recover clean images from noisy data, I show that the uncertainty indicates the possible recoverability of an image. A multi-task network combining uncertainty-aware artefact recovery with tissue segmentation highlights the distinction between visual and algorithmic quality, which has the impact that, depending on the downstream task, less data should be discarded for purely visual quality reasons
    • …
    corecore