31 research outputs found

    Super Resolution of Wavelet-Encoded Images and Videos

    Get PDF
    In this dissertation, we address the multiframe super resolution reconstruction problem for wavelet-encoded images and videos. The goal of multiframe super resolution is to obtain one or more high resolution images by fusing a sequence of degraded or aliased low resolution images of the same scene. Since the low resolution images may be unaligned, a registration step is required before super resolution reconstruction. Therefore, we first explore in-band (i.e. in the wavelet-domain) image registration; then, investigate super resolution. Our motivation for analyzing the image registration and super resolution problems in the wavelet domain is the growing trend in wavelet-encoded imaging, and wavelet-encoding for image/video compression. Due to drawbacks of widely used discrete cosine transform in image and video compression, a considerable amount of literature is devoted to wavelet-based methods. However, since wavelets are shift-variant, existing methods cannot utilize wavelet subbands efficiently. In order to overcome this drawback, we establish and explore the direct relationship between the subbands under a translational shift, for image registration and super resolution. We then employ our devised in-band methodology, in a motion compensated video compression framework, to demonstrate the effective usage of wavelet subbands. Super resolution can also be used as a post-processing step in video compression in order to decrease the size of the video files to be compressed, with downsampling added as a pre-processing step. Therefore, we present a video compression scheme that utilizes super resolution to reconstruct the high frequency information lost during downsampling. In addition, super resolution is a crucial post-processing step for satellite imagery, due to the fact that it is hard to update imaging devices after a satellite is launched. Thus, we also demonstrate the usage of our devised methods in enhancing resolution of pansharpened multispectral images

    Dirichlet belief networks for topic structure learning

    Full text link
    Recently, considerable research effort has been devoted to developing deep architectures for topic models to learn topic structures. Although several deep models have been proposed to learn better topic proportions of documents, how to leverage the benefits of deep structures for learning word distributions of topics has not yet been rigorously studied. Here we propose a new multi-layer generative process on word distributions of topics, where each layer consists of a set of topics and each topic is drawn from a mixture of the topics of the layer above. As the topics in all layers can be directly interpreted by words, the proposed model is able to discover interpretable topic hierarchies. As a self-contained module, our model can be flexibly adapted to different kinds of topic models to improve their modelling accuracy and interpretability. Extensive experiments on text corpora demonstrate the advantages of the proposed model.Comment: accepted in NIPS 201

    Sustainable Agriculture and Advances of Remote Sensing (Volume 2)

    Get PDF
    Agriculture, as the main source of alimentation and the most important economic activity globally, is being affected by the impacts of climate change. To maintain and increase our global food system production, to reduce biodiversity loss and preserve our natural ecosystem, new practices and technologies are required. This book focuses on the latest advances in remote sensing technology and agricultural engineering leading to the sustainable agriculture practices. Earth observation data, in situ and proxy-remote sensing data are the main source of information for monitoring and analyzing agriculture activities. Particular attention is given to earth observation satellites and the Internet of Things for data collection, to multispectral and hyperspectral data analysis using machine learning and deep learning, to WebGIS and the Internet of Things for sharing and publication of the results, among others

    Advanced machine learning methods for oncological image analysis

    Get PDF
    Cancer is a major public health problem, accounting for an estimated 10 million deaths worldwide in 2020 alone. Rapid advances in the field of image acquisition and hardware development over the past three decades have resulted in the development of modern medical imaging modalities that can capture high-resolution anatomical, physiological, functional, and metabolic quantitative information from cancerous organs. Therefore, the applications of medical imaging have become increasingly crucial in the clinical routines of oncology, providing screening, diagnosis, treatment monitoring, and non/minimally- invasive evaluation of disease prognosis. The essential need for medical images, however, has resulted in the acquisition of a tremendous number of imaging scans. Considering the growing role of medical imaging data on one side and the challenges of manually examining such an abundance of data on the other side, the development of computerized tools to automatically or semi-automatically examine the image data has attracted considerable interest. Hence, a variety of machine learning tools have been developed for oncological image analysis, aiming to assist clinicians with repetitive tasks in their workflow. This thesis aims to contribute to the field of oncological image analysis by proposing new ways of quantifying tumor characteristics from medical image data. Specifically, this thesis consists of six studies, the first two of which focus on introducing novel methods for tumor segmentation. The last four studies aim to develop quantitative imaging biomarkers for cancer diagnosis and prognosis. The main objective of Study I is to develop a deep learning pipeline capable of capturing the appearance of lung pathologies, including lung tumors, and integrating this pipeline into the segmentation networks to leverage the segmentation accuracy. The proposed pipeline was tested on several comprehensive datasets, and the numerical quantifications show the superiority of the proposed prior-aware DL framework compared to the state of the art. Study II aims to address a crucial challenge faced by supervised segmentation models: dependency on the large-scale labeled dataset. In this study, an unsupervised segmentation approach is proposed based on the concept of image inpainting to segment lung and head- neck tumors in images from single and multiple modalities. The proposed autoinpainting pipeline shows great potential in synthesizing high-quality tumor-free images and outperforms a family of well-established unsupervised models in terms of segmentation accuracy. Studies III and IV aim to automatically discriminate the benign from the malignant pulmonary nodules by analyzing the low-dose computed tomography (LDCT) scans. In Study III, a dual-pathway deep classification framework is proposed to simultaneously take into account the local intra-nodule heterogeneities and the global contextual information. Study IV seeks to compare the discriminative power of a series of carefully selected conventional radiomics methods, end-to-end Deep Learning (DL) models, and deep features-based radiomics analysis on the same dataset. The numerical analyses show the potential of fusing the learned deep features into radiomic features for boosting the classification power. Study V focuses on the early assessment of lung tumor response to the applied treatments by proposing a novel feature set that can be interpreted physiologically. This feature set was employed to quantify the changes in the tumor characteristics from longitudinal PET-CT scans in order to predict the overall survival status of the patients two years after the last session of treatments. The discriminative power of the introduced imaging biomarkers was compared against the conventional radiomics, and the quantitative evaluations verified the superiority of the proposed feature set. Whereas Study V focuses on a binary survival prediction task, Study VI addresses the prediction of survival rate in patients diagnosed with lung and head-neck cancer by investigating the potential of spherical convolutional neural networks and comparing their performance against other types of features, including radiomics. While comparable results were achieved in intra- dataset analyses, the proposed spherical-based features show more predictive power in inter-dataset analyses. In summary, the six studies incorporate different imaging modalities and a wide range of image processing and machine-learning techniques in the methods developed for the quantitative assessment of tumor characteristics and contribute to the essential procedures of cancer diagnosis and prognosis

    OPTIMIZATION ALGORITHMS USING PRIORS IN COMPUTER VISION

    Get PDF
    Over the years, many computer vision models, some inspired by human behavior, have been developed for various applications. However, only handful of them are popular and widely used. Why? There are two major factors: 1) most of these models do not have any efficient numerical algorithm and hence they are computationally very expensive; 2) many models, being too generic, cannot capitalize on problem specific prior information and thus demand rigorous hyper-parameter tuning. In this dissertation, we design fast and efficient algorithms to leverage application specific priors to solve unsupervised and weakly-supervised problems. Specifically, we focus on developing algorithms to impose structured priors, model priors and label priors during the inference and/or learning of vision models. In many application, it is known a priori that a signal is smooth and continuous in space. The first part of this work is focussed on improving unsupervised learning mechanisms by explicitly imposing these structured priors in an optimization framework using different regularization schemes. This led to the development of fast algorithms for robust recovery of signals from compressed measurements, image denoising and data clustering. Moreover, by employing re-descending robust penalty on the structured regularization terms and applying duality, we reduce our clustering formulation to an optimization of a single continuous objective. This enabled integration of clustering processes in an end-to-end feature learning pipeline. In the second part of our work, we exploit inherent properties of established models to develop efficient solvers for SDP, GAN, and semantic segmentation. We consider models for several different problem classes. a) Certain non-convex models in computer vision (e.g., BQP) are popularly solved using convex SDPs after lifting to a high-dimensional space. However, this computationally expensive approach limits these methods to small matrices. A fast and approximate algorithm is developed that directly solves the original non-convex formulation using biconvex relaxations and known rank information. b) Widely popular adversarial networks are difficult to train as they suffer from instability issues. This is because optimizing adversarial networks corresponds to finding a saddle-point of a loss function. We propose a simple prediction method that enables faster training of various adversarial networks using larger learning rates without any instability problems. c) Semantic segmentation models must learn long-distance contextual information while retaining high spatial resolution at the output. Existing models achieves this at the cost of computationally expensive and memory exhaustive training/inference. We designed stacked u-nets model which can repeatedly process top-down and bottom-up features. Our smallest model exceeds Resnet-101 performance on PASCAL VOC 2012 by 4.5% IoU with ∼ 7× fewer parameters. Next, we address the problem of learning heterogeneous concepts from internet videos using mined label tags. Given a large number of videos each with multiple concepts and labels, the idea is to teach machines to automatically learn these concepts by leveraging weak labels. We formulate this into a co-clustering problem and developed a novel bayesian non-parametric weakly supervised Indian buffet process model which additionally enforces the paired label prior between concepts. In the final part of this work we consider an inverse approach: learning data priors from a given model. Specifically, we develop numerically efficient algorithm for estimating the log likelihood of data samples from GANs. The approximate log-likelihood function is used for outlier detection and data augmentation for training classifiers

    Machine learning and signal processing contributions to identify circulation states during out-of-hospital cardiac arrest

    Get PDF
    212 p. (eusk) 216 p. (eng.)Bat-bateko bihotz geldialdia (BBG) ustekabeko bihotz jardueraren etenaldi gisa definitzen da [9], non odol perfusioa ez baita iristenez burmuinera, ez beste ezinbesteko organoetara. BBGa ahalik eta azkarren tratatu behar da berpizte terapien bidez bat-bateko bihotz heriotza (BBH) ekiditeko [10, 11]. Ohikoena BBGa ospitalez kanpoko inguruneetan gertatzea da [12] eta kasu gehienetan ez da lekukorik egoten [13]. Horregatik, berpizte terapien aplikazio goiztiarra erronka mediku eta soziala da gaur egun

    Design and Implementation of a Domain Specific Language for Deep Learning

    Get PDF
    \textit {Deep Learning} (DL) has found great success in well-diversified areas such as machine vision, speech recognition, big data analysis, and multimedia understanding recently. However, the existing state-of-the-art DL frameworks, e.g. Caffe2, Theano, TensorFlow, MxNet, Torch7, and CNTK, are programming libraries with fixed user interfaces, internal representations, and execution environments. Modifying the code of DL layers or data structure is very challenging without in-depth understanding of the underlying implementation. The optimization of the code and execution in these tools is often limited and relies on the specific DL computation graph manipulation and scheduling that lack systematic and universal strategies. Furthermore, most of these tools demand many dependencies beside the tool itself and require to be built to some specific platforms for DL training or inference. \\\\ \noindent This dissertation presents {\it DeepDSL}, a \textit {domain specific language} (DSL) embedded in Scala, that compiles DL networks encoded with DeepDSL to efficient, compact, and portable Java source programs for DL training and inference. DeepDSL represents DL networks as abstract tensor functions, performs symbolic gradient derivations to generate the Intermediate Representation (IR), optimizes the IR expressions, and compiles the optimized IR expressions to cross-platform Java code that is easily modifiable and debuggable. Also, the code directly runs on GPU without additional dependencies except a small set of \textit{JNI} (Java Native Interface) wrappers for invoking the underneath GPU libraries. Moreover, DeepDSL provides static analysis for memory consumption and error detection. \\\\ \noindent DeepDSL\footnote{Our previous results are reported in~\cite{zhao2017}; design and implementation details are summarized in~\cite{Zhao2018}.} has been evaluated with many current state-of-the-art DL networks (e.g. Alexnet, GoogleNet, VGG, Overfeat, and Deep Residual Network). While the DSL code is highly compact with less than 100 lines for each of the network, the Java source code generated by the DeepDSL compiler is highly efficient. Our experiments show that the output java source has very competitive runtime performance and memory efficiency compared to the existing DL frameworks

    Gaussian processes for modeling of facial expressions

    Get PDF
    Automated analysis of facial expressions has been gaining significant attention over the past years. This stems from the fact that it constitutes the primal step toward developing some of the next-generation computer technologies that can make an impact in many domains, ranging from medical imaging and health assessment to marketing and education. No matter the target application, the need to deploy systems under demanding, real-world conditions that can generalize well across the population is urgent. Hence, careful consideration of numerous factors has to be taken prior to designing such a system. The work presented in this thesis focuses on tackling two important problems in automated analysis of facial expressions: (i) view-invariant facial expression analysis; (ii) modeling of the structural patterns in the face, in terms of well coordinated facial muscle movements. Driven by the necessity for efficient and accurate inference mechanisms we explore machine learning techniques based on the probabilistic framework of Gaussian processes (GPs). Our ultimate goal is to design powerful models that can efficiently handle imagery with spontaneously displayed facial expressions, and explain in detail the complex configurations behind the human face in real-world situations. To effectively decouple the head pose and expression in the presence of large out-of-plane head rotations we introduce a manifold learning approach based on multi-view learning strategies. Contrary to the majority of existing methods that typically treat the numerous poses as individual problems, in this model we first learn a discriminative manifold shared by multiple views of a facial expression. Subsequently, we perform facial expression classification in the expression manifold. Hence, the pose normalization problem is solved by aligning the facial expressions from different poses in a common latent space. We demonstrate that the recovered manifold can efficiently generalize to various poses and expressions even from a small amount of training data, while also being largely robust to corrupted image features due to illumination variations. State-of-the-art performance is achieved in the task of facial expression classification of basic emotions. The methods that we propose for learning the structure in the configuration of the muscle movements represent some of the first attempts in the field of analysis and intensity estimation of facial expressions. In these models, we extend our multi-view approach to exploit relationships not only in the input features but also in the multi-output labels. The structure of the outputs is imposed into the recovered manifold either from heuristically defined hard constraints, or in an auto-encoded manner, where the structure is learned automatically from the input data. The resulting models are proven to be robust to data with imbalanced expression categories, due to our proposed Bayesian learning of the target manifold. We also propose a novel regression approach based on product of GP experts where we take into account people's individual expressiveness in order to adapt the learned models on each subject. We demonstrate the superior performance of our proposed models on the task of facial expression recognition and intensity estimation.Open Acces

    Bayesian and machine learning approaches in metagenomics

    Get PDF
    In this doctoral thesis, we present a novel set of bioinformatics tools to address key problems in the field of metagenomics. This set includes a fully probabilistic framework for estimating the number of present genomes on a species level in a metagenomic sample, the use of variational encoders as an alternative method for dimensionality reduction of the coverage and the tetramer composition of metagenomic samples and a natural language processing method for compressing the number of gene frequencies in metagenomes for better prediction of their phenotypic traits. The first tool tackles the problem of metagenomic binning. A Bayesian non-parametric method is used in conjunction with a Gaussian mixture model to estimate more accurately the number of present genomes, and also correctly cluster the contigs into the appropriate bin. We call this method DP (Dirichlet Processes) algorithm. An attempt was made to improve the accuracy of the algorithm by incorporating extra information from the edges of the assembly graph, but this addition was not used to the final model as the signal from data used is too weak. This method is validated in a 20-genomes simulated mock community and is compared against the state-of-the-art binners in a 100 genome simulated community in different scenarios using different number of samples. The results show that this method perform at least in the same standards as the state-of-the-art methods, while outperforming them in some scenarios. This method is also applied on a real 11 sample infant gut dataset. The second tool is about the prediction of phenotypic traits in metagenomes. In this part, we build on the idea of using the frequencies of genes annotated, based on the Kyoto Encyclopedia of Genes and Genomes (KEGG), to predict the presence and absence of 83 functional and metabolic traits. We apply the doc2vec algorithm as a dimensionality reduction method on 9407 prokaryotic genomes, experimenting with different compression dimensions and training on various machine learning algorithms for the trait prediction part. We conclude that the dimensionality reduction improves the performance of the classifiers, and it achieves the best results when combined with L-1 logistic regression on 100 dimensions. In addition, we train the classifiers on using the uncompressed KO frequencies and we identify in which traits the compression offers no improvement, comparing the number of KOs present in each case. The third tool presented is about the use of variational autoencoders for compressing the coverage and tetramer composition before binnig in metagenomic samples. We combine the variational autoencoder architecture used in the VAMB binner for dimensionality reduction with the Bayesian non-parametric binning approach we presented above. We tested this novel combination using the same 20-genomes simulated mock community we used previously and we concluded that this combination performs better in clustering the contigs correctly than the DP algorithm on the species level. We also concluded that this combination does not perform well in real datasets, being unable to identify any `good' bins, assessed by the percentage of single-copy core genes present. The last part of this work is case study of the oral microbiome. It is estimated that the oral hosts over 700 species of bacteria. In this study, we analyze 131 oral samples metagenomic samples from 68 individuals. We follow an assembly-based approach and then we split the analysis in two directions. In the first approach, the contigs are binned and the abundance of each sample to each bin is calulated. In the second approach, the contigs are not binned; open reading frames are called and mapped to KEGG genes and the coverage of each gene in every sample is calculated. We associate these coverages with various metadata and attribute their variation in the presence of different species or KOs
    corecore