1,399 research outputs found
Recommended from our members
3D Shape Understanding and Generation
In recent years, Machine Learning techniques have revolutionized solutions to longstanding image-based problems, like image classification, generation, semantic segmentation, object detection and many others. However, if we want to be able to build agents that can successfully interact with the real world, those techniques need to be capable of reasoning about the world as it truly is: a tridimensional space. There are two main challenges while handling 3D information in machine learning models. First, it is not clear what is the best 3D representation. For images, convolutional neural networks (CNNs) operating on raster images yield the best results in virtually all image-based benchmarks. For 3D data, the best combination of model and representation is still an open question. Second, 3D data is not available on the same scale as images – taking pictures is a common procedure in our daily lives, whereas capturing 3D content is an activity usually restricted to specialized professionals. This thesis is focused on addressing both of these issues. Which model and representation should we use for generating and recognizing 3D data? What are efficient ways of learning 3D representations from a few examples? Is it possible to leverage image data to build models capable of reasoning about the world in 3D?
Our research findings show that it is possible to build models that efficiently generate 3D shapes as irregularly structured representations. Those models require significantly less memory while generating higher quality shapes than the ones based on voxels and multi-view representations. We start by developing techniques to generate shapes represented as point clouds. This class of models leads to high quality reconstructions and better unsupervised feature learning. However, since point clouds are not amenable to editing and human manipulation, we also present models capable of generating shapes as sets of shape handles -- simpler primitives that summarize complex 3D shapes and were specifically designed for high-level tasks and user interaction. Despite their effectiveness, those approaches require some form of 3D supervision, which is scarce. We present multiple alternatives to this problem. First, we investigate how approximate convex decomposition techniques can be used as self-supervision to improve recognition models when only a limited number of labels are available. Second, we study how neural network architectures induce shape priors that can be used in multiple reconstruction tasks -- using both volumetric and manifold representations. In this regime, reconstruction is performed from a single example -- either a sparse point cloud or multiple silhouettes. Finally, we demonstrate how to train generative models of 3D shapes without using any 3D supervision by combining differentiable rendering techniques and Generative Adversarial Networks
A Survey on Deep Learning in Medical Image Analysis
Deep learning algorithms, in particular convolutional networks, have rapidly
become a methodology of choice for analyzing medical images. This paper reviews
the major deep learning concepts pertinent to medical image analysis and
summarizes over 300 contributions to the field, most of which appeared in the
last year. We survey the use of deep learning for image classification, object
detection, segmentation, registration, and other tasks and provide concise
overviews of studies per application area. Open challenges and directions for
future research are discussed.Comment: Revised survey includes expanded discussion section and reworked
introductory section on common deep architectures. Added missed papers from
before Feb 1st 201
Deep transfer learning for human identification based on footprint: A comparative study
Identifying people based on their footprint has not yet gained enough attention from the researchers. Therefore, in this paper, an investigation of human identification conducted based on the footprint. Transfer Learning used as the main concept of this investigation. The aim of using Transfer Learning is to overcome the need for a large-scale dataset and achieve high accuracy with a low-scale dataset. Five well-known models used, namely, Alexnet, Vgg16, Vgg19, Googlenet, and Inception v3. Each of these models fine-tuned to fit-in the paper’s topic. A dataset of 30 individuals constructed in order to train the models. The right and left footprint of each individual captured with iPhone camera. The models trained and evaluated based on the same settings. The evaluation shows that Inception v3 model achieved the highest accuracy compared to all other four models
Fine-Tuning Enhancer Models to Predict Transcriptional Targets across Multiple Genomes
Networks of regulatory relations between transcription factors (TF) and their target genes (TG)- implemented through TF binding sites (TFBS)- are key features of biology. An idealized approach to solving such networks consists of starting from a consensus TFBS or a position weight matrix (PWM) to generate a high accuracy list of candidate TGs for biological validation. Developing and evaluating such approaches remains a formidable challenge in regulatory bioinformatics. We perform a benchmark study on 34 Drosophila TFs to assess existing TFBS and cis-regulatory module (CRM) detection methods, with a strong focus on the use of multiple genomes. Particularly, for CRM-modelling we investigate the addition of orthologous sites to a known PWM to construct phyloPWMs and we assess the added value of phylogenentic footprinting to predict contextual motifs around known TFBSs. For CRM-prediction, we compare motif conservation with network-level conservation approaches across multiple genomes. Choosing the optimal training and scoring strategies strongly enhances the performance of TG prediction for more than half of the tested TFs. Finally, we analyse a 35th TF, namely Eyeless, and find a significant overlap between predicted TGs and candidate TGs identified by microarray expression studies. In summary we identify several ways to optimize TF-specific TG predictions, some of which can be applied to all TFs, and others that can be applied only to particular TFs. The ability to model known TF-TG relations, together with the use of multiple genomes, results in a significant step forward in solving the architecture of gene regulatory networks
Novel Approaches to the Representation and Analysis of 3D Segmented Anatomical Districts
Nowadays, image processing and 3D shape analysis are an integral part of clinical
practice and have the potentiality to support clinicians with advanced analysis
and visualization techniques. Both approaches provide visual and quantitative information
to medical practitioners, even if from different points of view. Indeed,
shape analysis is aimed at studying the morphology of anatomical structures, while
image processing is focused more on the tissue or functional information provided
by the pixels/voxels intensities levels. Despite the progress obtained by research in
both fields, a junction between these two complementary worlds is missing. When
working with 3D models analyzing shape features, the information of the volume
surrounding the structure is lost, since a segmentation process is needed to obtain
the 3D shape model; however, the 3D nature of the anatomical structure is represented
explicitly. With volume images, instead, the tissue information related to the
imaged volume is the core of the analysis, while the shape and morphology of the
structure are just implicitly represented, thus not clear enough.
The aim of this Thesis work is the integration of these two approaches in order to increase
the amount of information available for physicians, allowing a more accurate
analysis of each patient. An augmented visualization tool able to provide information
on both the anatomical structure shape and the surrounding volume through a
hybrid representation, could reduce the gap between the two approaches and provide
a more complete anatomical rendering of the subject.
To this end, given a segmented anatomical district, we propose a novel mapping of
volumetric data onto the segmented surface. The grey-levels of the image voxels are
mapped through a volume-surface correspondence map, which defines a grey-level
texture on the segmented surface. The resulting texture mapping is coherent to the
local morphology of the segmented anatomical structure and provides an enhanced
visual representation of the anatomical district. The integration of volume-based and
surface-based information in a unique 3D representation also supports the identification
and characterization of morphological landmarks and pathology evaluations.
The main research contributions of the Ph.D. activities and Thesis are:
\u2022 the development of a novel integration algorithm that combines surface-based
(segmented 3D anatomical structure meshes) and volume-based (MRI volumes)
information. The integration supports different criteria for the grey-levels mapping
onto the segmented surface;
\u2022 the development of methodological approaches for using the grey-levels mapping
together with morphological analysis. The final goal is to solve problems
in real clinical tasks, such as the identification of (patient-specific) ligament
insertion sites on bones from segmented MR images, the characterization of
the local morphology of bones/tissues, the early diagnosis, classification, and
monitoring of muscle-skeletal pathologies;
\u2022 the analysis of segmentation procedures, with a focus on the tissue classification
process, in order to reduce operator dependency and to overcome the
absence of a real gold standard for the evaluation of automatic segmentations;
\u2022 the evaluation and comparison of (unsupervised) segmentation methods, finalized
to define a novel segmentation method for low-field MR images, and for
the local correction/improvement of a given segmentation.
The proposed method is simple but effectively integrates information derived from
medical image analysis and 3D shape analysis. Moreover, the algorithm is general
enough to be applied to different anatomical districts independently of the segmentation
method, imaging techniques (such as CT), or image resolution. The volume
information can be integrated easily in different shape analysis applications, taking
into consideration not only the morphology of the input shape but also the real
context in which it is inserted, to solve clinical tasks. The results obtained by this
combined analysis have been evaluated through statistical analysis
Restricted Boltzmann machine vectors for speaker clustering and tracking tasks in TV broadcast shows
(This article belongs to the Special Issue IberSPEECH 2018: Speech and Language Technologies for Iberian Languages)Restricted Boltzmann Machines (RBMs) have shown success in both the front-end and backend of speaker verification systems. In this paper, we propose applying RBMs to the front-end for the tasks of speaker clustering and speaker tracking in TV broadcast shows. RBMs are trained to transform utterances into a vector based representation. Because of the lack of data for a test speaker, we propose RBM adaptation to a global model. First, the global model—which is referred to as universal RBM—is trained with all the available background data. Then an adapted RBM model is trained with the data of each test speaker. The visible to hidden weight matrices of the adapted models are concatenated along with the bias vectors and are whitened to generate the vector representation of speakers. These vectors, referred to as RBM vectors, were shown to preserve speaker-specific information and are used in the tasks of speaker clustering and speaker tracking. The evaluation was performed on the audio recordings of Catalan TV Broadcast shows. The experimental results show that our proposed speaker clustering system gained up to 12% relative improvement, in terms of Equal Impurity (EI), over the baseline system. On the other hand, in the task of speaker tracking, our system has a relative improvement of 11% and 7% compared to the baseline system using cosine and Probabilistic Linear Discriminant Analysis (PLDA) scoring, respectivelyPeer ReviewedPostprint (published version
- …