551 research outputs found
Analytical validation of innovative magneto-inertial outcomes: a controlled environment study.
peer reviewe
PanopticNeRF-360: Panoramic 3D-to-2D Label Transfer in Urban Scenes
Training perception systems for self-driving cars requires substantial
annotations. However, manual labeling in 2D images is highly labor-intensive.
While existing datasets provide rich annotations for pre-recorded sequences,
they fall short in labeling rarely encountered viewpoints, potentially
hampering the generalization ability for perception models. In this paper, we
present PanopticNeRF-360, a novel approach that combines coarse 3D annotations
with noisy 2D semantic cues to generate consistent panoptic labels and
high-quality images from any viewpoint. Our key insight lies in exploiting the
complementarity of 3D and 2D priors to mutually enhance geometry and semantics.
Specifically, we propose to leverage noisy semantic and instance labels in both
3D and 2D spaces to guide geometry optimization. Simultaneously, the improved
geometry assists in filtering noise present in the 3D and 2D annotations by
merging them in 3D space via a learned semantic field. To further enhance
appearance, we combine MLP and hash grids to yield hybrid scene features,
striking a balance between high-frequency appearance and predominantly
contiguous semantics. Our experiments demonstrate PanopticNeRF-360's
state-of-the-art performance over existing label transfer methods on the
challenging urban scenes of the KITTI-360 dataset. Moreover, PanopticNeRF-360
enables omnidirectional rendering of high-fidelity, multi-view and
spatiotemporally consistent appearance, semantic and instance labels. We make
our code and data available at https://github.com/fuxiao0719/PanopticNeRFComment: Project page: http://fuxiao0719.github.io/projects/panopticnerf360/.
arXiv admin note: text overlap with arXiv:2203.1522
Optimization of neural networks for deep learning and applications to CT image segmentation
[eng] During the last few years, AI development in deep learning has been going so fast that even important researchers, politicians, and entrepreneurs are signing petitions to try to slow it down. The newest methods for natural language processing and image generation are achieving results so unbelievable that people are seriously starting to think they can be dangerous for society. In reality, they are not dangerous (at the moment) even if we have to admit we reached a point where we have no more control over the flux of data inside the deep networks. It is impossible to open a modern
deep neural network and interpret how it processes the information and, in many cases, explain how or why it gives back that particular result. One of the goals of this doctoral work has been to study the behavior of weights in convolutional neural networks and in transformers. We hereby present a work that demonstrates how to invert 3x3 convolutions after training a neural network able to learn how to classify images, with the future aim of having precisely invertible convolutional neural networks. We demonstrate that a simple network can learn to classify images on an open-source dataset without loss in accuracy, with respect to a non-invertible one. All that with the ability to reconstruct the original image without detectable error
(on 8-bit images) in up to 20 convolutions stacked in a row. We present a thorough comparison between our method and the standard. We tested the
performances of the five most used transformers for image classification on an open- source dataset. Studying the embedded matrices, we have been
able to provide two criteria that can help transformers learn with a training time reduction of up to 30% and with no impact on classification accuracy.
The evolution of deep learning techniques is also touching the field of digital health. With tens of thousands of new start-ups and more than 1B $ of investments only in the last year, this field is growing rapidly and promising to revolutionize healthcare. In this thesis, we present several neural networks for the segmentation of lungs, lung nodules, and areas affected by pneumonia induced by COVID-19, in chest CT scans. The architecturesm we used are all residual convolutional neural networks inspired by UNet and Inception. We customized them with novel loss functions and layers
studied to achieve high performances on these particular applications. The errors on the surface of nodule segmentation masks are not over 1mm in more than 99% of the cases. Our algorithm for COVID-19 lesion detection has a specificity of 100% and overall accuracy of 97.1%. In general, it surpasses the state-of-the-art in all the considered statistics, using UNet as a benchmark. Combining these with other algorithms able to detect and predict lung cancer, the whole work was presented in a European innovation program and judged of high interest by worldwide experts.
With this work, we set the basis for the future development of better AI tools in healthcare and scientific investigation into the fundamentals of deep learning.[spa] Durante los últimos años, el desarrollo de la IA en el aprendizaje profundo ha ido tan rápido que Incluso importantes investigadores, polÃticos y empresarios están firmando peticiones para intentar para ralentizarlo. Los métodos más nuevos para el procesamiento y la generación de imágenes y lenguaje natural, están logrando resultados tan increÃbles que la gente está empezando a preocuparse seriamente. Pienso que pueden ser peligrosos para la sociedad. En realidad, no son peligrosos (al menos de momento) incluso si tenemos que admitir que llegamos a un punto en el que ya no tenemos control sobre el flujo de datos dentro de las redes profundas. Es imposible abrir una moderna red neuronal profunda e interpretar cómo procesa la información y, en muchos casos, explique cómo o por qué devuelve ese resultado en particular, uno de los objetivos de este doctorado.
El trabajo ha consistido en estudiar el comportamiento de los pesos en redes neuronales convolucionales y en transformadores. Por la presente presentamos un trabajo que demuestra cómo invertir 3x3 convoluciones después de entrenar una red neuronal capaz de aprender a clasificar imágenes, con el objetivo futuro de tener redes neuronales convolucionales precisamente invertibles. Nosotros queremos demostrar que una red simple puede aprender a clasificar imágenes en un código abierto conjunto de datos sin pérdida de precisión, con respecto a uno no invertible. Todo eso con la capacidad de reconstruir la imagen original sin errores detectables (en imágenes de 8 bits) en hasta 20 convoluciones apiladas en fila. Presentamos una exhaustiva comparación entre nuestro método y el estándar. Probamos las prestaciones de los cinco transformadores más utilizados para la clasificación de imágenes en abierto. conjunto de datos de origen. Al estudiar las matrices incrustadas, hemos sido capaz de proporcionar dos criterios que pueden ayudar a los transformadores a aprender con un tiempo de capacitación reducción de hasta el 30% y sin impacto en la precisión de la clasificación.
La evolución de las técnicas de aprendizaje profundo también está afectando al campo de la salud digital. Con decenas de miles de nuevas empresas y más de mil millones de dólares en inversiones sólo en el año pasado, este campo está creciendo rápidamente y promete revolucionar la atención médica. En esta tesis, presentamos varias redes neuronales para la segmentación de pulmones, nódulos pulmonares, y zonas afectadas por neumonÃa inducida por COVID-19, en tomografÃas computarizadas de tórax. La arquitectura que utilizamos son todas redes neuronales convolucionales residuales inspiradas en UNet. Las personalizamos con nuevas funciones y capas de pérdida, estudiado para lograr altos rendimientos en estas aplicaciones particulares. Los errores en la superficie de las máscaras de segmentación de los nódulos no supera 1 mm en más del 99% de los casos. Nuestro algoritmo para la detección de lesiones de COVID-19 tiene una especificidad del 100% y en general precisión del 97,1%. En general supera el estado del arte en todos los aspectos considerados, estadÃsticas, utilizando UNet como punto de referencia. Combinando estos con otros algoritmos capaces de detectar y predecir el cáncer de pulmón, todo el trabajo se presentó en una innovación europea programa y considerado de gran interés por expertos de todo el mundo.
Con este trabajo, sentamos las bases para el futuro desarrollo de mejores herramientas de IA en Investigación sanitaria y cientÃfica sobre los fundamentos del aprendizaje profundo
Deep learning methods applied to digital elevation models: state of the art
Deep Learning (DL) has a wide variety of applications in various
thematic domains, including spatial information. Although with
limitations, it is also starting to be considered in operations
related to Digital Elevation Models (DEMs). This study aims to
review the methods of DL applied in the field of altimetric spatial
information in general, and DEMs in particular. Void Filling (VF),
Super-Resolution (SR), landform classification and hydrography
extraction are just some of the operations where traditional methods
are being replaced by DL methods. Our review concludes
that although these methods have great potential, there are
aspects that need to be improved. More appropriate terrain information
or algorithm parameterisation are some of the challenges
that this methodology still needs to face.Functional Quality of Digital Elevation Models in Engineering’ of the State Agency Research of SpainPID2019-106195RB- I00/AEI/10.13039/50110001103
Designing a Direct Feedback Loop between Humans and Convolutional Neural Networks through Local Explanations
The local explanation provides heatmaps on images to explain how
Convolutional Neural Networks (CNNs) derive their output. Due to its visual
straightforwardness, the method has been one of the most popular explainable AI
(XAI) methods for diagnosing CNNs. Through our formative study (S1), however,
we captured ML engineers' ambivalent perspective about the local explanation as
a valuable and indispensable envision in building CNNs versus the process that
exhausts them due to the heuristic nature of detecting vulnerability. Moreover,
steering the CNNs based on the vulnerability learned from the diagnosis seemed
highly challenging. To mitigate the gap, we designed DeepFuse, the first
interactive design that realizes the direct feedback loop between a user and
CNNs in diagnosing and revising CNN's vulnerability using local explanations.
DeepFuse helps CNN engineers to systemically search "unreasonable" local
explanations and annotate the new boundaries for those identified as
unreasonable in a labor-efficient manner. Next, it steers the model based on
the given annotation such that the model doesn't introduce similar mistakes. We
conducted a two-day study (S2) with 12 experienced CNN engineers. Using
DeepFuse, participants made a more accurate and "reasonable" model than the
current state-of-the-art. Also, participants found the way DeepFuse guides
case-based reasoning can practically improve their current practice. We provide
implications for design that explain how future HCI-driven design can move our
practice forward to make XAI-driven insights more actionable.Comment: 32 pages, 6 figures, 5 tables. Accepted for publication in the
Proceedings of the ACM on Human-Computer Interaction (PACM HCI), CSCW 202
Recommended from our members
The Monogenic Architecture of Retinal and Neurological Diseases
Monogenic diseases, or single-gene disorders, are clinical manifestations that can be traced to genetic variation in a single gene that alters the biologically intended (wildtype) function of its protein (or mRNA) product. Although the causal gene and its function are well-understood in many monogenic diseases, this knowledge alone often does not fully encapsulate the extensive clinical spectrum of phenotypes seen in patients. This is due in part to the numerous types of pathogenic variants that can arise in a single gene, all of which can have distinct effects on disease expression. Understanding the relationship between the vast number of possible genotypes and corresponding disease phenotypes defines a gene’s monogenic disease architecture—an important but poorly understood concept that can yield informative mechanistic and clinical insight.
This doctoral dissertation integrates traditional sequencing approaches with in-depth characterization of patient phenotypes to elucidate the monogenic disease architecture of three etiologically distinct disorders: retinal degeneration caused by autosomal recessive variation in ABCA4 and neurodevelopmental disease entities caused by autosomal dominant variants in CERT1 and PUM1. Genetic modifiers are identified as a significant factor in the penetrance of the major disease-causing allele of ABCA4 and several other genetic inconsistencies are resolved to create a coherent genotype-phenotype model for the disease. Insight from this model is then applied to demonstrate the effect of allele differences in disease progression and evaluation of treatment efficacy in patients. A large cohort of affected individuals with CERT1 variation is assembled to (1) validate the causal role of CERT1 in disease, (2) delineate the precise mechanism of CERT protein dysfunction in sphingolipid metabolism and (3) demonstrate therapeutic efficacy of an inhibitor compound for a newly described syndrome.
Finally, the mutational spectrum of PUM1 is expanded to previously unattributed variant classes with unexpected pathophysiological consequences to patients. Not only do the findings in this dissertation advance the prospects of delivering personalized, precision medicine to patients, the overall impact underscores the importance of this integrated approach in reconciling knowledge gaps between observations at the molecular and organismal level
Machine Learning Approaches for Semantic Segmentation on Partly-Annotated Medical Images
Semantic segmentation of medical images plays a crucial role in assisting medical practitioners in providing accurate and swift diagnoses; nevertheless, deep neural networks require extensive labelled data to learn and generalise appropriately. This is a major issue in medical imagery because most of the datasets are not fully annotated. Training models with partly-annotated datasets generate plenty of predictions that belong to correct unannotated areas that are categorised as false positives; as a result, standard segmentation metrics and objective functions do not work correctly, affecting the overall performance of the models. In this thesis, the semantic segmentation of partly-annotated medical datasets is extensively and thoroughly studied. The general objective is to improve the segmentation results of medical images via innovative supervised and semi-supervised approaches. The main contributions of this work are the following. Firstly, a new metric, specifically designed for this kind of dataset, can provide a reliable score to partly-annotated datasets with positive expert feedback in their generated predictions by exploiting all the confusion matrix values except the false positives. Secondly, an innovative approach to generating better pseudo-labels when applying co-training with the disagreement selection strategy. This method expands the pixels in disagreement utilising the combined predictions as a guide. Thirdly, original attention mechanisms based on disagreement are designed for two cases: intra-model and inter-model. These attention modules leverage the disagreement between layers (from the same or different model instances) to enhance the overall learning process and generalisation of the models. Lastly, innovative deep supervision methods improve the segmentation results by training neural networks one subnetwork at a time following the order of the supervision branches. The methods are thoroughly evaluated on several histopathological datasets showing significant improvements
Computer Vision and Architectural History at Eye Level:Mixed Methods for Linking Research in the Humanities and in Information Technology
Information on the history of architecture is embedded in our daily surroundings, in vernacular and heritage buildings and in physical objects, photographs and plans. Historians study these tangible and intangible artefacts and the communities that built and used them. Thus valuableinsights are gained into the past and the present as they also provide a foundation for designing the future. Given that our understanding of the past is limited by the inadequate availability of data, the article demonstrates that advanced computer tools can help gain more and well-linked data from the past. Computer vision can make a decisive contribution to the identification of image content in historical photographs. This application is particularly interesting for architectural history, where visual sources play an essential role in understanding the built environment of the past, yet lack of reliable metadata often hinders the use of materials. The automated recognition contributes to making a variety of image sources usable forresearch.<br/
3D Representation Learning for Shape Reconstruction and Understanding
The real world we are living in is inherently composed of multiple 3D objects. However, most of the existing works in computer vision traditionally either focus on images or videos where the 3D information inevitably gets lost due to the camera projection. Traditional methods typically rely on hand-crafted algorithms and features with many constraints and geometric priors to understand the real world. However, following the trend of deep learning, there has been an exponential growth in the number of research works based on deep neural networks to learn 3D representations for complex shapes and scenes, which lead to many cutting-edged applications in augmented reality (AR), virtual reality (VR) and robotics as one of the most important directions for computer vision and computer graphics.
This thesis aims to build an intelligent system with dynamic 3D representations that can change over time to understand and recover the real world with semantic, instance and geometric information and eventually bridge the gap between the real world and the digital world. As the first step towards the challenges, this thesis explores both explicit representations and implicit representations by explicitly addressing the existing open problems in these areas. This thesis starts from neural implicit representation learning on 3D scene representation learning and understanding and moves to a parametric model based explicit 3D reconstruction method. Extensive experimentation over various benchmarks on various domains demonstrates the superiority of our method against previous state-of-the-art approaches, enabling many applications in the real world. Based on the proposed methods and current observations of open problems, this thesis finally presents a comprehensive conclusion with potential future research directions
- …