Search CORE

3 research outputs found

Mixed convolutional and long short-term memory network for the detection of lethal ventricular arrhythmia

Author: Alonso-Atienza Felipe
Aramendi Elisabete
Ayala Unai
Eftestøl Trygve
Figuera Carlos
Garrote Estibaliz
Irusta Unai
Kramer-Johansen Jo
Picon Artzai
Wik Lars
Álvarez-Gila Aitor
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

Early defibrillation by an automated external defibrillator (AED) is key for the survival of out-of-hospital cardiac arrest (OHCA) patients. ECG feature extraction and machine learning have been successfully used to detect ventricular fibrillation (VF) in AED shock decision algorithms. Recently, deep learning architectures based on 1D Convolutional Neural Networks (CNN) have been proposed for this task. This study introduces a deep learning architecture based on 1D-CNN layers and a Long Short-Term Memory (LSTM) network for the detection of VF. Two datasets were used, one from public repositories of Holter recordings captured at the onset of the arrhythmia, and a second from OHCA patients obtained minutes after the onset of the arrest. Data was partitioned patient-wise into training (80%) to design the classifiers, and test (20%) to report the results. The proposed architecture was compared to 1D-CNN only deep learners, and to a classical approach based on VF-detection features and a support vector machine (SVM) classifier. The algorithms were evaluated in terms of balanced accuracy (BAC), the unweighted mean of the sensitivity (Se) and specificity (Sp). The BAC, Se, and Sp of the architecture for 4-s ECG segments was 99.3%, 99.7%, and 98.9% for the public data, and 98.0%, 99.2%, and 96.7% for OHCA data. The proposed architecture outperformed all other classifiers by at least 0.3-points in BAC in the public data, and by 2.2-points in the OHCA data. The architecture met the 95% Sp and 90% Se requirements of the American Heart Association in both datasets for segment lengths as short as 3-s. This is, to the best of our knowledge, the most accurate VF detection algorithm to date, especially on OHCA data, and it would enable an accurate shock no shock diagnosis in a very short time.This study was supported by the Ministerio de Economía, Industria y Competitividad, Gobierno de España (ES) (TEC-2015-64678-R) to UI and EA and by Euskal Herriko Unibertsitatea (ES) (GIU17/031) to UI and EA. The funders, Tecnalia Research and Innovation and Banco Bilbao Vizcaya Argentaria (BBVA), provided support in the form of salaries for authors AP, AA, FAA, CF, EG, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the author contributions section

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

eBiltegia

NORA - Norwegian Open Research Archives

UiS Brage

TECNALIA Publications

FigShare

Self-supervised learning for image-to-image translation in the small data regime

Author: Álvarez Gila Aitor
Publication venue
Publication date: 01/01/2022
Field of study

La irrupció a gran escala de Xarxes Neuronals Convolucionals Profundes (CNNs) a la visió per computador des de 2012 ha duït a un paradigma predominant d'interpretació de la imatge consistent en un procés d'aprenentage completament supervisat amb conjunts massius de dades etiquetades. Aquesta aproximació ha resultat ser útil per a solucionar una miríada de tasques de visió per computador, amb resultats sense precedents, a costa d'emprar grans quantitats de dades anotades, recursos computacionals de gran magnitud, i tot el coneixement previ possible sobre la tasca a resoldre. Tot i que tècniques senzilles com el fine tuning han obtingut un gran impacte, el seu èxit quan la quantitat de dades etiquetades al domini objectiu és petita es manté limitat. A més a més, el caràcter no estàtic de les fonts de generació de dades resulta en canvis inesperats en la distribució d'aquestes dades, degradant el rendiment dels models. Com a conseqüència, hi ha una demanda de mètodes que puguin explotar elements de coneixement previ i fonts d'informació més enllà del conjunt d'etiquetes generades per un humà expert, per a que puguin adaptar-se a nous dominis que constitueixen un règim d'escasses dades etiquetades. Aquesta tesi s'adreça a aquesta classe d'escenaris en tres problemes de transformació imatge a imatge. Contribueix amb una sèrie de metodologies basades en coneixement a priori dels diferents elements del procés de formació de la imatge. Primer introduïm un marc conceptual, eficient en l'ús de dades, per al tractament del desenfocament, basat en un model capaç de produir degradacions locals sintètiques però realistes. Aquest marc es pot instanciar de tres maneres diferents: en tècnica auto-supervisada, feblement supervisada, o semi-supervisada, i resulta ser superior a les corresponent versions completament supervisades. El coneixement del procés de formació del color en la imatge és aprofitat després per a recopilar parelles entrada/objectiu d'imatges en el context de reconstrucció hiperespectral de la imatge. Emprem una CNN per a resoldre aquest problema, la qual cosa ens permet per primera vegada explotar context espacial i aconseguir resultats que estableixen un nou estat de l'art a partir de un conjunt reduït d'imatges hiperespectrals. En la nostra darrera contribució a l'àmbit de la transformació d'imatge a imatge en problemes amb poques dades anotades, presentem la nova tasca semi-supervisada de segmentació semàntica amb zero parells i amb vistes creuades: considerem el cas de recol·locació de camera en un sistema de segmentació semàntica desplegat. Assumint que podem obtenir un conjunt adicional de pars d'imatges, no etiquetades però sí sincronitzades, de noves escenes emprant la posició de càmera original i nova, presentem ZPCVNet, un model que possibilita la generació de prediccions semàntiques denses tant en vistes de inici com vistes objectiu. La carència de bases de dades públiques per a poder desenvolupar la metodolgia proposta ens condueix a la creació de MVMO, una base de dades pública de gran escala, multi-vista, multi-objecte, renderitzada mitjançant path tracing, amb anotacions per-vista de segmentació semàntica. Pensem que MVMO promourà futura recerca en la molt interessant però poc explorada àrea de la segmentació semàntica multi-vista i amb vistes creuades. Finalment, en una peça de recerca aplicada amb aplicació directa en un context de Forn d'Arc Elèctric (EAF) en una acereria, considerem el problema d'estimació simultània de la temperatura i la emissivitat espectral de mostres emissives calentes. Dissenyem el nostre propi sistema de captura, capaç de registrar senyal radiant entrant per un forat de 8cm de diàmetre localitzat fins a 20m de distància. Llavors definim un model físicament precís de transferència radiant. Resolem aquest problema invers sense la necessitat de dades anotades, mitjançant una aproximació Bayesiana basada en programació probabilística, que proporciona estimacions consistents amb mesures de nivell de laboratori.La irrupción masiva de las Redes Neuronales Convolucionales (CNN) en visión artificial a partir de 2012 condujo a un dominio del paradigma consistente en el aprendizaje extremo-a-extremo totalmente supervisado sobre bases de datos de imágenes de gran escala. Esta aproximación demostró ser extremadamente útil para la resolución de innumerables tareas de visión artificial con un rendimiento predictivo sin precedentes, a costa de requerir grandes cantidades de datos anotados y de recursos de computación, y de tener que descartar nuestro conocimiento previo sobre la tarea. Pese a que los métodos sencillos de aprendizaje por transferencia, tales como el fine-tuning, han logrado un impacto notable, su éxito se ve mermado cuando la cantidad de datos anotados en el dominio de destino es reducida. Asimismo, el carácter no estático de las fuentes de generación de datos deriva, en desplazamientos de la distribución de los datos que dan lugar a una degradación del rendimiento. En consecuencia, existe una demanda de métodos que puedan explotar tanto nuestro conocimiento a priori como fuentes de información adicionales a las anotaciones manuales, de manera que puedan adaptarse a nuevos dominios que constituyen un régimen de escasez de datos anotados. La presente tesis aborda dicho escenario en tres problemas de aprendizaje para mapeo imagen-a-imagen. En ella se hacen contribuciones que se apoyan en nuestro conocimiento previo sobre diferentes elementos del proceso de formación de imágenes: presentamos primero un marco de trabajo eficiente (en cuanto a uso de datos) para la detección de borrosidad, en base a un modelo capaz de producir degradaciones locales sintéticas realistas. La propuesta se compone de tres implementaciones (una auto-supervisada, una de supervisión débil, y una semi-supervisada), y supera a alternativas totalmente supervisadas. A continuación, empleamos nuestro conocimiento del dominio de la formación de imágenes en color para recopilar así parejas de imagenes de entrada y objetivo para la tarea de reconstrucción de imagen hiperespectral. Acometemos este problema haciendo uso de una CNN que nos permite explotar el contexto espacial y lograr resultados que suponen una avance en el estado de la técnica, dado un conjunto de imágenes hiperespectrales limitado. En nuestra siguiente contribución, presentamos la tarea semi-supervisada de segmentación semántica de vista cruzada con cero-pares: consideramos el caso de reubicación de la cámara en un sistema de segmentación semántica monocular ya implantado. Asumiendo que podemos obtener un conjunto adicional de pares de imágenes sincronizadas pero no anotadas de nuevas escenas desde ambas ubicaciones de cámara, presentamos ZPCVNet, un modelo que posibilita la generación de predicciones semánticas densas bajo ambas referencias. La inexistencia de bases de datos adecuadas para poder desarrollar este planteamiento nos condujo a la creación de MVMO, una base de datos de gran escala de imágenes Multi-Vista y Multi-Objeto, renderizadas mediante path tracing, y con anotaciones para segmentación semántica para cada vista. Esperamos que MVMO estimule futuras investigaciones en las áreas de la segmentación semántica multi-vista y de vista cruzada. Por último, en un ejercicio de investigación aplicada de utilidad directa en el contexto de monitorización del proceso en una planta de acería con horno eléctrico de arco (EAF), consideramos el problema de estimación conjunta de la temperatura y la emisividad espectral para muestras emisivas calientes distantes. Diseñamos nuestro propio dispositivo, el cual incorpora tres espectrómetros puntuales y es capaz de registrar la señal de radiancia procedente de un punto de 8cm ubicado a 20m de distancia. Asimismo, formulamos un modelo de transporte radiativo riguroso, para así resolver este problema inverso sin requerir dato anotado alguno, empleando una aproximación bayesiana apoyada en un modelo de programación probabilística que ofrece estimaciones de la distribución posterior de las variables aleatorias definidas consistentes con las mediciones de laboratorio.The mass irruption of Deep Convolutional Neural Networks (CNNs) in computer vision since 2012 led to a dominance of the image understanding paradigm consisting in an end-to-end fully supervised learning workflow over large-scale annotated datasets. This approach proved to be extremely useful at solving a myriad of classic and new computer vision tasks with unprecedented performance, at the expense of vast amounts of human-labeled data, extensive computational resources and the disposal of all of our prior knowledge on the task at hand. Even though simple transfer learning methods, such as fine-tuning, have achieved remarkable impact, their success when the amount of labeled data in the target domain is small is limited. Furthermore, the non-static nature of data generation sources will often derive in data distribution shifts that degrade the performance of deployed models. As a consequence, there is a growing demand for methods that can exploit elements of prior knowledge and sources of information other than the manually generated ground truth annotations of the images during the network training process, so that they can adapt to new domains that constitute, if not a small data regime, at least a small labeled data regime. This thesis targets such few or no labeled data scenario in three distinct image-to-image mapping learning problems. It contributes with various approaches that leverage our previous knowledge of different elements of the image formation process: We first present a data-efficient framework for both defocus and motion blur detection, based on a model able to produce realistic synthetic local degradations. The framework comprises a self-supervised, a weakly-supervised and a semi-supervised instantiation, and outperforms fully-supervised counterparts. Our knowledge on color image formation is then used to gather input and target ground truth image pairs for the RGB to hyperspectral image reconstruction task. We make use of a CNN to tackle this problem, which, for the first time, allows us to exploit spatial context and achieve state-of-the-art results given a limited hyperspectral image set. In our last contribution to the subfield of data-efficient image-to-image transformation problems, we present the novel semi-supervised task of zero-pair cross-view semantic segmentation: we consider the case of relocation of the camera in an end-to-end trained and deployed monocular, fixed-view semantic segmentation system often found in industry. Under the assumption that we are allowed to obtain an additional set of synchronized but unlabeled image pairs of new scenes from both original and new camera poses, we present ZPCVNet, a model and training procedure that enables the production of dense semantic predictions in either source or target views at inference time. The lack of existing suitable public datasets to develop this approach led us to the creation of MVMO, a large-scale Multi-View, Multi-Object path-traced dataset with per-view semantic segmentation annotations. We expect MVMO to propel future research in the exciting under-developed fields of cross-view and multi-view semantic segmentation. Last, in a piece of applied research of direct application in the context of process monitoring of an Electric Arc Furnace (EAF) in a steelmaking plant, we also consider the problem of simultaneously estimating the temperature and spectral emissivity of distant hot emissive samples. To that end, we design our own capturing device, which integrates three point spectrometers and is capable of registering the radiance signal incoming from an 8cm diameter spot located up to 20m away. We then define a physically accurate radiative transfer model and solve this inverse problem without the need for annotated data using a probabilistic programming-based Bayesian approach, which yields full posterior distribution estimates of the involved variables that are consistent with laboratory-grade measurements

Tesis Doctorals en Xarxa

Diposit Digital de Documents de la UAB

On the duality between retinex and image dehazing

Author: Bertalmío Marcelo
Bria Alessandro
Galdran Adrian
Vazquez-Corral Javier
Álvarez-Gila Aitor
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Comunicació presentada a: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, celebrada a Salt Lake City, Estats Units d'Amèrica, del 18 al 23 de juny de 2018.Image dehazing deals with the removal of undesired lossof visibility in outdoor images due to the presence of fog.Retinex is a color vision model mimicking the ability ofthe Human Visual System to robustly discount varying illu-minations when observing a scene under different spectrallighting conditions. Retinex has been widely explored inthe computer vision literature for image enhancement andother related tasks. While these two problems are appar-ently unrelated, the goal of this work is to show that theycan be connected by a simple linear relationship. Specif-ically, most Retinex-based algorithms have the character-istic feature of always increasing image brightness, whichturns them into ideal candidates for effective image dehaz-ing by directly applying Retinex to a hazy imagewhose in-tensities have been inverted. In this paper, we give theoret-ical proof that Retinex on inverted intensities is a solutionto the image dehazing problem. Comprehensive qualitativeand quantitative results indicate that several classical andmodern implementations of Retinex can be transformed intocompeting image dehazing algorithms performing on pairwith more complex fog removal methods, and can overcomesome of the main challenges associated with this problem.JVC was supported by the Spanish government grant ref.IJCI-2014-19516, and MB by European Research Coun-cil, Starting Grant ref. 306337, by the Spanish governmentgrant ref. TIN2015-71537-P, & by Icrea Academia Award

RECERCAT