Search CORE

31 research outputs found

Computer vision beyond the visible : image understanding through language

Author: Salvador Aguilera Amaia
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2019
Field of study

In the past decade, deep neural networks have revolutionized computer vision. High performing deep neural architectures trained for visual recognition tasks have pushed the field towards methods relying on learned image representations instead of hand-crafted ones, in the seek of designing end-to-end learning methods to solve challenging tasks, ranging from long-lasting ones such as image classification to newly emerging tasks like image captioning. As this thesis is framed in the context of the rapid evolution of computer vision, we present contributions that are aligned with three major changes in paradigm that the field has recently experienced, namely 1) the power of re-utilizing deep features from pre-trained neural networks for different tasks, 2) the advantage of formulating problems with end-to-end solutions given enough training data, and 3) the growing interest of describing visual data with natural language rather than pre-defined categorical label spaces, which can in turn enable visual understanding beyond scene recognition. The first part of the thesis is dedicated to the problem of visual instance search, where we particularly focus on obtaining meaningful and discriminative image representations which allow efficient and effective retrieval of similar images given a visual query. Contributions in this part of the thesis involve the construction of sparse Bag-of-Words image representations from convolutional features from a pre-trained image classification neural network, and an analysis of the advantages of fine-tuning a pre-trained object detection network using query images as training data. The second part of the thesis presents contributions to the problem of image-to-set prediction, understood as the task of predicting a variable-sized collection of unordered elements for an input image. We conduct a thorough analysis of current methods for multi-label image classification, which are able to solve the task in an end-to-end manner by simultaneously estimating both the label distribution and the set cardinality. Further, we extend the analysis of set prediction methods to semantic instance segmentation, and present an end-to-end recurrent model that is able to predict sets of objects (binary masks and categorical labels) in a sequential manner. Finally, the third part of the dissertation takes insights learned in the previous two parts in order to present deep learning solutions to connect images with natural language in the context of cooking recipes and food images. First, we propose a retrieval-based solution in which the written recipe and the image are encoded into compact representations that allow the retrieval of one given the other. Second, as an alternative to the retrieval approach, we propose a generative model to predict recipes directly from food images, which first predicts ingredients as sets and subsequently generates the rest of the recipe one word at a time by conditioning both on the image and the predicted ingredients.En l'última dècada, les xarxes neuronals profundes han revolucionat el camp de la visió per computador. Els resultats favorables obtinguts amb arquitectures neuronals profundes entrenades per resoldre tasques de reconeixement visual han causat un canvi de paradigma cap al disseny de mètodes basats en representacions d'imatges apreses de manera automàtica, deixant enrere les tècniques tradicionals basades en l'enginyeria de representacions. Aquest canvi ha permès l'aparició de tècniques basades en l'aprenentatge d'extrem a extrem (end-to-end), capaces de resoldre de manera efectiva molts dels problemes tradicionals de la visió per computador (e.g. classificació d'imatges o detecció d'objectes), així com nous problemes emergents com la descripció textual d'imatges (image captioning). Donat el context de la ràpida evolució de la visió per computador en el qual aquesta tesi s'emmarca, presentem contribucions alineades amb tres dels canvis més importants que la visió per computador ha experimentat recentment: 1) la reutilització de representacions extretes de models neuronals pre-entrenades per a tasques auxiliars, 2) els avantatges de formular els problemes amb solucions end-to-end entrenades amb grans bases de dades, i 3) el creixent interès en utilitzar llenguatge natural en lloc de conjunts d'etiquetes categòriques pre-definits per descriure el contingut visual de les imatges, facilitant així l'extracció d'informació visual més enllà del reconeixement de l'escena i els elements que la composen La primera part de la tesi està dedicada al problema de la cerca d'imatges (image retrieval), centrada especialment en l'obtenció de representacions visuals significatives i discriminatòries que permetin la recuperació eficient i efectiva d'imatges donada una consulta formulada amb una imatge d'exemple. Les contribucions en aquesta part de la tesi inclouen la construcció de representacions Bag-of-Words a partir de descriptors locals obtinguts d'una xarxa neuronal entrenada per classificació, així com un estudi dels avantatges d'utilitzar xarxes neuronals per a detecció d'objectes entrenades utilitzant les imatges d'exemple, amb l'objectiu de millorar les capacitats discriminatòries de les representacions obtingudes. La segona part de la tesi presenta contribucions al problema de predicció de conjunts a partir d'imatges (image to set prediction), entès com la tasca de predir una col·lecció no ordenada d'elements de longitud variable donada una imatge d'entrada. En aquest context, presentem una anàlisi exhaustiva dels mètodes actuals per a la classificació multi-etiqueta d'imatges, que són capaços de resoldre la tasca de manera integral calculant simultàniament la distribució probabilística sobre etiquetes i la cardinalitat del conjunt. Seguidament, estenem l'anàlisi dels mètodes de predicció de conjunts a la segmentació d'instàncies semàntiques, presentant un model recurrent capaç de predir conjunts d'objectes (representats per màscares binàries i etiquetes categòriques) de manera seqüencial. Finalment, la tercera part de la tesi estén els coneixements apresos en les dues parts anteriors per presentar solucions d'aprenentatge profund per connectar imatges amb llenguatge natural en el context de receptes de cuina i imatges de plats cuinats. En primer lloc, proposem una solució basada en algoritmes de cerca, on la recepta escrita i la imatge es codifiquen amb representacions compactes que permeten la recuperació d'una donada l'altra. En segon lloc, com a alternativa a la solució basada en algoritmes de cerca, proposem un model generatiu capaç de predir receptes (compostes pels seus ingredients, predits com a conjunts, i instruccions) directament a partir d'imatges de menjar.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Temporal activity detection in untrimmed videos with recurrent neural networks

Author: Giró Nieto Xavier
Montes Alberto
Pascual Santiago
Salvador Aguilera Amaia
Publication venue
Publication date: 01/01/2016
Field of study

This work proposes a simple pipeline to classify and temporally localize activities in untrimmed videos. Our system uses features from a 3D Convolutional Neural Network (C3D) as input to train a a recurrent neural network (RNN) that learns to classify video clips of 16 frames. After clip prediction, we post-process the output of the RNN to assign a single activity label to each video, and determine the temporal boundaries of the activity within the video. We show how our system can achieve competitive results in both tasks with a simple architecture. We evaluate our method in the ActivityNet Challenge 2016, achieving a 0.5874 mAP and a 0.2237 mAP in the classification and detection tasks, respectively. Our code and models are publicly available at: https://imatge-upc.github.io/activitynet-2016-cvprw/Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

RVOS: end-to-end recurrent network for video object segmentation

Author: Bellver Bueno Míriam
Girbau Xalabarder Andreu
Giró Nieto Xavier
Marqués Acosta Fernando
Salvador Aguilera Amaia
Ventura Royo Carles
Publication venue: Computer Vision Foundation
Publication date: 01/01/2019
Field of study

Multiple object video object segmentation is a challenging task, specially for the zero-shot case, when no object mask is given at the initial frame and the model has to find the objects to be segmented along the sequence. In our work, we propose a Recurrent network for multiple object Video Object Segmentation (RVOS) that is fully end-to-end trainable. Our model incorporates recurrence on two different domains: (i) the spatial, which allows to discover the different object instances within a frame, and (ii) the temporal, which allows to keep the coherence of the segmented objects along time. We train RVOS for zero-shot video object segmentation and are the first ones to report quantitative results for DAVIS-2017 and YouTube-VOS benchmarks. Further, we adapt RVOS for one-shot video object segmentation by using the masks obtained in previous time steps as inputs to be processed by the recurrent module. Our model reaches comparable results to state-of-the-art techniques in YouTube-VOS benchmark and outperforms all previous video object segmentation methods not using online learning in the DAVIS-2017 benchmark. Moreover, our model achieves faster inference runtimes than previous methods, reaching 44ms/frame on a P100 GPU.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Crowdsourced object segmentation with a game

Author: Carlier Axel
Charvillat Vincent
Giró Nieto Xavier
Marques Oge
Salvador Aguilera Amaia
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

We introduce a new algorithm for image segmentation based on crowdsourcing through a game : Ask'nSeek. The game provides information on the objects of an image, under the form of clicks that are either on the object, or on the background. These logs are then used in order to determine the best segmentation for an object among a set of candidates generated by the state-of-the-art CPMC algorithm. We also introduce a simulator that allows the generation of game logs and therefore gives insight about the number of games needed on an image to perform acceptable segmentation.Peer ReviewedPostprint (published version

Crossref

UPCommons. Portal del coneixement obert de la UPC

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Budget-aware semi-supervised semantic and instance segmentation

Author: Bellver Bueno Míriam
Giró Nieto Xavier
Salvador Aguilera Amaia
Torres Viñals Jordi
Publication venue
Publication date: 01/01/2019
Field of study

Methods that move towards less supervised scenarios are key for image segmentation, as dense labels demand significant human intervention. Generally, the annotation burden is mitigated by labeling datasets with weaker forms of supervision, e.g. image-level labels or bounding boxes. Another option are semi-supervised settings, that commonly leverage a few strong annotations and a huge number of unlabeled/weakly-labeled data. In this paper, we revisit semi-supervised segmentation schemes and narrow down significantly the annotation budget (in terms of total labeling time of the training set) compared to previous approaches. With a very simple pipeline, we demonstrate that at low annotation budgets, semi-supervised methods outperform by a wide margin weakly-supervised ones for both semantic and instance segmentation. Our approach also outperforms previous semi-supervised works at a much reduced labeling cost. We present results for the Pascal VOC benchmark and unify weakly and semi-supervised ap- proaches by considering the total annotation budget, thus allowing a fairer comparison between methods.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

RVOS: end-to-end recurrent network for video object segmentation

Author: Bellver Míriam
Girbau Xalabarder Andreu
Giró Nieto Xavier
Marqués Acosta Fernando
Salvador Aguilera Amaia
Ventura Royo Carles
Publication venue
Publication date: 15/06/2019
Field of study

UPCommons. Portal del coneixement obert de la UPC

Recurrent semantic instance semantic segmentation

Author: Bellver Míriam
Campos Víctor
Giró Nieto Xavier
Marqués Acosta Fernando
Salvador Aguilera Amaia
Torres Viñals Jordi
Publication venue: Barcelona Supercomputing Center
Publication date: 24/04/2018
Field of study

UPCommons. Portal del coneixement obert de la UPC

The evolution of the ventilatory ratio is a prognostic factor in mechanically ventilated COVID-19 ARDS patients

Author: Adell Serrano Berta
Agrifoglio Alexander
Aguilar Cabello María
Aguilera Luciano
Albaiceta Guillermo M.
Alcaraz Serrano Victoria
Aldecoa César
Alegre Cynthia
Almansa Mora Raquel
Amaya Villar Rosario
Andrea Rut
Arrieta Marta
Ayestarán J. Ignacio
Añón José M.
Badia Joan Ramon
Badía Mariona
Balsera Begoña
Barbena Laura
Barberà Carme
Barberán José
Barbeta Enric
Barbé Ferran
Bardi Tommaso
Barral Segade Patricia
Barroso Marta
Berezo García José Ángel
Bermejo Martín Jesús F.
Bigas Judit
Blancas Rafael
Blandino Ortiz Aaron
Blasco Cortés María Luisa
Boado María
Bodi Saera María
Bofill Neus
Bouza Vieiro María Teresa
Bueno Leticia
Bustamante Munguira Elena
Bustamante-munguira Juan
Busto Martínez Cecilia del
Báez Pravia Orville
Caballero Jesús
Cachafeiro Lucia
Campi Hermoso David
Campos Fernández Sandra
Cano Iosune
Cantón-bulnes Maria Luisa
Carbajales Cristina
Carbonell Nieves
Cardina Fernández Pablo
Carrión García Laura
Carvalho Sula
Casacuberta Barberà Núria
Castellví Andrea
Castellà Manuel
Castro Pedro
Catalán González Mercedes
Ceccato Adrián
CIBERESUCICOVID Project (COV20/00110 ISCIII)
Cicuendez Ávila Ramon
Cillóniz Catia
Clar Luisa
Climent Cristina
Codina Jordi
Conde Pamela
Contreras Sofía
Dot Irene
Díaz Santos Emili
Díaz Yolanda
Dólera Moreno Cristina
Enríquez Giraudo Pedro
Esmorís Arijón Inés
Estella Ángel
Farre Monjo Teresa
Fernández Barat Laia
Fernández Javier
Ferrando Carlos
Ferrer Roca Ricard
Figueras Albert
Forcadell Ferreres Eva
Forcelledo Espina Lorena
Franco Nieves
Furro Àngels
Gabarrús Albert
Galbán Cristóbal
Gallego Elena
García Garmendia José Luis
García Gasulla Dario
García Prieto Emilio
García Redruello Carlos
García Sagastume Amaia
García Beatriz
García Felipe
Garnacho Montero José
Gascón Castillo Maria Luisa
Gomà Gemma
Gonzalo Calvo David de
González Jessica
Gordo Federico
Gracia Maria Pilar
Gumucio Sanguino Víctor D.
Gómez Casal Vanesa
Gómez Gonzalez Carmen
Gómez José M.
Gómez Silvia
Herraiz Alba
Herrán Monge Rubén
Huerta Arturo
Ibarz Mercedes
Iglesias Silvia
Janer Maria Teresa
Jiménez Gabriel
Jorge García Ruth Noemí
Juan Díaz Mar
Kiarostami Karsa
Lazo Álvarez Juan I.
León Miguel
Lorente José Angel
Loza Vázquez Ana
López Gavín Alexandre
López Lago Ana
López Messa Juan
Macias Guerrero Desire
Mamolar Herrera Nuria
Mantellini Cecilia L.
Marco Naya Gregorio
Marcos Pilar
Marin Corral Judith
Mariño Ana Balan
Marmol Peis Enrique
Martin María Cruz
Martín Vicente Paula
Martínez de la Gándara Amalia
Martínez Fernández Carmen Eulalia
Martínez Juan Maria Dolores
Martínez Varela Ignacio
Martínez María
Masa Jimenez Juan Fernando
Masclans Joan Ramon
Maseda Emilio
Mañez Mendiluce Rafael
Mendoza Diego de
Menor Fernández Eva María
Menéndez Rosario
Miralbés Mar
Monclou Josman
Montejo González Juan Carlos
Montserrat Neus
Mora Aznar María
Moral Parras Pedro
Morales Dulce
Moreno Cano Sara Guadalupe
Mosquera Rodríguez David
Motos Anna
Muñoz Bermúdez Rosana
Nicolás José María
Nogue Bou Ramon
Nogueras Salinas Rafaela
Novo Mariana Andrea
Ocón Marta
Ortega Ana
Ossa Sergio
Pablo Sánchez Raul de
Pagliarani Pablo
Parera Pous Anna
Parrilla Francisco
Pestaña Laguna David
Peñasco Yhivian
Peñuelas Oscar
Piñol-tena Àngels
Pozo Laderas Juan Carlos
Prados Javier
Pujol Andrés
Pérez Arnal Raquel
Pérez Bastida Leire
Pérez Planelles Gloria
Pérez Rubio Eva
Pérez Purificación
Ramon Coll Núria
Renedo Sanchez-Giron Gloria
Ricart Pilar
Riera Jordi
Rivas Vilas María Digna
Roche Campo Ferran
Rodriguez Oviedo Alejandro
Rodriguez Laura
Rodríguez de Castro Felipe
Rodríguez Ruiz Covadonga
Rodríguez Silvia
Rubio López Alberto
Rubio Jorge
Ruiz Miralles Miriam
Ryan Murúa Pablo
Saborido Paz Eva
Salazar Degracia Ana
Salvador Adell Inmaculada
Sanchez Miguel
Sancho Chinesta Susana
Santacoloma Bitor
Sariñena Maria Teresa
Segura Pensado Marta
Serra Fortuny Mireia
Serra Lidia
Serrano Lázaro Ainhoa
Servià Lluís
Socias Lorenzo
Soliva Laura
Solé Violan Jordi
Speziale Carla
Suares Sipmann Fernando
Sánchez Miralles Angel
Sánchez Ana
Tamayo Lomas Luis
Tognetti Daniel
Tormos Adrián
Torre Maria del Carmen de la
Torres Martí Antoni
Torres Mateu
Trefler Sandra
Trenado José
Trujillano Javier
Urrelo Cerrón Luis
Val Estela
Valdivia Ruiz Luis
Vallverdú Montserrat
Van Der Hofstadt Martin-Montalvo Maria
Vara Adrio Sabela
Vengoechea Javier
Vidal Cortes Pablo
Vilanova Judit
Villada Warrington Tatiana
Vilà Vilardel Clara
Vázquez Nil
Yang Hua
Yang Minlan
Zapatero Ana
Álvarez Ruiz Antonjo
Álvarez Sergio
Ángel José
Úbeda Alejandro
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/10/2021
Field of study

Background: Mortality due to COVID-19 is high, especially in patients requiring mechanical ventilation. The purpose of the study is to investigate associations between mortality and variables measured during the first three days of mechanical ventilation in patients with COVID-19 intubated at ICU admission. Methods: Multicenter, observational, cohort study includes consecutive patients with COVID-19 admitted to 44 Spanish ICUs between February 25 and July 31, 2020, who required intubation at ICU admission and mechanical ventilation for more than three days. We collected demographic and clinical data prior to admission; information about clinical evolution at days 1 and 3 of mechanical ventilation; and outcomes. Results: Of the 2,095 patients with COVID-19 admitted to the ICU, 1,118 (53.3%) were intubated at day 1 and remained under mechanical ventilation at day three. From days 1 to 3, PaO2/FiO2 increased from 115.6 [80.0-171.2] to 180.0 [135.4-227.9] mmHg and the ventilatory ratio from 1.73 [1.33-2.25] to 1.96 [1.61-2.40]. In-hospital mortality was 38.7%. A higher increase between ICU admission and day 3 in the ventilatory ratio (OR 1.04 [CI 1.01-1.07], p = 0.030) and creatinine levels (OR 1.05 [CI 1.01-1.09], p = 0.005) and a lower increase in platelet counts (OR 0.96 [CI 0.93-1.00], p = 0.037) were independently associated with a higher risk of death. No association between mortality and the PaO2/FiO2 variation was observed (OR 0.99 [CI 0.95 to 1.02], p = 0.47). Conclusions: Higher ventilatory ratio and its increase at day 3 is associated with mortality in patients with COVID-19 receiving mechanical ventilation at ICU admission. No association was found in the PaO2/FiO2 variation

Diposit Digital de la Universitat de Barcelona

Computer vision beyond the visible : image understanding through language

Author: Salvador Aguilera Amaia
Publication venue: Universitat Politècnica de Catalunya
Publication date: 27/06/2019
Field of study

Temporal activity detection in untrimmed videos with recurrent neural networks

Author: Giró Nieto Xavier
Montes Alberto
Pascual Santiago
Salvador Aguilera Amaia
Publication venue
Publication date
Field of study

RECERCAT