Search CORE

6 research outputs found

Text-image synergy for multimodal retrieval and annotation

Author: Nag Chowdhury Sreyasi
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2021
Field of study

Text and images are the two most common data modalities found on the Internet. Understanding the synergy between text and images, that is, seamlessly analyzing information from these modalities may be trivial for humans, but is challenging for software systems. In this dissertation we study problems where deciphering text-image synergy is crucial for finding solutions. We propose methods and ideas that establish semantic connections between text and images in multimodal contents, and empirically show their effectiveness in four interconnected problems: Image Retrieval, Image Tag Refinement, Image-Text Alignment, and Image Captioning. Our promising results and observations open up interesting scopes for future research involving text-image data understanding.Text and images are the two most common data modalities found on the Internet. Understanding the synergy between text and images, that is, seamlessly analyzing information from these modalities may be trivial for humans, but is challenging for software systems. In this dissertation we study problems where deciphering text-image synergy is crucial for finding solutions. We propose methods and ideas that establish semantic connections between text and images in multimodal contents, and empirically show their effectiveness in four interconnected problems: Image Retrieval, Image Tag Refinement, Image-Text Alignment, and Image Captioning. Our promising results and observations open up interesting scopes for future research involving text-image data understanding.Text und Bild sind die beiden häufigsten Arten von Inhalten im Internet. Während es für Menschen einfach ist, gerade aus dem Zusammenspiel von Text- und Bildinhalten Informationen zu erfassen, stellt diese kombinierte Darstellung von Inhalten Softwaresysteme vor große Herausforderungen. In dieser Dissertation werden Probleme studiert, für deren Lösung das Verständnis des Zusammenspiels von Text- und Bildinhalten wesentlich ist. Es werden Methoden und Vorschläge präsentiert und empirisch bewertet, die semantische Verbindungen zwischen Text und Bild in multimodalen Daten herstellen. Wir stellen in dieser Dissertation vier miteinander verbundene Text- und Bildprobleme vor: • Bildersuche. Ob Bilder anhand von textbasierten Suchanfragen gefunden werden, hängt stark davon ab, ob der Text in der Nähe des Bildes mit dem der Anfrage übereinstimmt. Bilder ohne textuellen Kontext, oder sogar mit thematisch passendem Kontext, aber ohne direkte Übereinstimmungen der vorhandenen Schlagworte zur Suchanfrage, können häufig nicht gefunden werden. Zur Abhilfe schlagen wir vor, drei Arten von Informationen in Kombination zu nutzen: visuelle Informationen (in Form von automatisch generierten Bildbeschreibungen), textuelle Informationen (Stichworte aus vorangegangenen Suchanfragen), und Alltagswissen. • Verbesserte Bildbeschreibungen. Bei der Objekterkennung durch Computer Vision kommt es des Öfteren zu Fehldetektionen und Inkohärenzen. Die korrekte Identifikation von Bildinhalten ist jedoch eine wichtige Voraussetzung für die Suche nach Bildern mittels textueller Suchanfragen. Um die Fehleranfälligkeit bei der Objekterkennung zu minimieren, schlagen wir vor Alltagswissen einzubeziehen. Durch zusätzliche Bild-Annotationen, welche sich durch den gesunden Menschenverstand als thematisch passend erweisen, können viele fehlerhafte und zusammenhanglose Erkennungen vermieden werden. • Bild-Text Platzierung. Auf Internetseiten mit Text- und Bildinhalten (wie Nachrichtenseiten, Blogbeiträge, Artikel in sozialen Medien) werden Bilder in der Regel an semantisch sinnvollen Positionen im Textfluss platziert. Wir nutzen dies um ein Framework vorzuschlagen, in dem relevante Bilder ausgesucht werden und mit den passenden Abschnitten eines Textes assoziiert werden. • Bildunterschriften. Bilder, die als Teil von multimodalen Inhalten zur Verbesserung der Lesbarkeit von Texten dienen, haben typischerweise Bildunterschriften, die zum Kontext des umgebenden Texts passen. Wir schlagen vor, den Kontext beim automatischen Generieren von Bildunterschriften ebenfalls einzubeziehen. Üblicherweise werden hierfür die Bilder allein analysiert. Wir stellen die kontextbezogene Bildunterschriftengenerierung vor. Unsere vielversprechenden Beobachtungen und Ergebnisse eröffnen interessante Möglichkeiten für weitergehende Forschung zur computergestützten Erfassung des Zusammenspiels von Text- und Bildinhalten

Universaar

Acronym

MPG.PuRe

Wildlife Protection and Habitat Management

Author
Publication venue: 'MDPI AG'
Publication date: 16/09/2022
Field of study

The management of wildlife populations and their habitats are interdisciplinary fields that encompass many scientific disciplines that also impact the lives of people. Therefore, these are truly applied sciences where human dimensions play an important role.This book highlights the importance of conducting rigorous studies to design and implement the effective management and restoration of wild populations and their habitats. A new paradigm in conservation is developing that goes beyond the boundaries of protected areas to achieve the goal of sustainable development. The 16 papers in this book, including reviews and a project report, cover a broad range of topics, exploring a diversity of subjects that are representative of current practices and novel applications.We would like to thank both the MDPI publishers and editorial staff for their support and help during the process of editing this book, in addition to the authors for their contributions

Directory of Open Access Books (DOAB)

Detección y categorización de objetos invariante y multivista en imágenes digitales mediante visión artificial bioinspirada.

Author: Rodríguez Vaamonde Sergio
Publication venue
Publication date: 22/01/2016
Field of study

344 p.Esta tesis se posiciona en el campo de la anotación automática de imágenes dentro del área de investigación de la Visión Artificial. El principal objetivo de este campo es generar etiquetas textuales para una imagen de tal forma que describan los objetos existentes en la imagen sin intervención humana.Esta tesis se basa en el modelo de vecinos más cercanos para anotar de forma automática una imagen. La novedad de la tesis reside en la propuesta de una nueva implementación de los dos pasos principales de dicho modelo. En el primer paso, esta tesis propone el uso de las características MPEG7 para describir la similitud entre imágenes y propone un nuevo modelo de características de textura basado en el cortex primario de un primate. Se ha comprobado como el algoritmo formulado es más efectivo que la implementación propuesta por el estándar pero también es más preciso que otros modelos de córtex presentes en la literatura de neurociencia.En el segundo paso del modelo, esta tesis presenta un nuevo algoritmo para seleccionar las posibles etiquetas de una imagen dadas las imágenes visualmente similares. La principal ventaja introducida poreste algoritmo es la combinación de información textual de las etiquetas e información visual de las imágenes. Adicionalmente, esta tesis también propone un nuevo algoritmo de entrenamiento que tiene el beneficio de ser rápido y adaptado a la tarea de anotación particular, por lo que es posible aplicarlo en tiempo de anotación

Archivo Digital para la Docencia y la Investigación

BD 5 2022 Complete

Author: Bibliotheca Dantesca Managing Editors
Publication venue: ScholarlyCommons
Publication date: 13/12/2022
Field of study

ScholarlyCommons@Penn

Design revolutions: IASDR 2019 Conference Proceedings. Volume 2: Living, Making, Value

Author
Publication venue: Manchester Metropolitan University
Publication date: 06/08/2020
Field of study

In September 2019 Manchester School of Art at Manchester Metropolitan University was honoured to host the bi-annual conference of the International Association of Societies of Design Research (IASDR) under the unifying theme of DESIGN REVOLUTIONS. This was the first time the conference had been held in the UK. Through key research themes across nine conference tracks – Change, Learning, Living, Making, People, Technology, Thinking, Value and Voices – the conference opened up compelling, meaningful and radical dialogue of the role of design in addressing societal and organisational challenges. This Volume 2 includes papers from Living, Making and Value tracks of the conference

E-space: Manchester Metropolitan University's Research Repository

Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference

Author
Publication venue: AUAI Press
Publication date: 01/09/2018
Field of study

UCL Discovery