Search CORE

171,128 research outputs found

Visually-Aware Context Modeling for News Image Captioning

Author: Moens Marie-Francine
Qu Tingyu
Tuytelaars Tinne
Publication venue
Publication date: 16/08/2023
Field of study

The goal of News Image Captioning is to generate an image caption according to the content of both a news article and an image. To leverage the visual information effectively, it is important to exploit the connection between the context in the articles/captions and the images. Psychological studies indicate that human faces in images draw higher attention priorities. On top of that, humans often play a central role in news stories, as also proven by the face-name co-occurrence pattern we discover in existing News Image Captioning datasets. Therefore, we design a face-naming module for faces in images and names in captions/articles to learn a better name embedding. Apart from names, which can be directly linked to an image area (faces), news image captions mostly contain context information that can only be found in the article. Humans typically address this by searching for relevant information from the article based on the image. To emulate this thought process, we design a retrieval strategy using CLIP to retrieve sentences that are semantically close to the image. We conduct extensive experiments to demonstrate the efficacy of our framework. Without using additional paired data, we establish the new state-of-the-art performance on two News Image Captioning datasets, exceeding the previous state-of-the-art by 5 CIDEr points. We will release code upon acceptance

arXiv.org e-Print Archive

Taking the bite out of automated naming of characters in TV video

Author: Everingham M.
Sivic J.
Zisserman A.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

We investigate the problem of automatically labelling appearances of characters in TV or film material with their names. This is tremendously challenging due to the huge variation in imaged appearance of each character and the weakness and ambiguity of available annotation. However, we demonstrate that high precision can be achieved by combining multiple sources of information, both visual and textual. The principal novelties that we introduce are: (i) automatic generation of time stamped character annotation by aligning subtitles and transcripts; (ii) strengthening the supervisory information by identifying when characters are speaking. In addition, we incorporate complementary cues of face matching and clothing matching to propose common annotations for face tracks, and consider choices of classifier which can potentially correct errors made in the automatic extraction of training data from the weak textual annotation. Results are presented on episodes of the TV series ‘‘Buffy the Vampire Slayer”

Oxford University Research Archive

Captured by the camera's eye: Guantanamo and the shifting frame of the Global War on Terror

Author: Ash
Begg
Begg
Berger
Bowker
Burns
Butler
Butler
Butler
Campbell
Connolly
Danchev
Denbeaux
Derrida
Dowdney
Dworkin
Eban
ELSPETH VAN VEEREN
Enloe
Farmer
Foucault
Gillis
Goldberg
Gordon
Gorman
Guimond
Hariman
Hariman
Hariman
Heusdens
Hewitt
Horton
Jackson
Kassin
Kaufman-Osborn
Kellerhals
Mayer
Mazzetti
Monaco
Monaco
Montopoli
Moyes
Perlmutter
Rasul
Rose
Rosenberg
Rosenberg
Ruiz
Rumsfeld
Rumsfeld
Rumsfeld
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Sontag
Sontag
Sontag
Sontag
Sontag
Thompson
Weldes
Yanow
Yee
Yee
Yee
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/10/2011
Field of study

In January 2002, images of the detention of prisoners held at US Naval Station Guantanamo Bay as part of the Global War on Terrorism were released by the US Department of Defense, a public relations move that Secretary of Defense Donald Rumsfeld later referred to as ‘probably unfortunate’. These images, widely reproduced in the media, quickly came to symbolise the facility and the practices at work there. Nine years on, the images of orange-clad ‘detainees’ – the ‘orange series’ – remain a powerful symbol of US military practices and play a significant role in the resistance to the site. However, as the site has evolved, so too has its visual representation. Official images of these new facilities not only document this evolution but work to constitute, through a careful (re)framing (literal and figurative), a new (re)presentation of the site, and therefore the identities of those involved. The new series of images not only (re)inscribes the identities of detainees as dangerous but, more importantly, work to constitute the US State as humane and modern. These images are part of a broader effort by the US administration to resituate its image, and remind us, as IR scholars, to look at the diverse set of practices (beyond simply spoken language) to understand the complexity of international politic

Regulatory Crisis at Lloyd\u27s of London: Reform from Within

Author: Kelley Ian
Publication venue: FLASH: The Fordham Law Archive of Scholarship and History
Publication date: 01/01/1995
Field of study

Fordham University School of Law

Level Playing Field for Million Scale Face Recognition

Author: Kemelmacher-Shlizerman Ira
Nech Aaron
Publication venue
Publication date: 30/04/2017
Field of study

Face recognition has the perception of a solved problem, however when tested at the million-scale exhibits dramatic variation in accuracies across the different algorithms. Are the algorithms very different? Is access to good/big training data their secret weapon? Where should face recognition improve? To address those questions, we created a benchmark, MF2, that requires all algorithms to be trained on same data, and tested at the million scale. MF2 is a public large-scale set with 672K identities and 4.7M photos created with the goal to level playing field for large scale face recognition. We contrast our results with findings from the other two large-scale benchmarks MegaFace Challenge and MS-Celebs-1M where groups were allowed to train on any private/public/big/small set. Some key discoveries: 1) algorithms, trained on MF2, were able to achieve state of the art and comparable results to algorithms trained on massive private sets, 2) some outperformed themselves once trained on MF2, 3) invariance to aging suffers from low accuracies as in MegaFace, identifying the need for larger age variations possibly within identities or adjustment of algorithms in future testings

arXiv.org e-Print Archive

Information extraction from multimedia web documents: an open-source platform and testbed

Author: Blanco Roi
Boato Giulia
Costanzo Andrea
Demidova Elena
Dupplaw David
Fontani Marco
Griffiths Thomas
Hare Jonathon
Johansson Richard
Lewis Paul H.
Matthews Michael
Minack Enrico
Moschitti Alessandro
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2014
Field of study

The LivingKnowledge project aimed to enhance the current state of the art in search, retrieval and knowledge management on the web by advancing the use of sentiment and opinion analysis within multimedia applications. To achieve this aim, a diverse set of novel and complementary analysis techniques have been integrated into a single, but extensible software platform on which such applications can be built. The platform combines state-of-the-art techniques for extracting facts, opinions and sentiment from multimedia documents, and unlike earlier platforms, it exploits both visual and textual techniques to support multimedia information retrieval. Foreseeing the usefulness of this software in the wider community, the platform has been made generally available as an open-source project. This paper describes the platform design, gives an overview of the analysis algorithms integrated into the system and describes two applications that utilise the system for multimedia information retrieval

Southampton (e-Prints Soton)