171,128 research outputs found

    Visually-Aware Context Modeling for News Image Captioning

    Full text link
    The goal of News Image Captioning is to generate an image caption according to the content of both a news article and an image. To leverage the visual information effectively, it is important to exploit the connection between the context in the articles/captions and the images. Psychological studies indicate that human faces in images draw higher attention priorities. On top of that, humans often play a central role in news stories, as also proven by the face-name co-occurrence pattern we discover in existing News Image Captioning datasets. Therefore, we design a face-naming module for faces in images and names in captions/articles to learn a better name embedding. Apart from names, which can be directly linked to an image area (faces), news image captions mostly contain context information that can only be found in the article. Humans typically address this by searching for relevant information from the article based on the image. To emulate this thought process, we design a retrieval strategy using CLIP to retrieve sentences that are semantically close to the image. We conduct extensive experiments to demonstrate the efficacy of our framework. Without using additional paired data, we establish the new state-of-the-art performance on two News Image Captioning datasets, exceeding the previous state-of-the-art by 5 CIDEr points. We will release code upon acceptance

    Taking the bite out of automated naming of characters in TV video

    No full text
    We investigate the problem of automatically labelling appearances of characters in TV or film material with their names. This is tremendously challenging due to the huge variation in imaged appearance of each character and the weakness and ambiguity of available annotation. However, we demonstrate that high precision can be achieved by combining multiple sources of information, both visual and textual. The principal novelties that we introduce are: (i) automatic generation of time stamped character annotation by aligning subtitles and transcripts; (ii) strengthening the supervisory information by identifying when characters are speaking. In addition, we incorporate complementary cues of face matching and clothing matching to propose common annotations for face tracks, and consider choices of classifier which can potentially correct errors made in the automatic extraction of training data from the weak textual annotation. Results are presented on episodes of the TV series ā€˜ā€˜Buffy the Vampire Slayerā€

    Captured by the camera's eye: Guantanamo and the shifting frame of the Global War on Terror

    Get PDF
    In January 2002, images of the detention of prisoners held at US Naval Station Guantanamo Bay as part of the Global War on Terrorism were released by the US Department of Defense, a public relations move that Secretary of Defense Donald Rumsfeld later referred to as ā€˜probably unfortunateā€™. These images, widely reproduced in the media, quickly came to symbolise the facility and the practices at work there. Nine years on, the images of orange-clad ā€˜detaineesā€™ ā€“ the ā€˜orange seriesā€™ ā€“ remain a powerful symbol of US military practices and play a significant role in the resistance to the site. However, as the site has evolved, so too has its visual representation. Official images of these new facilities not only document this evolution but work to constitute, through a careful (re)framing (literal and figurative), a new (re)presentation of the site, and therefore the identities of those involved. The new series of images not only (re)inscribes the identities of detainees as dangerous but, more importantly, work to constitute the US State as humane and modern. These images are part of a broader effort by the US administration to resituate its image, and remind us, as IR scholars, to look at the diverse set of practices (beyond simply spoken language) to understand the complexity of international politic

    Regulatory Crisis at Lloyd\u27s of London: Reform from Within

    Get PDF

    Level Playing Field for Million Scale Face Recognition

    Full text link
    Face recognition has the perception of a solved problem, however when tested at the million-scale exhibits dramatic variation in accuracies across the different algorithms. Are the algorithms very different? Is access to good/big training data their secret weapon? Where should face recognition improve? To address those questions, we created a benchmark, MF2, that requires all algorithms to be trained on same data, and tested at the million scale. MF2 is a public large-scale set with 672K identities and 4.7M photos created with the goal to level playing field for large scale face recognition. We contrast our results with findings from the other two large-scale benchmarks MegaFace Challenge and MS-Celebs-1M where groups were allowed to train on any private/public/big/small set. Some key discoveries: 1) algorithms, trained on MF2, were able to achieve state of the art and comparable results to algorithms trained on massive private sets, 2) some outperformed themselves once trained on MF2, 3) invariance to aging suffers from low accuracies as in MegaFace, identifying the need for larger age variations possibly within identities or adjustment of algorithms in future testings

    Information extraction from multimedia web documents: an open-source platform and testbed

    No full text
    The LivingKnowledge project aimed to enhance the current state of the art in search, retrieval and knowledge management on the web by advancing the use of sentiment and opinion analysis within multimedia applications. To achieve this aim, a diverse set of novel and complementary analysis techniques have been integrated into a single, but extensible software platform on which such applications can be built. The platform combines state-of-the-art techniques for extracting facts, opinions and sentiment from multimedia documents, and unlike earlier platforms, it exploits both visual and textual techniques to support multimedia information retrieval. Foreseeing the usefulness of this software in the wider community, the platform has been made generally available as an open-source project. This paper describes the platform design, gives an overview of the analysis algorithms integrated into the system and describes two applications that utilise the system for multimedia information retrieval
    • ā€¦
    corecore