1,036 research outputs found

    Know2Look: Commonsense Knowledge for Visual Search

    No full text
    With the rise in popularity of social media, images accompanied by contextual text form a huge section of the web. However, search and retrieval of documents are still largely dependent on solely textual cues. Although visual cues have started to gain focus, the imperfection in object/scene detection do not lead to significantly improved results. We hypothesize that the use of background commonsense knowledge on query terms can significantly aid in retrieval of documents with associated images. To this end we deploy three different modalities - text, visual cues, and commonsense knowledge pertaining to the query - as a recipe for efficient search and retrieval

    Text-image synergy for multimodal retrieval and annotation

    Get PDF
    Text and images are the two most common data modalities found on the Internet. Understanding the synergy between text and images, that is, seamlessly analyzing information from these modalities may be trivial for humans, but is challenging for software systems. In this dissertation we study problems where deciphering text-image synergy is crucial for finding solutions. We propose methods and ideas that establish semantic connections between text and images in multimodal contents, and empirically show their effectiveness in four interconnected problems: Image Retrieval, Image Tag Refinement, Image-Text Alignment, and Image Captioning. Our promising results and observations open up interesting scopes for future research involving text-image data understanding.Text and images are the two most common data modalities found on the Internet. Understanding the synergy between text and images, that is, seamlessly analyzing information from these modalities may be trivial for humans, but is challenging for software systems. In this dissertation we study problems where deciphering text-image synergy is crucial for finding solutions. We propose methods and ideas that establish semantic connections between text and images in multimodal contents, and empirically show their effectiveness in four interconnected problems: Image Retrieval, Image Tag Refinement, Image-Text Alignment, and Image Captioning. Our promising results and observations open up interesting scopes for future research involving text-image data understanding.Text und Bild sind die beiden häufigsten Arten von Inhalten im Internet. Während es für Menschen einfach ist, gerade aus dem Zusammenspiel von Text- und Bildinhalten Informationen zu erfassen, stellt diese kombinierte Darstellung von Inhalten Softwaresysteme vor große Herausforderungen. In dieser Dissertation werden Probleme studiert, für deren Lösung das Verständnis des Zusammenspiels von Text- und Bildinhalten wesentlich ist. Es werden Methoden und Vorschläge präsentiert und empirisch bewertet, die semantische Verbindungen zwischen Text und Bild in multimodalen Daten herstellen. Wir stellen in dieser Dissertation vier miteinander verbundene Text- und Bildprobleme vor: • Bildersuche. Ob Bilder anhand von textbasierten Suchanfragen gefunden werden, hängt stark davon ab, ob der Text in der Nähe des Bildes mit dem der Anfrage übereinstimmt. Bilder ohne textuellen Kontext, oder sogar mit thematisch passendem Kontext, aber ohne direkte Übereinstimmungen der vorhandenen Schlagworte zur Suchanfrage, können häufig nicht gefunden werden. Zur Abhilfe schlagen wir vor, drei Arten von Informationen in Kombination zu nutzen: visuelle Informationen (in Form von automatisch generierten Bildbeschreibungen), textuelle Informationen (Stichworte aus vorangegangenen Suchanfragen), und Alltagswissen. • Verbesserte Bildbeschreibungen. Bei der Objekterkennung durch Computer Vision kommt es des Öfteren zu Fehldetektionen und Inkohärenzen. Die korrekte Identifikation von Bildinhalten ist jedoch eine wichtige Voraussetzung für die Suche nach Bildern mittels textueller Suchanfragen. Um die Fehleranfälligkeit bei der Objekterkennung zu minimieren, schlagen wir vor Alltagswissen einzubeziehen. Durch zusätzliche Bild-Annotationen, welche sich durch den gesunden Menschenverstand als thematisch passend erweisen, können viele fehlerhafte und zusammenhanglose Erkennungen vermieden werden. • Bild-Text Platzierung. Auf Internetseiten mit Text- und Bildinhalten (wie Nachrichtenseiten, Blogbeiträge, Artikel in sozialen Medien) werden Bilder in der Regel an semantisch sinnvollen Positionen im Textfluss platziert. Wir nutzen dies um ein Framework vorzuschlagen, in dem relevante Bilder ausgesucht werden und mit den passenden Abschnitten eines Textes assoziiert werden. • Bildunterschriften. Bilder, die als Teil von multimodalen Inhalten zur Verbesserung der Lesbarkeit von Texten dienen, haben typischerweise Bildunterschriften, die zum Kontext des umgebenden Texts passen. Wir schlagen vor, den Kontext beim automatischen Generieren von Bildunterschriften ebenfalls einzubeziehen. Üblicherweise werden hierfür die Bilder allein analysiert. Wir stellen die kontextbezogene Bildunterschriftengenerierung vor. Unsere vielversprechenden Beobachtungen und Ergebnisse eröffnen interessante Möglichkeiten für weitergehende Forschung zur computergestützten Erfassung des Zusammenspiels von Text- und Bildinhalten

    Contextual Media Retrieval Using Natural Language Queries

    Full text link
    The widespread integration of cameras in hand-held and head-worn devices as well as the ability to share content online enables a large and diverse visual capture of the world that millions of users build up collectively every day. We envision these images as well as associated meta information, such as GPS coordinates and timestamps, to form a collective visual memory that can be queried while automatically taking the ever-changing context of mobile users into account. As a first step towards this vision, in this work we present Xplore-M-Ego: a novel media retrieval system that allows users to query a dynamic database of images and videos using spatio-temporal natural language queries. We evaluate our system using a new dataset of real user queries as well as through a usability study. One key finding is that there is a considerable amount of inter-user variability, for example in the resolution of spatial relations in natural language utterances. We show that our retrieval system can cope with this variability using personalisation through an online learning-based retrieval formulation.Comment: 8 pages, 9 figures, 1 tabl

    Design aspects of dual gate GaAs nanowire FET for room temperature charge qubit operation: A study on diameter and gate engineering

    Full text link
    The current work explores a geometrically engineered dual gate GaAs nanowire FET with state of the art miniaturized dimensions for high performance charge qubit operation at room temperature. Relevant gate voltages in such device can create two voltage tunable quantum dots (VTQDs) underneath the gates, as well as can manipulate their eigenstate detuning and the inter-dot coupling to generate superposition, whereas a small drain bias may cause its collapse leading to qubit read out. Such qubit operations, i.e., Initialization, Manipulation, and Measurement, are theoretically modeled in the present work by developing a second quantization filed operator based Schrodinger-Poisson self-consistent framework coupled to non-equilibrium Greens function formalism. The study shows that the Bloch sphere coverage can be discretized along polar and azimuthal directions by reducing the nanowire diameter and increasing the inter-dot separation respectively, that can be utilized for selective information encoding. The theoretically obtained stability diagrams suggest that downscaled nanowire diameter and increased gate separation sharpen the bonding and anti-bonding states with reduced anticrossing leading to a gradual transformation of the hyperbolic current mapping into a pair of straight lines. However, the dephasing time in the proposed GaAs VTQD-based qubit may be significantly improved by scaling down both the nanowire diameter and gate separation. Therefore, the present study suggests an optimization window for geometrical engineering of a dual gate nanowire FET qubit to achieve superior qubit performance. Most importantly, such device is compatible with the mainstream CMOS technology and can be utilized for large scale implementation by little modification of the state of the art fabrication processes

    Story-oriented Image Selection and Placement

    No full text
    Multimodal contents have become commonplace on the Internet today, manifested as news articles, social media posts, and personal or business blog posts. Among the various kinds of media (images, videos, graphics, icons, audio) used in such multimodal stories, images are the most popular. The selection of images from a collection - either author's personal photo album, or web repositories - and their meticulous placement within a text, builds a succinct multimodal commentary for digital consumption. In this paper we present a system that automates the process of selecting relevant images for a story and placing them at contextual paragraphs within the story for a multimodal narration. We leverage automatic object recognition, user-provided tags, and commonsense knowledge, and use an unsupervised combinatorial optimization to solve the selection and placement problems seamlessly as a single unit

    {SANDI}: {S}tory-and-Images Alignment

    Get PDF

    Degradation of Phenolic Compounds Through UV and Visible- Light-Driven Photocatalysis: Technical and Economic Aspects

    Get PDF
    Phenolic compounds are found in surface and groundwater as well as wastewater from several industries. It is necessary to eliminate phenols and phenolic compounds from contaminated water before releasing into water bodies due to their toxicity to human beings. Photocatalytic degradation seems to be a promising technology for the degradation of several phenolic compounds. Complete mineralization of phenol and phenolic compound has been achieved with TiO2-based photocatalysts under both UV and visible-light irradiation. This chapter will evaluate the conventional processes and advanced oxidation processes for the degradation of phenol and phenolic compounds. The process economics and efficiencies of different advanced oxidation processes will also be discussed. The main focus of the chapter is photocatalytic degradation processes under UV and visible light along with a detailed review of several factors affecting degradation of phenol and phenolic compounds. Photocatalytic degradation process is governed by reactions with hydroxyl radical or superoxide ion. The extent of degradation depends on light sources (UV, visible, and solar), the type of photocatalyst, and experimental conditions (pH, photocatalyst dosage, initial concentration of phenolic compounds, light intensity, electron donor concentration, etc.).Visible-light-active photocatalysts are applied by several researchers to exploit sunlight and to make the photocatalysis process sustainable. In the future, using sunlight in place of UV could make photocatalysis economically more efficient
    • …
    corecore