11 research outputs found

    Text–to–Video: Image Semantics and NLP

    Get PDF
    When aiming at automatically translating an arbitrary text into a visual story, the main challenge consists in finding a semantically close visual representation whereby the displayed meaning should remain the same as in the given text. Besides, the appearance of an image itself largely influences how its meaningful information is transported towards an observer. This thesis now demonstrates that investigating in both, image semantics as well as the semantic relatedness between visual and textual sources enables us to tackle the challenging semantic gap and to find a semantically close translation from natural language to a corresponding visual representation. Within the last years, social networking became of high interest leading to an enormous and still increasing amount of online available data. Photo sharing sites like Flickr allow users to associate textual information with their uploaded imagery. Thus, this thesis exploits this huge knowledge source of user generated data providing initial links between images and words, and other meaningful data. In order to approach visual semantics, this work presents various methods to analyze the visual structure as well as the appearance of images in terms of meaningful similarities, aesthetic appeal, and emotional effect towards an observer. In detail, our GPU-based approach efficiently finds visual similarities between images in large datasets across visual domains and identifies various meanings for ambiguous words exploring similarity in online search results. Further, we investigate in the highly subjective aesthetic appeal of images and make use of deep learning to directly learn aesthetic rankings from a broad diversity of user reactions in social online behavior. To gain even deeper insights into the influence of visual appearance towards an observer, we explore how simple image processing is capable of actually changing the emotional perception and derive a simple but effective image filter. To identify meaningful connections between written text and visual representations, we employ methods from Natural Language Processing (NLP). Extensive textual processing allows us to create semantically relevant illustrations for simple text elements as well as complete storylines. More precisely, we present an approach that resolves dependencies in textual descriptions to arrange 3D models correctly. Further, we develop a method that finds semantically relevant illustrations to texts of different types based on a novel hierarchical querying algorithm. Finally, we present an optimization based framework that is capable of not only generating semantically relevant but also visually coherent picture stories in different styles.Bei der automatischen Umwandlung eines beliebigen Textes in eine visuelle Geschichte, besteht die größte Herausforderung darin eine semantisch passende visuelle Darstellung zu finden. Dabei sollte die Bedeutung der Darstellung dem vorgegebenen Text entsprechen. Darüber hinaus hat die Erscheinung eines Bildes einen großen Einfluß darauf, wie seine bedeutungsvollen Inhalte auf einen Betrachter übertragen werden. Diese Dissertation zeigt, dass die Erforschung sowohl der Bildsemantik als auch der semantischen Verbindung zwischen visuellen und textuellen Quellen es ermöglicht, die anspruchsvolle semantische Lücke zu schließen und eine semantisch nahe Übersetzung von natürlicher Sprache in eine entsprechend sinngemäße visuelle Darstellung zu finden. Des Weiteren gewann die soziale Vernetzung in den letzten Jahren zunehmend an Bedeutung, was zu einer enormen und immer noch wachsenden Menge an online verfügbaren Daten geführt hat. Foto-Sharing-Websites wie Flickr ermöglichen es Benutzern, Textinformationen mit ihren hochgeladenen Bildern zu verknüpfen. Die vorliegende Arbeit nutzt die enorme Wissensquelle von benutzergenerierten Daten welche erste Verbindungen zwischen Bildern und Wörtern sowie anderen aussagekräftigen Daten zur Verfügung stellt. Zur Erforschung der visuellen Semantik stellt diese Arbeit unterschiedliche Methoden vor, um die visuelle Struktur sowie die Wirkung von Bildern in Bezug auf bedeutungsvolle Ähnlichkeiten, ästhetische Erscheinung und emotionalem Einfluss auf einen Beobachter zu analysieren. Genauer gesagt, findet unser GPU-basierter Ansatz effizient visuelle Ähnlichkeiten zwischen Bildern in großen Datenmengen quer über visuelle Domänen hinweg und identifiziert verschiedene Bedeutungen für mehrdeutige Wörter durch die Erforschung von Ähnlichkeiten in Online-Suchergebnissen. Des Weiteren wird die höchst subjektive ästhetische Anziehungskraft von Bildern untersucht und "deep learning" genutzt, um direkt ästhetische Einordnungen aus einer breiten Vielfalt von Benutzerreaktionen im sozialen Online-Verhalten zu lernen. Um noch tiefere Erkenntnisse über den Einfluss des visuellen Erscheinungsbildes auf einen Betrachter zu gewinnen, wird erforscht, wie alleinig einfache Bildverarbeitung in der Lage ist, tatsächlich die emotionale Wahrnehmung zu verändern und ein einfacher aber wirkungsvoller Bildfilter davon abgeleitet werden kann. Um bedeutungserhaltende Verbindungen zwischen geschriebenem Text und visueller Darstellung zu ermitteln, werden Methoden des "Natural Language Processing (NLP)" verwendet, die der Verarbeitung natürlicher Sprache dienen. Der Einsatz umfangreicher Textverarbeitung ermöglicht es, semantisch relevante Illustrationen für einfache Textteile sowie für komplette Handlungsstränge zu erzeugen. Im Detail wird ein Ansatz vorgestellt, der Abhängigkeiten in Textbeschreibungen auflöst, um 3D-Modelle korrekt anzuordnen. Des Weiteren wird eine Methode entwickelt die, basierend auf einem neuen hierarchischen Such-Anfrage Algorithmus, semantisch relevante Illustrationen zu Texten verschiedener Art findet. Schließlich wird ein optimierungsbasiertes Framework vorgestellt, das nicht nur semantisch relevante, sondern auch visuell kohärente Bildgeschichten in verschiedenen Bildstilen erzeugen kann

    Advances in Character Recognition

    Get PDF
    This book presents advances in character recognition, and it consists of 12 chapters that cover wide range of topics on different aspects of character recognition. Hopefully, this book will serve as a reference source for academic research, for professionals working in the character recognition field and for all interested in the subject

    Offline printed Arabic character recognition

    Get PDF
    Optical Character Recognition (OCR) shows great potential for rapid data entry, but has limited success when applied to the Arabic language. Normal OCR problems are compounded by the right-to-left nature of Arabic and because the script is largely connected. This research investigates current approaches to the Arabic character recognition problem and innovates a new approach. The main work involves a Haar-Cascade Classifier (HCC) approach modified for the first time for Arabic character recognition. This technique eliminates the problematic steps in the pre-processing and recognition phases in additional to the character segmentation stage. A classifier was produced for each of the 61 Arabic glyphs that exist after the removal of diacritical marks. These 61 classifiers were trained and tested on an average of about 2,000 images each. A Multi-Modal Arabic Corpus (MMAC) has also been developed to support this work. MMAC makes innovative use of the new concept of connected segments of Arabic words (PAWs) with and without diacritics marks. These new tokens have significance for linguistic as well as OCR research and applications and have been applied here in the post-processing phase. A complete Arabic OCR application has been developed to manipulate the scanned images and extract a list of detected words. It consists of the HCC to extract glyphs, systems for parsing and correcting these glyphs and the MMAC to apply linguistic constrains. The HCC produces a recognition rate for Arabic glyphs of 87%. MMAC is based on 6 million words, is published on the web and has been applied and validated both in research and commercial use

    Multimodal Interactive Transcription of Handwritten Text Images

    Full text link
    En esta tesis se presenta un nuevo marco interactivo y multimodal para la transcripción de Documentos manuscritos. Esta aproximación, lejos de proporcionar la transcripción completa pretende asistir al experto en la dura tarea de transcribir. Hasta la fecha, los sistemas de reconocimiento de texto manuscrito disponibles no proporcionan transcripciones aceptables por los usuarios y, generalmente, se requiere la intervención del humano para corregir las transcripciones obtenidas. Estos sistemas han demostrado ser realmente útiles en aplicaciones restringidas y con vocabularios limitados (como es el caso del reconocimiento de direcciones postales o de cantidades numéricas en cheques bancarios), consiguiendo en este tipo de tareas resultados aceptables. Sin embargo, cuando se trabaja con documentos manuscritos sin ningún tipo de restricción (como documentos manuscritos antiguos o texto espontáneo), la tecnología actual solo consigue resultados inaceptables. El escenario interactivo estudiado en esta tesis permite una solución más efectiva. En este escenario, el sistema de reconocimiento y el usuario cooperan para generar la transcripción final de la imagen de texto. El sistema utiliza la imagen de texto y una parte de la transcripción previamente validada (prefijo) para proponer una posible continuación. Despues, el usuario encuentra y corrige el siguente error producido por el sistema, generando así un nuevo prefijo mas largo. Este nuevo prefijo, es utilizado por el sistema para sugerir una nueva hipótesis. La tecnología utilizada se basa en modelos ocultos de Markov y n-gramas. Estos modelos son utilizados aquí de la misma manera que en el reconocimiento automático del habla. Algunas modificaciones en la definición convencional de los n-gramas han sido necesarias para tener en cuenta la retroalimentación del usuario en este sistema.Romero Gómez, V. (2010). Multimodal Interactive Transcription of Handwritten Text Images [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8541Palanci

    B!SON: A Tool for Open Access Journal Recommendation

    Get PDF
    Finding a suitable open access journal to publish scientific work is a complex task: Researchers have to navigate a constantly growing number of journals, institutional agreements with publishers, funders’ conditions and the risk of Predatory Publishers. To help with these challenges, we introduce a web-based journal recommendation system called B!SON. It is developed based on a systematic requirements analysis, built on open data, gives publisher-independent recommendations and works across domains. It suggests open access journals based on title, abstract and references provided by the user. The recommendation quality has been evaluated using a large test set of 10,000 articles. Development by two German scientific libraries ensures the longevity of the project

    Reports to the President

    Get PDF
    A compilation of annual reports for the 1990-1991 academic year, including a report from the President of the Massachusetts Institute of Technology, as well as reports from the academic and administrative units of the Institute. The reports outline the year's goals, accomplishments, honors and awards, and future plans

    Abstracts on Radio Direction Finding (1899 - 1995)

    Get PDF
    The files on this record represent the various databases that originally composed the CD-ROM issue of "Abstracts on Radio Direction Finding" database, which is now part of the Dudley Knox Library's Abstracts and Selected Full Text Documents on Radio Direction Finding (1899 - 1995) Collection. (See Calhoun record https://calhoun.nps.edu/handle/10945/57364 for further information on this collection and the bibliography). Due to issues of technological obsolescence preventing current and future audiences from accessing the bibliography, DKL exported and converted into the three files on this record the various databases contained in the CD-ROM. The contents of these files are: 1) RDFA_CompleteBibliography_xls.zip [RDFA_CompleteBibliography.xls: Metadata for the complete bibliography, in Excel 97-2003 Workbook format; RDFA_Glossary.xls: Glossary of terms, in Excel 97-2003 Workbookformat; RDFA_Biographies.xls: Biographies of leading figures, in Excel 97-2003 Workbook format]; 2) RDFA_CompleteBibliography_csv.zip [RDFA_CompleteBibliography.TXT: Metadata for the complete bibliography, in CSV format; RDFA_Glossary.TXT: Glossary of terms, in CSV format; RDFA_Biographies.TXT: Biographies of leading figures, in CSV format]; 3) RDFA_CompleteBibliography.pdf: A human readable display of the bibliographic data, as a means of double-checking any possible deviations due to conversion

    Development of a sophisticated tool for siting small-scale, embedded wind projects using a geographical information system

    Get PDF
    The aim of the research is to produce a methodology for the siting of small-scale, embedded wind generators, and implement this within a commercial software package designed to use existing digital data sets. There is a widespread opportunity to exploit smaller size developments, but potentially large numbers of suitable sites means that an automated screening process is essential. Much of the information required for such a siting study is spatial in nature and hence the site identification process can be facilitated using a geographical information system - GIS. The literature has revealed a number of GIS-based assessments, but these have concentrated on large wind farms, and have been undertaken at relatively coarse resolution. In contrast, this research has produced a much more sophisticated tool, allowing analysis at much finer resolution and encompassing a wider range of relevant factors. An attractive site for a wind turbine development requires more than just a suitable wind resource; factors such as environmental acceptability, public safety, physical constraints such as land use and impact on the electricity supply system will all determine the potential of a site. Constraints and parameters have been derived describing these factors and from these algorithms and inference rules have been developed. These have been coded up for use with a proprietary GIS package, producing a tool that can be widely applied. In particular, it has been demonstrated for a test region in Shropshire, UK. A particular emphasis of this study is the consideration of the impact on the electricity network. Relatively, few small installations have been connected to the national electricity grid in the UK; there is a range of reasons for this, a lack of suitable siting tool being one. Connection to the 11 kV network has been assumed given its relevance to smaller scale installations. This can result in a lower grid connection cost than for typical large-scale wind farm arrangements, for which connection usually represents a major element in the overall project costs. Often these low voltage lines are weak (i.e. susceptible to voltage variation), especially in remote rural areas. An appraisal of the impact of such embedded generators is important and is an intrinsic part of the methodology presented and implemented here.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore