2,106 research outputs found

    Ubic: Bridging the gap between digital cryptography and the physical world

    Full text link
    Advances in computing technology increasingly blur the boundary between the digital domain and the physical world. Although the research community has developed a large number of cryptographic primitives and has demonstrated their usability in all-digital communication, many of them have not yet made their way into the real world due to usability aspects. We aim to make another step towards a tighter integration of digital cryptography into real world interactions. We describe Ubic, a framework that allows users to bridge the gap between digital cryptography and the physical world. Ubic relies on head-mounted displays, like Google Glass, resource-friendly computer vision techniques as well as mathematically sound cryptographic primitives to provide users with better security and privacy guarantees. The framework covers key cryptographic primitives, such as secure identification, document verification using a novel secure physical document format, as well as content hiding. To make a contribution of practical value, we focused on making Ubic as simple, easily deployable, and user friendly as possible.Comment: In ESORICS 2014, volume 8712 of Lecture Notes in Computer Science, pp. 56-75, Wroclaw, Poland, September 7-11, 2014. Springer, Berlin, German

    Analysis and Modular Approach for Text Extraction from Scientific Figures on Limited Data

    Get PDF
    Scientific figures are widely used as compact, comprehensible representations of important information. The re-usability of these figures is however limited, as one can rarely search directly for them, since they are mostly indexing by their surrounding text (e. g., publication or website) which often does not contain the full-message of the figure. In this thesis, the focus is on making the content of scientific figures accessible by extracting the text from these figures. A modular pipeline for unsupervised text extraction from scientific figures, based on a thorough analysis of the literature, was built to address the problem. This modular pipeline was used to build several unsupervised approaches, to evaluate different methods from the literature and new methods and method combinations. Some supervised approaches were built as well for comparison. One challenge, while evaluating the approaches, was the lack of annotated data, which especially needed to be considered when building the supervised approach. Three existing datasets were used for evaluation as well as two datasets of 241 scientific figures which were manually created and annotated. Additionally, two existing datasets for text extraction from other types of images were used for pretraining the supervised approach. Several experiments showed the superiority of the unsupervised pipeline over common Optical Character Recognition engines and identified the best unsupervised approach. This unsupervised approach was compared with the best supervised approach, which, despite of the limited amount of training data available, clearly outperformed the unsupervised approach.Infografiken sind ein viel verwendetes Medium zur kompakten Darstellung von Kernaussagen. Die Nachnutzbarkeit dieser Abbildungen ist jedoch häufig limitiert, da sie schlecht auffindbar sind, da sie meist über die umschließenden Medien, wie beispielsweise Publikationen oder Webseiten, und nicht über ihren Inhalt indexiert sind. Der Fokus dieser Arbeit liegt auf der Extraktion der textuellen Inhalte aus Infografiken, um deren Inhalt zu erschließen. Ausgehend von einer umfangreichen Analyse verwandter Arbeiten, wurde ein generalisierender, modularer Ansatz für die unüberwachte Textextraktion aus wissenschaftlichen Abbildungen entwickelt. Mit diesem modularen Ansatz wurden mehrere unüberwachte Ansätze und daneben auch noch einige überwachte Ansätze umgesetzt, um diverse Methoden aus der Literatur sowie neue und bisher noch nicht genutzte Methoden zu vergleichen. Eine Herausforderung bei der Evaluation war die geringe Menge an annotierten Abbildungen, was insbesondere beim überwachten Ansatz Methoden berücksichtigt werden musste. Für die Evaluation wurden drei existierende Datensätze verwendet und zudem wurden zusätzlich zwei Datensätze mit insgesamt 241 Infografiken erstellt und mit den nötigen Informationen annotiert, sodass insgesamt 5 Datensätze für die Evaluation verwendet werden konnten. Für das Pre-Training des überwachten Ansatzes wurden zudem zwei Datensätze aus verwandten Textextraktionsbereichen verwendet. In verschiedenen Experimenten wird gezeigt, dass der unüberwachte Ansatz besser funktioniert als klassische Texterkennungsverfahren und es wird aus den verschiedenen unüberwachten Ansätzen der beste ermittelt. Dieser unüberwachte Ansatz wird mit dem überwachten Ansatz verglichen, der trotz begrenzter Trainingsdaten die besten Ergebnisse liefert

    Cooperative Text and Line Art Extraction from a Topographic Map

    Get PDF
    The black layer is digitized from a USGS topographic map digitized at 1000 dpi. The connected components of this layer are analyzed and separated into line art, text, and icons in two passes. The paired street casings are converted to polylines by vectorization and associated with street labels from the character recognition phase. The accuracy of character recognition is shown to improve by taking account of the frequently occurring overlap of line art with street labels. The experiments show that complete vectorization of the black line-layer bitmap is the major remaining problem
    corecore