51 research outputs found

    Synthetic Document Generator for Annotation-free Layout Recognition

    Full text link
    Analyzing the layout of a document to identify headers, sections, tables, figures etc. is critical to understanding its content. Deep learning based approaches for detecting the layout structure of document images have been promising. However, these methods require a large number of annotated examples during training, which are both expensive and time consuming to obtain. We describe here a synthetic document generator that automatically produces realistic documents with labels for spatial positions, extents and categories of the layout elements. The proposed generative process treats every physical component of a document as a random variable and models their intrinsic dependencies using a Bayesian Network graph. Our hierarchical formulation using stochastic templates allow parameter sharing between documents for retaining broad themes and yet the distributional characteristics produces visually unique samples, thereby capturing complex and diverse layouts. We empirically illustrate that a deep layout detection model trained purely on the synthetic documents can match the performance of a model that uses real documents

    Utilização da Norma JPEG2000 para codificar proteger e comercializar Produtos de Observação Terrestre

    Get PDF
    Applications like, change detection, global monitoring, disaster detection and management have emerging requirements that need the availability of large amounts of data. This data is currently being capture by a multiplicity of instruments and EO (Earth Observation) sensors originating large volumes of data that needs to be stored, processed and accessed in order to be useful – as an example, ENVISAT accumulates, in a yearly basis, several hundred terabytes of data. This need to recover, store, process and access brings some interesting challenges, like storage space, processing power, bandwidth and security, just to mention a few. These challenges are still very important on today’s technological world. If we take a look for example at the number of subscribers of ISP (Internet Service Providers) broadband services on the developed world today, one can notice that broadband services are still far from being common and dominant. On the underdeveloped countries the picture is even dimmer, not only from a bandwidth point of view but also in all other aspects regarding information and communication technologies (ICTs). All this challenges need to be taken into account if a service is to reach the broadest audience possible. Obviously protection and securing of services and contents is an extra asset that helps on the preservation of possible business values, especially if we consider such a costly business as the space industry. This thesis presents and describes a system which allows, not only the encoding and decoding of several EO products into a JPEG2000 format, but also supports some of the security requirements identified previously that allows ESA (European Space Agency) and related EO services to define and apply efficient EO data access security policies and even to exploit new ways to commerce EO products over the Internet.Aplicações como, detecção de mudanças no terreno, monitorização planetária, detecção e gestão de desastres, têm necessidades prementes que necessitam de vastas quantidades de dados. Estes dados estão presentemente a ser capturados por uma multiplicidade de instrumentos e sensores de observação terrestre, que originam uma enormidade de dados que necessitam de ser armazenados processados e acedidos de forma a se tornarem úteis – por exemplo, a ENVISAT acumula anualmente varias centenas de terabytes de dados. Esta necessidade de recuperar, armazenar, processar e aceder introduz alguns desafios interessantes como o espaço de armazenamento, poder de processamento, largura de banda e segurança dos dados só para mencionar alguns. Estes desafios são muito importantes no mundo tecnológico de hoje. Se olharmos, por exemplo, ao número actual de subscritores de ISP (Internet Service Providers) de banda larga nos países desenvolvidos podemos ficar surpreendidos com o facto do número de subscritores desses serviços ainda não ser uma maioria da população ou dos agregados familiares. Nos países subdesenvolvidos o quadro é ainda mais negro não só do ponto de vista da largura de banda mas também de todos os outros aspectos relacionados com Tecnologias da Informação e Comunicação (TICs). Todos estes aspectos devem ser levados em consideração se se pretende que um serviço se torne o mais abrangente possível em termos de audiências. Obviamente a protecção e segurança dos conteúdos é um factor extra que ajuda a preservar possíveis valores de negócio, especialmente considerando industrias tão onerosas como a Industria Espacial. Esta tese apresenta e descreve um sistema que permite, não só a codificação e descodificação de diversos produtos de observação terrestre para formato JPEG2000 mas também o suporte de alguns requisitos de segurança identificados previamente que permitem, á Agência Espacial Europeia e a outros serviços relacionados com observação terrestre, a aplicação de politicas eficientes de acesso seguro a produtos de observação terrestre, permitindo até o aparecimento de novas forma de comercialização de produtos de observação terrestre através da Internet

    MediaSync: Handbook on Multimedia Synchronization

    Get PDF
    This book provides an approachable overview of the most recent advances in the fascinating field of media synchronization (mediasync), gathering contributions from the most representative and influential experts. Understanding the challenges of this field in the current multi-sensory, multi-device, and multi-protocol world is not an easy task. The book revisits the foundations of mediasync, including theoretical frameworks and models, highlights ongoing research efforts, like hybrid broadband broadcast (HBB) delivery and users' perception modeling (i.e., Quality of Experience or QoE), and paves the way for the future (e.g., towards the deployment of multi-sensory and ultra-realistic experiences). Although many advances around mediasync have been devised and deployed, this area of research is getting renewed attention to overcome remaining challenges in the next-generation (heterogeneous and ubiquitous) media ecosystem. Given the significant advances in this research area, its current relevance and the multiple disciplines it involves, the availability of a reference book on mediasync becomes necessary. This book fills the gap in this context. In particular, it addresses key aspects and reviews the most relevant contributions within the mediasync research space, from different perspectives. Mediasync: Handbook on Multimedia Synchronization is the perfect companion for scholars and practitioners that want to acquire strong knowledge about this research area, and also approach the challenges behind ensuring the best mediated experiences, by providing the adequate synchronization between the media elements that constitute these experiences

    Internet of Things data contextualisation for scalable information processing, security, and privacy

    Get PDF
    The Internet of Things (IoT) interconnects billions of sensors and other devices (i.e., things) via the internet, enabling novel services and products that are becoming increasingly important for industry, government, education and society in general. It is estimated that by 2025, the number of IoT devices will exceed 50 billion, which is seven times the estimated human population at that time. With such a tremendous increase in the number of IoT devices, the data they generate is also increasing exponentially and needs to be analysed and secured more efficiently. This gives rise to what is appearing to be the most significant challenge for the IoT: Novel, scalable solutions are required to analyse and secure the extraordinary amount of data generated by tens of billions of IoT devices. Currently, no solutions exist in the literature that provide scalable and secure IoT scale data processing. In this thesis, a novel scalable approach is proposed for processing and securing IoT scale data, which we refer to as contextualisation. The contextualisation solution aims to exclude irrelevant IoT data from processing and address data analysis and security considerations via the use of contextual information. More specifically, contextualisation can effectively reduce the volume, velocity and variety of data that needs to be processed and secured in IoT applications. This contextualisation-based data reduction can subsequently provide IoT applications with the scalability needed for IoT scale knowledge extraction and information security. IoT scale applications, such as smart parking or smart healthcare systems, can benefit from the proposed method, which  improves the scalability of data processing as well as the security and privacy of data.   The main contributions of this thesis are: 1) An introduction to context and contextualisation for IoT applications; 2) a contextualisation methodology for IoT-based applications that is modelled around observation, orientation, decision and action loops; 3) a collection of contextualisation techniques and a corresponding software platform for IoT data processing (referred to as contextualisation-as-a-service or ConTaaS) that enables highly scalable data analysis, security and privacy solutions; and 4) an evaluation of ConTaaS in several IoT applications to demonstrate that our contextualisation techniques permit data analysis, security and privacy solutions to remain linear, even in situations where the number of IoT data points increases exponentially

    Remote management of applications: deployment of applications and configurations using a rule system

    Get PDF
    Dissertação de mestrado em Engenharia Informática (área de especialização em Sistemas Distribuídos)Users expect access to programs and business information anywhere in the simplest way possible using a device. With the diversification of devices, the standard is disappearing and we are going towards a more heterogeneous world of mobile devices. With this divergence increasing, it gets more difficult to update, support and control applications through all these new platforms. Therefore it is important to facilitate these tasks. The solution to these problems lies on the Mobile Device Management (MDM) programs that can control what devices install and configure, providing remote tasks and access. This dissertation aims not to compete with the current products on the market, but to propose a different way to distribute content to the devices registered on the platform using a Rule system. This system will prioritize the newest rules by the device and its location characteristics. As so, providing a different way of grouping devices and distributing content to them.Os utilizadores esperam acesso aos programas e informações corporativas em qualquer lugar da forma mais simples possível, utilizando um dispositivo. Com a diversificação de dispositivos, o standard está a desaparecer e estamos a ir em direção a um mundo mais heterogéneo de dispositivos móveis. Com esta crescente divergência, torna-se mais difícil de atualizar, dar suporte e controlar aplicações através de todas estas novas plataformas. É então importante que estas tarefas sejam facilitadas. A solução para estes problemas reside nos programas de MDM que podem controlar o que os dispositivos instalam e configuraram, proporcionando acesso e tarefas remotas. Esta dissertação não pretende competir com os produtos existentes no mercado, mas para propor uma forma diferente de distribuir conteúdo para os dispositivos registados na plataforma através de um sistema de Regras. Este sistema vai priorizar as regras mais recentes por dispositivo e as características da sua localização. Proporcionando uma forma diferente de agrupar dispositivos e distribuição de conteúdo para eles

    System Architecture and Intelligent Data Curation of Virtual Museum for Ancient History

    Get PDF
    Preserving the cultural and historical heritage of various world nations, and their thorough presentation is a long-term commitment of scholars and researchers working in many areas. From centuries every generation is aimed at keeping record about its labor, so that it could be revised and studied by the next generations. New information and multimedia technologies have been developed during the past couple of years, which introduced new methods of preservation, maintenance and distribution of the huge amounts of collected material. This article aims to present the virtual museum, an advanced system managing diverse collections of digital objects that are organized in various ways by a complex specialized functionality. The management of digital content requires a well-designed architecture that embeds services for content presentation, management, and administration. All elements of the system architecture are interrelated, thus the accuracy of each element is of great importance. These systems suffer from the lack of tools for intelligent data curation with the capacity to validate data from different sources and to add value to data. This paper proposes a solution for intelligent data curation that can be implemented in a virtual museum in order to provide opportunity to observe the valuable historical specimens in a proper way. The solution is focused on the process of validation and verification to prevent the duplication of records for digital objects, in order to guarantee the integrity of data and more accurate retrieval of knowledge
    corecore