2,821 research outputs found

    On classifying digital accounting documents

    Get PDF
    Advances in computing and multimedia technologies allow many accounting documents to be digitized within little cost for effective storage and access. Moreover, the amount of accounting documents is increasing rapidly, this leads to the need of developing some mechanisms to effectively manage those (semi-structured) digital accounting documents for future accounting information systems (AIS). In general, accounting documents contains such as invoices, purchase orders, checks, photographs, charts, diagrams, etc. As a result, the major functionality of future AIS is to automatically classify digital accounting documents into different categories in an effective manner. The aim of this paper is to examine flat nonhierarchical and hierarchical classification schemes for automatic classification of different types of digital accounting documents. The experimental results show that non-hierarchical classification of digital accounting documents performs better than hierarchically classifying digital accounting documents.Los avances en informática y en las tecnologías multimedia permiten, que muchos documentos de contabilidad sean digitalizados por poco dinero para un almacenamiento y acceso efectivos. Además, la cantidad de documentos de contabilidad está incrementando de forma rápida, lo que lleva a la necesidad por desarrollar algunos mecanismos para dirigir efectivamente aquellos (semi-estructurados) documentos de contabilidad digitales para los futuros sistemas de información de contabilidad (AIS en inglés). En general, dichos documentos contienen por ejemplo, facturas, órdenes de compra, comprobaciones, fotografías, gráficos, diagramas, etc. Como resultado, la mayor funcionalidad de los futuros AIS es para clasificar automáticamente los documentos digitales de contabilidad en diferentes categorías de una forma efectiva. El objetivo de este artículo es el de examinar los esquemas de clasificación jerárquica y no jerárquica sin cambios para la clasificación automática de los diferentes tipos de dichos documentos. Los resultados experimentales demuestran que la clasificación no jerárquica de estos documentos tiene más éxito que la jerárquica

    A Framework for Multimedia Data Hiding (Security)

    Get PDF
    With the proliferation of multimedia data such as images, audio, and video, robust digital watermarking and data hiding techniques are needed for copyright protection, copy control, annotation, and authentication. While many techniques have been proposed for digital color and grayscale images, not all of them can be directly applied to binary document images. The difficulty lies in the fact that changing pixel values in a binary document could introduce Irregularities that is very visually noticeable. We have seen but limited number of papers proposing new techniques and ideas for document image watermarking and data hiding. In this paper, we present an overview and summary of recent developments on this important topic, and discuss important issues such as robustness and data hiding capacity of the different techniques

    Image Annotation and Topic Extraction Using Super-Word Latent Dirichlet

    Get PDF
    This research presents a multi-domain solution that uses text and images to iteratively improve automated information extraction. Stage I uses local text surrounding an embedded image to provide clues that help rank-order possible image annotations. These annotations are forwarded to Stage II, where the image annotations from Stage I are used as highly-relevant super-words to improve extraction of topics. The model probabilities from the super-words in Stage II are forwarded to Stage III where they are used to refine the automated image annotation developed in Stage I. All stages demonstrate improvement over existing equivalent algorithms in the literature

    Learning and mining from personal digital archives

    Get PDF
    Given the explosion of new sensing technologies, data storage has become significantly cheaper and consequently, people increasingly rely on wearable devices to create personal digital archives. Lifelogging is the act of recording aspects of life in digital format for a variety of purposes such as aiding human memory, analysing human lifestyle and diet monitoring. In this dissertation we are concerned with Visual Lifelogging, a form of lifelogging based on the passive capture of photographs by a wearable camera. Cameras, such as Microsoft's SenseCam can record up to 4,000 images per day as well as logging data from several incorporated sensors. Considering the volume, complexity and heterogeneous nature of such data collections, it is a signifcant challenge to interpret and extract knowledge for the practical use of lifeloggers and others. In this dissertation, time series analysis methods have been used to identify and extract useful information from temporal lifelogging images data, without benefit of prior knowledge. We focus, in particular, on three fundamental topics: noise reduction, structure and characterization of the raw data; the detection of multi-scale patterns; and the mining of important, previously unknown repeated patterns in the time series of lifelog image data. Firstly, we show that Detrended Fluctuation Analysis (DFA) highlights the feature of very high correlation in lifelogging image collections. Secondly, we show that study of equal-time Cross-Correlation Matrix demonstrates atypical or non-stationary characteristics in these images. Next, noise reduction in the Cross-Correlation Matrix is addressed by Random Matrix Theory (RMT) before Wavelet multiscaling is used to characterize the `most important' or `unusual' events through analysis of the associated dynamics of the eigenspectrum. A motif discovery technique is explored for detection of recurring and recognizable episodes of an individual's image data. Finally, we apply these motif discovery techniques to two known lifelog data collections, All I Have Seen (AIHS) and NTCIR-12 Lifelog, in order to examine multivariate recurrent patterns of multiple-lifelogging users
    corecore