Search CORE

196,024 research outputs found

Autonomous Cleaning of Corrupted Scanned Documents - A Generative Modeling Approach

Author: Dai Zhenwen
Lücke Jörg
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/07/2012
Field of study

We study the task of cleaning scanned text documents that are strongly corrupted by dirt such as manual line strokes, spilled ink etc. We aim at autonomously removing dirt from a single letter-size page based only on the information the page contains. Our approach, therefore, has to learn character representations without supervision and requires a mechanism to distinguish learned representations from irregular patterns. To learn character representations, we use a probabilistic generative model parameterizing pattern features, feature variances, the features' planar arrangements, and pattern frequencies. The latent variables of the model describe pattern class, pattern position, and the presence or absence of individual pattern features. The model parameters are optimized using a novel variational EM approximation. After learning, the parameters represent, independently of their absolute position, planar feature arrangements and their variances. A quality measure defined based on the learned representation then allows for an autonomous discrimination between regular character patterns and the irregular patterns making up the dirt. The irregular patterns can thus be removed to clean the document. For a full Latin alphabet we found that a single page does not contain sufficiently many character examples. However, even if heavily corrupted by dirt, we show that a page containing a lower number of character types can efficiently and autonomously be cleaned solely based on the structural regularity of the characters it contains. In different examples using characters from different alphabets, we demonstrate generality of the approach and discuss its implications for future developments.Comment: oral presentation and Google Student Travel Award; IEEE conference on Computer Vision and Pattern Recognition 201

arXiv.org e-Print Archive

Crossref

Modeling views in the layered view model for XML using UML

Author: Chang E.
Dillon T.S.
Feng L.
Rajugan R.
Publication venue: Troubador Publishing Ltd.
Publication date: 01/01/2006
Field of study

In data engineering, view formalisms are used to provide flexibility to users and user applications by allowing them to extract and elaborate data from the stored data sources. Conversely, since the introduction of Extensible Markup Language (XML), it is fast emerging as the dominant standard for storing, describing, and interchanging data among various web and heterogeneous data sources. In combination with XML Schema, XML provides rich facilities for defining and constraining user-defined data semantics and properties, a feature that is unique to XML. In this context, it is interesting to investigate traditional database features, such as view models and view design techniques for XML. However, traditional view formalisms are strongly coupled to the data language and its syntax, thus it proves to be a difficult task to support views in the case of semi-structured data models. Therefore, in this paper we propose a Layered View Model (LVM) for XML with conceptual and schemata extensions. Here our work is three-fold; first we propose an approach to separate the implementation and conceptual aspects of the views that provides a clear separation of concerns, thus, allowing analysis and design of views to be separated from their implementation. Secondly, we define representations to express and construct these views at the conceptual level. Thirdly, we define a view transformation methodology for XML views in the LVM, which carries out automated transformation to a view schema and a view query expression in an appropriate query language. Also, to validate and apply the LVM concepts, methods and transformations developed, we propose a view-driven application development framework with the flexibility to develop web and database applications for XML, at varying levels of abstraction

CiteSeerX

OPUS - University of Technology Sydney

University of Twente Research Information

espace@Curtin

Comprehensive Review of Opinion Summarization

Author: Ganesan Kavita
Kim Hyun Duk
Sondhi Parikshit
Zhai ChengXiang
Publication venue
Publication date: 01/01/2011
Field of study

The abundance of opinions on the web has kindled the study of opinion summarization over the last few years. People have introduced various techniques and paradigms to solving this special task. This survey attempts to systematically investigate the different techniques and approaches used in opinion summarization. We provide a multi-perspective classification of the approaches used and highlight some of the key weaknesses of these approaches. This survey also covers evaluation techniques and data sets used in studying the opinion summarization problem. Finally, we provide insights into some of the challenges that are left to be addressed as this will help set the trend for future research in this area.unpublishednot peer reviewe

CiteSeerX

Illinois Digital Environment for Access to Learning and Scholarship Repository

Modeling functional requirements using tacit knowledge: a design science research methodology informed approach

Author: Benfell Adrian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2021
Field of study

The research in this paper adds to the discussion linked to the challenge of capturing and modeling tacit knowledge throughout software development projects. The issue emerged when modeling functional requirements during a project for a client. However, using the design science research methodology at a particular point in the project helped to create an artifact, a functional requirements modeling technique, that resolved the issue with tacit knowledge. Accordingly, this paper includes research based upon the stages of the design science research methodology to design and test the artifact in an observable situation, empirically grounding the research undertaken. An integral component of the design science research methodology, the knowledge base, assimilated structuration and semiotic theories so that other researchers can test the validity of the artifact created. First, structuration theory helped to identify how tacit knowledge is communicated and can be understood when modeling functional requirements for new software. Second, structuration theory prescribed the application of semiotics which facilitated the development of the artifact. Additionally, following the stages of the design science research methodology and associated tasks allows the research to be reproduced in other software development contexts. As a positive outcome, using the functional requirements modeling technique created, specifically for obtaining tacit knowledge on the software development project, indicates that using such knowledge increases the likelihood of deploying software successfully

University of East Anglia digital repository