1,019 research outputs found

    ICDAR2003 Page Segmentation Competition

    No full text
    There is a significant need to objectively evaluate layout analysis (page segmentation and region classification) methods. This paper describes the Page Segmentation Competition (modus operandi, dataset and evaluation criteria) held in the context of ICDAR2003 and presents the results of the evaluation of the candidate methods. The main objective of the competition was to evaluate such methods using scanned documents from commonly-occurring publications. The results indicate that although methods seem to be maturing, there is still a considerable need to develop robust methods that deal with everyday documents

    Ground Truth for Layout Analysis Performance Evaluation

    No full text
    Over the past two decades a significant number of layout analysis (page segmentation and region classification) approaches have been proposed in the literature. Each approach has been devised for and/or evaluated using (usually small) application-specific datasets. While the need for objective performance evaluation of layout analysis algorithms is evident, there does not exist a suitable dataset with ground truth that reflects the realities of everyday documents (widely varying layouts, complex entities, colour, noise etc.). The most significant impediment is the creation of accurate and flexible (in representation) ground truth, a task that is costly and must be carefully designed. This paper discusses the issues related to the design, representation and creation of ground truth in the context of a realistic dataset developed by the authors. The effectiveness of the ground truth discussed in this paper has been successfully shown in its use for two international page segmentation competitions (ICDAR2003 and ICDAR2005)

    A Model for Personalized Keyword Extraction from Web Pages using Segmentation

    Full text link
    The World Wide Web caters to the needs of billions of users in heterogeneous groups. Each user accessing the World Wide Web might have his / her own specific interest and would expect the web to respond to the specific requirements. The process of making the web to react in a customized manner is achieved through personalization. This paper proposes a novel model for extracting keywords from a web page with personalization being incorporated into it. The keyword extraction problem is approached with the help of web page segmentation which facilitates in making the problem simpler and solving it effectively. The proposed model is implemented as a prototype and the experiments conducted on it empirically validate the model's efficiency.Comment: 6 Pages, 2 Figure

    Web Page Segmentation for Non Visual Skimming

    Get PDF
    International audienceWeb page segmentation aims to break a page into smaller blocks, in which contents with coherent semantics are kept together. Examples of tasks targeted by such a technique are advertisement detection or main content extraction. In this paper, we study different seg-mentation strategies for the task of non visual skimming. For that purpose, we consider web page segmentation as a clustering problem of visual elements, where (1) all visual elements must be clustered, (2) a fixed number of clusters must be discovered, and (3) the elements of a cluster should be visually connected. Therefore, we study three different algorithms that comply to these constraints: K-means, F-K-means, and Guided Expansion. Evaluation shows that Guided Expansion evidences statistically-relevant results in terms of compactness and separateness, and satisfies more logical constraints when compared to the other strategies

    Morpes: A Model for Personalized Rendering of Web Content on Mobile Devices

    Full text link
    With the tremendous growth in the information communication sector, the mobile phones have become the prime information communication devices. The convergence of traditional telephony with the modern web enabled communication in the mobile devices has made the communication much effective and simpler. As mobile phones are becoming the crucial source of accessing the contents of the World Wide Web which was originally designed for personal computers, has opened up a new challenge of accommodating the web contents in to the smaller mobile devices. This paper proposes an approach towards building a model for rendering the web pages in mobile devices. The proposed model is based on a multi-dimensional web page segment evaluation model. The incorporation of personalization in the proposed model makes the rendering user-centric. The proposed model is validated with a prototype implementation.Comment: 10 Pages, 2 Figure
    • …
    corecore