6 research outputs found

    Social media usage in academic research

    Get PDF
    Recently researchers have used “conversation prism” and “social media prisma”, to consolidate social medias with respect to their use. Although both identified 25 types, having average five examples each, they did not identify contribution of each type in academic research. Moreover some of mentioned social services had been suspended or changed. In this paper we attempt to access each social media mentioned in conversation prism in order to first, identify services that are operational to date, services which have suspended and those which have changed during course of time. Second, we compare number of publications associated with each social media, in order to identify which social media has contributed most to academic research. Third, we attempt to find correlation between number of publications and development tools provided by respective social applications. Fourth, social medias are ranked with respect to number of times other social medias share content with respective social application. It was found that out of 168 social applications, 10% changed their service objective while 13% were suspended. Among all social application, AMAZON had highest i.e. 147,000 number of citations on Google scholar whereas 90.7% of total citations were contributed by top 30 social medias. For developers, 22 out of top 30 social medias provided developer options in form of either application programming interface (API) or software development kits (SDK) and Facebook was found to be most cross referred social media based on content sharing. Finally conclusion and future work of study is presented

    Web Page Segmentation for Non Visual Skimming

    Get PDF
    International audienceWeb page segmentation aims to break a page into smaller blocks, in which contents with coherent semantics are kept together. Examples of tasks targeted by such a technique are advertisement detection or main content extraction. In this paper, we study different seg-mentation strategies for the task of non visual skimming. For that purpose, we consider web page segmentation as a clustering problem of visual elements, where (1) all visual elements must be clustered, (2) a fixed number of clusters must be discovered, and (3) the elements of a cluster should be visually connected. Therefore, we study three different algorithms that comply to these constraints: K-means, F-K-means, and Guided Expansion. Evaluation shows that Guided Expansion evidences statistically-relevant results in terms of compactness and separateness, and satisfies more logical constraints when compared to the other strategies

    Page Segmentation in a Web Browser

    Get PDF
    Táto práca sa zaoberá segmentáciou webových stránok vo webovom prehliadači. V rámci práce bola vytvorená implementácia metódy Box Clustering Segmentation (BCS) v jazyku JavaScript s využitím automatizovaného prehliadača. Samotná implementácia pozostáva z dvoch hlavných krokov, ktorými sú extrakcia boxov (listových uzlov DOM) z kontextu prehliadača a ich následné zhlukovanie na základe modelu podobnosti definovanom podľa BCS. Výsledkom práce je funkčná implementácia metódy BCS použiteľná na segmentáciu stránok. Vyhodnotenie funkčnosti a presnosti implementácie prebehlo na základe porovnania s referenčnou implementáciou vytvorenou v jazyku Java.This thesis deals with the web page segmentation in a web browser. The implementation of Box Clustering Segmentation (BCS) method in JavaScript using an automated browser was created. The actual implementation consists of two main steps, which are the box extraction (leaf DOM nodes) from the browser context and their subsequent clustering based on the similarity model defined in BCS. Main result of this thesis is a functional implementation of BCS method usable for web page segmentation. The evaluation of the functionality and accuracy of the implementation is based on a comparison with a reference implementation created in Java.

    Web page segmentation evaluation

    No full text
    International audienceIn this paper, we present a framework for evaluating segmentation algorithms for Web pages. Web page segmentation consists in dividing a Web page into coherent fragments, called blocks. Each block represents one distinct information element in the page. We define an evaluation model that includes different metrics to evaluate the quality of a segmentation obtained with a given algorithm. Those metrics compute the distance between the obtained segmentation and a manually built segmentation that serves as a ground truth. We apply our framework to four state-of-the-art segmentation algorithms (BOM, Block Fusion, VIPS and JVIPS) on several categories (types) of Web pages. Results show that the tested algorithms usually perform rather well for text extraction, but may have serious problems for the extraction of geometry. They also show that the relative quality of a segmentation algorithm depends on the category of the segmented page

    Web page segmentation, evaluation and applications

    No full text
    Les pages web sont devenues plus complexes que jamais, principalement parce qu'elles sont générées par des systèmes de gestion de contenu (CMS). Il est donc difficile de les analyser, c'est-à-dire d'identifier et classifier automatiquement les différents éléments qui les composent. La segmentation de pages web est une des solutions à ce problème. Elle consiste à décomposer une page web en segments, visuellement et sémantiquement cohérents, appelés blocs. La qualité d'une segmentation est mesurée par sa correction et sa généricité, c'est-à-dire sa capacité à traiter des pages web de différents types. Notre recherche se concentre sur l'amélioration de la segmentation et sur une mesure fiable et équitable de la qualité des segmenteurs. Nous proposons un modèle pour la segmentation ainsi que notre segmenteur Block-o-Matic (BoM). Nous définissons un modèle d'évaluation qui prend en compte le contenu ainsi que la géométrie des blocs pour mesurer la correction d'un segmenteur par rapport à une vérité de terrain. Ce modèle est générique, il permet de tester tout algorithme de segmentation et observer ses performances sur différents types de page. Nous l'avons testé sur quatre segmenteurs et quatre types de pages. Les résultats montrent que BOM surpasse ses concurrents en général et que la performance relative d'un segmenteur dépend du type de page. Enfin, nous présentons deux applications développées au dessus de BOM. Pagelyzer compare deux versions de pages web et décide si elles sont similaires ou pas. C'est la principale contribution de notre équipe au projet européen Scape (FP7-IP). Nous avons aussi développé un outil de migration de pages HTML4 vers le nouveau format HTML5.Web pages are becoming more complex than ever, as they are generated by Content Management Systems (CMS). Thus, analyzing them, i.e. automatically identifying and classifying different elements from Web pages, such as main content, menus, among others, becomes difficult. A solution to this issue is provided by Web page segmentation which refers to the process of dividing a Web page into visually and semantically coherent segments called blocks.The quality of a Web page segmenter is measured by its correctness and its genericity, i.e. the variety of Web page types it is able to segment. Our research focuses on enhancing this quality and measuring it in a fair and accurate way. We first propose a conceptual model for segmentation, as well as Block-o-Matic (BoM), our Web page segmenter. We propose an evaluation model that takes the content as well as the geometry of blocks into account in order to measure the correctness of a segmentation algorithm according to a predefined ground truth. The quality of four state of the art algorithms is experimentally tested on four types of pages. Our evaluation framework allows testing any segmenter, i.e. measuring their quality. The results show that BoM presents the best performance among the four segmentation algorithms tested, and also that the performance of segmenters depends on the type of page to segment.We present two applications of BoM. Pagelyzer uses BoM for comparing two Web pages versions and decides if they are similar or not. It is the main contribution of our team to the European project Scape (FP7-IP). We also developed a migration tool of Web pages from HTML4 format to HTML5 format in the context of Web archives
    corecore