Search CORE

509 research outputs found

Closed sequential pattern mining for sitemap generation

Author: Ceci M.
Lanotte P. F.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

A sitemap represents an explicit specification of the design concept and knowledge organization of a website and is therefore considered as the website’s basic ontology. It not only presents the main usage flows for users, but also hierarchically organizes concepts of the website. Typically, sitemaps are defined by webmasters in the very early stages of the website design. However, during their life websites significantly change their structure, their content and their possible navigation paths. Even if this is not the case, webmasters can fail to either define sitemaps that reflect the actual website content or, vice versa, to define the actual organization of pages and links which do not reflect the intended organization of the content coded in the sitemaps. In this paper we propose an approach which automatically generates sitemaps. Contrary to other approaches proposed in the literature, which mainly generate sitemaps from the textual content of the pages, in this work sitemaps are generated by analyzing the Web graph of a website. This allows us to: i) automatically generate a sitemap on the basis of possible navigation paths, ii) compare the generated sitemaps with either the sitemap provided by the Web designer or with the intended sitemap of the website and, consequently, iii) plan possible website re-organization. The solution we propose is based on closed frequent sequence extraction and only concentrates on hyperlinks organized in “Web lists”, which are logical lists embedded in the pages. These “Web lists” are typically used for supporting users in Web site navigation and they include menus, navbars and content tables. Experiments performed on three real datasets show that the extracted sitemaps are much more similar to those defined by website curators than those obtained by competitor algorithms

Archivio istituzionale della ricerca - Università di Bari

Closed sequential pattern mining for sitemap generation

Author: Michelangelo Ceci
Michelangelo Ceci
Pasqua Fabiana Lanotte
Publication venue
Publication date: 27/09/2020
Field of study

AbstractA sitemap represents an explicit specification of the design concept and knowledge organization of a website and is therefore considered as the website's basic ontology. It not only presents the main usage flows for users, but also hierarchically organizes concepts of the website. Typically, sitemaps are defined by webmasters in the very early stages of the website design. However, during their life websites significantly change their structure, their content and their possible navigation paths. Even if this is not the case, webmasters can fail to either define sitemaps that reflect the actual website content or, vice versa, to define the actual organization of pages and links which do not reflect the intended organization of the content coded in the sitemaps. In this paper we propose an approach which automatically generates sitemaps. Contrary to other approaches proposed in the literature, which mainly generate sitemaps from the textual content of the pages, in this work sitemaps are generated by analyzing the Web graph of a website. This allows us to: i) automatically generate a sitemap on the basis of possible navigation paths, ii) compare the generated sitemaps with either the sitemap provided by the Web designer or with the intended sitemap of the website and, consequently, iii) plan possible website re-organization. The solution we propose is based on closed frequent sequence extraction and only concentrates on hyperlinks organized in "Web lists", which are logical lists embedded in the pages. These "Web lists" are typically used for supporting users in Web site navigation and they include menus, navbars and content tables. Experiments performed on three real datasets show that the extracted sitemaps are much more similar to those defined by website curators than those obtained by competitor algorithms

Open Access Repository

Graph Sketches: Sparsification, Spanners, and Subgraphs

Author: Ahn KookJin
Guha Sudipto
Mcgregor Andrew
Publication venue: ScholarlyCommons
Publication date: 01/01/2012
Field of study

When processing massive data sets, a core task is to construct synopses of the data. To be useful, a synopsis data structure should be easy to construct while also yielding good approximations of the relevant properties of the data set. A particularly useful class of synopses are sketches, i.e., those based on linear projections of the data. These are applicable in many models including various parallel, stream, and compressed sensing settings. A rich body of analytic and empirical work exists for sketching numerical data such as the frequencies of a set of entities. Our work investigates graph sketching where the graphs of interest encode the relationships between these entities. The main challenge is to capture this richer structure and build the necessary synopses with only linear measurements. In this paper we consider properties of graphs including the size of the cuts, the distances between nodes, and the prevalence of dense sub-graphs. Our main result is a sketch-based sparsifier construction: we show that O̅(nε-2) random linear projections of a graph on n nodes suffice to (1 + ε) approximate all cut values. Similarly, we show that O(ε-2) linear projections suffice for (additively) approximating the fraction of induced sub-graphs that match a given pattern such as a small clique. Finally, for distance estimation we present sketch-based spanner constructions. In this last result the sketches are adaptive, i.e., the linear projections are performed in a small number of batches where each projection may be chosen dependent on the outcome of earlier sketches. All of the above results immediately give rise to data stream algorithms that also apply to dynamic graph streams where edges are both inserted and deleted. The non-adaptive sketches, such as those for sparsification and subgraphs, give us single-pass algorithms for distributed data streams with insertion and deletions. The adaptive sketches can be used to analyze MapReduce algorithms that use a small number of rounds

CiteSeerX

ScholarlyCommons@Penn

Recommended from our members

Streaming Algorithms Via Reductions

Author: Crouch Michael S
Publication venue: ScholarWorks@UMass Amherst
Publication date: 12/11/2014
Field of study

In the streaming algorithms model of computation we must process data in order and without enough memory to remember the entire input. We study reductions between problems in the streaming model with an eye to using reductions as an algorithm design technique. Our contributions include: * Linear Transformation reductions, which compose with existing linear sketch techniques. We use these for small-space algorithms for numeric measurements of distance-from-periodicity, finding the period of a numeric stream, and detecting cyclic shifts. * The first streaming graph algorithms in the sliding window\u27 model, where we must consider only the most recent L elements for some fixed threshold L. We develop basic algorithms for connectivity and unweighted maximum matching, then develop a variety of other algorithms via reductions to these problems. * A new reduction from maximum weighted matching to maximum unweighted matching. This reduction immediately yields improved approximation guarantees for maximum weighted matching in the semistreaming, sliding window, and MapReduce models, and extends to the more general problem of finding maximum independent sets in p-systems. * Algorithms in a stream-of-samples model which exhibit clear sample vs. space tradeoffs. These algorithms are also inspired by examining reductions. We provide algorithms for calculating F_k frequency moments and graph connectivity

ScholarWorks@UMass Amherst

BlogForever D2.6: Data Extraction Methodology

Author: Banos V.
Davis R.
Gkotsis G.
Pincent E.
Stepanyan K.
Publication venue
Publication date: 25/10/2013
Field of study

This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Applying patterns to hypermedia instructional design (APHID)

Author: Thomson Judith Rosilynne
Publication venue: 'University of Saskatchewan Library'
Publication date
Field of study

This research addresses the issue of automatically generating instructional hypermedia documents (in the form of web sites). Our hypothesis is that, for certain types of hypermedia, an automated approach can produce satisfactory hypermedia applications more efficiently than humans are able to create them. We propose a method (APHID) that guides a hypermedia creator through the design process and partially automates the creation of hypermedia applications. Our method uses concept maps and instructional design patterns, as well as the more common domain and presentation models, to support partial automation for creating instructional hypermedia. Most hypermedia application developers follow basic graphical design principles, but few commonly accepted principles exist for the structuring of hypermedia applications. The design of instructional hypermedia imposes the additional requirement that the designer be expert both in hypermedia design and in instructional design. APHID supports designers through the use of patterns to describe and clarify design concepts for both instructional design and interface design. This thesis describes the design and development of the APHID approach and a prototype software tool that supports the development of instructional hypermedia using the APHID approach. The thesis also presents a study in which web sites created with APHID are compared (by an independent evaluator) to web sites created by instructional technologists. The study shows that good instructional web sites can be generated semi-automatically with less expenditure of time on the part of the instructional designer

eCommons@USASK

University of Saskatchewan Research Archive

Evolving Networks and Social Network Analysis Methods and Techniques

Author: Brazdil Pavel
Cordeiro Mário
Gama João
Sarmento Rui P.
Publication venue: 'IntechOpen'
Publication date: 31/10/2018
Field of study

Evolving networks by definition are networks that change as a function of time. They are a natural extension of network science since almost all real-world networks evolve over time, either by adding or by removing nodes or links over time: elementary actor-level network measures like network centrality change as a function of time, popularity and influence of individuals grow or fade depending on processes, and events occur in networks during time intervals. Other problems such as network-level statistics computation, link prediction, community detection, and visualization gain additional research importance when applied to dynamic online social networks (OSNs). Due to their temporal dimension, rapid growth of users, velocity of changes in networks, and amount of data that these OSNs generate, effective and efficient methods and techniques for small static networks are now required to scale and deal with the temporal dimension in case of streaming settings. This chapter reviews the state of the art in selected aspects of evolving social networks presenting open research challenges related to OSNs. The challenges suggest that significant further research is required in evolving social networks, i.e., existent methods, techniques, and algorithms must be rethought and designed toward incremental and dynamic versions that allow the efficient analysis of evolving networks

IntechOpen

Crossref

Seventh Biennial Report : June 2003 - March 2005

Author
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2005
Field of study

MPG.PuRe