Search CORE

6,948 research outputs found

Recovering Grammar Relationships for the Java Language Specification

Author: A. Dubey
C. A. R. Hoare
D. A. Thomas
D. Barnard
E. Bouwers
H. H. Do
M. Di Penta
R. Lämmel
Ralf Lämmel
T. Dean
Vadim Zaytsev
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/08/2010
Field of study

Grammar convergence is a method that helps discovering relationships between different grammars of the same language or different language versions. The key element of the method is the operational, transformation-based representation of those relationships. Given input grammars for convergence, they are transformed until they are structurally equal. The transformations are composed from primitive operators; properties of these operators and the composed chains provide quantitative and qualitative insight into the relationships between the grammars at hand. We describe a refined method for grammar convergence, and we use it in a major study, where we recover the relationships between all the grammars that occur in the different versions of the Java Language Specification (JLS). The relationships are represented as grammar transformation chains that capture all accidental or intended differences between the JLS grammars. This method is mechanized and driven by nominal and structural differences between pairs of grammars that are subject to asymmetric, binary convergence steps. We present the underlying operator suite for grammar transformation in detail, and we illustrate the suite with many examples of transformations on the JLS grammars. We also describe the extraction effort, which was needed to make the JLS grammars amenable to automated processing. We include substantial metadata about the convergence process for the JLS so that the effort becomes reproducible and transparent

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

Automatic Wrapper Adaptation by Tree Edit Distance Matching

Author: Baumgartner Robert
Ferrara Emilio
Publication venue
Publication date: 01/01/2010
Field of study

Information distributed through the Web keeps growing faster day by day,\ud and for this reason, several techniques for extracting Web data have been suggested\ud during last years. Often, extraction tasks are performed through so called wrappers,\ud procedures extracting information from Web pages, e.g. implementing logic-based\ud techniques. Many ﬁelds of application today require a strong degree of robustness\ud of wrappers, in order not to compromise assets of information or reliability of data\ud extracted.\ud Unfortunately, wrappers may fail in the task of extracting data from a Web page, if\ud its structure changes, sometimes even slightly, thus requiring the exploiting of new\ud techniques to be automatically held so as to adapt the wrapper to the new structure\ud of the page, in case of failure. In this work we present a novel approach of automatic wrapper adaptation based on the measurement of similarity of trees through\ud improved tree edit distance matching techniques

CogPrints Cognitive Sciences Eprint Archive

Design of Automatically Adaptable Web Wrappers

Author: Baumgartner Robert
Ferrara Emilio
Publication venue
Publication date: 01/01/2011
Field of study

Nowadays, the huge amount of information distributed through the Web motivates studying techniques to\ud be adopted in order to extract relevant data in an efﬁcient and reliable way. Both academia and enterprises\ud developed several approaches of Web data extraction, for example using techniques of artiﬁcial intelligence or\ud machine learning. Some commonly adopted procedures, namely wrappers, ensure a high degree of precision\ud of information extracted from Web pages, and, at the same time, have to prove robustness in order not to\ud compromise quality and reliability of data themselves.\ud In this paper we focus on some experimental aspects related to the robustness of the data extraction process\ud and the possibility of automatically adapting wrappers. We discuss the implementation of algorithms for\ud ﬁnding similarities between two different version of a Web page, in order to handle modiﬁcations, avoiding\ud the failure of data extraction tasks and ensuring reliability of information extracted. Our purpose is to evaluate\ud performances, advantages and draw-backs of our novel system of automatic wrapper adaptation

arXiv.org e-Print Archive

CiteSeerX

CogPrints Cognitive Sciences Eprint Archive

Recommended from our members

Steering data quality with visual analytics: The complexity challenge

Author: Andrienko G.
Cao N.
Hong S.
Jiang L.
Liu S.
Shi C.
Wang Y. S.
Wu Y.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Data quality management, especially data cleansing, has been extensively studied for many years in the areas of data management and visual analytics. In the paper, we first review and explore the relevant work from the research areas of data management, visual analytics and human-computer interaction. Then for different types of data such as multimedia data, textual data, trajectory data, and graph data, we summarize the common methods for improving data quality by leveraging data cleansing techniques at different analysis stages. Based on a thorough analysis, we propose a general visual analytics framework for interactively cleansing data. Finally, the challenges and opportunities are analyzed and discussed in the context of data and humans

City Research Online

Directory of Open Access Journals