803 research outputs found
Inferring syntactic rules for word alignment through Inductive Logic Programming
International audienceThis paper presents and evaluates an original approach to automatically align bitexts at the word level. It relies on a syntactic dependency analysis of the source and target texts and is based on a machine-learning technique, namely inductive logic programming (ILP). We show that ILP is particularly well suited for this task in which the data can only be expressed by (translational and syntactic) relations. It allows us to infer easily rules called syntactic alignment rules. These rules make the most of the syntactic information to align words. A simple bootstrapping technique provides the examples needed by ILP, making this machine learning approach entirely automatic. Moreover, through different experiments, we show that this approach requires a very small amount of training data, and its performance rivals some of the best existing alignment systems. Furthermore, cases of syntactic isomorphisms or non-isomorphisms between the source language and the target language are easily identified through the inferred rules
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
- …