3,061 research outputs found
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
State-of-the-art on evolution and reactivity
This report starts by, in Chapter 1, outlining aspects of querying and updating resources on
the Web and on the Semantic Web, including the development of query and update languages
to be carried out within the Rewerse project.
From this outline, it becomes clear that several existing research areas and topics are of
interest for this work in Rewerse. In the remainder of this report we further present state of
the art surveys in a selection of such areas and topics. More precisely: in Chapter 2 we give
an overview of logics for reasoning about state change and updates; Chapter 3 is devoted to briefly describing existing update languages for the Web, and also for updating logic programs;
in Chapter 4 event-condition-action rules, both in the context of active database systems and
in the context of semistructured data, are surveyed; in Chapter 5 we give an overview of some relevant rule-based agents frameworks
A Logic-based Approach for Recognizing Textual Entailment Supported by Ontological Background Knowledge
We present the architecture and the evaluation of a new system for
recognizing textual entailment (RTE). In RTE we want to identify automatically
the type of a logical relation between two input texts. In particular, we are
interested in proving the existence of an entailment between them. We conceive
our system as a modular environment allowing for a high-coverage syntactic and
semantic text analysis combined with logical inference. For the syntactic and
semantic analysis we combine a deep semantic analysis with a shallow one
supported by statistical models in order to increase the quality and the
accuracy of results. For RTE we use logical inference of first-order employing
model-theoretic techniques and automated reasoning tools. The inference is
supported with problem-relevant background knowledge extracted automatically
and on demand from external sources like, e.g., WordNet, YAGO, and OpenCyc, or
other, more experimental sources with, e.g., manually defined presupposition
resolutions, or with axiomatized general and common sense knowledge. The
results show that fine-grained and consistent knowledge coming from diverse
sources is a necessary condition determining the correctness and traceability
of results.Comment: 25 pages, 10 figure
A Survey on IT-Techniques for a Dynamic Emergency Management in Large Infrastructures
This deliverable is a survey on the IT techniques that are relevant to the three use cases of the project EMILI. It describes the state-of-the-art in four complementary IT areas: Data cleansing, supervisory control and data acquisition, wireless sensor networks and complex event processing. Even though the deliverableās authors have tried to avoid a too technical language and have tried to explain every concept referred to, the deliverable might seem rather technical to readers so far little familiar with the techniques it describes
Approximate model composition for explanation generation
This thesis presents a framework for the formulation of knowledge models to supĀ¬
port the generation of explanations for engineering systems that are represented by the
resulting models. Such models are automatically assembled from instantiated generic
component descriptions, known as modelfragments. The model fragments are of suffiĀ¬
cient detail that generally satisfies the requirements of information content as identified
by the user asking for explanations.
Through a combination of fuzzy logic based evidence preparation, which exploits the
history of prior user preferences, and an approximate reasoning inference engine, with
a Bayesian evidence propagation mechanism, different uncertainty sources can be hanĀ¬
dled. Model fragments, each representing structural or behavioural aspects of a comĀ¬
ponent of the domain system of interest, are organised in a library. Those fragments
that represent the same domain system component, albeit with different representation
detail, form parts of the same assumption class in the library. Selected fragments are
assembled to form an overall system model, prior to extraction of any textual inforĀ¬
mation upon which to base the explanations. The thesis proposes and examines the
techniques that support the fragment selection mechanism and the assembly of these
fragments into models.
In particular, a Bayesian network-based model fragment selection mechanism is deĀ¬
scribed that forms the core of the work. The network structure is manually determined
prior to any inference, based on schematic information regarding the connectivity of
the components present in the domain system under consideration. The elicitation
of network probabilities, on the other hand is completely automated using probability
elicitation heuristics. These heuristics aim to provide the information required to select
fragments which are maximally compatible with the given evidence of the fragments
preferred by the user. Given such initial evidence, an existing evidence propagation
algorithm is employed. The preparation of the evidence for the selection of certain
fragments, based on user preference, is performed by a fuzzy reasoning evidence fabĀ¬
rication engine. This engine uses a set of fuzzy rules and standard fuzzy reasoning
mechanisms, attempting to guess the information needs of the user and suggesting the selection of fragments of sufficient detail to satisfy such needs. Once the evidence
is propagated, a single fragment is selected for each of the domain system compoĀ¬
nents and hence, the final model of the entire system is constructed. Finally, a highly
configurable XML-based mechanism is employed to extract explanation content from
the newly formulated model and to structure the explanatory sentences for the final
explanation that will be communicated to the user.
The framework is illustratively applied to a number of domain systems and is compared
qualitatively to existing compositional modelling methodologies. A further empirical
assessment of the performance of the evidence propagation algorithm is carried out to
determine its performance limits. Performance is measured against the number of fragĀ¬
ments that represent each of the components of a large domain system, and the amount
of connectivity permitted in the Bayesian network between the nodes that stand for
the selection or rejection of these fragments. Based on this assessment recommendaĀ¬
tions are made as to how the framework may be optimised to cope with real world
applications
- ā¦