13,754 research outputs found
XML Schema Clustering with Semantic and Hierarchical Similarity Measures
With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis
Inductive queries for a drug designing robot scientist
It is increasingly clear that machine learning algorithms need to be integrated in an iterative scientific discovery loop, in which data is queried repeatedly by means of inductive queries and where the computer provides guidance to the experiments that are being performed. In this chapter, we summarise several key challenges in achieving this integration of machine learning and data mining algorithms in methods for the discovery of Quantitative Structure Activity Relationships (QSARs). We introduce the concept of a robot scientist, in which all steps of the discovery process are automated; we discuss the representation of molecular data such that knowledge discovery tools can analyse it, and we discuss the adaptation of machine learning and data mining algorithms to guide QSAR experiments
Using edit distance to analyse errors in a natural language to logic translation corpus
We have assembled a large corpus of student submissions to an automatic grading system, where the subject matter involves the translation of natural language sentences into propositional logic. Of the 2.3 million translation instances in the corpus, 286,000 (approximately 12%) are categorized as being in error. We want to understand the nature of the errors that students make, so that we can develop tools and supporting infrastructure that help students with the problems that these errors represent.
With this aim in mind, this paper describes an analysis of a significant proportion of the data, using edit distance between incorrect answers and their corresponding correct solutions, and the associated edit sequences, as a means of organising the data and detecting categories of errors. We demonstrate that a large proportion of errors can be accounted for by means of a small number of relatively simple error types, and that the method draws attention to interesting phenomena in the data set
Exploring Communities in Large Profiled Graphs
Given a graph and a vertex , the community search (CS) problem
aims to efficiently find a subgraph of whose vertices are closely related
to . Communities are prevalent in social and biological networks, and can be
used in product advertisement and social event recommendation. In this paper,
we study profiled community search (PCS), where CS is performed on a profiled
graph. This is a graph in which each vertex has labels arranged in a
hierarchical manner. Extensive experiments show that PCS can identify
communities with themes that are common to their vertices, and is more
effective than existing CS approaches. As a naive solution for PCS is highly
expensive, we have also developed a tree index, which facilitate efficient and
online solutions for PCS
Integrating Economic Knowledge in Data Mining Algorithms
The assessment of knowledge derived from databases depends on many factors. Decision makers often need to convince others about the correctness and effectiveness of knowledge induced from data.The current data mining techniques do not contribute much to this process of persuasion.Part of this limitation can be removed by integrating knowledge from experts in the field, encoded in some accessible way, with knowledge derived form patterns stored in the database.In this paper we will in particular discuss methods for implementing monotonicity constraints in economic decision problems.This prior knowledge is combined with data mining algorithms based on decision trees and neural networks.The method is illustrated in a hedonic price model.knowledge;neural network;data mining;decision trees
On the use of hierarchical subtrace mining for efficient local process model mining
Mining local patterns of process behavior is a vital tool for the analysis of event data that originates from flexible processes, for which it is generally not possible to describe the behavior of the process in a single process model without overgeneralizing the behavior allowed by the process. Several techniques for mining such local patterns have been developed throughout the years, including Local Process Model (LPM) mining and the hierarchical mining of frequent subtraces (i.e., subprocesses). These two techniques can be considered to be orthogonal, i.e., they provide different types of insights on the behavior observed in an event log. As a consequence, it is often useful to apply both techniques to the data. However, both techniques can be computationally intensive, hindering data analysis. In this work, we explore how the output of a subtrace mining approach can be used to mine LPMs more efficiently. We show on a collection of real-life event logs that exploiting the ordering constraints extracted from subtraces lowers the computation time needed for LPM mining compared to state-of-the-art techniques, while at the same time mining higher quality LPMs. Additionally, by mining LPMs from subtraces, we can obtain a more structured and meaningful representation of subprocesses allowing for classic process-flow constructs such as parallel ordering, choices, and loops, besides the precedence relations shown by subtraces.</p
An introduction to Graph Data Management
A graph database is a database where the data structures for the schema
and/or instances are modeled as a (labeled)(directed) graph or generalizations
of it, and where querying is expressed by graph-oriented operations and type
constructors. In this article we present the basic notions of graph databases,
give an historical overview of its main development, and study the main current
systems that implement them
Influence of Wind Turbines on Farmlands’ Value: Exploring the Behaviour of a Rural Community through the Decision Tree
The relationship between wind energy and rural areas leads to the controversial debate
on the effects declared by rural communities after wind farms or single turbines are operative. The
literature on this topic lacks dedicated studies analysing how the behaviour of rural communities
towards wind turbines can affect the market value of farmlands. This research aims to examine to the
extent to which the easement of wind turbines can influence the market value of farmlands in terms
of willingness to pay (WTP) by a small rural community, and to identify the main factors affecting
the WTP. Starting from data collected via face-to-face interviews, a decision tree is then applied to
investigate the WTP for seven types of farmland in a rural town of Puglia Region (Southern Italy)
hosting a wind farm. Results of the interviews show a broad acceptance of the wind farm, while the
decision tree classification shows a significant reduction of WTP for all farmlands. The main factors
influencing the WTP are the education level, the possibility to increase the income, the concerns for
impacts on human health and for maintenance workmen. National and local policy measures have
to be put in place to inform rural communities about the ‘magnitude’ of the effects they identified
as crucial, so that policy-makers and private bodies will contribute to make the farmland market
more equitable
- …