Search CORE

13,754 research outputs found

XML Schema Clustering with Semantic and Hierarchical Similarity Measures

Author: Iryadi Wina
Nayak Richi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis

Crossref

Queensland University of Technology ePrints Archive

Inductive queries for a drug designing robot scientist

Author: A. Lingas
C. Hansch
C.A. Lipinski
D.R. Jones
D.R. Jones
H. Blockeel
J. Matousek
L. Raedt De
R.D. King
R.D. King
T. Gärtner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

It is increasingly clear that machine learning algorithms need to be integrated in an iterative scientific discovery loop, in which data is queried repeatedly by means of inductive queries and where the computer provides guidance to the experiments that are being performed. In this chapter, we summarise several key challenges in achieving this integration of machine learning and data mining algorithms in methods for the discovery of Quantitative Structure Activity Relationships (QSARs). We introduce the concept of a robot scientist, in which all steps of the discovery process are automated; we discuss the representation of molecular data such that knowledge discovery tools can analyse it, and we discuss the adaptation of machine learning and data mining algorithms to guide QSAR experiments

Lirias

Crossref

Bournemouth University Research Online

The University of Manchester - Institutional Repository

DIAL UCLouvain

Using edit distance to analyse errors in a natural language to logic translation corpus

Author: Barker-Plummer Dave
Cox Richard
Dale Robert
Publication venue
Publication date: 01/01/2012
Field of study

We have assembled a large corpus of student submissions to an automatic grading system, where the subject matter involves the translation of natural language sentences into propositional logic. Of the 2.3 million translation instances in the corpus, 286,000 (approximately 12%) are categorized as being in error. We want to understand the nature of the errors that students make, so that we can develop tools and supporting infrastructure that help students with the problems that these errors represent. With this aim in mind, this paper describes an analysis of a significant proportion of the data, using edit distance between incorrect answers and their corresponding correct solutions, and the associated edit sequences, as a means of organising the data and detecting categories of errors. We demonstrate that a large proportion of errors can be accounted for by means of a small number of relatively simple error types, and that the method draws attention to interesting phenomena in the data set

CiteSeerX

Macquarie University ResearchOnline

Sussex Research Online

Exploring Communities in Large Profiled Graphs

Author: Chen Xiaojun
Chen Yankai
Cheng Reynold
Fang Yixiang
Li Yun
Zhang Jie
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Given a graph

G

and a vertex

q\in G

, the community search (CS) problem aims to efficiently find a subgraph of

G

whose vertices are closely related to

q

. Communities are prevalent in social and biological networks, and can be used in product advertisement and social event recommendation. In this paper, we study profiled community search (PCS), where CS is performed on a profiled graph. This is a graph in which each vertex has labels arranged in a hierarchical manner. Extensive experiments show that PCS can identify communities with themes that are common to their vertices, and is more effective than existing CS approaches. As a naive solution for PCS is highly expensive, we have also developed a tree index, which facilitate efficient and online solutions for PCS

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Integrating Economic Knowledge in Data Mining Algorithms

Author: Daniëls H.A.M.
Feelders A.J.
Publication venue
Publication date
Field of study

The assessment of knowledge derived from databases depends on many factors. Decision makers often need to convince others about the correctness and effectiveness of knowledge induced from data.The current data mining techniques do not contribute much to this process of persuasion.Part of this limitation can be removed by integrating knowledge from experts in the field, encoded in some accessible way, with knowledge derived form patterns stored in the database.In this paper we will in particular discuss methods for implementing monotonicity constraints in economic decision problems.This prior knowledge is combined with data mining algorithms based on decision trees and neural networks.The method is illustrated in a hedonic price model.knowledge;neural network;data mining;decision trees

Research Papers in Economics

On the use of hierarchical subtrace mining for efficient local process model mining

Author: Genga L.
Tax N.
Zannone N.
Publication venue: CEUR-WS.org
Publication date: 06/12/2017
Field of study

Mining local patterns of process behavior is a vital tool for the analysis of event data that originates from flexible processes, for which it is generally not possible to describe the behavior of the process in a single process model without overgeneralizing the behavior allowed by the process. Several techniques for mining such local patterns have been developed throughout the years, including Local Process Model (LPM) mining and the hierarchical mining of frequent subtraces (i.e., subprocesses). These two techniques can be considered to be orthogonal, i.e., they provide different types of insights on the behavior observed in an event log. As a consequence, it is often useful to apply both techniques to the data. However, both techniques can be computationally intensive, hindering data analysis. In this work, we explore how the output of a subtrace mining approach can be used to mine LPMs more efficiently. We show on a collection of real-life event logs that exploiting the ordering constraints extracted from subtraces lowers the computation time needed for LPM mining compared to state-of-the-art techniques, while at the same time mining higher quality LPMs. Additionally, by mining LPMs from subtraces, we can obtain a more structured and meaningful representation of subprocesses allowing for classic process-flow constructs such as parallel ordering, choices, and loops, besides the precedence relations shown by subtraces.</p

Pure OAI Repository

An introduction to Graph Data Management

Author: A Dries
A Gutiérrez
A Iosup
A Morari
A Poulovassilis
AD Zhu
AO Mendelzon
B Amann
B Elser
C Berge
C Vicknair
C Watters
C Weiss
CS Chang
D Conte
D Dominguez-Sal
D Theodoratos
DC Faye
DW Shipman
EF Codd
FW Tompa
G Malewicz
GM Kuper
H He
HS Kunii
IF Cruz
IF Cruz
J Hidders
J Paredaens
J Peckham
J. Hidders
Jonathan Hayes
K Zeng
L Kowalik
L Zou
M Atre
M Ciglan
M Consens
M Gemis
M Gyssens
M Han
M Levene
M Levene
M Levene
M Mainguenaud
M Schmidt
M Yannakakis
MA Bornea
MA Rodriguez
MA Rodriguez
Marc Andries
MP Consens
MP Consens
N Kiesel
N Roussopoulos
O Erling
P Barceló Baeza
P Buneman
P Yuan
Philippe Cudré-Mauroux
PPS Chen
PT Wood
PT Wood
R Agrawal
R Angles
R Angles
R Brijder
R Ronen
RH Güting
RS Xin
S Abiteboul
S Abiteboul
T Neumann
W Fan
W Kim
Y Guo
Y Low
Y Papakonstantinou
Y Tian
Y Zhao
YA Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/12/2017
Field of study

A graph database is a database where the data structures for the schema and/or instances are modeled as a (labeled)(directed) graph or generalizations of it, and where querying is expressed by graph-oriented operations and type constructors. In this article we present the basic notions of graph databases, give an historical overview of its main development, and study the main current systems that implement them

arXiv.org e-Print Archive

Crossref

Influence of Wind Turbines on Farmlands’ Value: Exploring the Behaviour of a Rural Community through the Decision Tree

Author: Acciani Claudio
De Boni Annalisa
Ottomano Palmisano Giovanni
Roma Rocco
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

The relationship between wind energy and rural areas leads to the controversial debate on the effects declared by rural communities after wind farms or single turbines are operative. The literature on this topic lacks dedicated studies analysing how the behaviour of rural communities towards wind turbines can affect the market value of farmlands. This research aims to examine to the extent to which the easement of wind turbines can influence the market value of farmlands in terms of willingness to pay (WTP) by a small rural community, and to identify the main factors affecting the WTP. Starting from data collected via face-to-face interviews, a decision tree is then applied to investigate the WTP for seven types of farmland in a rural town of Puglia Region (Southern Italy) hosting a wind farm. Results of the interviews show a broad acceptance of the wind farm, while the decision tree classification shows a significant reduction of WTP for all farmlands. The main factors influencing the WTP are the education level, the possibility to increase the income, the concerns for impacts on human health and for maintenance workmen. National and local policy measures have to be put in place to inform rural communities about the ‘magnitude’ of the effects they identified as crucial, so that policy-makers and private bodies will contribute to make the farmland market more equitable

Directory of Open Access Journals

Archivio istituzionale della ricerca - Università di Bari

Archivio Istituzionale della Ricerca- Università degli Studi di Foggia

Open Access Repository