Search CORE

268,115 research outputs found

Conjoint data mining of structured and semi-structured data

Author: Dillon Tharam S.
Hadzic Fedja
Pan Qi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

With the knowledge management requirement growing, enterprises are becoming increasingly aware of the significance of interlinking business information across structured and semi-structured data sources. This problem has become more important with the growing amount of semi-structured data often found in XML repositories, web logs, biological databases, etc. Effectively creating links between semi-structured and structured data is a challenging and unresolved problem. Once an optimized method has been formulated, the process of data mining can be implemented in a conjoint manner. This paper investigates a way in which this challenging problem can be tackled. The proposed method is experimentally evaluated using a real world database and the effectiveness and the potential in discovering collective information is demonstrated

espace@Curtin

Recommended from our members

Structured low complexity data mining

Author: Jo Jason
Publication venue
Publication date: 02/10/2015
Field of study

textDue to the rapidly increasing dimensionality of modern datasets many classical approximation algorithms have run into severe computational bottlenecks. This has often been referred to as the “curse of dimensionality.” To combat this, low complexity priors have been used as they enable us to design efficient approximation algorithms which are capable of scaling up to these modern datasets. Typically the reduction in computational complexity comes at the expense of accuracy. However, the tradeoffs have been relatively advantageous to the computational scientist. This is typically referred to as the “blessings of dimensionality.” Solving large underdetermined systems of linear equations has benefited greatly from the sparsity low complexity prior. A priori, solving a large underdetermined system of linear equations is severely ill-posed. However, using a relatively generic class of sampling matrices, assuming a sparsity prior can yield a well-posed linear system of equations. In particular, various greedy iterative approximation algorithms have been developed which can recover and accurately approximate the k-most significant atoms in our signal. For many engineering applications, the distribution of the top k atoms is not arbitrary and itself has some further structure. In the first half of the thesis we will be concerned with incorporating some a priori designed weights to allow for structured sparse approximation. We provide performance guarantees and numerically demonstrate how the appropriate use of weights can yield a simultaneous reduction in sample complexity and an improvement in approximation accuracy. In the second half of the thesis we will consider the collaborative filtering problem, specifically the task of matrix completion. The matrix completion problem is likewise severely ill-posed but with a low rank prior, the matrix completion problem with high probability admits a unique and robust solution via a cadre of convex optimization solvers. The drawback here is that the solvers enjoy strong theoretical guarantees only in the uniform sampling regime. Building upon recent work on non-uniform matrix completion, we propose a completely expert-free empirical procedure to design optimization parameters in the form of positive weights which allow for the recovery of arbitrarily sampled low rank matrices. We provide theoretical guarantees for these empirically learned weights and present numerical simulations which again show that encoding prior knowledge in the form of weights for optimization problems can again yield a simultaneous reduction in sample complexity and an improvement in approximation accuracy.Mathematic

Texas ScholarWorks

Ontology of core data mining entities

Author: A Bernstein
A Golbraikh
A Karalic
B Smith
B Smith
B Smith
C Silla
C Vens
D Demšar
D Kocev
D Kocev
D Qi
D Young
DJ Hand
F Serban
G Madjarov
G Tsoumakas
GH Bakir
H Mannila
HP Kriegel
I Slavkov
J Vanschoren
K Button
Larisa Soldatova
LN Soldatova
M Courtot
M Ford
M Žáková
MA Avery
MA Avery
MF López
O Spjuth
P Robinson
Panče Panov
Q Yang
R Caruana
R Guha
R Guha
RD King
RD King
RR Brinkman
Sašo Džeroski
T Dietterich
V Podpečan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/07/2014
Field of study

In this article, we present OntoDM-core, an ontology of core data mining entities. OntoDM-core defines themost essential datamining entities in a three-layered ontological structure comprising of a specification, an implementation and an application layer. It provides a representational framework for the description of mining structured data, and in addition provides taxonomies of datasets, data mining tasks, generalizations, data mining algorithms and constraints, based on the type of data. OntoDM-core is designed to support a wide range of applications/use cases, such as semantic annotation of data mining algorithms, datasets and results; annotation of QSAR studies in the context of drug discovery investigations; and disambiguation of terms in text mining. The ontology has been thoroughly assessed following the practices in ontology engineering, is fully interoperable with many domain resources and is easy to extend

Crossref

Brunel University Research Archive

Mining Projects from Structured and Unstructured Data

Author: Bala Saimir
Publication venue: Jens Gulden, Selmin Nurcan, Iris Reinhartz-Berger, Widet Guédria, Palash Bera, Sérgio Guerreiro, Michael Fellman, Matthias Weidlich
Publication date: 01/01/2017
Field of study

Companies working on safety-critical projects must adhere to strict rules imposed by the domain, especially when human safety is involved. These projects need to be compliant to standard norms and regulations. Thus, all the process steps must be clearly documented in order to be verifiable for compliance in a later stage by an auditor. Nevertheless, documentation often comes in the form of manually written textual documents in different formats. Moreover, the project members use diverse proprietary tools. This makes it difficult for auditors to understand how the actual project was conducted. My research addresses the project mining problem by exploiting logs from project-generated artifacts, which come from software repositories used by the project team

Elektronische Publikationen der Wirtschaftsuniversität Wien

Efficient Mining of Heterogeneous Star-Structured Data

Author: Rege Manjeet
Yu Qi
Publication venue: RIT Scholar Works
Publication date: 01/01/2008
Field of study

Many of the real world clustering problems arising in data mining applications are heterogeneous in nature. Heterogeneous co-clustering involves simultaneous clustering of objects of two or more data types. While pairwise co-clustering of two data types has been well studied in the literature, research on high-order heterogeneous co-clustering is still limited. In this paper, we propose a graph theoretical framework for addressing star- structured co-clustering problems in which a central data type is connected to all the other data types. Partitioning this graph leads to co-clustering of all the data types under the constraints of the star-structure. Although, graph partitioning approach has been adopted before to address star-structured heterogeneous complex problems, the main contribution of this work lies in an e cient algorithm that we propose for partitioning the star-structured graph. Computationally, our algorithm is very quick as it requires a simple solution to a sparse system of overdetermined linear equations. Theoretical analysis and extensive exper- iments performed on toy and real datasets demonstrate the quality, e ciency and stability of the proposed algorithm

CiteSeerX

RIT Scholar Works