Search CORE

1,816 research outputs found

Mining complex structured data: Enhanced methods and applications

Author: Bui Dang Bach
Publication venue: Curtin University
Publication date: 01/01/2015
Field of study

Conventional approaches to analysing complex business data typically rely on process models, which are difficult to construct and use. This thesis addresses this issue by converting semi-structured event logs to a simpler flat representation without any loss of information, which then enables direct applications of classical data mining methods. The thesis also proposes an effective and scalable classification method which can identify distinct characteristics of a business process for further improvements

espace@Curtin

Quality and interestingness of association rules derived from data mining of relational and semi-structured data

Author: Mohd Shaharanee Izwan Nizal
Publication venue: Curtin University
Publication date: 01/01/2012
Field of study

Deriving useful and interesting rules from a data mining system are essential and important tasks. Problems such as the discovery of random and coincidental patterns or patterns with no significant values, and the generation of a large volume of rules from a database commonly occur. Works on sustaining the interestingness of rules generated by data mining algorithms are actively and constantly being examined and developed. As the data mining techniques are data-driven, it is beneficial to affirm the rules using a statistical approach. It is important to establish the ways in which the existing statistical measures and constraint parameters can be effectively utilized and the sequence of their usage.In this thesis, a systematic way to evaluate the association rules discovered from frequent, closed and maximal itemset mining algorithms; and frequent subtree mining algorithm including the rules based on induced, embedded and disconnected subtrees is presented. With reference to the frequent subtree mining, in addition a new direction is explored based on utilizing the DSM approach capable of preserving all information from tree-structured database in a flat data format, consequently enabling the direct application of a wider range of data mining analysis/techniques to tree-structured data. Implications of this approach were investigated and it was found that basing rules on disconnected subtrees, can be useful in terms of increasing the accuracy and the coverage rate of the rule set.A strategy that combines data mining and statistical measurement techniques such as sampling, redundancy and contradictive checks, correlation and regression analysis to evaluate the rules is developed. This framework is then applied to real-world datasets that represent diverse characteristics of data/items. Empirical results show that with a proper combination of data mining and statistical analysis, the proposed framework is capable of eliminating a large number of non-significant, redundant and contradictive rules while preserving relatively valuable high accuracy rules. Moreover, the results reveal the important characteristics and differences between mining frequent, closed or maximal itemsets; and mining frequent subtree including the rules based on induced, embedded and disconnected subtrees; as well as the impact of confidence measure for the prediction and classification task

espace@Curtin

Recommended from our members

Understanding Semantic Implicit Learning through distributional linguistic patterns: A computational perspective

Author: Alikaniotis Dimitrios
Publication venue: University of Cambridge
Publication date: 27/06/2019
Field of study

The research presented in this PhD dissertation provides a computational perspective on Semantic Implicit Learning (SIL). It puts forward the idea that SIL does not depend on semantic knowledge as classically conceived but upon semantic-like knowledge gained through distributional analysis of massive linguistic input. Using methods borrowed from the machine learning and artificial intelligence literature, we construct computational models, which can simulate the performance observed during behavioural tasks of semantic implicit learning in a human-like way. We link this methodology to the current literature on implicit learning, arguing that this behaviour is a necessary by-product of efficient language processing. Chapter 1 introduces the computational problem posed by implicit learning in general, and semantic implicit learning, in particular, as well as the computational framework, used to tackle them. Chapter 2 introduces distributional semantics models as a way to learn semantic-like representations from exposure to linguistic input. Chapter 3 reports two studies on large datasets of semantic priming which seek to identify the computational model of semantic knowledge that best fits the data under conditions that resemble SIL tasks. We find that a model which acquires semantic-like knowledge gained through distributional analysis of massive linguistic input provides the best fit to the data. Chapter 4 generalises the results of the previous two studies by looking at the performance of the same models in languages other than English. Chapter 5 applies the results of the two previous Chapters on eight datasets of semantic implicit learning. Crucially, these datasets use various semantic manipulations and speakers of different L1s enabling us to test the predictions of different models of semantics. Chapter 6 examines more closely two assumptions which we have taken for granted throughout this thesis. Firstly, we test whether a simpler model based on phonological information can explain the generalisation patterns observed in the tasks. Secondly, we examine whether our definition of the computational problem in Chapter 5 is reasonable. Chapter 7 summarises and discusses the implications for implicit language learning and computational models of cognition. Furthermore, we offer one more study that seeks to bridge the literature on distributional models of semantics to `deeper' models of semantics by learning semantic relations. There are two main contributions of this dissertation to the general field of implicit learning research. Firstly, we highlight the superiority of distributional models of semantics in modelling unconscious semantic knowledge. Secondly, we question whether `deep' semantic knowledge is needed to achieve above chance performance in SIIL tasks. We show how a simple model that learns through distributional analysis of the patterns found in the linguistic input can match the behavioural results in different languages. Furthermore, we link these models to more general problems faced in psycholinguistics such as language processing and learning of semantic relations.Alexandros Onassis Foundatio

Apollo (Cambridge)

Small nets and short paths optimising neural computation

Author: Frean Marcus Roland
Publication venue: The University of Edinburgh
Publication date: 01/01/1990
Field of study

Edinburgh Research Archive

Integrating Structure and Meaning: Using Holographic Reduced Representations to Improve Automatic Text Classification

Author: Fishbein Jonathan Michael
Publication venue: 'University of Waterloo'
Publication date: 01/01/2008
Field of study

Current representation schemes for automatic text classification treat documents as syntactically unstructured collections of words (Bag-of-Words) or `concepts' (Bag-of-Concepts). Past attempts to encode syntactic structure have treated part-of-speech information as another word-like feature, but have been shown to be less effective than non-structural approaches. We propose a new representation scheme using Holographic Reduced Representations (HRRs) as a technique to encode both semantic and syntactic structure, though in very different ways. This method is unique in the literature in that it encodes the structure across all features of the document vector while preserving text semantics. Our method does not increase the dimensionality of the document vectors, allowing for efficient computation and storage. We present the results of various Support Vector Machine classification experiments that demonstrate the superiority of this method over Bag-of-Concepts representations and improvement over Bag-of-Words in certain classification contexts

University of Waterloo's Institutional Repository

Machine Learning

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Machine Learning can be defined in various ways related to a scientific domain concerned with the design and development of theoretical and implementation tools that allow building systems with some Human Like intelligent behavior. Machine learning addresses more specifically the ability to improve automatically through experience

Directory of Open Access Books (DOAB)

An incremental clustering and associative learning architecture for intelligent robotics

Author: Keysermann Matthias Ulrich
Publication venue: Mathematical and Computer Sciences
Publication date: 01/08/2015
Field of study

The ability to learn from the environment and memorise the acquired knowledge is essential for robots to become autonomous and versatile artificial companions. This thesis proposes a novel learning and memory architecture for robots, which performs associative learning and recall of sensory and actuator patterns. The approach avoids the inclusion of task-specific expert knowledge and can deal with any kind of multi-dimensional real-valued data, apart from being tolerant to noise and supporting incremental learning. The proposed architecture integrates two machine learning methods: a topology learning algorithm that performs incremental clustering, and an associative memory model that learns relationship information based on the co-occurrence of inputs. The evaluations of both the topology learning algorithm and the associative memory model involved the memorisation of high-dimensional visual data as well as the association of symbolic data, presented simultaneously and sequentially. Moreover, the document analyses the results of two experiments in which the entire architecture was evaluated regarding its associative and incremental learning capabilities. One experiment comprised an incremental learning task with visual patterns and text labels, which was performed both in a simulated scenario and with a real robot. In a second experiment a robot learned to recognise visual patterns in the form of road signs and associated them with di erent con gurations of its arm joints. The thesis also discusses several learning-related aspects of the architecture and highlights strengths and weaknesses of the proposed approach. The developed architecture and corresponding ndings contribute to the domains of machine learning and intelligent robotics

ROS: The Research Output Service. Heriot-Watt University Edinburgh

The role of phonology in visual word recognition: evidence from Chinese

Author: Ip JKM
Lau DKY
Leung MT
Weekes BS
Publication venue: 'United States Sports Academy'
Publication date: 01/01/2010
Field of study

Posters - Letter/Word Processing V: abstract no. 5024The hypothesis of bidirectional coupling of orthography and phonology predicts that phonology plays a role in visual word recognition, as observed in the effects of feedforward and feedback spelling to sound consistency on lexical decision. However, because orthography and phonology are closely related in alphabetic languages (homophones in alphabetic languages are usually orthographically similar), it is difficult to exclude an influence of orthography on phonological effects in visual word recognition. Chinese languages contain many written homophones that are orthographically dissimilar, allowing a test of the claim that phonological effects can be independent of orthographic similarity. We report a study of visual word recognition in Chinese based on a mega-analysis of lexical decision performance with 500 characters. The results from multiple regression analyses, after controlling for orthographic frequency, stroke number, and radical frequency, showed main effects of feedforward and feedback consistency, as well as interactions between these variables and phonological frequency and number of homophones. Implications of these results for resonance models of visual word recognition are discussed.postprin

HKU Scholars Hub

Acquiring symbolic design optimization problem reformulation knowledge: On computable relationships between design syntax and semantics

Author: Sarkar Somwrita
Publication venue: Faculty of Architecture, Design and Planning
Publication date: 01/01/2009
Field of study

This thesis presents a computational method for the inductive inference of explicit and implicit semantic design knowledge from the symbolic-mathematical syntax of design formulations using an unsupervised pattern recognition and extraction approach. Existing research shows that AI / machine learning based design computation approaches either require high levels of knowledge engineering or large training databases to acquire problem reformulation knowledge. The method presented in this thesis addresses these methodological limitations. The thesis develops, tests, and evaluates ways in which the method may be employed for design problem reformulation. The method is based on the linear algebra based factorization method Singular Value Decomposition (SVD), dimensionality reduction and similarity measurement through unsupervised clustering. The method calculates linear approximations of the associative patterns of symbol cooccurrences in a design problem representation to infer induced coupling strengths between variables, constraints and system components. Unsupervised clustering of these approximations is used to identify useful reformulations. These two components of the method automate a range of reformulation tasks that have traditionally required different solution algorithms. Example reformulation tasks that it performs include selection of linked design variables, parameters and constraints, design decomposition, modularity and integrative systems analysis, heuristically aiding design “case” identification, topology modeling and layout planning. The relationship between the syntax of design representation and the encoded semantic meaning is an open design theory research question. Based on the results of the method, the thesis presents a set of theoretical postulates on computable relationships between design syntax and semantics. The postulates relate the performance of the method with empirical findings and theoretical insights provided by cognitive neuroscience and cognitive science on how the human mind engages in symbol processing and the resulting capacities inherent in symbolic representational systems to encode “meaning”. The performance of the method suggests that semantic “meaning” is a higher order, global phenomenon that lies distributed in the design representation in explicit and implicit ways. A one-to-one local mapping between a design symbol and its meaning, a largely prevalent approach adopted by many AI and learning algorithms, may not be sufficient to capture and represent this meaning. By changing the theoretical standpoint on how a “symbol” is defined in design representations, it was possible to use a simple set of mathematical ideas to perform unsupervised inductive inference of knowledge in a knowledge-lean and training-lean manner, for a knowledge domain that traditionally relies on “giving” the system complex design domain and task knowledge for performing the same set of tasks

Sydney eScholarship

Interactive effects of orthography and semantics in Chinese picture naming

Author: Law SP
Su IF
Weekes BS
Publication venue: 'United States Sports Academy'
Publication date: 01/01/2010
Field of study

Posters - Language Production/Writing: abstract no. 4035Picture-naming performance in English and Dutch is enhanced by presentation of a word that is similar in form to the picture name. However, it is unclear whether facilitation has an orthographic or a phonological locus. We investigated the loci of the facilitation effect in Cantonese Chinese speakers by manipulating—at three SOAs (2100, 0, and 1100 msec)—semantic, orthographic, and phonological similarity. We identified an effect of orthographic facilitation that was independent of and larger than phonological facilitation across all SOAs. Semantic interference was also found at SOAs of 2100 and 0 msec. Critically, an interaction of semantics and orthography was observed at an SOA of 1100 msec. This interaction suggests that independent effects of orthographic facilitation on picture naming are located either at the level of semantic processing or at the lemma level and are not due to the activation of picture name segments at the level of phonological retrieval.postprin

HKU Scholars Hub