162 research outputs found
A Metahierarchical Rule Decision System to Design Robust Fuzzy Classifiers Based on Data Complexity
There is a wide variety of studies that propose different classifiers to solve a large amount of problems in distinct classification scenarios. The no free lunch theorem states that if we use a big enough set of varied problems, all classifiers would be equivalent in performance. From another point of view, the performance of the classifiers is dependant of the scope and properties of the datasets. In this sense, new proposals on the topic often focus on a given context, aiming at improving the related state-of-the-art approaches. Data complexity metrics have been traditionally used to determine the inner characteristics of datasets. This way, researchers are able to categorize the problems in different scenarios. Then, this taxonomy can be applied to determine inner characteristics of the datasets in order to determine intervals of good and bad behavior for a given classifier. In this paper, we will take advantage of the data complexity metrics in order to design a fuzzy metaclassifier. The final goal is to create decision rules based on the inner characteristics of the data to apply a different version of the fuzzy classifier for a given problem. To do so, we will make use of the FARC-HD classifier, an evolutionary fuzzy system that has led to different extensions in the specialized literature. Experimental results show the goodness of this novel approach as it is able to outperform all versions of FARC-HD on a wide set of problems, and obtain competitive results (in terms of performance and interpretability) versus two selected state-of-the-art rule-based classification system, C4.5 and FURIA
Data complexity in supervised learning: A far-reaching implication
Aquesta tesi estudia la complexitat de les dades i el seu rol en la definició del comportament de les tècniques d'aprenentatge supervisat, i alhora explora la generació artificial de conjunts de dades mitjançant estimadors de complexitat. El treball s'ha construït sobre quatre principis que s'han succeït de manera natural. (1) La crítica de la metodologia actual utilitzada per la comunitat científica per avaluar el rendiment de nous sistemes d'aprenentatge ha desencadenat (2) l'interès per estimadors alternatius basats en l'anàlisi de la complexitat de les dades i el seu estudi. Ara bé, tant l'estat primerenc de les mesures de complexitat com la disponibilitat limitada de problemes del món real per fer el seu test han inspirat (3) la generació sintètica de problemes, la qual ha esdevingut l'eix central de la tesi, i (4) la proposta de fer servir estàndards artificials amb semblança als problemes reals.
L'objectiu que es persegueix a llarg termini amb aquesta recerca és proporcionar als usuaris (1) unes directrius per escollir el sistema d'aprenentatge idoni per resoldre el seu problema i (2) una col•lecció de problemes per, o bé avaluar el rendiment dels sistemes d'aprenentatge, o bé provar les seves limitacions.Esta tesis profundiza en el estudio de la complejidad de los datos y su papel en la definición del comportamiento de las técnicas de aprendizaje supervisado, a la vez que explora la generación artificial de conjuntos de datos mediante estimadores de complejidad. El trabajo se ha construido sobre cuatro pilares que se han sucedido de manera natural. (1) La crítica de la metodología actual utilizada por la comunidad científica para evaluar el rendimiento de nuevos sistemas de aprendizaje ha desatado (2) el interés por estimadores alternativos basados en el análisis de la complejidad de los datos y su estudio. Sin embargo, tanto el estado primerizo de las medidas de complejidad como la limitada disponibilidad de problemas del mundo real para su testeo han inspirado (3) la generación sintética de problemas, considerada el eje central de la tesis, y (4) la propuesta del uso de estándares artificiales con parecido a los problemas reales.
El objetivo que se persigue a largo plazo con esta investigación es el de proporcionar a los usuarios (1) unas pautas pare escoger el sistema de aprendizaje más idóneo para resolver su problema y (2) una colección de problemas para evaluar el rendimiento de los sistemas de aprendizaje o probar sus limitaciones.This thesis takes a close view of data complexity and its role shaping the behaviour of machine learning techniques in supervised learning and explores the generation of synthetic data sets through complexity estimates. The work has been built upon four principles which have naturally followed one another. (1) A critique about the current methodologies used by the machine learning community to evaluate the performance of new learners unleashes (2) the interest for alternative estimates based on the analysis of data complexity and its study.
However, both the early stage of the complexity measures and the limited availability of real-world problems for testing inspire (3) the generation of synthetic problems, which becomes the backbone of this thesis, and (4) the proposal of artificial benchmarks resembling real-world problems.
The ultimate goal of this research flow is, in the long run, to provide practitioners (1) with some guidelines to choose the most suitable learner given a problem and (2) with a collection of benchmarks to either assess the performance of the learners or test their limitations
Making music through real-time voice timbre analysis: machine learning and timbral control
PhDPeople can achieve rich musical expression through vocal sound { see for example
human beatboxing, which achieves a wide timbral variety through a range of
extended techniques. Yet the vocal modality is under-exploited as a controller
for music systems. If we can analyse a vocal performance suitably in real time,
then this information could be used to create voice-based interfaces with the
potential for intuitive and ful lling levels of expressive control.
Conversely, many modern techniques for music synthesis do not imply any
particular interface. Should a given parameter be controlled via a MIDI keyboard,
or a slider/fader, or a rotary dial? Automatic vocal analysis could provide
a fruitful basis for expressive interfaces to such electronic musical instruments.
The principal questions in applying vocal-based control are how to extract
musically meaningful information from the voice signal in real time, and how
to convert that information suitably into control data. In this thesis we address
these questions, with a focus on timbral control, and in particular we
develop approaches that can be used with a wide variety of musical instruments
by applying machine learning techniques to automatically derive the mappings
between expressive audio input and control output. The vocal audio signal is
construed to include a broad range of expression, in particular encompassing
the extended techniques used in human beatboxing.
The central contribution of this work is the application of supervised and
unsupervised machine learning techniques to automatically map vocal timbre
to synthesiser timbre and controls. Component contributions include a delayed
decision-making strategy for low-latency sound classi cation, a regression-tree
method to learn associations between regions of two unlabelled datasets, a fast
estimator of multidimensional di erential entropy and a qualitative method for
evaluating musical interfaces based on discourse analysis
Women in Artificial intelligence (AI)
This Special Issue, entitled "Women in Artificial Intelligence" includes 17 papers from leading women scientists. The papers cover a broad scope of research areas within Artificial Intelligence, including machine learning, perception, reasoning or planning, among others. The papers have applications to relevant fields, such as human health, finance, or education. It is worth noting that the Issue includes three papers that deal with different aspects of gender bias in Artificial Intelligence. All the papers have a woman as the first author. We can proudly say that these women are from countries worldwide, such as France, Czech Republic, United Kingdom, Australia, Bangladesh, Yemen, Romania, India, Cuba, Bangladesh and Spain. In conclusion, apart from its intrinsic scientific value as a Special Issue, combining interesting research works, this Special Issue intends to increase the invisibility of women in AI, showing where they are, what they do, and how they contribute to developments in Artificial Intelligence from their different places, positions, research branches and application fields. We planned to issue this book on the on Ada Lovelace Day (11/10/2022), a date internationally dedicated to the first computer programmer, a woman who had to fight the gender difficulties of her times, in the XIX century. We also thank the publisher for making this possible, thus allowing for this book to become a part of the international activities dedicated to celebrating the value of women in ICT all over the world. With this book, we want to pay homage to all the women that contributed over the years to the field of AI
Recommended from our members
Considerations in designing a cybernetic simple 'learning' model; and an overview of the problem of modelling learning
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Learning is viewed as a central feature of living systems and must be manifested in any artifact that claims to exhibit general intelligence. The central aims of the thesis are twofold: (1) - To review and critically assess the empirical and theoretical aspects of learning as have been addressed in a multitude of disciplines, with the aim of extracting fundamental features and elements. (2) - To develop a more systematic approach to the cybernetic modelling of learning than has been achieved hitherto. In pursuit of aim (1) above the following discussions are included: Historical and Philosophical backgrounds; Natural learning, both physiological and psychological aspects; Hierarchies of learning identified in the evolutionary, functional and developmental senses; An extensive section on the general problem of modelling of learning and the formal tools, is included as a link between aims (1) and (2). Following this a systematic and historically oriented study of cybernetic and other related approaches to the problem of modelling of learning is presented. This then leads to the development of a state-of-the-art general purpose experimental cybernetic learning model. The programming and use of this model is also fully described, including an elaborate scheme for the manifestation of simple learning
The impact of domain knowledge-driven variable derivation on classifier performance for corporate data mining
The technological progress in terms of increasing computational power and growing virtual space to collect data offers great potential for businesses to benefit from data mining applications. Data mining can create a competitive advantage for corporations by discovering business relevant information, such as patterns, relationships, and rules. The role of the human user within the data mining process is crucial, which is why the research area of domain knowledge becomes increasingly important. This thesis investigates the impact of domain knowledge-driven variable derivation on classifier performance for corporate data mining. Domain knowledge is defined as methodological, data and business know-how. The thesis investigates the topic from a new perspective by shifting the focus from a one-sided approach, namely a purely analytic or purely theoretical approach towards a target group-oriented (researcher and practitioner) approach which puts the methodological aspect by means of a scientific guideline in the centre of the research. In order to ensure feasibility and practical relevance of the guideline, it is adapted and applied to the requirements of a practical business case. Thus, the thesis examines the topic from both perspectives, a theoretical and practical perspective. Therewith, it overcomes the limitation of a one-sided approach which mostly lacks practical relevance or generalisability of the results. The primary objective of this thesis is to provide a scientific guideline which should enable both practitioners and researchers to move forward the domain knowledge-driven research for variable derivation on a corporate basis. In the theoretical part, a broad overview of the main aspects which are necessary to undertake the research are given, such as the concept of domain knowledge, the data mining task of classification, variable derivation as a subtask of data preparation, and evaluation techniques. This part of the thesis refers to the methodological aspect of domain knowledge. In the practical part, a research design is developed for testing six hypotheses related to domain knowledge-driven variable derivation. The major contribution of the empirical study is concerned with testing the impact of domain knowledge on a real business data set compared to the impact of a standard and randomly derived data set. The business application of the research is a binary classification problem in the domain of an insurance business, which deals with the prediction of damages in legal expenses insurances. Domain knowledge is expressed through deriving the corporate variables by means of the business and data-driven constructive induction strategy. Six variable derivation steps are investigated: normalisation, instance relation, discretisation, categorical encoding, ratio, and multivariate mathematical function. The impact of the domain knowledge is examined by pairwise (with and without derived variables) performance comparisons for five classification techniques (decision trees, naive Bayes, logistic regression, artificial neural networks, k-nearest neighbours). The impact is measured by two classifier performance criteria: sensitivity and area under the ROC-curve (AUC). The McNemar significance test is used to verify the results. Based on the results, two hypotheses are clearly verified and accepted, three hypotheses are partly verified, and one hypothesis had to be rejected on the basis of the case study results. The thesis reveals a significant positive impact of domain knowledge-driven variable derivation on classifier performance for options of all six tested steps. Furthermore, the findings indicate that the classification technique influences the impact of the variable derivation steps, and the bundling of steps has a significant higher performance impact if the variables are derived by using domain knowledge (compared to a non-knowledge application). Finally, the research turns out that an empirical examination of the domain knowledge impact is very complex due to a high level of interaction between the selected research parameters (variable derivation step, classification technique, and performance criteria)
OFSET_mine:an integrated framework for cardiovascular diseases risk prediction based on retinal vascular function
As cardiovascular disease (CVD) represents a spectrum of disorders that often manifestfor the first time through an acute life-threatening event, early identification of seemingly healthy subjects with various degrees of risk is a priority.More recently, traditional scores used for early identification of CVD risk are slowly being replaced by more sensitive biomarkers that assess individual, rather than population risks for CVD. Among these, retinal vascular function, as assessed by the retinal vessel analysis method (RVA), has been proven as an accurate reflection of subclinical CVD in groups of participants without overt disease but with certain inherited or acquired risk factors. Furthermore, in order to correctly detect individual risk at an early stage, specialized machine learning methods and featureselection techniques that can cope with the characteristics of the data need to bedevised.The main contribution of this thesis is an integrated framework, OFSET_mine, that combinesnovel machine learning methods to produce a bespoke solution for Cardiovascular Risk Prediction based on RVA data that is also applicable to other medical datasets with similar characteristics. The three identified essential characteristics are 1) imbalanced dataset,2) high dimensionality and 3) overlapping feature ranges with the possibility of acquiring new samples. The thesis proposes FiltADASYN as an oversampling method that deals with imbalance, DD_Rank as a feature selection method that handles high dimensionality, and GCO_mine as a method for individual-based classification, all three integrated within the OFSET_mine framework.The new oversampling method FiltADASYN extends Adaptive Synthetic Oversampling(ADASYN) with an additional step to filter the generated samples and improve the reliability of the resultant sample set. The feature selection method DD_Rank is based on Restricted Boltzmann Machine (RBM) and ranks features according to their stability and discrimination power. GCO_mine is a lazy learning method based on Graph Cut Optimization (GCO), which considers both the local arrangements and the global structure of the data.OFSET_mine compares favourably to well established composite techniques. Itex hibits high classification performance when applied to a wide range of benchmark medical datasets with variable sample size, dimensionality and imbalance ratios.When applying OFSET _mine on our RVA data, an accuracy of 99.52% is achieved. In addition, using OFSET, the hybrid solution of FiltADASYN and DD_Rank, with Random Forest on our RVA data produces risk group classifications with accuracy 99.68%. This not only reflects the success of the framework but also establishes RVAas a valuable cardiovascular risk predicto
Fuzzy-based machine learning for predicting narcissistic traits among Twitter users.
Doctoral Degree. University of KwaZulu-Natal, Pietermaritzburg.Social media has provided a platform for people to share views and opinions they identify with or
which are significant to them. Similarly, social media enables individuals to express themselves
authentically and divulge their personal experiences in a variety of ways. This behaviour, in turn,
reflects the user’s personality. Social media has in recent times been used to perpetuate various
forms of crimes, and a narcissistic personality trait has been linked to violent criminal
activities. This negative side effect of social media calls for multiple ways to respond and
prevent damage instigated. Eysenck's theory on personality and crime postulated that various forms
of crime are caused by a mixture of environmental and neurological causes. This theory suggests
certain people are more likely to commit a crime, and personality is the principal factor in
criminal behaviour. Twitter is a widely used social media platform for sharing news, opinions,
feelings, and emotions
by users.
Given that narcissists have an inflated self-view and engage in a variety of strategies aimed at
bringing attention to themselves, features unique to Twitter are more appealing to narcissists than
those on sites such as Facebook. This study adopted design science research methodology to develop
a fuzzy-based machine learning predictive model to identify traces of narcissism from Twitter using
data obtained from the activities of a user. Performance evaluation of various classifiers was
conducted and an optimal classifier with 95% accuracy was obtained. The research found that the
size of the dataset and input variables have an influence on classifier accuracy. In addition, the
research developed an updated process model and recommended a research model
for narcissism classification
- …