Search CORE

10 research outputs found

PRZEGLĄD METOD SELEKCJI CECH UŻYWANYCH W DIAGNOSTYCE CZERNIAKA

Author: Michalska Magdalena
Publication venue: 'Politechnika Lubelska'
Publication date: 01/01/2021
Field of study

Currently, a large number of trait selection methods are used. They are becoming more and more of interest among researchers. Some of the methods are of course used more frequently. The article describes the basics of selection-based algorithms. FS methods fall into three categories: filter wrappers, embedded methods. Particular attention was paid to finding examples of applications of the described methods in the diagnosisof skin melanoma.Obecnie stosuje się wiele metod selekcji cech. Cieszą się coraz większym zainteresowaniem badaczy. Oczywiście niektóre metody są stosowane częściej. W artykule zostały opisane podstawy działania algorytmów opartych na selekcji. Metody selekcji cech należące dzielą się na trzy kategorie: metody filtrowe, metody opakowujące, metody wbudowane. Zwrócono szczególnie uwagę na znalezienie przykładów zastosowań opisanych metod w diagnostyce czerniaka skóry

Biblioteka Nauki - repozytorium artykuÅÃ³w

Lublin University of Technology Journals

Semantic variation operators for multidimensional genetic programming

Author: Cava William La
Cava William La
Fine Steven B.
James Gareth
McConaghy Trent
Muñoz Luis
Pedregosa Fabian
Silva Sara
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/04/2019
Field of study

Multidimensional genetic programming represents candidate solutions as sets of programs, and thereby provides an interesting framework for exploiting building block identification. Towards this goal, we investigate the use of machine learning as a way to bias which components of programs are promoted, and propose two semantic operators to choose where useful building blocks are placed during crossover. A forward stagewise crossover operator we propose leads to significant improvements on a set of regression problems, and produces state-of-the-art results in a large benchmark study. We discuss this architecture and others in terms of their propensity for allowing heuristic search to utilize information during the evolutionary process. Finally, we look at the collinearity and complexity of the data representations that result from these architectures, with a view towards disentangling factors of variation in application.Comment: 9 pages, 8 figures, GECCO 201

arXiv.org e-Print Archive

Crossref

Consistent Feature Construction with Constrained Genetic Programming for Experimental Physics

Author: Cherrier Noëlie
Defurne Maxime
Poli Jean-Philippe
Sabatié Franck
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/06/2019
Field of study

A good feature representation is a determinant factor to achieve high performance for many machine learning algorithms in terms of classification. This is especially true for techniques that do not build complex internal representations of data (e.g. decision trees, in contrast to deep neural networks). To transform the feature space, feature construction techniques build new high-level features from the original ones. Among these techniques, Genetic Programming is a good candidate to provide interpretable features required for data analysis in high energy physics. Classically, original features or higher-level features based on physics first principles are used as inputs for training. However, physicists would benefit from an automatic and interpretable feature construction for the classification of particle collision events. Our main contribution consists in combining different aspects of Genetic Programming and applying them to feature construction for experimental physics. In particular, to be applicable to physics, dimensional consistency is enforced using grammars. Results of experiments on three physics datasets show that the constructed features can bring a significant gain to the classification accuracy. To the best of our knowledge, it is the first time a method is proposed for interpretable feature construction with units of measurement, and that experts in high-energy physics validate the overall approach as well as the interpretability of the built features.Comment: Accepted in this version to CEC 201

arXiv.org e-Print Archive

Crossref

HAL-CEA

Interpretable Dimensionally-Consistent Feature Extraction from Electrical Network Sensors

Author: Barbesant Vincent
Boudjeloud-Assala Lydia
Crochepierre Laure
Publication venue: HAL CCSD
Publication date: 14/09/2020
Field of study

International audienceElectrical power networks are heavily monitored systems, requiring operators to perform intricate information synthesis before understanding the underlying network state. Our study aims at helping this synthesis step by automatically creating features from the sensor data. We propose a supervised feature extraction approach using a grammar-guided evolution, which outputs interpretable and dimensionally consistent features. Operations restrictions on dimensions are introduced in the learning process through context-free grammars. They ensure coherence with physical laws, dimensional-consistency, and also introduce technical expertise in the created features. We compare our approach to other state-of-the-art feature extraction methods on a real dataset taken from the French electrical network sensors

INRIA a CCSD electronic archive server

A Filter Approach to Multiple Feature Construction for Symbolic Learning Classifiers Using Genetic Programming

Author: Kourosh Neshatian
Mengjie Zhang
Peter Andreae
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Feature clustering for pso-based feature construction on high-dimensional data

Author: Abu Bakar Azuraliza
Swesi Idheba Mohamad Ali Omer
Publication venue: 'UUM Press, Universiti Utara Malaysia'
Publication date: 01/01/2019
Field of study

Feature construction (FC) refers to a process that uses the original features to construct new features with better discrimination ability. Particle Swarm Optimisation (PSO) is an effective search technique that has been successfully utilised in FC. However, the application of PSO for feature construction using high dimensional data has been a challenge due to its large search space and high computational cost. Moreover, unnecessary features that were irrelevant, redundant and contained noise were constructed when PSO was applied to the whole feature. Therefore, the main purpose of this paper is to select the most informative features and construct new features from the selected features for a better classification performance. The feature clustering methods were used to aggregate similar features into clusters, whereby the dimensionality of the data was lowered by choosing representative features from every cluster to form the final feature subset. The clustering of each features are proven to be accurate in feature selection (FS), however, only one study investigated its application in FC for classification. The study identified some limitations, such as the implementation of only two binary classes and the decreasing accuracy of the data. This paper proposes a cluster based PSO feature construction approach called ClusPSOFC. The Redundancy-Based Feature Clustering (RFC) algorithm was applied to choose the most informative features from the original data, while PSO was used to construct new features from those selected by RFC. Experimental results were obtained by using six UCI data sets and six high-dimensional data to demonstrate the efficiency of the proposed method when compared to the original full features, other PSO based FC methods, and standard genetic programming based feature construction (GPFC). Hence, the ClusPSOFC method is effective for feature construction in the classification of high dimensional data

UUM Repository

Directory of Open Access Journals

Comprehensible and Robust Knowledge Discovery from Small Datasets

Author: Arzamasov Vadim
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 24/06/2021
Field of study

Die Wissensentdeckung in Datenbanken (“Knowledge Discovery in Databases”, KDD) zielt darauf ab, nützliches Wissen aus Daten zu extrahieren. Daten können eine Reihe von Messungen aus einem realen Prozess repräsentieren oder eine Reihe von Eingabe- Ausgabe-Werten eines Simulationsmodells. Zwei häufig widersprüchliche Anforderungen an das erworbene Wissen sind, dass es (1) die Daten möglichst exakt zusammenfasst und (2) in einer gut verständlichen Form vorliegt. Entscheidungsbäume (“Decision Trees”) und Methoden zur Entdeckung von Untergruppen (“Subgroup Discovery”) liefern Wissenszusammenfassungen in Form von Hyperrechtecken; diese gelten als gut verständlich. Um die Bedeutung einer verständlichen Datenzusammenfassung zu demonstrieren, erforschen wir Dezentrale intelligente Netzsteuerung — ein neues System, das die Bedarfsreaktion in Stromnetzen ohne wesentliche Änderungen in der Infrastruktur implementiert. Die bisher durchgeführte konventionelle Analyse dieses Systems beschränkte sich auf die Berücksichtigung identischer Teilnehmer und spiegelte daher die Realität nicht ausreichend gut wider. Wir führen viele Simulationen mit unterschiedlichen Eingabewerten durch und wenden Entscheidungsbäume auf die resultierenden Daten an. Mit den daraus resultierenden verständlichen Datenzusammenfassung konnten wir neue Erkenntnisse zum Verhalten der Dezentrale intelligente Netzsteuerung gewinnen. Entscheidungsbäume ermöglichen die Beschreibung des Systemverhaltens für alle Eingabekombinationen. Manchmal ist man aber nicht daran interessiert, den gesamten Eingaberaum zu partitionieren, sondern Bereiche zu finden, die zu bestimmten Ausgabe führen (sog. Untergruppen). Die vorhandenen Algorithmen zum Erkennen von Untergruppen erfordern normalerweise große Datenmengen, um eine stabile und genaue Ausgabe zu erzielen. Der Datenerfassungsprozess ist jedoch häufig kostspielig. Unser Hauptbeitrag ist die Verbesserung der Untergruppenerkennung aus Datensätzen mit wenigen Beobachtungen. Die Entdeckung von Untergruppen in simulierten Daten wird als Szenarioerkennung bezeichnet. Ein häufig verwendeter Algorithmus für die Szenarioerkennung ist PRIM (Patient Rule Induction Method). Wir schlagen REDS (Rule Extraction for Discovering Scenarios) vor, ein neues Verfahren für die Szenarioerkennung. Für REDS, trainieren wir zuerst ein statistisches Zwischenmodell und verwenden dieses, um eine große Menge neuer Daten für PRIM zu erstellen. Die grundlegende statistische Intuition beschrieben wir ebenfalls. Experimente zeigen, dass REDS viel besser funktioniert als PRIM für sich alleine: Es reduziert die Anzahl der erforderlichen Simulationsläufe um 75% im Durchschnitt. Mit simulierten Daten hat man perfekte Kenntnisse über die Eingangsverteilung — eine Voraussetzung von REDS. Um REDS auf realen Messdaten anwendbar zu machen, haben wir es mit Stichproben aus einer geschätzten multivariate Verteilung der Daten kombiniert. Wir haben die resultierende Methode in Kombination mit verschiedenen Methoden zur Generierung von Daten experimentell evaluiert. Wir haben dies für PRIM und BestInterval — eine weitere repräsentative Methode zur Erkennung von Untergruppen — gemacht. In den meisten Fällen hat unsere Methodik die Qualität der entdeckten Untergruppen erhöht

KITopen

Integrating Machine Learning Paradigms for Predictive Maintenance in the Fourth Industrial Revolution era

Author: Calabrese Francesca <1992>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 17/03/2022
Field of study

In the last decade, manufacturing companies have been facing two significant challenges. First, digitalization imposes adopting Industry 4.0 technologies and allows creating smart, connected, self-aware, and self-predictive factories. Second, the attention on sustainability imposes to evaluate and reduce the impact of the implemented solutions from economic and social points of view. In manufacturing companies, the maintenance of physical assets assumes a critical role. Increasing the reliability and the availability of production systems leads to the minimization of systems’ downtimes; In addition, the proper system functioning avoids production wastes and potentially catastrophic accidents. Digitalization and new ICT technologies have assumed a relevant role in maintenance strategies. They allow assessing the health condition of machinery at any point in time. Moreover, they allow predicting the future behavior of machinery so that maintenance interventions can be planned, and the useful life of components can be exploited until the time instant before their fault. This dissertation provides insights on Predictive Maintenance goals and tools in Industry 4.0 and proposes a novel data acquisition, processing, sharing, and storage framework that addresses typical issues machine producers and users encounter. The research elaborates on two research questions that narrow down the potential approaches to data acquisition, processing, and analysis for fault diagnostics in evolving environments. The research activity is developed according to a research framework, where the research questions are addressed by research levers that are explored according to research topics. Each topic requires a specific set of methods and approaches; however, the overarching methodological approach presented in this dissertation includes three fundamental aspects: the maximization of the quality level of input data, the use of Machine Learning methods for data analysis, and the use of case studies deriving from both controlled environments (laboratory) and real-world instances

AMS Tesi di Dottorato

A corpus-based study of academic-collocation use and patterns in postgraduate Computer Science students’ writing

Author: Farooqui Afnan Saleh
Publication venue
Publication date: 01/01/2016
Field of study

Collocation has been considered a problematic area for L2 learners. Various studies have been conducted to investigate native speakers’ (NS) and non-native speakers’ (NNS) use of different types of collocations (e.g., Durrant and Schmitt, 2009; Laufer and Waldman, 2011).These studies have indicated that, unlike NS, NNS rely on a limited set of collocations and tend to overuse them. This raises the question: if NNS tend to overuse a limited set of collocations in their academic writing, would their use of academic collocations in a specific discipline (Computer Science in this study) vary from that of NS and expert writers? This study has three main aims. First, it investigates the use of lexical academic collocations in NNS and NS Computer Science students’ MSc dissertations and compares their uses with those by expert writers in their writing of published research articles. Second, it explores the factors behind the over/underuse of the 24shared lexical collocations among corpora. Third, it develops awareness-raising activities that could be used to help non-expert NNS students with collocation over/underuse problems. For this purpose, a corpus of 600,000 words was compiled from 55 dissertations (26 written by NS and 29 by NNS). For comparison purposes, a reference corpus of 600,269 words was compiled from 63 research articles from prestigious high impact factor Computer Science academic journals. The Academic Word List (AWL) (Coxhead, 2000) was used to develop lists of the most frequent academic words in the student corpora, whose collocations were examined. Quantitative analysis was then carried out by comparing the 100 most frequent noun and verb collocations from each of the student corpora with the reference corpus. The results reveal that both NNS (52%) and NS (78%) students overuse noun collocations compared to the expert writers in the reference corpus. They underuse only a small number of noun collocations (8%). Surprisingly, neither NNS nor NS students significantly over/underused verb collocations compared to the reference corpus. In order to achieve the second aim, mixed methods approach was adopted. First, the variant patterns of the 24 shared noun collocations between NNS and NS corpora were identified to determine whether over/underuse of these collocations could be explained by their differences in the number of patterns used. Approximately half of the 24 collocations identified for their patterns were using more patterns including (Noun + preposition +Noun and Noun + adjective +Noun) that were rarely located in the writing of experts. Second, a categorisation judgement task and semi-structured interviews were carried out with three Computer Scientists to elicit their views on the various factors likely influencing noun collocation choices by the writers across the corpora. Results demonstrate that three main factors could explain the variation: sub-discipline, topic, and genre. To achieve the third pedagogical aim, a sample of awareness-raising activities was designed for the problematic over/underuse of some noun collocations. Using the corpus-based Data Driven Learning (DDL)approach (Johns,1991), three types of awareness-raising activities were developed: noticing collocation, noticing and identifying different patterns of the same collocation, and comparing and contrasting patterns between NNS students’ corpora and the reference corpus. Results of this study suggest that academic collocation use in an ESP context (Computer Science) is related to other factors than students’ lack of knowledge of collocations. Expertness, genre variation, topic and discipline-specific collocations are proved important factors to be considered in ESP. Thus, ESP teachers have to alert their students to the effect of these factors in academic collocation use in subject specific disciplines. This has tangible implications for Applied Linguistics and for teaching practices

University of Essex Research Repository