7 research outputs found

    Bounds on Depth of Decision Trees Derived from Decision Rule Systems

    Full text link
    Systems of decision rules and decision trees are widely used as a means for knowledge representation, as classifiers, and as algorithms. They are among the most interpretable models for classifying and representing knowledge. The study of relationships between these two models is an important task of computer science. It is easy to transform a decision tree into a decision rule system. The inverse transformation is a more difficult task. In this paper, we study unimprovable upper and lower bounds on the minimum depth of decision trees derived from decision rule systems depending on the various parameters of these systems

    Deduction in many-valued logics: a survey

    Get PDF

    GASVeM: A New Machine Learning Methodology for Multi-SNP Analysis of GWAS Data Based on Genetic Algorithms and Support Vector Machines

    Get PDF
    Genome-wide association studies (GWAS) are observational studies of a large set of genetic variants in an individual's sample in order to find if any of these variants are linked to a particular trait. In the last two decades, GWAS have contributed to several new discoveries in the field of genetics. This research presents a novel methodology to which GWAS can be applied to. It is mainly based on two machine learning methodologies, genetic algorithms and support vector machines. The database employed for the study consisted of information about 370,750 single-nucleotide polymorphisms belonging to 1076 cases of colorectal cancer and 973 controls. Ten pathways with different degrees of relationship with the trait under study were tested. The results obtained showed how the proposed methodology is able to detect relevant pathways for a certain trait: in this case, colorectal cancer

    New local search in the space of infeasible solutions framework for the routing of vehicles

    Get PDF
    Combinatorial optimisation problems (COPs) have been at the origin of the design of many optimal and heuristic solution frameworks such as branch-and-bound algorithms, branch-and-cut algorithms, classical local search methods, metaheuristics, and hyperheuristics. This thesis proposes a refined generic and parametrised infeasible local search (GPILS) algorithm for solving COPs and customises it to solve the traveling salesman problem (TSP), for illustration purposes. In addition, a rule-based heuristic is proposed to initialise infeasible local search, referred to as the parameterised infeasible heuristic (PIH), which allows the analyst to have some control over the features of the infeasible solution he/she might want to start the infeasible search with. A recursive infeasible neighbourhood search (RINS) as well as a generic patching procedure to search the infeasible space are also proposed. These procedures are designed in a generic manner, so they can be adapted to any choice of parameters of the GPILS, where the set of parameters, in fact for simplicity, refers to set of parameters, components, criteria and rules. Furthermore, a hyperheuristic framework is proposed for optimizing the parameters of GPILS referred to as HH-GPILS. Experiments have been run for both sequential (i.e. simulated annealing, variable neighbourhood search, and tabu search) and parallel hyperheuristics (i.e., genetic algorithms / GAs) to empirically assess the performance of the proposed HH-GPILS in solving TSP using instances from the TSPLIB. Empirical results suggest that HH-GPILS delivers an outstanding performance. Finally, an offline learning mechanism is proposed as a seeding technique to improve the performance and speed of the proposed parallel HH-GPILS. The proposed offline learning mechanism makes use of a knowledge-base to keep track of the best performing chromosomes and their scores. Empirical results suggest that this learning mechanism is a promising technique to initialise the GA’s population

    Formal Foundations for Information-Preserving Model Synchronization Processes Based on Triple Graph Grammars

    Get PDF
    Zwischen verschiedenen Artefakten, die Informationen teilen, wieder Konsistenz herzustellen, nachdem eines von ihnen geändert wurde, ist ein wichtiges Problem, das in verschiedenen Bereichen der Informatik auftaucht. Mit dieser Dissertation legen wir eine Lösung für das grundlegende Modellsynchronisationsproblem vor. Bei diesem Problem ist ein Paar solcher Artefakte (Modelle) gegeben, von denen eines geändert wurde; Aufgabe ist die Wiederherstellung der Konsistenz. Tripelgraphgrammatiken (TGGs) sind ein etablierter und geeigneter Formalismus, um dieses und verwandte Probleme anzugehen. Da sie auf der algebraischen Theorie der Graphtransformation und dem (Double-)Pushout Zugang zu Ersetzungssystemen basieren, sind sie besonders geeignet, um Lösungen zu entwickeln, deren Eigenschaften formal bewiesen werden können. Doch obwohl TGG-basierte Ansätze etabliert sind, leiden viele von ihnen unter dem Problem des Informationsverlustes. Wenn ein Modell geändert wurde, können während eines Synchronisationsprozesses Informationen verloren gehen, die nur im zweiten Modell vorliegen. Das liegt daran, dass solche Synchronisationsprozesse darauf zurückfallen Konsistenz dadurch wiederherzustellen, dass sie das geänderte Modell (bzw. große Teile von ihm) neu übersetzen. Wir schlagen einen TGG-basierten Ansatz vor, der fortgeschrittene Features von TGGs unterstützt (Attribute und negative Constraints), durchgängig formalisiert ist, implementiert und inkrementell in dem Sinne ist, dass er den Informationsverlust im Vergleich mit vorherigen Ansätzen drastisch reduziert. Bisher gibt es keinen TGG-basierten Ansatz mit vergleichbaren Eigenschaften. Zentraler Beitrag dieser Dissertation ist es, diesen Ansatz formal auszuarbeiten und seine wesentlichen Eigenschaften, nämlich Korrektheit, Vollständigkeit und Termination, zu beweisen. Die entscheidende neue Idee unseres Ansatzes ist es, Reparaturregeln anzuwenden. Dies sind spezielle Regeln, die es erlauben, Änderungen an einem Modell direkt zu propagieren anstatt auf Neuübersetzung zurückzugreifen. Um diese Reparaturregeln erstellen und anwenden zu können, entwickeln wir grundlegende Beiträge zur Theorie der algebraischen Graphtransformation. Zunächst entwickeln wir eine neue Art der sequentiellen Komposition von Regeln. Im Gegensatz zur gewöhnlichen Komposition, die zu Regeln führt, die Elemente löschen und dann wieder neu erzeugen, können wir Regeln herleiten, die solche Elemente stattdessen bewahren. Technisch gesehen findet der Synchronisationsprozess, den wir entwickeln, außerdem in der Kategorie der partiellen Tripelgraphen statt und nicht in der der normalen Tripelgraphen. Daher müssen wir sicherstellen, dass die für Double-Pushout-Ersetzungssysteme ausgearbeitete Theorie immer noch gültig ist. Dazu entwickeln wir eine (kategorientheoretische) Konstruktion neuer Kategorien aus gegebenen und zeigen, dass (i) diese Konstruktion die Axiome erhält, die nötig sind, um die Theorie für Double-Pushout-Ersetzungssysteme zu entwickeln, und (ii) partielle Tripelgraphen als eine solche Kategorie konstruiert werden können. Zusammen ermöglichen diese beiden grundsätzlichen Beiträge es uns, unsere Lösung für das grundlegende Modellsynchronisationsproblem vollständig formal auszuarbeiten und ihre zentralen Eigenschaften zu beweisen.Restoring consistency between different information-sharing artifacts after one of them has been changed is an important problem that arises in several areas of computer science. In this thesis, we provide a solution to the basic model synchronization problem. There, a pair of such artifacts (models), one of which has been changed, is given and consistency shall be restored. Triple graph grammars (TGGs) are an established and suitable formalism to address this and related problems. Being based on the algebraic theory of graph transformation and (double-)pushout rewriting, they are especially suited to develop solutions whose properties can be formally proven. Despite being established, many TGG-based solutions do not satisfactorily deal with the problem of information loss. When one model is changed, in the process of restoring consistency such solutions may lose information that is only present in the second model because the synchronization process resorts to restoring consistency by re-translating (large parts of) the updated model. We introduce a TGG-based approach that supports advanced features of TGGs (attributes and negative constraints), is comprehensively formalized, implemented, and is incremental in the sense that it drastically reduces the amount of information loss compared to former approaches. Up to now, a TGG-based approach with these characteristics is not available. The central contribution of this thesis is to formally develop that approach and to prove its essential properties, namely correctness, completeness, and termination. The crucial new idea in our approach is the use of repair rules, which are special rules that allow one to directly propagate changes from one model to the other instead of resorting to re-translation. To be able to construct and apply these repair rules, we contribute more fundamentally to the theory of algebraic graph transformation. First, we develop a new kind of sequential rule composition. Whereas the conventional composition of rules leads to rules that delete and re-create elements, we can compute rules that preserve such elements instead. Furthermore, technically the setting in which the synchronization process we develop takes place is the category of partial triple graphs and not the one of ordinary triple graphs. Hence, we have to ensure that the elaborate theory of double-pushout rewriting still applies. Therefore, we develop a (category-theoretic) construction of new categories from given ones and show that (i) this construction preserves the axioms that are necessary to develop the theory of double-pushout rewriting and (ii) partial triple graphs can be constructed as such a category. Together, those two more fundamental contributions enable us to develop our solution to the basic model synchronization problem in a fully formal manner and to prove its central properties

    Substructural Analysis Using Evolutionary Computing Techniques

    Get PDF
    Substructural analysis (SSA) was one of the very first machine learning techniques to be applied to chemoinformatics in the area of virtual screening. For this method, given a set of compounds typically defined by their fragment occurrence data (such as 2D fingerprints). The SSA computes weights for each of the fragments which outlines its contribution to the activity (or inactivity) of compounds containing that fragment. The overall probability of activity for a compound is then computed by summing up or combining the weights for the fragments present in the compound. A variety of weighting schemes based on specific relationship-bound equations are available for this purpose. This thesis identifies uplift to the effectiveness of SSA, using two evolutionary computation methods based on genetic traits, particularly the genetic algorithm (GA) and genetic programming (GP). Building on previous studies, it was possible to analyse and compare ten published SSA weighting schemes based on a simulated virtual screening experiment. The analysis showed the most effective weighting scheme to be the R4 equation which was a part of document-based weighting schemes. A second experiment was carried out to investigate the application of GA-based weighting scheme for the SSA in comparison to an experiment using the R4 weighting scheme. The GA algorithm is simple in concept focusing purely on suitable weight generation and effective in operation. The findings show that the GA-based SSA is superior to the R4-based SSA, both in terms of active compound retrieval rate and predictive performance. A third experiment investigated the genetic application via a GP-based SSA. Rigorous experiment results showed that the GP was found to be superior to the existing SSA weighting schemes. In general, however, the GP-based SSA was found to be less effective than the GA-based SSA. A final experimented is described in this thesis which sought to explore the feasibility of data fusion on both the GA and GP. It is a method producing a final ranking list from multiple sets of ranking lists, based on several fusion rules. The results indicate that data fusion is a good method to boost GA-and GP-based SSA searching. The RKP rule was considered the most effective fusion rule
    corecore