Search CORE

11 research outputs found

Effective Classification using a small Training Set based on Discretization and Statistical Analysis

Author: BRUNI Renato
G. Bianchi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

This work deals with the problem of producing a fast and accurate data classification, learning it from a possibly small set of records that are already classified. The proposed approach is based on the framework of the so-called Logical Analysis of Data (LAD), but enriched with information obtained from statistical considerations on the data. A number of discrete optimization problems are solved in the different steps of the procedure, but their computational demand can be controlled. The accuracy of the proposed approach is compared to that of the standard LAD algorithm, of Support Vector Machines and of Label Propagation algorithm on publicly available datasets of the UCI repository. Encouraging results are obtained and discusse

Archivio della ricerca- Università di Roma La Sapienza

Logical analysis of data as a tool for the analysis of probabilistic discrete choice behavior

Author: Bianchi Gianpiero
Bruni Renato
Dolente Cosimo
Leporelli Claudio
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Probabilistic Discrete Choice Models (PDCM) have been extensively used to interpret the behavior of heterogeneous decision makers that face discrete alternatives. The classification approach of Logical Analysis of Data (LAD) uses discrete optimization to generate patterns, which are logic formulas characterizing the different classes. Patterns can be seen as rules explaining the phenomenon under analysis. In this work we discuss how LAD can be used as the first phase of the specification of PDCM. Since in this task the number of patterns generated may be extremely large, and many of them may be nearly equivalent, additional processing is necessary to obtain practically meaningful information. Hence, we propose computationally viable techniques to obtain small sets of patterns that constitute meaningful representations of the phenomenon and allow to discover significant associations between subsets of explanatory variables and the output. We consider the complex socio-economic problem of the analysis of the utilization of the Internet in Italy, using real data gathered by the Italian National Institute of Statistics

Archivio della ricerca- Università di Roma La Sapienza

Reformulation of the support set selection problem in the logical analysis of data

Author: A. Schrijver
C.E. Brodley
D.J. Hand
E. Boros
E. Boros
G.L. Nemhauser
H. Almuallim
ILOG Cplex 8.0
J.R. Quinlan
M.L. Fisher
P.E. Utgoff
Renato Bruni
T. Hastie
T.M. Mitchell
Y. Crama
Y.J. Lee
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Near-optimal supervised feature selection among frequent subgraphs

Author
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date
Field of study

Crossref

Optimization techniques for data mining and information reconstruction

Author: Bianchi Gianpiero
Publication venue
Publication date: 18/11/2013
Field of study

Archivio della ricerca- Università di Roma La Sapienza

Discriminative frequent subgraph mining with optimality guarantees

Author: Agrawal
Bairoch
Borgelt
Borgelt
Boros
Bringmann
Dash
Deshpande
Dobson
Ein-Dor
Huan
Inokuchi
Jin
Kubinyi
Kuramochi
Nemhauser
Nijssen
Przulj
Radivojac
Saigo
Saigo
van 't Veer
Vanetik
Wernicke
Yan
Yang
Zimmermann
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

Determining Geographical Casual Relationships through the Development of Spatial Cluster Detection and Feature Selection Techniques

Author: Jarvis Paul
Publication venue
Publication date: 01/01/2006
Field of study

University of South Wales Research Explorer

Developed Algorithms for Maximum Pattern Generation in Logical Analysis of Data

Author: Tagarian Sara
Publication venue
Publication date: 01/12/2016
Field of study

RÉSUMÉ : Les données sont au coeur des industries et des organisations. Beaucoup d’entreprises possèdent de grandes quantités de données mais échouent à en tirer un bénéfice conséquent, bien souvent parce que ces données ne sont pas utilisées de façon productive. Il est indispensable de prendre des décisions importantes au bon moment, en utilisant des outils adaptés permettant d’extraire de l’information pratique et fiable de grandes quantités de données. Avec l’augmentation de la quantité et de la variété des données, le recours aux outils traditionnels facultatifs a été abandonné alors que l’importance de fournir des méthodes efficaces et prometteuses pour l’analyse de données se fait grandissante. La classification de données est l’un des moyens de répondre à ce besoin d’analyse de données. L’analyse Logique de Données (LAD : Logical Analysis of Data) est une nouvelle méthodologie d’analyse de données. Cette méthodologie qui combine l’optimisation, l’analyse combinatoire et la logique booléenne, est applicable pour le problème de classification des données. Son but est de trouver des motifs logiques cachés qui séparent les observations d’une certaine classe de toutes les autres observations. Ces motifs sont les blocs de base de l’Analyse Logique de Données dont l’objectif principal est de choisir un ensemble de motifs capable de classifier correctement des observations. La précision d’un modèle mesure à quel point cet objectif est atteint par le modèle. Dans ce projet de recherche, on s’intéresse à un type particulier de motifs appelé α-motif « α-pattern ». Ce type de motif permet de construire des modèles de classification LAD de très grande précision. En dépit du grand nombre de méthodologies existantes pour générer des α-motifs maximaux, il n’existe pas encore de méta-heuristique adressant ce problème. Le but de ce projet de recherche est donc de développer une méta-heuristique pour résoudre le problème des α-motifs maximaux. Cette méta-heuristique devra être efficace en termes de temps de résolution et aussi en termes de précision des motifs générés. Afin de satisfaire les deux exigences citées plus haut, notre choix s’est porté sur le recuit simulé. Nous avons utilisé le recuit simulé pour générer des α-motifs maximaux avec une approche différente de celle pratiquée dans le modèle BLA. La performance de l’algorithme développé est évaluée dans la suite. Les résultats du test statistique de Friedman montrent que notre algorithme possède les meilleures performances en termes de temps de résolution. De plus, pour ce qui est de la précision, celle fournie par notre algorithme est comparable à celles des autres méthodes. Notre précision possède par ailleurs de forts niveaux de confiance statistiques.----------ABSTRACT : Data is the heart of any industry or organization. Most of the companies are gifted with a large amount of data but they often fail to gain valuable insight from it, which is often because they cannot use their data productively. It is crucial to make essential and on-time decisions by using adapted tools to find applicable and accurate information from large amount of data. By increasing the amount and variety of data, the use of facultative traditional methods, were abolished and the importance of providing efficient and fruitful methods to analyze the data is growing. Data classification is one of the ways to fulfill this need of data analysis. Logical Analysis of Data is a methodology to analyze the data. This methodology, the combination of optimization, combinatorics and Boolean logic, is applicable for classification problems. Its aim is to discover hidden logical patterns that differentiate observations pertaining to one class from all of the other observations. Patterns are the key building blocks in LAD. Choosing a set of patterns that is capable of classifying observations correctly is the essential goal of LAD. Accuracy represents how successfully this goal is met. In this research study, one specific kind of pattern, called maximum α-pattern, is considered. This particular pattern helps building highly accurate LAD classification models. In spite of various presented methodologies to generate maximum α-pattern there is not yet any developed meta-heuristic algorithm. This research study is presented here with the objective of developing a meta-heuristic algorithm generating maximum α-patterns that are effective both in terms of computational time and accuracy. This study proposes a computationally efficient and accurate meta-heuristic algorithm based on the Simulated Annealing approach. The aim of the developed algorithm is to generate maximum α-patterns in a way that differs from the best linear approximation model proposed in the literature. Later, the performance of the new algorithm is evaluated. The results of the statistical Friedman test shows that the algorithm developed here has the best performance in terms of computational time. Moreover, its performance in terms of accuracy is competitive to other methods with, statistically speaking, high levels of confidence

PolyPublie

Similarity search applications in medical images

Author: Petri Marisa
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 17/01/2012
Field of study

Digitale Hochschulschriften der LMU