Search CORE

39,454 research outputs found

Beyond the storage capacity: data driven satisfiability transition

Author: Gherardi Marco
Pastore Mauro
Rotondo Pietro
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2020
Field of study

Data structure has a dramatic impact on the properties of neural networks, yet its significance in the established theoretical frameworks is poorly understood. Here we compute the Vapnik-Chervonenkis entropy of a kernel machine operating on data grouped into equally labelled subsets. At variance with the unstructured scenario, entropy is non-monotonic in the size of the training set, and displays an additional critical point besides the storage capacity. Remarkably, the same behavior occurs in margin classifiers even with randomly labelled data, as is elucidated by identifying the synaptic volume encoding the transition. These findings reveal aspects of expressivity lying beyond the condensed description provided by the storage capacity, and they indicate the path towards more realistic bounds for the generalization error of neural networks.Comment: 5 pages, 2 figure

arXiv.org e-Print Archive

Archivio istituzionale della Ricerca - Università degli Studi di Parma

AIR Universita degli studi di Milano

Counting the learnable functions of structured data

Author: Gherardi Marco
Lagomarsino Marco Cosentino
Rotondo Pietro
Publication venue: 'American Physical Society (APS)'
Publication date: 28/03/2019
Field of study

Cover's function counting theorem is a milestone in the theory of artificial neural networks. It provides an answer to the fundamental question of determining how many binary assignments (dichotomies) of

p

points in

n

dimensions can be linearly realized. Regrettably, it has proved hard to extend the same approach to more advanced problems than the classification of points. In particular, an emerging necessity is to find methods to deal with structured data, and specifically with non-pointlike patterns. A prominent case is that of invariant recognition, whereby identification of a stimulus is insensitive to irrelevant transformations on the inputs (such as rotations or changes in perspective in an image). An object is therefore represented by an extended perceptual manifold, consisting of inputs that are classified similarly. Here, we develop a function counting theory for structured data of this kind, by extending Cover's combinatorial technique, and we derive analytical expressions for the average number of dichotomies of generically correlated sets of patterns. As an application, we obtain a closed formula for the capacity of a binary classifier trained to distinguish general polytopes of any dimension. These results may help extend our theoretical understanding of generalization, feature extraction, and invariant object recognition by neural networks

arXiv.org e-Print Archive

Archivio istituzionale della Ricerca - Università degli Studi di Parma

AIR Universita degli studi di Milano

A study of hierarchical and flat classification of proteins

Author: Buchwald Fabian
Frank Eibe
Kramer Stefan
Zimek Arthur
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Automatic classification of proteins using machine learning is an important problem that has received significant attention in the literature. One feature of this problem is that expert-defined hierarchies of protein classes exist and can potentially be exploited to improve classification performance. In this article we investigate empirically whether this is the case for two such hierarchies. We compare multi-class classification techniques that exploit the information in those class hierarchies and those that do not, using logistic regression, decision trees, bagged decision trees, and support vector machines as the underlying base learners. In particular, we compare hierarchical and flat variants of ensembles of nested dichotomies. The latter have been shown to deliver strong classification performance in multi-class settings. We present experimental results for synthetic, fold recognition, enzyme classification, and remote homology detection data. Our results show that exploiting the class hierarchy improves performance on the synthetic data, but not in the case of the protein classification problems. Based on this we recommend that strong flat multi-class methods be used as a baseline to establish the benefit of exploiting class hierarchies in this area

Research Commons@Waikato

Overcoming Language Dichotomies: Toward Effective Program Comprehension for Mobile App Development

Author: Bavota G.
Gu T.
Gui J.
Jabbarvand R.
Lelli V.
Linares-Vásquez M.
Liu Y.
Moran K.
Palomba F.
Robillard M. P.
Wan M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/07/2018
Field of study

Mobile devices and platforms have become an established target for modern software developers due to performant hardware and a large and growing user base numbering in the billions. Despite their popularity, the software development process for mobile apps comes with a set of unique, domain-specific challenges rooted in program comprehension. Many of these challenges stem from developer difficulties in reasoning about different representations of a program, a phenomenon we define as a "language dichotomy". In this paper, we reflect upon the various language dichotomies that contribute to open problems in program comprehension and development for mobile apps. Furthermore, to help guide the research community towards effective solutions for these problems, we provide a roadmap of directions for future work.Comment: Invited Keynote Paper for the 26th IEEE/ACM International Conference on Program Comprehension (ICPC'18

arXiv.org e-Print Archive

Crossref

Getting On With Field Research Using Participant Deconstruction

Author: Brazil Victoria
Hibbert Paul
Middleton Stuart
Wright April L.
Publication venue: 'SAGE Publications'
Publication date: 01/04/2020
Field of study

Bond University Research Portal