Search CORE

13,502 research outputs found

Subjectively Interesting Subgroup Discovery on Real-valued Targets

Author: De Bie Tijl
Duivesteijn Wouter
Kang Bo
Lijffijt Jefrey
Oikarinen Emilia
Puolamäki Kai
Publication venue
Publication date: 01/01/2017
Field of study

Deriving insights from high-dimensional data is one of the core problems in data mining. The difficulty mainly stems from the fact that there are exponentially many variable combinations to potentially consider, and there are infinitely many if we consider weighted combinations, even for linear combinations. Hence, an obvious question is whether we can automate the search for interesting patterns and visualizations. In this paper, we consider the setting where a user wants to learn as efficiently as possible about real-valued attributes. For example, to understand the distribution of crime rates in different geographic areas in terms of other (numerical, ordinal and/or categorical) variables that describe the areas. We introduce a method to find subgroups in the data that are maximally informative (in the formal Information Theoretic sense) with respect to a single or set of real-valued target attributes. The subgroup descriptions are in terms of a succinct set of arbitrarily-typed other attributes. The approach is based on the Subjective Interestingness framework FORSIED to enable the use of prior knowledge when finding most informative non-redundant patterns, and hence the method also supports iterative data mining.Comment: 12 pages, 10 figures, 2 tables, conference submissio

arXiv.org e-Print Archive

Repository TU/e

Crossref

Pure OAI Repository

Ghent University Academic Bibliography

Aaltodoc Publication Archive

Codimension-two bifurcations in animal aggregation models with symmetry

Author: Hillen T.
Pietro-Luciano Buono
Raluca Eftimie
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2014
Field of study

Crossref

University of Dundee Online Publications

Finding Statistically Significant Interactions between Continuous Features

Author: Borgwardt Karsten
Sugiyama Mahito
Publication venue
Publication date: 10/05/2019
Field of study

The search for higher-order feature interactions that are statistically significantly associated with a class variable is of high relevance in fields such as Genetics or Healthcare, but the combinatorial explosion of the candidate space makes this problem extremely challenging in terms of computational efficiency and proper correction for multiple testing. While recent progress has been made regarding this challenge for binary features, we here present the first solution for continuous features. We propose an algorithm which overcomes the combinatorial explosion of the search space of higher-order interactions by deriving a lower bound on the p-value for each interaction, which enables us to massively prune interactions that can never reach significance and to thereby gain more statistical power. In our experiments, our approach efficiently detects all significant interactions in a variety of synthetic and real-world datasets.Comment: 13 pages, 5 figures, 2 tables, accepted to the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019

arXiv.org e-Print Archive

Crossref

Anytime Subgroup Discovery in Numerical Domains with Guarantees

Author: Belfodil Adnene
Belfodil Aimene
Kaytoue Mehdi
Publication venue: HAL CCSD
Publication date: 10/09/2018
Field of study

International audienceSubgroup discovery is the task of discovering patterns that accurately discriminate a class label from the others. Existing approaches can uncover such patterns either through an exhaustive or an approximate exploration of the pattern search space. However, an exhaustive exploration is generally unfeasible whereas approximate approaches do not provide guarantees bounding the error of the best pattern quality nor the exploration progression ("How far are we of an exhaustive search"). We design here an algorithm for mining numerical data with three key properties w.r.t. the state of the art: (i) It yields progressively interval patterns whose quality improves over time; (ii) It can be interrupted anytime and always gives a guarantee bounding the error on the top pattern quality and (iii) It always bounds a distance to the exhaustive exploration. After reporting experimentations showing the effectiveness of our method, we discuss its generalization to other kinds of patterns

Report on the Second Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE2)

Author: Adams
Boettiger
Bourque
Chamberlain
Chue Hong
Chue Hong
Clune
Crusoe
Downs
Dubey
Dubey
Gil
Habermann
Hanwell
Hook
Hook
Howison
Howison
Katz
Kelley
Lenhardt
Marker
Patra
Piccolo
Pierce
Rosado de Souza
Slaughter
Venters
Publication venue: 'Ubiquity Press, Ltd.'
Publication date: 08/07/2015
Field of study

This technical report records and discusses the Second Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE2). The report includes a description of the alternative, experimental submission and review process, two workshop keynote presentations, a series of lightning talks, a discussion on sustainability, and five discussions from the topic areas of exploring sustainability; software development experiences; credit & incentives; reproducibility & reuse & sharing; and code testing & code review. For each topic, the report includes a list of tangible actions that were proposed and that would lead to potential change. The workshop recognized that reliance on scientific software is pervasive in all areas of world-leading research today. The workshop participants then proceeded to explore different perspectives on the concept of sustainability. Key enablers and barriers of sustainable scientific software were identified from their experiences. In addition, recommendations with new requirements such as software credit files and software prize frameworks were outlined for improving practices in sustainable software engineering. There was also broad consensus that formal training in software development or engineering was rare among the practitioners. Significant strides need to be made in building a sense of community via training in software and technical practices, on increasing their size and scope, and on better integrating them directly into graduate education programs. Finally, journals can define and publish policies to improve reproducibility, whereas reviewers can insist that authors provide sufficient information and access to data and software to allow them reproduce the results in the paper. Hence a list of criteria is compiled for journals to provide to reviewers so as to make it easier to review software submitted for publication as a “Software Paper.

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

University of Huddersfield Repository

Large sets of consecutive Maass forms and fluctuations in the Weyl remainder

Author: Then Holger
Publication venue
Publication date: 13/12/2012
Field of study

We explore an algorithm which systematically finds all discrete eigenvalues of an analytic eigenvalue problem. The algorithm is more simple and elementary as could be expected before. It consists of Hejhal's identity, linearisation, and Turing bounds. Using the algorithm, we compute more than one hundredsixty thousand consecutive eigenvalues of the Laplacian on the modular surface, and investigate the asymptotic and statistic properties of the fluctuations in the Weyl remainder. We summarize the findings in two conjectures. One is on the maximum size of the Weyl remainder, and the other is on the distribution of a suitably scaled version of the Weyl remainder.Comment: A version with higher resolution figures can be downloaded from http://www.maths.bris.ac.uk/~mahlt/research/T2012a.pd

arXiv.org e-Print Archive

CiteSeerX

Explore Bristol Research