Search CORE

393 research outputs found

The Minimum Description Length Principle for Pattern Mining: A Survey

Author: Galbrun Esther
Publication venue
Publication date: 28/07/2021
Field of study

This is about the Minimum Description Length (MDL) principle applied to pattern mining. The length of this description is kept to the minimum. Mining patterns is a core task in data analysis and, beyond issues of efficient enumeration, the selection of patterns constitutes a major challenge. The MDL principle, a model selection method grounded in information theory, has been applied to pattern mining with the aim to obtain compact high-quality sets of patterns. After giving an outline of relevant concepts from information theory and coding, as well as of work on the theory behind the MDL and similar principles, we review MDL-based methods for mining various types of data and patterns. Finally, we open a discussion on some issues regarding these methods, and highlight currently active related data analysis problems

arXiv.org e-Print Archive

Recommended from our members

A Tool for Producing Verified, Explainable Proofs

Author: Ayers Edward
Publication venue: University of Cambridge
Publication date: 25/02/2022
Field of study

Mathematicians are reluctant to use interactive theorem provers. In this thesis I argue that this is because proof assistants don't emphasise explanations of proofs; and that in order to produce good explanations, the system must create proofs in a manner that mimics how humans would create proofs. My research goals are to determine what constitutes a human-like proof and to represent human-like reasoning within an interactive theorem prover to create formalised, understandable proofs. Another goal is to produce a framework to visualise the goal states of this system. To demonstrate this, I present HumanProof: a piece of software built for the Lean 3 theorem prover. It is used for interactively creating proofs that resemble how human mathematicians reason. The system provides a visual, hierarchical representation of the goal and a system for suggesting available inference rules. The system produces output in the form of both natural language and formal proof terms which are checked by Lean's kernel. This is made possible with the use of a structured goal state system which interfaces with Lean's tactic system which is detailed in Chapter 3. In Chapter 4, I present the subtasks automation planning subsystem, which is used to produce equality proofs in a human-like fashion. The basic strategy of the subtasks system is break a given equality problem in to a hierarchy of tasks and then maintain a stack of these tasks in order to determine the order in which to apply equational rewriting moves. This process produces equality chains for simple problems without having to resort to brute force or specialised procedures such as normalisation. This makes proofs more human-like by breaking the problem into a hierarchical set of tasks in the same way that a human would. To produce the interface for this software, I also created the ProofWidgets system for Lean 3. This system is detailed in Chapter 5. The ProofWidgets system uses Lean's metaprogramming framework to allow users to write their own interactive, web-based user interfaces to display within the VSCode editor and in an online web-editor. The entire tactic state is available to the rendering engine, and hence expression structure and types of subexpressions can be explored interactively. The ProofWidgets system also allows the user interface to interactively edit the proof document, enabling a truly interactive modality for creating proofs; human-like or not. In Chapter 6, the system is evaluated by asking real mathematicians about the output of the system, and what it means for a proof to be understandable to them. The user group study asks participants to rank and comment on proofs created by HumanProof alongside natural language and pure Lean proofs. The study finds that participants generally prefer the HumanProof format over the Lean format. The verbal responses collected during the study indicate that providing intuition and signposting are the most important properties of a proof that aid understanding.EPSR

Apollo (Cambridge)

29th International Symposium on Theoretical Aspects of Computer Science: STACS '12, February 29th to March 3rd, 2012, Paris, France

Author: Schloss Dagstuhl Leibniz-Zentrum für Informatik
Publication venue: Schloss Dagstuhl - Leibniz-Center for Informatics
Publication date: 01/02/2012
Field of study

Digitale Bibliothek Thüringen

Front Matter - Soft Computing for Data Mining Applications

Author: Patnaik L.M.
Srinivasa K.G.
Venugopal K.R.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Efficient tools and algorithms for knowledge discovery in large data sets have been devised during the recent years. These methods exploit the capability of computers to search huge amounts of data in a fast and effective manner. However, the data to be analyzed is imprecise and afflicted with uncertainty. In the case of heterogeneous data sources such as text, audio and video, the data might moreover be ambiguous and partly conflicting. Besides, patterns and relationships of interest are usually vague and approximate. Thus, in order to make the information mining process more robust or say, human-like methods for searching and learning it requires tolerance towards imprecision, uncertainty and exceptions. Thus, they have approximate reasoning capabilities and are capable of handling partial truth. Properties of the aforementioned kind are typical soft computing. Soft computing techniques like Genetic

ePrints@Bangalore University

31th International Symposium on Theoretical Aspects of Computer Science: STACS '14, March 5th to March 8th, 2014, Lyon, France

Author: STACS <31 2014, Lyon>
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/03/2014
Field of study

Digitale Bibliothek Thüringen

On the performance and programming of reversible molecular computers

Author: Earley Amelie Hannah
Publication venue: University of Cambridge
Publication date: 18/12/2021
Field of study

If the 20th century was known for the computational revolution, what will the 21st be known for? Perhaps the recent strides in the nascent fields of molecular programming and biological computation will help bring about the ‘Coming Era of Nanotechnology’ promised in Drexler’s ‘Engines of Creation’. Though there is still far to go, there is much reason for optimism. This thesis examines the underlying principles needed to realise the computational aspects of such ‘engines’ in a performant way. Its main body focusses on the ways in which thermodynamics constrains the operation and design of such systems, and it ends with the proposal of a model of computation appropriate for exploiting these constraints. These thermodynamic constraints are approached from three different directions. The first considers the maximum possible aggregate performance of a system of computers of given volume, V, with a given supply of free energy. From this perspective, reversible computing is imperative in order to circumvent the Landauer limit. A result of Frank is refined and strengthened, showing that the adiabatic regime reversible computer performance is the best possible for any computer—quantum or classical. This therefore shows a universal scaling law governing the performance of compact computers of ~V^(5/6), compared to ~V^(2/3) for conventional computers. For the case of molecular computers, it is shown how to attain this bound. The second direction extends this performance analysis to the case where individual computational particles or sub-units can interact with one another. The third extends it to interactions with shared, non-computational parts of the system. It is found that accommodating these interactions in molecular computers imposes a performance penalty that undermines the earlier scaling result. Nonetheless, scaling superior to that of irreversible computers can be preserved, and appropriate mitigations and considerations are discussed. These analyses are framed in a context of molecular computation, but where possible more general computational systems are considered. The proposed model, the א-calculus, is appropriate for programming reversible molecular computers taking into account these constraints. A variety of examples and mathematical analyses accompany it. Moreover, abstract sketches of potential molecular implementations are provided. Developing these into viable schemes suitable for experimental validation will be a focus of future work

arXiv.org e-Print Archive

Apollo (Cambridge)

Application of Intermediate Multi-Agent Systems to Integrated Algorithmic Composition and Expressive Performance of Music

Author: Kirke Alexis
Publication venue: 'University of Plymouth'
Publication date: 01/01/2011
Field of study

We investigate the properties of a new Multi-Agent Systems (MAS) for computer-aided composition called IPCS (pronounced “ipp-siss”) the Intermediate Performance Composition System which generates expressive performance as part of its compositional process, and produces emergent melodic structures by a novel multi-agent process. IPCS consists of a small-medium size (2 to 16) collection of agents in which each agent can perform monophonic tunes and learn monophonic tunes from other agents. Each agent has an affective state (an “artificial emotional state”) which affects how it performs the music to other agents; e.g. a “happy” agent will perform “happier” music. The agent performance not only involves compositional changes to the music, but also adds smaller changes based on expressive music performance algorithms for humanization. Every agent is initialized with a tune containing the same single note, and over the interaction period longer tunes are built through agent interaction. Agents will only learn tunes performed to them by other agents if the affective content of the tune is similar to their current affective state; learned tunes are concatenated to the end of their current tune. Each agent in the society learns its own growing tune during the interaction process. Agents develop “opinions” of other agents that perform to them, depending on how much the performing agent can help their tunes grow. These opinions affect who they interact with in the future. IPCS is not a mapping from multi-agent interaction onto musical features, but actually utilizes music for the agents to communicate emotions. In spite of the lack of explicit melodic intelligence in IPCS, the system is shown to generate non-trivial melody pitch sequences as a result of emotional communication between agents. The melodies also have a hierarchical structure based on the emergent social structure of the multi-agent system and the hierarchical structure is a result of the emerging agent social interaction structure. The interactive humanizations produce micro-timing and loudness deviations in the melody which are shown to express its hierarchical generative structure without the need for structural analysis software frequently used in computer music humanization

Plymouth Electronic Archive and Research Library