393 research outputs found
The Minimum Description Length Principle for Pattern Mining: A Survey
This is about the Minimum Description Length (MDL) principle applied to
pattern mining. The length of this description is kept to the minimum.
Mining patterns is a core task in data analysis and, beyond issues of
efficient enumeration, the selection of patterns constitutes a major challenge.
The MDL principle, a model selection method grounded in information theory, has
been applied to pattern mining with the aim to obtain compact high-quality sets
of patterns. After giving an outline of relevant concepts from information
theory and coding, as well as of work on the theory behind the MDL and similar
principles, we review MDL-based methods for mining various types of data and
patterns. Finally, we open a discussion on some issues regarding these methods,
and highlight currently active related data analysis problems
Recommended from our members
A Tool for Producing Verified, Explainable Proofs
Mathematicians are reluctant to use interactive theorem provers. In this thesis I argue that this is because proof assistants don't emphasise explanations of proofs; and that in order to produce good explanations, the system must create proofs in a manner that mimics how humans would create proofs. My research goals are to determine what constitutes a human-like proof and to represent human-like reasoning within an interactive theorem prover to create formalised, understandable proofs. Another goal is to produce a framework to visualise the goal states of this system.
To demonstrate this, I present HumanProof: a piece of software built for the Lean 3 theorem prover. It is used for interactively creating proofs that resemble how human mathematicians reason. The system provides a visual, hierarchical representation of the goal and a system for suggesting available inference rules. The system produces output in the form of both natural language and formal proof terms which are checked by Lean's kernel. This is made possible with the use of a structured goal state system which interfaces with Lean's tactic system which is detailed in Chapter 3.
In Chapter 4, I present the subtasks automation planning subsystem, which is used to produce equality proofs in a human-like fashion. The basic strategy of the subtasks system is break a given equality problem in to a hierarchy of tasks and then maintain a stack of these tasks in order to determine the order in which to apply equational rewriting moves. This process produces equality chains for simple problems without having to resort to brute force or specialised procedures such as normalisation. This makes proofs more human-like by breaking the problem into a hierarchical set of tasks in the same way that a human would.
To produce the interface for this software, I also created the ProofWidgets system for Lean 3. This system is detailed in Chapter 5. The ProofWidgets system uses Lean's metaprogramming framework to allow users to write their own interactive, web-based user interfaces to display within the VSCode editor and in an online web-editor. The entire tactic state is available to the rendering engine, and hence expression structure and types of subexpressions can be explored interactively. The ProofWidgets system also allows the user interface to interactively edit the proof document, enabling a truly interactive modality for creating proofs; human-like or not.
In Chapter 6, the system is evaluated by asking real mathematicians about the output of the system, and what it means for a proof to be understandable to them. The user group study asks participants to rank and comment on proofs created by HumanProof alongside natural language and pure Lean proofs. The study finds that participants generally prefer the HumanProof format over the Lean format. The verbal responses collected during the study indicate that providing intuition and signposting are the most important properties of a proof that aid understanding.EPSR
Front Matter - Soft Computing for Data Mining Applications
Efficient tools and algorithms for knowledge discovery in large data sets have been devised during the recent years. These methods exploit the capability of computers to search huge amounts of data in a fast and effective manner. However, the data to be analyzed is imprecise and afflicted with uncertainty. In the case of heterogeneous data sources such as text, audio and video, the data might moreover be ambiguous and partly conflicting. Besides, patterns and relationships of interest are usually vague and approximate. Thus, in order to make the information mining process more robust or say, human-like methods for searching and learning it requires tolerance towards imprecision, uncertainty and exceptions. Thus, they have approximate reasoning capabilities and are capable of handling partial truth. Properties of the aforementioned kind are typical soft computing. Soft computing techniques like Genetic
On the performance and programming of reversible molecular computers
If the 20th century was known for the computational revolution, what will the 21st be known for? Perhaps the recent strides in the nascent fields of molecular programming and biological computation will help bring about the âComing Era of Nanotechnologyâ promised in Drexlerâs âEngines of Creationâ. Though there is still far to go, there is much reason for optimism. This thesis examines the underlying principles needed to realise the computational aspects of such âenginesâ in a performant way. Its main body focusses on the ways in which thermodynamics constrains the operation and design of such systems, and it ends with the proposal of a model of computation appropriate for exploiting these constraints.
These thermodynamic constraints are approached from three different directions. The first considers the maximum possible aggregate performance of a system of computers of given volume, V, with a given supply of free energy. From this perspective, reversible computing is imperative in order to circumvent the Landauer limit. A result of Frank is refined and strengthened, showing that the adiabatic regime reversible computer performance is the best possible for any computerâquantum or classical. This therefore shows a universal scaling law governing the performance of compact computers of ~V^(5/6), compared to ~V^(2/3) for conventional computers. For the case of molecular computers, it is shown how to attain this bound. The second direction extends this performance analysis to the case where individual computational particles or sub-units can interact with one another. The third extends it to interactions with shared, non-computational parts of the system. It is found that accommodating these interactions in molecular computers imposes a performance penalty that undermines the earlier scaling result. Nonetheless, scaling superior to that of irreversible computers can be preserved, and appropriate mitigations and considerations are discussed. These analyses are framed in a context of molecular computation, but where possible more general computational systems are considered.
The proposed model, the ×-calculus, is appropriate for programming reversible molecular computers taking into account these constraints. A variety of examples and mathematical analyses accompany it. Moreover, abstract sketches of potential molecular implementations are provided. Developing these into viable schemes suitable for experimental validation will be a focus of future work
Application of Intermediate Multi-Agent Systems to Integrated Algorithmic Composition and Expressive Performance of Music
We investigate the properties of a new Multi-Agent Systems (MAS) for computer-aided composition called IPCS (pronounced âipp-sissâ) the Intermediate Performance Composition System which generates expressive performance as part of its compositional process, and produces emergent melodic structures by a novel multi-agent process. IPCS consists of a small-medium size (2 to 16) collection of agents in which each agent can perform monophonic tunes and learn monophonic tunes from other agents. Each agent has an affective state (an âartificial emotional stateâ) which affects how it performs the music to other agents; e.g. a âhappyâ agent will perform âhappierâ music. The agent performance not only involves compositional changes to the music, but also adds smaller changes based on expressive music performance algorithms for humanization. Every agent is initialized with a tune containing the same single note, and over the interaction period longer tunes are built through agent interaction. Agents will only learn tunes performed to them by other agents if the affective content of the tune is similar to their current affective state; learned tunes are concatenated to the end of their current tune. Each agent in the society learns its own growing tune during the interaction process. Agents develop âopinionsâ of other agents that perform to them, depending on how much the performing agent can help their tunes grow. These opinions affect who they interact with in the future. IPCS is not a mapping from multi-agent interaction onto musical features, but actually utilizes music for the agents to communicate emotions. In spite of the lack of explicit melodic intelligence in IPCS, the system is shown to generate non-trivial melody pitch sequences as a result of emotional communication between agents. The melodies also have a hierarchical structure based on the emergent social structure of the multi-agent system and the hierarchical structure is a result of the emerging agent social interaction structure. The interactive humanizations produce micro-timing and loudness deviations in the melody which are shown to express its hierarchical generative structure without the need for structural analysis software frequently used in computer music humanization
- âŚ