12 research outputs found

    Hierarchical information clustering by means of topologically embedded graphs

    Get PDF
    We introduce a graph-theoretic approach to extract clusters and hierarchies in complex data-sets in an unsupervised and deterministic manner, without the use of any prior information. This is achieved by building topologically embedded networks containing the subset of most significant links and analyzing the network structure. For a planar embedding, this method provides both the intra-cluster hierarchy, which describes the way clusters are composed, and the inter-cluster hierarchy which describes how clusters gather together. We discuss performance, robustness and reliability of this method by first investigating several artificial data-sets, finding that it can outperform significantly other established approaches. Then we show that our method can successfully differentiate meaningful clusters and hierarchies in a variety of real data-sets. In particular, we find that the application to gene expression patterns of lymphoma samples uncovers biologically significant groups of genes which play key-roles in diagnosis, prognosis and treatment of some of the most relevant human lymphoid malignancies.Comment: 33 Pages, 18 Figures, 5 Table

    Searching for Macro-operators with Automatically Generated Heuristics

    No full text
    Abstract. Macro search is used to derive solutions quickly for large search spaces at the expense of optimality. We present a novel way of building macro tables. Our contribution is twofold: (1) for the first time, we use automatically generated heuristics to find optimal macros, (2) due to the speed-up achieved by (1), we merge consecutive subgoals to reduce the solution lengths. We use the Rubik’s Cube to demonstrate our techniques. For this puzzle, a 44 % improvement of the average solution length was achieved over macro tables built with previous techniques.

    PSVN: A Vector Representation for Production Systems

    No full text
    In this paper we present a production system which acts on fixed length vectors of labels. Our goal is to automatically generate heuristics to search the state space for shortest paths between states efficiently. The heuristic values which guide search in the state space are obtained by searching for the shortest path in an abstract space derived from the definition of the original space. In PSVN, a state is a fixed length vector of labels and abstractions are generated by simply mapping the set of labels to another smaller set of labels (domain abstraction). A domain abstraction on labels induces space preserves important properties of the original space while usually being significantly smaller in size. It is guaranteed that the shortest path between two states in the original space is at least as long as the shortest path between their images in the abstract space. Hence, such abstractions provide admissible heuristics for search algorithms such as A* and IDA*. The mapping of states and operators can be efficiently obtained by applying the domain map on the labels. We explore important properties of state spaces defined in PSVN and abstractions generated by domain maps. Despite its simplicity, PSVN is capable to define all finitely generated permutation groups and such benchmark problems as Rubik's Cube, the sliding-tile puzzles and the Blocks World

    Privacy in Data Mining Using Formal Methods

    No full text
    Abstract. There is growing public concern about personal data collected by both private and public sectors. People have very little control over what kinds of data are stored and how such data is used. Moreover, the ability to infer new knowledge from existing data is increasing rapidly with advances in database and data mining technologies. We describe a solution which allows people to take control by specifying constraints on the ways in which their data can be used. User constraints are represented in formal logic, and organizations that want to use this data provide formal proofs that the software they use to process data meets these constraints. Checking the proof by an independent verifier demonstrates that user constraints are (or are not) respected by this software. Our notion of “privacy correctness” differs from general software correctness in two ways. First, properties of interest are simpler and thus their proofs should be easier to automate. Second, this kind of correctness is stricter; in addition to showing a certain relation between input and output is realized, we must also show that only operations that respect privacy constraints are applied during execution. We have therefore an intensional notion of correctness, rather that the usual extensional one. We discuss how our mechanism can be put into practice, and we present the technical aspects via an example. Our example shows how users can exercise control when their data is to be used as input to a decision tree learning algorithm. We have formalized the example and the proof of preservation of privacy constraints in Coq.

    Experiments with automatically created memory-based heuristics

    No full text
    A memory-based heuristic is a function, h(s), stored in the form of a lookup table: h(s) is computed by mapping s to an index and then retrieving the corresponding entry in the table. In this paper we present a notation for describing state spaces, PSVN, and a method for automatically creating memory-based heuristics for a state space by abstracting its PSVN description. Two investigations of these automatically generated heuristics are presented. First, thousands of automatically generated heuristics are used to experimentally investigate the conjecture by Korf [4] that m t is a constant, where m is the size of a heuristic's lookup table and t is the number of nodes expanded when the heuristic is used to guide search. Second, a similar large-scale experiment isused to verify that the Korf and Reid's complexity analysis [5] can be used to rapidly and reliably choose the best among a given set of heuristics

    Hierarchical heuristic search revisited

    No full text
    Abstract. Pattern databases enable difficult search problems to be solved very quickly, but are large and time-consuming to build. They are therefore best suited to situations where many problem instances are to be solved, and less than ideal when only a few instances are to be solved. This paper examines a technique- hierarchical heuristic search-especially designed for the latter situation. The key idea is to compute, on demand, only those pattern database entries needed to solve a given problem instance. Our experiments show that Hierarchical IDA * can solve individual problems very quickly, up to two orders of magnitude faster than the time required to build an entire high-performance pattern database.

    Shape Semantics from Shape Context

    No full text
    Abstract. 3D models play an important role in many industrial applications. Therefore semantic processing for the purposes of comparing, cataloging and archiving shapes is a major concern. Most previous work considers comparisons based on the object’s overall geometry or in a reference frame which is computed from the object’s geometry alone, disregarding its context. There are also approaches which propose to match feature points to perform context alignment to better analyze a single element. In complex assemblies created in a CAD system, however, the parts (components and layers) are often explicitly marked and named and therefore the geometric context is evident. In this paper we show how this can be exploited with the aid of Knowledge Management tools to establish accurate frames of reference where the individual shapes can be better analyzed.

    Integrated Modeling of Shape Semantics for Industrial Design

    No full text
    Abstract — Modeling with 3D shapes plays a significant role in Industrial Design. Therefore semantic processing and interpretation of shapes and surfaces in CAD environments is an important problem. In this paper, we outline our architecture which integrates Knowledge Management and shape geometry for the purposes of cataloging, archiving and querying shapes. Unlike previous work, in our framework the textual annotation and geometry are closely integrated. We use domain knowledge encoded in a knowledge base to infer the geometric context of individual design elements and use this information to better analyze the corresponding shape. Our proposed architecture could be adapted in CAD applications where there is a well understood conceptual layering of the product assembly and the individual components are of similar geometry. I
    corecore