6,101 research outputs found
Substructure Discovery Using Minimum Description Length and Background Knowledge
The ability to identify interesting and repetitive substructures is an
essential component to discovering knowledge in structural data. We describe a
new version of our SUBDUE substructure discovery system based on the minimum
description length principle. The SUBDUE system discovers substructures that
compress the original data and represent structural concepts in the data. By
replacing previously-discovered substructures in the data, multiple passes of
SUBDUE produce a hierarchical description of the structural regularities in the
data. SUBDUE uses a computationally-bounded inexact graph match that identifies
similar, but not identical, instances of a substructure and finds an
approximate measure of closeness of two substructures when under computational
constraints. In addition to the minimum description length principle, other
background knowledge can be used by SUBDUE to guide the search towards more
appropriate substructures. Experiments in a variety of domains demonstrate
SUBDUE's ability to find substructures capable of compressing the original data
and to discover structural concepts important to the domain. Description of
Online Appendix: This is a compressed tar file containing the SUBDUE discovery
system, written in C. The program accepts as input databases represented in
graph form, and will output discovered substructures with their corresponding
value.Comment: See http://www.jair.org/ for an online appendix and other files
accompanying this articl
Graph ambiguity
In this paper, we propose a rigorous way to define the concept of ambiguity in the domain of graphs. In past studies, the classical definition of ambiguity has been derived starting from fuzzy set and fuzzy information theories. Our aim is to show that also in the domain of the graphs it is possible to derive a formulation able to capture the same semantic and mathematical concept. To strengthen the theoretical results, we discuss the application of the graph ambiguity concept to the graph classification setting, conceiving a new kind of inexact graph matching procedure. The results prove that the graph ambiguity concept is a characterizing and discriminative property of graphs. (C) 2013 Elsevier B.V. All rights reserved
Recommended from our members
Discovering Services during Service-Based System Design Using UML
Recently, there has been a proliferation of service-based systems, i.e., software systems that are composed of autonomous services but can also use software code. In order to support the development of these systems, it is necessary to have new methods, processes, and tools. In this paper, we describe a UML-based framework to assist with the development of service-based systems. The framework adopts an iterative process in which software services that can provide functional and nonfunctional characteristics of a system being developed are discovered, and the identified services are used to reformulate the design models of the system. The framework uses a query language to represent structural, behavioral, and quality characteristics of services to be identified, and a query processor to match the queries against service registries. The matching process is based on distance measurements between the queries and service specifications. A prototype tool has been implemented. The work has been evaluated in terms of recall, precision, and performance measurements
Galton's Error and the Under-Representation of Systematic Risk
Our methodology of 'complete identification,' using simple algebraic geometry, throws new light on the continued commitment of Galton's Error in finance and the resulting misinformation of investors. Mutual funds conventionally advertise their relative systematic market risk, or 'betas,' to potential investors based on incomplete measurement by unidirectional bivariate projections: they commit Galton's Error by under-representing their systematic risk. Consequently, far too many mutual funds are marketed as 'defensive' and too few as 'aggressive.' Using our new methodology we found that, out of a total of 3,217 mutual funds, 2,047 funds (63.7%) claimed to be defensive based on the current industry standard methodology, but only 608 (18.9%) actually are. This under-representation of systematic risk leads to inefficiencies in the capital allocation process, since biased betas lead to mis-pricing of mutual funds. Our complete bivariate projection produces a correct representation of the epistemic uncertainty inherent in the bivariate measurement of relative market risk. Our conclusions have also serious consequences for the proper 'bench-marking' and recent regulatory proposals for the mutual funds industry.
Finding Near-Optimal Independent Sets at Scale
The independent set problem is NP-hard and particularly difficult to solve in
large sparse graphs. In this work, we develop an advanced evolutionary
algorithm, which incorporates kernelization techniques to compute large
independent sets in huge sparse networks. A recent exact algorithm has shown
that large networks can be solved exactly by employing a branch-and-reduce
technique that recursively kernelizes the graph and performs branching.
However, one major drawback of their algorithm is that, for huge graphs,
branching still can take exponential time. To avoid this problem, we
recursively choose vertices that are likely to be in a large independent set
(using an evolutionary approach), then further kernelize the graph. We show
that identifying and removing vertices likely to be in large independent sets
opens up the reduction space---which not only speeds up the computation of
large independent sets drastically, but also enables us to compute high-quality
independent sets on much larger instances than previously reported in the
literature.Comment: 17 pages, 1 figure, 8 tables. arXiv admin note: text overlap with
arXiv:1502.0168
Ontology-based composition and matching for dynamic cloud service coordination
Recent cross-organisational software service offerings, such as cloud computing, create higher integration needs.
In particular, services are combined through brokers and mediators, solutions to allow individual services to collaborate and their interaction to be coordinated are required. The need to address dynamic management - caused by cloud and on-demand environments - can be addressed through service coordination based on ontology-based composition and matching techniques. Our solution to composition and matching utilises a service coordination space that acts as a passive infrastructure for collaboration where users submit requests that are then selected and taken on by providers. We discuss the information models and the coordination principles of such a collaboration environment in terms of an ontology and its underlying description logics. We provide ontology-based solutions for structural composition of descriptions and matching between requested and provided services
Technology Mapping for Circuit Optimization Using Content-Addressable Memory
The growing complexity of Field Programmable Gate Arrays (FPGA's) is leading to architectures with high input cardinality look-up tables (LUT's). This thesis describes a methodology for area-minimizing technology mapping for combinational logic, specifically designed for such FPGA architectures. This methodology, called LURU, leverages the parallel search capabilities of Content-Addressable Memories (CAM's) to outperform traditional mapping algorithms in both execution time and quality of results. The LURU algorithm is fundamentally different from other techniques for technology mapping in that LURU uses textual string representations of circuit topology in order to efficiently store and search for circuit patterns in a CAM. A circuit is mapped to the target LUT technology using both exact and inexact string matching techniques. Common subcircuit expressions (CSE's) are also identified and used for architectural optimization---a small set of CSE's is shown to effectively cover an average of 96% of the test circuits. LURU was tested with the ISCAS'85 suite of combinational benchmark circuits and compared with the mapping algorithms FlowMap and CutMap. The area reduction shown by LURU is, on average, 20% better compared to FlowMap and CutMap. The asymptotic runtime complexity of LURU is shown to be better than that of both FlowMap and CutMap
- …