282,938 research outputs found

    Organising the knowledge space for software components

    Get PDF
    Software development has become a distributed, collaborative process based on the assembly of off-the-shelf and purpose-built components. The selection of software components from component repositories and the development of components for these repositories requires an accessible information infrastructure that allows the description and comparison of these components. General knowledge relating to software development is equally important in this context as knowledge concerning the application domain of the software. Both form two pillars on which the structural and behavioural properties of software components can be addressed. Form, effect, and intention are the essential aspects of process-based knowledge representation with behaviour as a primary property. We investigate how this information space for software components can be organised in order to facilitate the required taxonomy, thesaurus, conceptual model, and logical framework functions. Focal point is an axiomatised ontology that, in addition to the usual static view on knowledge, also intrinsically addresses the dynamics, i.e. the behaviour of software. Modal logics are central here – providing a bridge between classical (static) knowledge representation approaches and behaviour and process description and classification. We relate our discussion to the Web context, looking at Web services as components and the Semantic Web as the knowledge representation framewor

    Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure

    Full text link
    Big data research has attracted great attention in science, technology, industry and society. It is developing with the evolving scientific paradigm, the fourth industrial revolution, and the transformational innovation of technologies. However, its nature and fundamental challenge have not been recognized, and its own methodology has not been formed. This paper explores and answers the following questions: What is big data? What are the basic methods for representing, managing and analyzing big data? What is the relationship between big data and knowledge? Can we find a mapping from big data into knowledge space? What kind of infrastructure is required to support not only big data management and analysis but also knowledge discovery, sharing and management? What is the relationship between big data and science paradigm? What is the nature and fundamental challenge of big data computing? A multi-dimensional perspective is presented toward a methodology of big data computing.Comment: 59 page

    Real-time analytics for complex structure data

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.The advancement of data acquisition and analysis technology has resulted in many real-world data being dynamic and containing rich content and structured information. More specifically, with the fast development of information technology, many current real-world data are always featured with dynamic changes, such as new instances, new nodes and edges, and modifications to the node content. Different from traditional data, which are represented as feature vectors, data with complex relationships are often represented as graphs to denote the content of the data entries and their structural relationships, where instances (nodes) are not only characterized by the content but are also subject to dependency relationships. Plus, real-time availability is one of outstanding features of today’s data. Real-time analytics is dynamic analysis and reporting based on data entered into a system before the actual time of use. Real-time analytics emphasizes on deriving immediate knowledge from dynamic data sources, such as data streams, and knowledge discovery and pattern mining are facing complex, dynamic data sources. However, how to combine structure information and node content information for accurate and real-time data mining is still a big challenge. Accordingly, this thesis focuses on real-time analytics for complex structure data. We explore instance correlation in complex structure data and utilises it to make mining tasks more accurate and applicable. To be specific, our objective is to combine node correlation with node content and utilize them for three different tasks, including (1) graph stream classification, (2) super-graph classification and clustering, and (3) streaming network node classification. Understanding the role of structured patterns for graph classification: the thesis introduces existing works on data mining from an complex structured perspective. Then we propose a graph factorization-based fine-grained representation model, where the main objective is to use linear combinations of a set of discriminative cliques to represent graphs for learning. The optimization-oriented factorization approach ensures minimum information loss for graph representation, and also avoids the expensive sub-graph isomorphism validation process. Based on this idea, we propose a novel framework for fast graph stream classification. A new structure data classification algorithm: The second method introduces a new super-graph classification and clustering problem. Due to the inherent complex structure representation, all existing graph classification methods cannot be applied to super-graph classification. In the thesis, we propose a weighted random walk kernel which calculates the similarity between two super-graphs by assessing (a) the similarity between super-nodes of the super-graphs, and (b) the common walks of the super-graphs. Our key contribution is: (1) a new super-node and super-graph structure to enrich existing graph representation for real-world applications; (2) a weighted random walk kernel considering node and structure similarities between graphs; (3) a mixed-similarity considering structured content inside super-nodes and structural dependency between super-nodes; and (4) an effective kernel-based super-graph classification method with sound theoretical basis. Empirical studies show that the proposed methods significantly outperform the state-of-the-art methods. Real-time analytics framework for dynamic complex structure data: For streaming networks, the essential challenge is to properly capture the dynamic evolution of the node content and node interactions in order to support node classification. While streaming networks are dynamically evolving, for a short temporal period, a subset of salient features are essentially tied to the network content and structures, and therefore can be used to characterize the network for classification. To achieve this goal, we propose to carry out streaming network feature selection (SNF) from the network, and use selected features as gauge to classify unlabeled nodes. A Laplacian based quality criterion is proposed to guide the node classification, where the Laplacian matrix is generated based on node labels and network topology structures. Node classification is achieved by finding the class label that results in the minimal gauging value with respect to the selected features. By frequently updating the features selected from the network, node classification can quickly adapt to the changes in the network for maximal performance gain. Experiments and comparisons on real-world networks demonstrate that SNOC is able to capture dynamics in the network structures and node content, and outperforms baseline approaches with significant performance gain

    Ontology of core data mining entities

    Get PDF
    In this article, we present OntoDM-core, an ontology of core data mining entities. OntoDM-core defines themost essential datamining entities in a three-layered ontological structure comprising of a specification, an implementation and an application layer. It provides a representational framework for the description of mining structured data, and in addition provides taxonomies of datasets, data mining tasks, generalizations, data mining algorithms and constraints, based on the type of data. OntoDM-core is designed to support a wide range of applications/use cases, such as semantic annotation of data mining algorithms, datasets and results; annotation of QSAR studies in the context of drug discovery investigations; and disambiguation of terms in text mining. The ontology has been thoroughly assessed following the practices in ontology engineering, is fully interoperable with many domain resources and is easy to extend
    corecore