6,895 research outputs found

    Ontology of core data mining entities

    Get PDF
    In this article, we present OntoDM-core, an ontology of core data mining entities. OntoDM-core defines themost essential datamining entities in a three-layered ontological structure comprising of a specification, an implementation and an application layer. It provides a representational framework for the description of mining structured data, and in addition provides taxonomies of datasets, data mining tasks, generalizations, data mining algorithms and constraints, based on the type of data. OntoDM-core is designed to support a wide range of applications/use cases, such as semantic annotation of data mining algorithms, datasets and results; annotation of QSAR studies in the context of drug discovery investigations; and disambiguation of terms in text mining. The ontology has been thoroughly assessed following the practices in ontology engineering, is fully interoperable with many domain resources and is easy to extend

    Algorithmic and Statistical Perspectives on Large-Scale Data Analysis

    Full text link
    In recent years, ideas from statistics and scientific computing have begun to interact in increasingly sophisticated and fruitful ways with ideas from computer science and the theory of algorithms to aid in the development of improved worst-case algorithms that are useful for large-scale scientific and Internet data analysis problems. In this chapter, I will describe two recent examples---one having to do with selecting good columns or features from a (DNA Single Nucleotide Polymorphism) data matrix, and the other having to do with selecting good clusters or communities from a data graph (representing a social or information network)---that drew on ideas from both areas and that may serve as a model for exploiting complementary algorithmic and statistical perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors, "Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201

    Intelligent Management and Efficient Operation of Big Data

    Get PDF
    This chapter details how Big Data can be used and implemented in networking and computing infrastructures. Specifically, it addresses three main aspects: the timely extraction of relevant knowledge from heterogeneous, and very often unstructured large data sources, the enhancement on the performance of processing and networking (cloud) infrastructures that are the most important foundational pillars of Big Data applications or services, and novel ways to efficiently manage network infrastructures with high-level composed policies for supporting the transmission of large amounts of data with distinct requisites (video vs. non-video). A case study involving an intelligent management solution to route data traffic with diverse requirements in a wide area Internet Exchange Point is presented, discussed in the context of Big Data, and evaluated.Comment: In book Handbook of Research on Trends and Future Directions in Big Data and Web Intelligence, IGI Global, 201

    Approximation contexts in addressing graph data structures

    Get PDF
    While the application of machine learning algorithms to practical problems has been expanded from fixed sized input data to sequences, trees or graphs input data, the composition of learning system has developed from a single model to integrated ones. Recent advances in graph based learning algorithms include: the SOMSD (Self Organizing Map for Structured Data), PMGraphSOM (Probability Measure Graph Self Organizing Map,GNN (Graph Neural Network) and GLSVM (Graph Laplacian Support Vector Machine). A main motivation of this thesis is to investigate if such algorithms, whether by themselves individually or modified, or in various combinations, would provide better performance over the more traditional artificial neural networks or kernel machine methods on some practical challenging problems. More succinctly, this thesis seeks to answer the main research question: when or under what conditions/contexts could graph based models be adjusted and tailored to be most efficacious in terms of predictive or classification performance on some challenging practical problems? There emerges a range of sub-questions including: how do we craft an effective neural learning system which can be an integration of several graph and non-graph based models? Integration of various graph based and non graph based kernel machine algorithms; enhancing the capability of the integrated model in working with challenging problems; tackling the problem of long term dependency issues which aggravate the performance of layer-wise graph based neural systems. This thesis will answer these questions. Recent research on multiple staged learning models has demonstrated the efficacy of multiple layers of alternating unsupervised and supervised learning approaches. This underlies the very successful front-end feature extraction techniques in deep neural networks. However much exploration is still possible with the investigation of the number of layers required, and the types of unsupervised or supervised learning models which should be used. Such issues have not been considered so far, when the underlying input data structure is in the form of a graph. We will explore empirically the capabilities of models of increasing complexities, the combination of the unsupervised learning algorithms, SOM, or PMGraphSOM, with or without a cascade connection with a multilayer perceptron, and with or without being followed by multiple layers of GNN. Such studies explore the effects of including or ignoring context. A parallel study involving kernel machines with or without graph inputs has also been conducted empirically
    corecore