70,265 research outputs found

    Multi-purpose exploratory mining of complex data

    Get PDF
    Due to the increasing power of data acquisition and data storage technologies, a large amount of data sets with complex structure are collected in the era of data explosion. Instead of simple representations by low-dimensional numerical features, such data sources range from high-dimensional feature spaces to graph data describing relationships among objects. Many techniques exist in the literature for mining simple numerical data but only a few approaches touch the increasing challenge of mining complex data, such as high-dimensional vectors of non-numerical data type, time series data, graphs, and multi-instance data where each object is represented by a finite set of feature vectors. Besides, there are many important data mining tasks for high-dimensional data, such as clustering, outlier detection, dimensionality reduction, similarity search, classification, prediction and result interpretation. Many algorithms have been proposed to solve these tasks separately, although in some cases they are closely related. Detecting and exploiting the relationships among them is another important challenge. This thesis aims to solve these challenges in order to gain new knowledge from complex high-dimensional data. We propose several new algorithms combining different data mining tasks to acquire novel knowledge from complex high-dimensional data: ROCAT (Relevant Overlapping Subspace Clusters on Categorical Data) automatically detects the most relevant overlapping subspace clusters on categorical data. It integrates clustering, feature selection and pattern mining without any input parameters in an information theoretic way. The next algorithm MSS (Multiple Subspace Selection) finds multiple low-dimensional subspaces for moderately high-dimensional data, each exhibiting an interesting cluster structure. For better interpretation of the results, MSS visualizes the clusters in multiple low-dimensional subspaces in a hierarchical way. SCMiner (Summarization-Compression Miner) focuses on bipartite graph data, which integrates co-clustering, graph summarization, link prediction, and the discovery of the hidden structure of a bipartite graph data on the basis of data compression. Finally, we propose a novel similarity measure for multi-instance data. The Probabilistic Integral Metric (PIM) is based on a probabilistic generative model requiring few assumptions. Experiments demonstrate the effectiveness and efficiency of PIM for similarity search (multi-instance data indexing with M-tree), explorative data analysis and data mining (multi-instance classification). To sum up, we propose algorithms combining different data mining tasks for complex data with various data types and data structures to discover the novel knowledge hidden behind the complex data

    On the Application of Data Mining to Official Data

    Get PDF
    Retrieving valuable knowledge and statistical patterns from official data has a great potential in supporting strategic policy making. Data Mining (DM) techniques are well-known for providing flexible and efficient analytical tools for data processing. In this paper, we provide an introduction to applications of DM to official statistics and flag the important issues and challenges. Considering recent advancements in software projects for DM, we propose intelligent data control system design and specifications as an example of DM application in official data processing.Data mining, Official data, Intelligent data control system

    Establishment of a integrative multi-omics expression database CKDdb in the context of chronic kidney disease (CKD)

    Get PDF
    Complex human traits such as chronic kidney disease (CKD) are a major health and financial burden in modern societies. Currently, the description of the CKD onset and progression at the molecular level is still not fully understood. Meanwhile, the prolific use of high-throughput omic technologies in disease biomarker discovery studies yielded a vast amount of disjointed data that cannot be easily collated. Therefore, we aimed to develop a molecule-centric database featuring CKD-related experiments from available literature publications. We established the Chronic Kidney Disease database CKDdb, an integrated and clustered information resource that covers multi-omic studies (microRNAs, genomics, peptidomics, proteomics and metabolomics) of CKD and related disorders by performing literature data mining and manual curation. The CKDdb database contains differential expression data from 49395 molecule entries (redundant), of which 16885 are unique molecules (non-redundant) from 377 manually curated studies of 230 publications. This database was intentionally built to allow disease pathway analysis through a systems approach in order to yield biological meaning by integrating all existing information and therefore has the potential to unravel and gain an in-depth understanding of the key molecular events that modulate CKD pathogenesis

    Astrophysics in S.Co.P.E

    Get PDF
    S.Co.P.E. is one of the four projects funded by the Italian Government in order to provide Southern Italy with a distributed computing infrastructure for fundamental science. Beside being aimed at building the infrastructure, S.Co.P.E. is also actively pursuing research in several areas among which astrophysics and observational cosmology. We shortly summarize the most significant results obtained in the first two years of the project and related to the development of middleware and Data Mining tools for the Virtual Observatory

    ANTIDS: Self-Organized Ant-based Clustering Model for Intrusion Detection System

    Full text link
    Security of computers and the networks that connect them is increasingly becoming of great significance. Computer security is defined as the protection of computing systems against threats to confidentiality, integrity, and availability. There are two types of intruders: the external intruders who are unauthorized users of the machines they attack, and internal intruders, who have permission to access the system with some restrictions. Due to the fact that it is more and more improbable to a system administrator to recognize and manually intervene to stop an attack, there is an increasing recognition that ID systems should have a lot to earn on following its basic principles on the behavior of complex natural systems, namely in what refers to self-organization, allowing for a real distributed and collective perception of this phenomena. With that aim in mind, the present work presents a self-organized ant colony based intrusion detection system (ANTIDS) to detect intrusions in a network infrastructure. The performance is compared among conventional soft computing paradigms like Decision Trees, Support Vector Machines and Linear Genetic Programming to model fast, online and efficient intrusion detection systems.Comment: 13 pages, 3 figures, Swarm Intelligence and Patterns (SIP)- special track at WSTST 2005, Muroran, JAPA
    • …
    corecore