4,604 research outputs found

    Fuzzy-Granular Based Data Mining for Effective Decision Support in Biomedical Applications

    Get PDF
    Due to complexity of biomedical problems, adaptive and intelligent knowledge discovery and data mining systems are highly needed to help humans to understand the inherent mechanism of diseases. For biomedical classification problems, typically it is impossible to build a perfect classifier with 100% prediction accuracy. Hence a more realistic target is to build an effective Decision Support System (DSS). In this dissertation, a novel adaptive Fuzzy Association Rules (FARs) mining algorithm, named FARM-DS, is proposed to build such a DSS for binary classification problems in the biomedical domain. Empirical studies show that FARM-DS is competitive to state-of-the-art classifiers in terms of prediction accuracy. More importantly, FARs can provide strong decision support on disease diagnoses due to their easy interpretability. This dissertation also proposes a fuzzy-granular method to select informative and discriminative genes from huge microarray gene expression data. With fuzzy granulation, information loss in the process of gene selection is decreased. As a result, more informative genes for cancer classification are selected and more accurate classifiers can be modeled. Empirical studies show that the proposed method is more accurate than traditional algorithms for cancer classification. And hence we expect that genes being selected can be more helpful for further biological studies

    Data mining in soft computing framework: a survey

    Get PDF
    The present article provides a survey of the available literature on data mining using soft computing. A categorization has been provided based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the model. The utility of the different soft computing methodologies is highlighted. Generally fuzzy sets are suitable for handling the issues related to understandability of patterns, incomplete/noisy data, mixed media information and human interaction, and can provide approximate solutions faster. Neural networks are nonparametric, robust, and exhibit good learning and generalization capabilities in data-rich environments. Genetic algorithms provide efficient search algorithms to select a model, from mixed media data, based on some preference criterion/objective function. Rough sets are suitable for handling different types of uncertainty in data. Some challenges to data mining and the application of soft computing methodologies are indicated. An extensive bibliography is also included

    New Fundamental Technologies in Data Mining

    Get PDF
    The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining

    Clustering Web Concepts Using Algebraic Topology

    Get PDF
    In this world of Internet, there is a rapid amount of growth in data both in terms of size and dimension. It consists of web pages that represents human thoughts. These thoughts involves concepts and associations which we can capture. Using mathematics, we can perform meaningful clustering of these pages. This project aims at providing a new problem solving paradigm known as algebraic topology in data science. Professor Vasant Dhar, Editor-In-Chief of Big Data (Professor at NYU) define data science as a generalizable extraction of knowledge from data. The core concept of semantic based search engine project developed by my team is to extract a high frequency finite sequence of keywords by association mining. Each frequent finite keywords sequences represent a human concept in a document set. The collective view of such a collection concepts represent a piece of human knowledge. So this MS project is a data science project. By regarding each keyword as an abstract vertex, a finite sequence of keywords becomes a simplex, and the collection becomes a simplicial complexes. Based on this geometric view, new type of clustering can be performed here. If two concepts are connected by n-simplex, we say that these two simplex are connected. Those connected components will be captured by Homology Theory of Simplicial Complexes. The input data for this project are ten thousand files about data mining which are downloaded from IEEE explore library. The search engine nowadays deals with large amount of high dimensional data. Applying mathematical concepts and measuring the connectivity for ten thousand files will be a real challenge. Since, using algebraic topology is a complete new approach. Therefore, extensive testing has to be performed to verify the results for homology groups obtained

    Developing domain ontologies for course content

    Get PDF
    Ontologies have the potential to play an important role in instructional design and the development of course content. They can be used to represent knowledge about content, supporting instructors in creating content or learners in accessing content in a knowledge-guided way. While ontologies exist for many subject domains, their quality and suitability for the educational context might be unclear. For numerous subjects, ontologies do not exist. We present a method for domain experts rather than ontology engineers to develop ontologies for use in the delivery of courseware content. We will focus in particular on relationship types that allow us to model rich domains adequately
    corecore