35 research outputs found

    The impact of noise at different data attributes

    Get PDF
    āļœāļĨāļ‡āļēāļ™āļ§āļīāļŠāļēāļāļēāļĢāļ„āļ“āļēāļˆāļēāļĢāļĒāđŒāļĄāļŦāļēāļ§āļīāļ—āļĒāļēāļĨāļąāļĒāđ€āļ—āļ„āđ‚āļ™āđ‚āļĨāļĒāļĩāļŠāļļāļĢāļ™āļēāļĢ

    Multiple principal component analyses and projective clustering

    Get PDF
    Multiple principal component analyses and projective clusterin

    Density-biased clustering based on reservoir sampling

    Get PDF

    Weighted K-means for density-biased clustering

    Get PDF

    THE DISCOVERY OF TOP-K DNA FREQUENT PATTERNS WITH APPROXIMATE METHOD

    No full text
    Top-k frequent pattern discovery is indeed an association analysis concerning automatic extraction of the k most correlated and interesting patterns from large databases. Current studies in association mining concentrate on how to effectively find all objects that are frequently co-occurring. Given a set of objects with m features, there are almost 2m frequent patterns to consider. For DNA data that are normally very high in dimensionality, frequent pattern discovery from genetic data is obviously a computationally expensive problem. We therefore devise an approximate approach to tackle this problem. We propose an approximate method based on the window sliding concept to estimate data density and obtain data characteristics from a small set of samples. Then we draw a set of representatives with reservoir sampling technique. These representatives are subsequently used in the main process of frequent pattern mining. Our designed algorithm had been implemented with the Erlang language, which is the functional programming paradigm with inherent support for pattern matching. The experimental results confirm the efficiency and reliability of our approximate method

    Density Estimation technique for data stream classification

    No full text
    āđ‚āļ„āļĢāļ‡āļāļēāļĢāļŦāļ™āļķāđˆāļ‡āļ­āļēāļˆāļēāļĢāļĒāđŒāļŦāļ™āļķāđˆāļ‡āļœāļĨāļ‡āļēāļ™ āļ›āļĢāļ°āļˆāļģāļ›āļĩ 254

    The Application of Inductive Logic Programming to Support Semantic Query Optimization

    No full text
    Inductive logic programming (ILP) is a recently emerging subfield of machine learning that aims at overcoming the limitations of most attribute-value learning algorithms by adopting a more powerful language of first-order logic. Employing successful learning techniques of ILP to learn interesting characteristics among database relations is of particular interest to the knowledge discovery in databases research community. However, most existing ILP systems are general-purpose learners and that means users have to know how to tune some factors of ILP learners to best suit their tasks at hand. One such factor with great impact on the efficiency of ILP learning is how to specify the language bias. The language bias is a restriction on the format (or syntax) of clauses allowed in the hypothesis space. If the language is too weak, the search space is very large, and hence, the learning efficiency is decreased. On the contrary, if the language is too strong, the search space is so small that many interesting rules may be excluded from consideration. It is the purpose of this dissertation to develop an algorithm to generate a potentially useful language bias that is more appropriate for the task of inducing semantic constraints from the database relations. These constraints will be a major source of semantic knowledge for semantic query optimization in database query processing. The efficiency of the proposed algorithm was verified experimentally. The appropriate form of language bias specification, which is the output of the algorithm, was tested on the ILP system CLAUDIEN comparing with a number of different forms of language bias specification. The learning results were compared on the basis of number of rules discovered, the quality of rules, total time spent to learn rules, and the size of the search space. The experimental results showed that the proposed algorithm is helpful for the induction of semantic rules
    corecore