thesis

Deep Data Analysis on the Web

Abstract

Search engines are well known to people all over the world. People prefer to use keywords searching to open websites or retrieve information rather than type typical URLs. Therefore, collecting finite sequences of keywords that represent important concepts within a set of authors is important, in other words, we need knowledge mining. We use a simplicial concept method to speed up concept mining. Previous CS 298 project has studied this approach under Dr. Lin. This method is very fast, for example, to mine the concept, FP-growth takes 876 seconds from a database with 1257 columns 65k rows, simplicial complex only takes 5 seconds. The collection of such concepts can be interpreted geometrically into simplicial complex, which can be construed as the knowledge base of this set of documents. Furthermore, we use homology theory to analyze this knowledge base (deep data analysis). For example, in mining market basket data with {a, b, c, d}, we find out frequent item sets {abc, abd, acd, bcd}, and the homology group H2 = Z (the integer Abelian group), which implies that very few customers buy four items together {abcd}, then we may analysis possible causes, etc

    Similar works