7 research outputs found
Intelligent Support for Information Retrieval of Web Documents
The main goal of this research was to investigate the means of intelligent support for retrieval of web documents. We have proposed the architecture of the web tool system --- Trillian, which discovers the interests of users without their interaction and uses them for autonomous searching of related web content. Discovered pages are suggested to the user. The discovery of user interests is based on analysis of documents visited by the users previously. We have created a module for completely transparent tracking of the user's movement on the web, which logs both visited URLs and contents of web pages. The post analysis step is based on a variant of the suffix tree clustering algorithm. We primarily focus on overall Trillian architecture design and the process of discovering topics of interests. We have implemented an experimental prototype of Trillian and evaluated the quality, speed and usefulness of the proposed system. We have shown that clustering is a feasible technique for extraction of interests from web documents. We consider the proposed architecture to be quite promising and suitable for future extensions
Mining Traversal Patterns from Weighted Traversals and Graph
μ€μΈκ³μ λ§μ λ¬Έμ λ€μ κ·Έλνμ κ·Έ κ·Έλνλ₯Ό μννλ νΈλμμ
μΌλ‘ λͺ¨λΈλ§λ μ μλ€. μλ₯Ό λ€λ©΄, μΉ νμ΄μ§μ μ°κ²°κ΅¬μ‘°λ κ·Έλνλ‘ ννλ μ μκ³ , μ¬μ©μμ μΉ νμ΄μ§ λ°©λ¬Έκ²½λ‘λ κ·Έ κ·Έλνλ₯Ό μννλ νΈλμμ
μΌλ‘ λͺ¨λΈλ§λ μ μλ€. μ΄μ κ°μ΄ κ·Έλνλ₯Ό μννλ νΈλμμ
μΌλ‘λΆν° μ€μνκ³ κ°μΉ μλ ν¨ν΄μ μ°Ύμλ΄λ κ²μ μλ―Έ μλ μΌμ΄λ€. μ΄λ¬ν ν¨ν΄μ μ°ΎκΈ° μν μ§κΈκΉμ§μ μ°κ΅¬μμλ μνλ κ·Έλνμ κ°μ€μΉλ₯Ό κ³ λ €νμ§ μκ³ λ¨μν λΉλ°νλ ν¨ν΄λ§μ μ°Ύλ μκ³ λ¦¬μ¦μ μ μνμλ€. μ΄λ¬ν μκ³ λ¦¬μ¦μ νκ³λ λ³΄λ€ μ λ’°μ± μκ³ μ νν ν¨ν΄μ νμ¬νλ λ° μ΄λ €μμ΄ μλ€λ κ²μ΄λ€.
λ³Έ λ
Όλ¬Έμμλ μνλ κ·Έλνμ μ μ μ λΆμ¬λ κ°μ€μΉλ₯Ό κ³ λ €νμ¬ ν¨ν΄μ νμ¬νλ λ κ°μ§ λ°©λ²λ€μ μ μνλ€. 첫 λ²μ§Έ λ°©λ²μ κ·Έλνλ₯Ό μννλ μ 보μ κ°μ€μΉκ° μ‘΄μ¬νλ κ²½μ°μ λΉλ° μν ν¨ν΄μ νμ¬νλ κ²μ΄λ€. κ·Έλν μνμ λΆμ¬λ μ μλ κ°μ€μΉλ‘λ λ λμκ°μ μ΄λ μκ°μ΄λ μΉ μ¬μ΄νΈλ₯Ό λ°©λ¬Έν λ ν νμ΄μ§μμ λ€λ₯Έ νμ΄μ§λ‘ μ΄λνλ μκ° λ±μ΄ λ μ μλ€. λ³Έ λ
Όλ¬Έμμλ μ’ λ μ νν μν ν¨ν΄μ λ§μ΄λνκΈ° μν΄ ν΅κ³νμ μ λ’° ꡬκ°μ μ΄μ©νλ€. μ¦, μ 체 μνμ κ° κ°μ μ λΆμ¬λ κ°μ€μΉλ‘λΆν° μ λ’° ꡬκ°μ ꡬν ν μ λ’° ꡬκ°μ λ΄μ μλ μνλ§μ μ ν¨ν κ²μΌλ‘ μΈμ νλ λ°©λ²μ΄λ€. μ΄λ¬ν λ°©λ²μ μ μ©ν¨μΌλ‘μ¨ λμ± μ λ’°μ± μλ μν ν¨ν΄μ λ§μ΄λν μ μλ€. λν μ΄λ κ² κ΅¬ν ν¨ν΄κ³Ό κ·Έλν μ 보λ₯Ό μ΄μ©νμ¬ ν¨ν΄ κ°μ μ°μ μμλ₯Ό κ²°μ ν μ μλ λ°©λ²κ³Ό μ±λ₯ ν₯μμ μν μκ³ λ¦¬μ¦λ μ μνλ€.
λ λ²μ§Έ λ°©λ²μ κ·Έλνμ μ μ μ κ°μ€μΉκ° λΆμ¬λ κ²½μ°μ κ°μ€μΉκ° κ³ λ €λ λΉλ° μν ν¨ν΄μ νμ¬νλ λ°©λ²μ΄λ€. κ·Έλνμ μ μ μ λΆμ¬λ μ μλ κ°μ€μΉλ‘λ μΉ μ¬μ΄νΈ λ΄μ κ° λ¬Έμμ μ 보λμ΄λ μ€μλ λ±μ΄ λ μ μλ€. μ΄ λ¬Έμ μμλ λΉλ° μν ν¨ν΄μ κ²°μ νκΈ° μνμ¬ ν¨ν΄μ λ°μ λΉλλΏλ§ μλλΌ λ°©λ¬Έν μ μ μ κ°μ€μΉλ₯Ό λμμ κ³ λ €νμ¬μΌ νλ€. μ΄λ₯Ό μν΄ λ³Έ λ
Όλ¬Έμμλ μ μ μ κ°μ€μΉλ₯Ό μ΄μ©νμ¬ ν₯νμ λΉλ° ν¨ν΄μ΄ λ κ°λ₯μ±μ΄ μλ ν보 ν¨ν΄μ κ° λ§μ΄λ λ¨κ³μμ μ κ±°νμ§ μκ³ μ μ§νλ μκ³ λ¦¬μ¦μ μ μνλ€. λν μ±λ₯ ν₯μμ μν΄ ν보 ν¨ν΄μ μλ₯Ό κ°μμν€λ μκ³ λ¦¬μ¦λ μ μνλ€.
λ³Έ λ
Όλ¬Έμμ μ μν λ κ°μ§ λ°©λ²μ λνμ¬ λ€μν μ€νμ ν΅νμ¬ μν μκ° λ° μμ±λλ ν¨ν΄μ μ λ±μ λΉκ΅ λΆμνμλ€.
λ³Έ λ
Όλ¬Έμμλ μνμ κ°μ€μΉκ° μλ κ²½μ°μ κ·Έλνμ μ μ μ κ°μ€μΉκ° μλ κ²½μ°μ λΉλ° μν ν¨ν΄μ νμ¬νλ μλ‘μ΄ λ°©λ²λ€μ μ μνμλ€. μ μν λ°©λ²λ€μ μΉ λ§μ΄λκ³Ό κ°μ λΆμΌμ μ μ©ν¨μΌλ‘μ¨ μΉ κ΅¬μ‘°μ ν¨μ¨μ μΈ λ³κ²½μ΄λ μΉ λ¬Έμμ μ κ·Ό μλ ν₯μ, μ¬μ©μλ³ κ°μΈνλ μΉ λ¬Έμ κ΅¬μΆ λ±μ΄ κ°λ₯ν κ²μ΄λ€.Abstract β
Ά
Chapter 1 Introduction
1.1 Overview
1.2 Motivations
1.3 Approach
1.4 Organization of Thesis
Chapter 2 Related Works
2.1 Itemset Mining
2.2 Weighted Itemset Mining
2.3 Traversal Mining
2.4 Graph Traversal Mining
Chapter 3 Mining Patterns from Weighted Traversals on
Unweighted Graph
3.1 Definitions and Problem Statements
3.2 Mining Frequent Patterns
3.2.1 Augmentation of Base Graph
3.2.2 In-Mining Algorithm
3.2.3 Pre-Mining Algorithm
3.2.4 Priority of Patterns
3.3 Experimental Results
Chapter 4 Mining Patterns from Unweighted Traversals on
Weighted Graph
4.1 Definitions and Problem Statements
4.2 Mining Weighted Frequent Patterns
4.2.1 Pruning by Support Bounds
4.2.2 Candidate Generation
4.2.3 Mining Algorithm
4.3 Estimation of Support Bounds
4.3.1 Estimation by All Vertices
4.3.2 Estimation by Reachable Vertices
4.4 Experimental Results
Chapter 5 Conclusions and Further Works
Reference
Mining of uncertain Web log sequences with access history probabilities
An uncertain data sequence is a sequence of data that exist with some level of doubt or probability. Each data item in the uncertain sequence is represented with a label and probability values, referred to as existential probability, ranging from 0 to 1.
Existing algorithms are either unsuitable or inefficient for discovering frequent sequences in uncertain data. This thesis presents mining of uncertain Web sequences with a method that combines access history probabilities from several Web log sessions with features of the PLWAP web sequential miner. The method is Uncertain Position Coded Pre-order Linked Web Access Pattern (U-PLWAP) algorithm for mining frequent sequential patterns in uncertain web logs. While PLWAP only considers a session of weblogs, U-PLWAP takes more sessions of weblogs from which existential probabilities are generated. Experiments show that U-PLWAP is at least 100% faster than U-apriori, and 33% faster than UF-growth. The UF-growth algorithm also fails to take into consideration the order of the items, thereby making U-PLWAP a richer algorithm in terms of the information its result contains
Finding Generalized Path Patterns for Web Log Data Mining
Conducting data mining on logs of web servers involves the determination of frequently occurring access sequences. We examine the problem of finding traversal patterns from web logs by considering the fact that irrelevant accesses to web documents may be interleaved within access patterns due to navigational purposes. We define a general type of pattern that takes into account this fact and also, we present a level-wise algorithm for the determination of these patterns, which is based on the underlying structure of the web site. The performance of the algorithm and its sensitivity to several parameters is examined experimentally with synthetic data