Search CORE

32 research outputs found

A Survey on Framework for Improved Web Data Clustering Using Language Processing Technique

Author: Ms. Aashwini T Thakare, Prof. M. S. Chaudhari
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/12/2014
Field of study

Now a day, World Wide Web becomes very popular and interactive for transferring of information. It is a massive repository of web pages and links. It provides information about vast area for the internet user. The web is huge, diverse and active and thus increases the scalability, multimedia data & temporal matters. The growth of the web has outcome in a huge amount of information that is now freely offered for user access. Since due to tremendous usage, the log files are growing at a faster rate & the size is becoming huge. Preprocessing plays a vital role in efficient mining process as log data is normally noisy and indistinct. Reconstruction of session and paths are completed by appending missing pages in preprocessing. Additionally, the transactions which illustrate the behavior of users are constructed exactly in preprocessing by calculating the Reference Length of user access by means of byte rate, the clustering task the ability to capture the uncertainty among web user’s navigation performance

International Journal on Recent and Innovation Trends in Computing and Communication

Rough Sets Clustering and Markov model for Web Access Prediction

Author: Chimphlee Siriporn
Chimphlee Witcha
Ngadiman Mohd. Salihin
Salim Naomie
Srinoy Surat
Publication venue
Publication date: 01/05/2006
Field of study

Discovering user access patterns from web access log is increasing the importance of information to build up adaptive web server according to the individual user’s behavior. The variety of user behaviors on accessing information also grows, which has a great impact on the network utilization. In this paper, we present a rough set clustering to cluster web transactions from web access logs and using Markov model for next access prediction. Using this approach, users can effectively mine web log records to discover and predict access patterns. We perform experiments using real web trace logs collected from www.dusit.ac.th servers. In order to improve its prediction ration, the model includes a rough sets scheme in which search similarity measure to compute the similarity between two sequences using upper approximation

Universiti Teknologi Malaysia Institutional Repository

Using Element Clustering to Increase the Efficiency of XML Schema Matching

Author: Jonker Willem
Keulen Maurice van
Smiljanic Marko
Publication venue
Publication date: 01/01/2006
Field of study

Schema matching attempts to discover semantic mappings between elements of two schemas. Elements are cross compared using various heuristics (e.g., name, data-type, and structure similarity). Seen from a broader perspective, the schema matching problem is a combinatorial problem with an exponential complexity. This makes the naive matching algorithms for large schemas prohibitively inefficient. In this paper we propose a clustering based technique for improving the efficiency of large scale schema matching. The technique inserts clustering as an intermediate step into existing schema matching algorithms. Clustering partitions schemas and reduces the overall matching load, and creates a possibility to trade between the efficiency and effectiveness. The technique can be used in addition to other optimization techniques. In the paper we describe the technique, validate the performance of one implementation of the technique, and open directions for future research

Crossref

University of Twente Research Information

Search Result Clustering via Randomized Partitioning of Query-Induced Subgraphs

Author: Bradic Aleksandar
Publication venue
Publication date: 25/11/2008
Field of study

In this paper, we present an approach to search result clustering, using partitioning of underlying link graph. We define the notion of "query-induced subgraph" and formulate the problem of search result clustering as a problem of efficient partitioning of given subgraph into topic-related clusters. Also, we propose a novel algorithm for approximative partitioning of such graph, which results in cluster quality comparable to the one obtained by deterministic algorithms, while operating in more efficient computation time, suitable for practical implementations. Finally, we present a practical clustering search engine developed as a part of this research and use it to get results about real-world performance of proposed concepts.Comment: 16th Telecommunications Forum TELFOR 200

arXiv.org e-Print Archive

Directory of Open Access Journals

Interval set clustering of web users using modified Kohonen self-organizing maps based on the properties of rough sets

Author: Hogo Mofreh
Lingras Pawan
Snorek Miroslav
Publication venue: 'IOS Press'
Publication date: 01/09/2004
Field of study

Publisher's version/PDFWeb usage mining involves application of data mining techniques to discover usage patterns from the web data. Clustering is one of the important functions in web usage mining. The likelihood of bad or incomplete web usage data is higher than the conventional applications. The clusters and associations in web usage mining do not necessarily have crisp boundaries. Researchers have studied the possibility of using fuzzy sets in web mining clustering applications. Recent attempts have adapted the K-means clustering algorithm as well as genetic algorithms based on rough sets to find interval sets of clusters. The genetic algorithms based clustering may not be able to handle large amounts of data. The K-means algorithm does not lend itself well to adaptive clustering. This paper proposes an adaptation of Kohonen self-organizing maps based on the properties of rough sets, to find the interval sets of clusters. Experiments are used to create interval set representations of clusters of web visitors on three educational web sites. The proposed approach has wider applications in other areas of web mining as well as data mining

Saint Mary's University, Halifax: Institutional Repository

Augmented Session Similarity Based Framework for Measuring Web User Concern from Web Server Logs

Author: Sisodia Dilip Singh
Publication venue: 'Insight Society'
Publication date: 07/06/2017
Field of study

In this paper, an augmented sessions similarity based framework is proposed to measure web user concern from web server logs. This proposed framework will consider the best usage similarity between two web sessions based on accessed page relevance and URL based syntactic structure of website within the session. The proposed framework is implemented using K-medoids clustering algorithms with independent and combined similarity measures. The clusters qualities are evaluated by measuring average intra-cluster and inter-cluster distances. The experimental results show that combined augmented session dissimilarity metric outperformed the independent augmented session dissimilarity measures in terms of cluster validity measures

International Journal on Advanced Science, Engineering and Information Technology

Overview of the Relational Analysis approach in Data-Mining and Multi-criteria Decision Making

Author: Jean-Francois Marcotorchino
Julien Ah-Pine
Publication venue: 'IntechOpen'
Publication date: 01/01/2010
Field of study

International audienceIn this chapter we introduce a general framework called the Relational Analysis approach and its related contributions and applications in the fields of data analysis, data mining and multi-criteria decision making. This approach was initiated by J.F. Marcotorchino and P. Michaud at the end of the 70's and has generated many research activities. However, the aspects of this framework that we would like to focus on are of a theoretical kind. Indeed, we are aimed at recalling the background and the basics of this framework, the unifying results and the modeling contributions that it has allowed to achieve. Besides, the main tasks that we are interested in are the ranking aggregation problem, the clustering problem and the block seriation problem. Those problems are combinatorial ones and the computational considerations of such tasks in the context of the RA methodology will not be covered. However, among the list of references that we give thoughout this chapter, there are numerous articles that the interested reader could consult to this end

IntechOpen

Crossref

Shopping hard or hardly shopping:Revealing consumer segments using clickstream data

Author: De Smedt Johannes
Lacka Ewelina
Zavali Melina
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/05/2021
Field of study

Edinburgh Research Explorer

Finding Conceptual Document Clusters Based on Top-N Formal Concept Search: Pruning Mechanism and Empirical Effectiveness

Author: Makoto Haraguchi
Yoshiaki Okubo
Publication venue: 'IntechOpen'
Publication date: 26/04/2011
Field of study

IntechOpen