66,332 research outputs found
pSPADE: Mining sequential pattern using personalized support threshold value
As the web log data is considered as complex and temporal, applying Sequential Pattern Mining technique becomes a challenging task.The min sup threshold issue is highlighted - as a pattern is considered as frequent if it meets the specified min sup.If the min sup is high, few patterns are discovered else the mining process will be longer if too many patterns generated using low min sup. The format of web log data that creates consecutive occurring pages has made it difficult to generate frequent sequences. Also, as each userā behaviour is unique; using one min sup value for all users may affect the pattern generation. This research introduced a personalized minimum support threshold for each web users using their Median item access (support) value to curb this problem.The pSPADE performance was the highest on the discovery of userās origin and also interesting pattern discovery attribute
Learning lost temporal fuzzy association rules
Fuzzy association rule mining discovers patterns in transactions, such as shopping baskets in a supermarket, or Web page accesses by a visitor to a Web site. Temporal patterns can be present in fuzzy association rules because the underlying process generating the data can be dynamic. However, existing solutions may not discover all interesting patterns because of a previously unrecognised problem that is revealed in this thesis. The contextual meaning of fuzzy association rules changes because of the dynamic feature of data. The static fuzzy representation and traditional
search method are inadequate.
The Genetic Iterative Temporal Fuzzy Association Rule Mining (GITFARM) framework solves the problem by utilising flexible fuzzy representations from a fuzzy rule-based system (FRBS). The combination of temporal, fuzzy and itemset space was simultaneously searched with a genetic algorithm (GA) to overcome the problem. The framework transforms the dataset to a graph for efficiently searching the dataset. A choice of model in fuzzy representation provides a trade-off in usage between an approximate and descriptive model. A method for verifying the solution to the hypothesised problem was presented. The proposed GA-based solution was compared with a traditional approach that uses an exhaustive search method. It was shown how the GA-based solution discovered rules that the traditional approach did not. This shows that simultaneously searching for rules and membership functions with a GA is a suitable solution for mining temporal fuzzy association rules. So, in practice, more knowledge can be discovered for making well-informed decisions that would otherwise be lost with a traditional approach.EPSRC DT
Language in Our Time: An Empirical Analysis of Hashtags
Hashtags in online social networks have gained tremendous popularity during
the past five years. The resulting large quantity of data has provided a new
lens into modern society. Previously, researchers mainly rely on data collected
from Twitter to study either a certain type of hashtags or a certain property
of hashtags. In this paper, we perform the first large-scale empirical analysis
of hashtags shared on Instagram, the major platform for hashtag-sharing. We
study hashtags from three different dimensions including the temporal-spatial
dimension, the semantic dimension, and the social dimension. Extensive
experiments performed on three large-scale datasets with more than 7 million
hashtags in total provide a series of interesting observations. First, we show
that the temporal patterns of hashtags can be categorized into four different
clusters, and people tend to share fewer hashtags at certain places and more
hashtags at others. Second, we observe that a non-negligible proportion of
hashtags exhibit large semantic displacement. We demonstrate hashtags that are
more uniformly shared among users, as quantified by the proposed hashtag
entropy, are less prone to semantic displacement. In the end, we propose a
bipartite graph embedding model to summarize users' hashtag profiles, and rely
on these profiles to perform friendship prediction. Evaluation results show
that our approach achieves an effective prediction with AUC (area under the ROC
curve) above 0.8 which demonstrates the strong social signals possessed in
hashtags.Comment: WWW 201
Analysis of Users' Behavior in Structured e-Commerce Websites
Online shopping is becoming more and more common in our daily lives. Understanding users'' interests and behavior is essential to adapt e-commerce websites to customers'' requirements. The information about users'' behavior is stored in the Web server logs. The analysis of such information has focused on applying data mining techniques, where a rather static characterization is used to model users'' behavior, and the sequence of the actions performed by them is not usually considered. Therefore, incorporating a view of the process followed by users during a session can be of great interest to identify more complex behavioral patterns. To address this issue, this paper proposes a linear-temporal logic model checking approach for the analysis of structured e-commerce Web logs. By defining a common way of mapping log records according to the e-commerce structure, Web logs can be easily converted into event logs where the behavior of users is captured. Then, different predefined queries can be performed to identify different behavioral patterns that consider the different actions performed by a user during a session. Finally, the usefulness of the proposed approach has been studied by applying it to a real case study of a Spanish e-commerce website. The results have identified interesting findings that have made possible to propose some improvements in the website design with the aim of increasing its efficiency
Web Usage Mining with Evolutionary Extraction of Temporal Fuzzy Association Rules
In Web usage mining, fuzzy association rules that have a temporal property can provide useful knowledge about when associations occur. However, there is a problem with traditional temporal fuzzy association rule mining algorithms. Some rules occur at the intersection of fuzzy sets' boundaries where there is less support (lower membership), so the rules are lost. A genetic algorithm (GA)-based solution is described that uses the flexible nature of the 2-tuple linguistic representation to discover rules that occur at the intersection of fuzzy set boundaries. The GA-based approach is enhanced from previous work by including a graph representation and an improved fitness function. A comparison of the GA-based approach with a traditional approach on real-world Web log data discovered rules that were lost with the traditional approach. The GA-based approach is recommended as complementary to existing algorithms, because it discovers extra rules. (C) 2013 Elsevier B.V. All rights reserved
- ā¦