971 research outputs found
Preference Mining Using Neighborhood Rough Set Model on Two Universes
Preference mining plays an important role in e-commerce and video websites for enhancing user satisfaction and loyalty. Some classical methods are not available for the cold-start problem when the user or the item is new. In this paper, we propose a new model, called parametric neighborhood rough set on two universes (NRSTU), to describe the user and item data structures. Furthermore, the neighborhood lower approximation operator is used for defining the preference rules. Then, we provide the means for recommending items to users by using these rules. Finally, we give an experimental example to show the details of NRSTU-based preference mining for cold-start problem. The parameters of the model are also discussed. The experimental results show that the proposed method presents an effective solution for preference mining. In particular, NRSTU improves the recommendation accuracy by about 19% compared to the traditional method
Generated rules for AIDS and e-learning classifier using rough set approach
The emergence and growth of internet usage has accumulated an extensive amount of data. These data contain a wealth of undiscovered valuable information and problems of incomplete data set may lead to observation error. This research explored a technique to analyze data that transforms meaningless data to meaningful information. The work focused on Rough Set (RS) to deal with incomplete data and rules derivation. Rules with high and low left-hand-side (LHS) support value generated by RS were used as query statements to form a cluster of data. The model was tested on AIDS blog data set consisting of 146 bloggers and E-Learning@UTM (EL) log data set comprising 23105 URLs. 5-fold and 10-fold cross validation were used to split the data. Naïve algorithm and Boolean algorithm as discretization techniques and Johnson’s algorithm (Johnson) and Genetic algorithm (GA) as reduction techniques were employed to compare the results. 5-fold cross validation tended to suit AIDS data well while 10-fold cross validation was the best for EL data set. Johnson and GA yielded the same number of rules for both data sets. These findings are significant as evidence in terms of accuracy that was achieved using the proposed mode
Generated rules for AIDS and e-learning classifier using rough set approach
The emergence and growth of internet usage has accumulated an extensive amount of data. These data contain a wealth of undiscovered valuable information and problems of incomplete data set may lead to observation error. This research explored a technique to analyze data that transforms meaningless data to meaningful information. The work focused on Rough Set (RS) to deal with incomplete data and rules derivation. Rules with high and low left-hand-side (LHS) support value generated by RS were used as query statements to form a cluster of data. The model was tested on AIDS blog data set consisting of 146 bloggers and E-Learning@UTM (EL) log data set comprising 23105 URLs. 5-fold and 10-fold cross validation were used to split the data. Naïve algorithm and Boolean algorithm as discretization techniques and Johnson’s algorithm (Johnson) and Genetic algorithm (GA) as reduction techniques were employed to compare the results. 5-fold cross validation tended to suit AIDS data well while 10-fold cross validation was the best for EL data set. Johnson and GA yielded the same number of rules for both data sets. These findings are significant as evidence in terms of accuracy that was achieved using the proposed mode
On Approaches to Discretisation of Stylometric Data and Conflict Resolution in Decision Making
23rd International Conference on Knowledge-Based and Intelligent Information & Engineering
Systems - early articleThe paper presents research on unsupervised and supervised discretisation of input data used in execution of stylometric tasks of authorship attribution. Basing on numeric characterisation of writing styles, recognition of authorship is performed by decision rules, as their transparent structure enhances understanding of discovered knowledge. The performance of rule classifiers, constructed in rough set approach, is studied in the context of a strategy employed for resolving conflicts. It is also contrasted with that of other selected inducers
A Comprehensive Survey on Rare Event Prediction
Rare event prediction involves identifying and forecasting events with a low
probability using machine learning and data analysis. Due to the imbalanced
data distributions, where the frequency of common events vastly outweighs that
of rare events, it requires using specialized methods within each step of the
machine learning pipeline, i.e., from data processing to algorithms to
evaluation protocols. Predicting the occurrences of rare events is important
for real-world applications, such as Industry 4.0, and is an active research
area in statistical and machine learning. This paper comprehensively reviews
the current approaches for rare event prediction along four dimensions: rare
event data, data processing, algorithmic approaches, and evaluation approaches.
Specifically, we consider 73 datasets from different modalities (i.e.,
numerical, image, text, and audio), four major categories of data processing,
five major algorithmic groupings, and two broader evaluation approaches. This
paper aims to identify gaps in the current literature and highlight the
challenges of predicting rare events. It also suggests potential research
directions, which can help guide practitioners and researchers.Comment: 44 page
Discretisation of conditions in decision rules induced for continuous
Typically discretisation procedures are implemented as a part of initial pre-processing of data, before knowledge mining is employed. It means that conclusions and observations are based on reduced data, as usually by discretisation some information is discarded. The paper presents a different approach, with taking advantage of discretisation executed after data mining. In the described study firstly decision rules were induced from real-valued features. Secondly, data sets were discretised. Using categories found for attributes, in the
third step conditions included in inferred rules were translated into discrete domain. The properties and performance of rule classifiers were tested in the domain of stylometric analysis of texts, where writing styles were defined through quantitative attributes of continuous nature. The performed experiments show that the proposed processing leads to sets of rules with significantly reduced sizes while maintaining quality of predictions, and allows to test many data discretisation methods at the acceptable computational costs
Fuzzy-Granular Based Data Mining for Effective Decision Support in Biomedical Applications
Due to complexity of biomedical problems, adaptive and intelligent knowledge discovery and data mining systems are highly needed to help humans to understand the inherent mechanism of diseases. For biomedical classification problems, typically it is impossible to build a perfect classifier with 100% prediction accuracy. Hence a more realistic target is to build an effective Decision Support System (DSS). In this dissertation, a novel adaptive Fuzzy Association Rules (FARs) mining algorithm, named FARM-DS, is proposed to build such a DSS for binary classification problems in the biomedical domain. Empirical studies show that FARM-DS is competitive to state-of-the-art classifiers in terms of prediction accuracy. More importantly, FARs can provide strong decision support on disease diagnoses due to their easy interpretability. This dissertation also proposes a fuzzy-granular method to select informative and discriminative genes from huge microarray gene expression data. With fuzzy granulation, information loss in the process of gene selection is decreased. As a result, more informative genes for cancer classification are selected and more accurate classifiers can be modeled. Empirical studies show that the proposed method is more accurate than traditional algorithms for cancer classification. And hence we expect that genes being selected can be more helpful for further biological studies
Parametric Rough Sets with Application to Granular Association Rule Mining
Granular association rules reveal patterns hidden in many-to-many relationships which are common in relational databases. In recommender systems, these rules are appropriate for cold-start recommendation, where a customer or a product has just entered the system. An example of such rules might be “40% men like at least 30% kinds of alcohol; 45% customers are men and 6% products are alcohol.” Mining such rules is a challenging problem due to pattern explosion. In this paper, we build a new type of parametric rough sets on two universes and propose an efficient rule mining algorithm based on the new model. Specifically, the model is deliberately defined such that the parameter corresponds to one threshold of rules. The algorithm benefits from the lower approximation operator in the new model. Experiments on two real-world data sets show that the new algorithm is significantly faster than an existing algorithm, and the performance of recommender systems is stable
- …