149 research outputs found
CAS-MINE: Providing personalized services in context-aware applications by means of generalized rules
Context-aware systems acquire and exploit information on the user context to tailor services to a particular user, place, time, and/or event. Hence, they allowservice providers to adapt their services to actual user needs, by offering personalized services depending on the current user context. Service providers are usually interested in profiling users both
to increase client satisfaction and to broaden the set of offered services. Novel and efficient techniques are needed to tailor service supply to the user (or the user category) and to the situation inwhich he/she is involved. This paper presents the CAS-Mine framework to efficiently
discover relevant relationships between user context data and currently asked services for both user and service profiling. CAS-Mine efficiently extracts generalized association rules, which provide a high-level abstraction of both user habits and service characteristics depending
on the context. A lazy (analyst-provided) taxonomy evaluation performed on different attributes (e.g., a geographic hierarchy on spatial coordinates, a classification of provided services) drives the rule generalization process. Extracted rules are classified into groups according to their semantic meaning and ranked by means of quality indices, thus allowing a domain expert to focus on the most relevant patterns. Experiments performed on three context-aware datasets, obtained by logging user requests and context information for three
real applications, show the effectiveness and the efficiency of the CAS-Mine framework in mining different valuable types of correlations between user habits, context information, and provided services
Recommended from our members
Visual Analytics of Event Data using Multiple Mining Methods
Most researchers use a single method of mining to analyze event data. This paper uses case studies from two very differentdomains (electronic health records and cybersecurity) to investigate how researchers can gain breakthrough insights by com-bining multiple event mining methods in a visual analytics workflow. The aim of the health case study was to identify patternsof missing values, which was daunting because the 615 million missing values occurred in 43,219 combinations of fields. How-ever, a workflow that involved exclusive set intersections (ESI), frequent itemset mining (FIM) and then two more ESI stepsallowed us to identify that 82% of the missing values were from just 244 combinations. The cybersecurity case study’s aim wasto understand users’ behavior from logs that contained 300 types of action, gathered from 15,000 sessions and 1,400 users.Sequential frequent pattern mining (SFPM) and ESI highlighted some patterns in common, and others that were not. For thelatter, SFPM stood out for its ability to action sequences that were buried within otherwise different sessions, and ESI detectedsubtle signals that were missed by SFPM. In summary, this paper demonstrates the importance of using multiple perspectives,complementary set mining methods and a diverse workflow when using visual analytics to analyze complex event data
Study of various data mining techniques
The advent of computing technology has significantly influenced our lives and two major impacts of this effect are Business Data Processing and Scientific Computing. During the initial years of the development of computer techniques for business, computer professionals were concerned with designing files to store the data so that information could be efficiently retrieved. There were restrictions on storage size for storing data and on speed of accessing the data. Needless to say, the activity was restricted to a very few, highly qualified professionals. Then came the era when the task was simplified by a DBMS [1]. The responsibilities of intricate tasks, such as declarative aspects of the program were passed on to the database administrator and the user could pose his query in simpler languages such as query languages
Pattern mining under different conditions
New requirements and demands on pattern mining arise in modern applications, which cannot be fulfilled using conventional methods. For example, in scientific research, scientists are more interested in unknown knowledge, which usually hides in significant but not frequent patterns. However, existing itemset mining algorithms are designed for very frequent patterns. Furthermore, scientists need to repeat an experiment many times to ensure reproducibility. A series of datasets are generated at once, waiting for clustering, which can contain an unknown number of clusters with various densities and shapes. Using existing clustering algorithms is time-consuming because parameter tuning is necessary for each dataset. Many scientific datasets are extremely noisy. They contain considerably more noises than in-cluster data points. Most existing clustering algorithms can only handle noises up to a moderate level. Temporal pattern mining is also important in scientific research. Existing temporal pattern mining algorithms only consider pointbased events. However, most activities in the real-world are interval-based with a starting and an ending timestamp. This thesis developed novel pattern mining algorithms for various data mining tasks under different conditions.
The first part of this thesis investigates the problem of mining less frequent itemsets in transactional datasets. In contrast to existing frequent itemset mining algorithms, this part focus on itemsets that occurred not that frequent. Algorithms NIIMiner, RaCloMiner, and LSCMiner are proposed to identify such kind of itemsets efficiently. NIIMiner utilizes the negative itemset tree to extract all patterns that occurred less than a given support threshold in a top-down depth-first manner. RaCloMiner combines existing bottom-up frequent itemset mining algorithms with a top-down itemset mining algorithm to achieve a better performance in mining less frequent patterns. LSCMiner investigates the problem of mining less frequent closed patterns.
The second part of this thesis studied the problem of interval-based temporal pattern mining in the stream environment. Interval-based temporal patterns are sequential patterns in which each event is aligned with a starting and ending temporal information. The ability to handle interval-based events and stream data is lacking in existing approaches. A novel intervalbased temporal pattern mining algorithm for stream data is described in this part.
The last part of this thesis studies new problems in clustering on numeric datasets. The first problem tackled in this part is shape alternation adaptivity in clustering. In applications such as scientific data analysis, scientists need to deal with a series of datasets generated from one experiment. Cluster sizes and shapes are different in those datasets. A kNN density-based clustering algorithm, kadaClus, is proposed to provide the shape alternation adaptability so that users do not need to tune parameters for each dataset. The second problem studied in this part is clustering in an extremely noisy dataset. Many real-world datasets contain considerably more noises than in-cluster data points. A novel clustering algorithm, kenClus, is proposed to identify clusters in arbitrary shapes from extremely noisy datasets. Both clustering algorithms are kNN-based, which only require one parameter k.
In each part, the efficiency and effectiveness of the presented techniques are thoroughly analyzed. Intensive experiments on synthetic and real-world datasets are conducted to show the benefits of the proposed algorithms over conventional approaches
- …