21,282 research outputs found
Concept discovery innovations in law enforcement: a perspective.
In the past decades, the amount of information available to law enforcement agencies has increased significantly. Most of this information is in textual form, however analyses have mainly focused on the structured data. In this paper, we give an overview of the concept discovery projects at the Amsterdam-Amstelland police where Formal Concept Analysis (FCA) is being used as text mining instrument. FCA is combined with statistical techniques such as Hidden Markov Models (HMM) and Emergent Self Organizing Maps (ESOM). The combination of this concept discovery and refinement technique with statistical techniques for analyzing high-dimensional data not only resulted in new insights but often in actual improvements of the investigation procedures.Formal concept analysis; Intelligence led policing; Knowledge discovery;
Integration of Legacy Appliances into Home Energy Management Systems
The progressive installation of renewable energy sources requires the
coordination of energy consuming devices. At consumer level, this coordination
can be done by a home energy management system (HEMS). Interoperability issues
need to be solved among smart appliances as well as between smart and
non-smart, i.e., legacy devices. We expect current standardization efforts to
soon provide technologies to design smart appliances in order to cope with the
current interoperability issues. Nevertheless, common electrical devices affect
energy consumption significantly and therefore deserve consideration within
energy management applications. This paper discusses the integration of smart
and legacy devices into a generic system architecture and, subsequently,
elaborates the requirements and components which are necessary to realize such
an architecture including an application of load detection for the
identification of running loads and their integration into existing HEM
systems. We assess the feasibility of such an approach with a case study based
on a measurement campaign on real households. We show how the information of
detected appliances can be extracted in order to create device profiles
allowing for their integration and management within a HEMS
Clickstream Data Analysis: A Clustering Approach Based on Mixture Hidden Markov Models
Nowadays, the availability of devices such as laptops and cell phones enables one to
browse the web at any time and place. As a consequence, a company needs to have a
website so as to maintain or increase customer loyalty and reach potential new customers.
Besides, acting as a virtual point-of-sale, the company portal allows it to obtain insights on
potential customers through clickstream data, web generated data that track users accesses
and activities in websites. However, these data are not easy to handle as they are complex,
unstructured and limited by lack of clear information about user intentions and goals.
Clickstream data analysis is a suitable tool for managing the complexity of these datasets,
obtaining a cleaned and processed sequential dataframe ready to identify and analyse
patterns.
Analysing clickstream data is important for companies as it enables them to under stand differences in web user behaviour while they explore websites, how they move
from one page to another and what they select in order to define business strategies tar geting specific types of potential costumers. To obtain this level of insight it is pivotal to
understand how to exploit hidden information related to clickstream data.
This work presents the cleaning and pre-processing procedures for clickstream data
which are needed to get a structured sequential dataset and analyses these sequences by
the application of Mixture of discrete time Hidden Markov Models (MHMMs), a statisti cal tool suitable for clickstream data analysis and profile identification that has not been
widely used in this context. Specifically, hidden Markov process accounts for a time varying latent variable to handle uncertainty and groups together observed states based
on unknown similarity and entails identifying both the number of mixture components re lating to the subpopulations as well as the number of latent states for each latent Markov
chain.
However, the application of MHMMs requires the identification of both the number
of components and states. Information Criteria (IC) are generally used for model selection in mixture hidden Markov models and, although their performance has been widely
studied for mixture models and hidden Markov models, they have received little attention
in the MHMM context. The most widely used criterion is BIC even if its performance for
these models depends on factors such as the number of components and sequence length.
Another class of model selection criteria is the Classification Criteria (CC). They were
defined specifically for clustering purposes and rely on an entropy measure to account for
separability between groups. These criteria are clearly the best option for our purpose, but
their application as model selection tools for MHMMs requires the definition of a suitable
entropy measure.
In the light of these considerations, this work proposes a classification criterion based
on an integrated classification likelihood approach for MHMMs that accounts for the two
latent classes in the model: the subpopulations and the hidden states. This criterion is
a modified ICL BIC, a classification criterion that was originally defined in the mixture
model context and used in hidden Markov models. ICL BIC is a suitable score to identify
the number of classes (components or states) and, thus, to extend it to MHMMs we de fined a joint entropy accounting for both a component-related entropy and a state-related
conditional entropy.
The thesis presents a Monte Carlo simulation study to compare selection criteria per formance, the results of which point out the limitations of the most commonly used infor mation criteria and demonstrate that the proposed criterion outperforms them in identify ing components and states, especially in short length sequences which are quite common
in website accesses. The proposed selection criterion was applied to real clickstream data
collected from the website of a Sicilian company operating in the hospitality sector. Data
was modelled by an MHMM identifying clusters related to the browsing behaviour of
web users which provided essential indications for developing new business strategies.
This thesis is structured as follows: after an introduction on the main topics in Chapter
1, we present the clickstream data and their cleaning and pre-processing steps in Chapter
2; Chapter 3 illustrates the structure and estimation algorithms of mixture hidden Markov
models; Chapter 4 presents a review of model selection criteria and the definition of the
proposed ICL BIC for MHMMs; the real clickstream data analysis follows in Chapter 5
DATA MINING: A SEGMENTATION ANALYSIS OF U.S. GROCERY SHOPPERS
Consumers make choices about where to shop based on their preferences for a shopping environment and experience as well as the selection of products at a particular store. This study illustrates how retail firms and marketing analysts can utilize data mining techniques to better understand customer profiles and behavior. Among the key areas where data mining can produce new knowledge is the segmentation of customer data bases according to demographics, buying patterns, geographics, attitudes, and other variables. This paper builds profiles of grocery shoppers based on their preferences for 33 retail grocery store characteristics. The data are from a representative, nationwide sample of 900 supermarket shoppers collected in 1999. Six customer profiles are found to exist, including (1) "Time Pressed Meat Eaters", (2) "Back to Nature Shoppers", (3) "Discriminating Leisure Shoppers", (4) "No Nonsense Shoppers", (5) "The One Stop Socialites", and (6) "Middle of the Road Shoppers". Each of the customer profiles is described with respect to the underlying demographics and income. Consumer shopping segments cut across most demographic groups but are somewhat correlated with income. Hierarchical lists of preferences reveal that low price is not among the top five most important store characteristics. Experience and preferences for internet shopping shows that of the 44% who have access to the internet, only 3% had used it to order food.Consumer/Household Economics, Food Consumption/Nutrition/Food Safety,
REVIEW PAPER ON WEB PAGE PREDICTION USING DATA MINING
The continuous growth of the World Wide Web imposes the need of new methods of design and determines how to access a web page in the web usage mining by performing preprocessing of the data in a web page and development of on-line information services. The need for predicting the user’s needs in order to improve the usability and user retention of a web site is more than evident now a day. Without proper guidance, a visitor often wanders aimlessly without visiting important pages, loses interest, and leaves the site sooner than expected. In proposed system focus on investigating efficient and effective sequential access pattern mining techniques for web usage data. The mined patterns are then used for matching and generating web links for online recommendations. A web page of interest application will be developed for evaluating the quality and effectiveness of the discovered knowledge. Keyword: Webpage Prediction, Web Mining, MRF, ANN, KNN, GA
- …