748 research outputs found
DiffNodesets: An Efficient Structure for Fast Mining Frequent Itemsets
Mining frequent itemsets is an essential problem in data mining and plays an
important role in many data mining applications. In recent years, some itemset
representations based on node sets have been proposed, which have shown to be
very efficient for mining frequent itemsets. In this paper, we propose
DiffNodeset, a novel and more efficient itemset representation, for mining
frequent itemsets. Based on the DiffNodeset structure, we present an efficient
algorithm, named dFIN, to mining frequent itemsets. To achieve high efficiency,
dFIN finds frequent itemsets using a set-enumeration tree with a hybrid search
strategy and directly enumerates frequent itemsets without candidate generation
under some case. For evaluating the performance of dFIN, we have conduct
extensive experiments to compare it against with existing leading algorithms on
a variety of real and synthetic datasets. The experimental results show that
dFIN is significantly faster than these leading algorithms.Comment: 22 pages, 13 figure
MINING AND VERIFICATION OF TEMPORAL EVENTS WITH APPLICATIONS IN COMPUTER MICRO-ARCHITECTURE RESEARCH
Computer simulation programs are essential tools for scientists and engineers to understand a particular system of interest. As expected, the complexity of the software increases with the depth of the model used. In addition to the exigent demands of software engineering, verification of simulation programs is especially challenging because the models represented are complex and ridden with unknowns that will be discovered by developers in an iterative process. To manage such complexity, advanced verification techniques for continually matching the intended model to the implemented model are necessary. Therefore, the main goal of this research work is to design a useful verification and validation framework that is able to identify model representation errors and is applicable to generic simulators.
The framework that was developed and implemented consists of two parts. The first part is First-Order Logic Constraint Specification Language (FOLCSL) that enables users to specify the invariants of a model under consideration. From the first-order logic specification, the FOLCSL translator automatically synthesizes a verification program that reads the event trace generated by a simulator and signals whether all invariants are respected. The second part consists of mining the temporal flow of events using a newly developed representation called State Flow Temporal Analysis Graph (SFTAG). While the first part seeks an assurance of implementation correctness by checking that the model invariants hold, the second part derives an extended model of the implementation and hence enables a deeper understanding of what was implemented. The main application studied in this work is the validation of the timing behavior of micro-architecture simulators. The study includes SFTAGs generated for a wide set of benchmark programs and their analysis using several artificial intelligence algorithms. This work improves the computer architecture research and verification processes as shown by the case studies and experiments that have been conducted
Corporate Smart Content Evaluation
Nowadays, a wide range of information sources are available due to the
evolution of web and collection of data. Plenty of these information are
consumable and usable by humans but not understandable and processable by
machines. Some data may be directly accessible in web pages or via data feeds,
but most of the meaningful existing data is hidden within deep web databases
and enterprise information systems. Besides the inability to access a wide
range of data, manual processing by humans is effortful, error-prone and not
contemporary any more. Semantic web technologies deliver capabilities for
machine-readable, exchangeable content and metadata for automatic processing
of content. The enrichment of heterogeneous data with background knowledge
described in ontologies induces re-usability and supports automatic processing
of data. The establishment of “Corporate Smart Content” (CSC) - semantically
enriched data with high information content with sufficient benefits in
economic areas - is the main focus of this study. We describe three actual
research areas in the field of CSC concerning scenarios and datasets
applicable for corporate applications, algorithms and research. Aspect-
oriented Ontology Development advances modular ontology development and
partial reuse of existing ontological knowledge. Complex Entity Recognition
enhances traditional entity recognition techniques to recognize clusters of
related textual information about entities. Semantic Pattern Mining combines
semantic web technologies with pattern learning to mine for complex models by
attaching background knowledge. This study introduces the afore-mentioned
topics by analyzing applicable scenarios with economic and industrial focus,
as well as research emphasis. Furthermore, a collection of existing datasets
for the given areas of interest is presented and evaluated. The target
audience includes researchers and developers of CSC technologies - people
interested in semantic web features, ontology development, automation,
extracting and mining valuable information in corporate environments. The aim
of this study is to provide a comprehensive and broad overview over the three
topics, give assistance for decision making in interesting scenarios and
choosing practical datasets for evaluating custom problem statements. Detailed
descriptions about attributes and metadata of the datasets should serve as
starting point for individual ideas and approaches
Content-Based Multimedia Recommendation Systems: Definition and Application Domains
The goal of this work is to formally provide a general definition of a multimedia recommendation system (MMRS), in particular a content-based MMRS (CB-MMRS), and to shed light on different applications of multimedia content for solving a variety of tasks related to recommendation. We would like to disambiguate the fact that multimedia recommendation is not only about recommending a particular media type (e.g., music, video), rather there exists a variety of other applications in which the analysis of multimedia input can be usefully exploited to provide recommendations of various kinds of information
Anomaly Detection and Explanation Discovery on Event Streams
International audienceAs enterprise information systems are collecting event streams from various sources, the ability of a system to automatically detect anomalous events and further provide human readable explanations is of paramount importance. In this position paper, we argue for the need of a new type of data stream analytics that can address anomaly detection and explanation discovery in a single, integrated system, which not only offers increased business intelligence, but also opens up opportunities for improved solutions. In particular , we propose a two-pass approach to building such a system, highlight the challenges, and offer initial directions for solutions
Automatic Video Classification
Within the past few years video usage has grown in a multi-fold fashion. One of the major reasons for this explosive video growth is the rising Internet bandwidth speeds. As of today, a significant human effort is needed to categorize these video data files. A successful automatic video classification method can substantially help to reduce the growing amount of cluttered video data on the Internet. This research project is based on finding a successful model for video classification. We have utilized various schemes of visual and audio data analysis methods to build a successful classification model. As far as the classification classes are concerned, we have handpicked News, Animation and Music video classes to carry out the experiments. A total number of 445 video files from all three classes were analyzed to build classification models based on Naïve Bayes and Support Vector Machine classifiers. In order to gather the final results we developed a “weighted voting - meta classifier” model. Our approach attained an average of 90% success rate among all three classification classes
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
Intelligent Learning Automata-based Strategies Applied to Personalized Service Provisioning in Pervasive Environments
Doktorgradsavhandling i informasjons- og kommunikasjonsteknologi, Universitetet i Agder, Grimstad, 201
A hierarchal framework for recognising activities of daily life
PhDIn today’s working world the elderly who are dependent can sometimes be
neglected by society. Statistically, after toddlers it is the elderly who are observed
to have higher accident rates while performing everyday activities. Alzheimer’s
disease is one of the major impairments that elderly people suffer from, and leads
to the elderly person not being able to live an independent life due to forgetfulness.
One way to support elderly people who aspire to live an independent life and
remain safe in their home is to find out what activities the elderly person is
carrying out at a given time and provide appropriate assistance or institute
safeguards.
The aim of this research is to create improved methods to identify tasks related to
activities of daily life and determine a person’s current intentions and so reason
about that person’s future intentions. A novel hierarchal framework has been
developed, which recognises sensor events and maps them to significant activities
and intentions. As privacy is becoming a growing concern, the monitoring of an
individual’s behaviour can be seen as intrusive. Hence, the monitoring is based
around using simple non intrusive sensors and tags on everyday objects that are
used to perform daily activities around the home. Specifically there is no use of
any cameras or visual surveillance equipment, though the techniques developed
are still relevant in such a situation.
Models for task recognition and plan recognition have been developed and tested
on scenarios where the plans can be interwoven. Potential targets are people in the
first stages of Alzheimer’s disease and in the structuring of the library of kernel
plan sequences, typical routines used to sustain meaningful activity have been
used. Evaluations have been carried out using volunteers conducting activities of
daily life in an experimental home environment. The results generated from the
sensors have been interpreted and analysis of developed algorithms has been
made. The outcomes and findings of these experiments demonstrate that the
developed hierarchal framework is capable of carrying activity recognition as well
as being able to carry out intention analysis, e.g. predicting what activity they are
most likely to carry out next
- …