748 research outputs found

    DiffNodesets: An Efficient Structure for Fast Mining Frequent Itemsets

    Full text link
    Mining frequent itemsets is an essential problem in data mining and plays an important role in many data mining applications. In recent years, some itemset representations based on node sets have been proposed, which have shown to be very efficient for mining frequent itemsets. In this paper, we propose DiffNodeset, a novel and more efficient itemset representation, for mining frequent itemsets. Based on the DiffNodeset structure, we present an efficient algorithm, named dFIN, to mining frequent itemsets. To achieve high efficiency, dFIN finds frequent itemsets using a set-enumeration tree with a hybrid search strategy and directly enumerates frequent itemsets without candidate generation under some case. For evaluating the performance of dFIN, we have conduct extensive experiments to compare it against with existing leading algorithms on a variety of real and synthetic datasets. The experimental results show that dFIN is significantly faster than these leading algorithms.Comment: 22 pages, 13 figure

    MINING AND VERIFICATION OF TEMPORAL EVENTS WITH APPLICATIONS IN COMPUTER MICRO-ARCHITECTURE RESEARCH

    Get PDF
    Computer simulation programs are essential tools for scientists and engineers to understand a particular system of interest. As expected, the complexity of the software increases with the depth of the model used. In addition to the exigent demands of software engineering, verification of simulation programs is especially challenging because the models represented are complex and ridden with unknowns that will be discovered by developers in an iterative process. To manage such complexity, advanced verification techniques for continually matching the intended model to the implemented model are necessary. Therefore, the main goal of this research work is to design a useful verification and validation framework that is able to identify model representation errors and is applicable to generic simulators. The framework that was developed and implemented consists of two parts. The first part is First-Order Logic Constraint Specification Language (FOLCSL) that enables users to specify the invariants of a model under consideration. From the first-order logic specification, the FOLCSL translator automatically synthesizes a verification program that reads the event trace generated by a simulator and signals whether all invariants are respected. The second part consists of mining the temporal flow of events using a newly developed representation called State Flow Temporal Analysis Graph (SFTAG). While the first part seeks an assurance of implementation correctness by checking that the model invariants hold, the second part derives an extended model of the implementation and hence enables a deeper understanding of what was implemented. The main application studied in this work is the validation of the timing behavior of micro-architecture simulators. The study includes SFTAGs generated for a wide set of benchmark programs and their analysis using several artificial intelligence algorithms. This work improves the computer architecture research and verification processes as shown by the case studies and experiments that have been conducted

    Corporate Smart Content Evaluation

    Get PDF
    Nowadays, a wide range of information sources are available due to the evolution of web and collection of data. Plenty of these information are consumable and usable by humans but not understandable and processable by machines. Some data may be directly accessible in web pages or via data feeds, but most of the meaningful existing data is hidden within deep web databases and enterprise information systems. Besides the inability to access a wide range of data, manual processing by humans is effortful, error-prone and not contemporary any more. Semantic web technologies deliver capabilities for machine-readable, exchangeable content and metadata for automatic processing of content. The enrichment of heterogeneous data with background knowledge described in ontologies induces re-usability and supports automatic processing of data. The establishment of “Corporate Smart Content” (CSC) - semantically enriched data with high information content with sufficient benefits in economic areas - is the main focus of this study. We describe three actual research areas in the field of CSC concerning scenarios and datasets applicable for corporate applications, algorithms and research. Aspect- oriented Ontology Development advances modular ontology development and partial reuse of existing ontological knowledge. Complex Entity Recognition enhances traditional entity recognition techniques to recognize clusters of related textual information about entities. Semantic Pattern Mining combines semantic web technologies with pattern learning to mine for complex models by attaching background knowledge. This study introduces the afore-mentioned topics by analyzing applicable scenarios with economic and industrial focus, as well as research emphasis. Furthermore, a collection of existing datasets for the given areas of interest is presented and evaluated. The target audience includes researchers and developers of CSC technologies - people interested in semantic web features, ontology development, automation, extracting and mining valuable information in corporate environments. The aim of this study is to provide a comprehensive and broad overview over the three topics, give assistance for decision making in interesting scenarios and choosing practical datasets for evaluating custom problem statements. Detailed descriptions about attributes and metadata of the datasets should serve as starting point for individual ideas and approaches

    Content-Based Multimedia Recommendation Systems: Definition and Application Domains

    Get PDF
    The goal of this work is to formally provide a general definition of a multimedia recommendation system (MMRS), in particular a content-based MMRS (CB-MMRS), and to shed light on different applications of multimedia content for solving a variety of tasks related to recommendation. We would like to disambiguate the fact that multimedia recommendation is not only about recommending a particular media type (e.g., music, video), rather there exists a variety of other applications in which the analysis of multimedia input can be usefully exploited to provide recommendations of various kinds of information

    Anomaly Detection and Explanation Discovery on Event Streams

    Get PDF
    International audienceAs enterprise information systems are collecting event streams from various sources, the ability of a system to automatically detect anomalous events and further provide human readable explanations is of paramount importance. In this position paper, we argue for the need of a new type of data stream analytics that can address anomaly detection and explanation discovery in a single, integrated system, which not only offers increased business intelligence, but also opens up opportunities for improved solutions. In particular , we propose a two-pass approach to building such a system, highlight the challenges, and offer initial directions for solutions

    Automatic Video Classification

    Get PDF
    Within the past few years video usage has grown in a multi-fold fashion. One of the major reasons for this explosive video growth is the rising Internet bandwidth speeds. As of today, a significant human effort is needed to categorize these video data files. A successful automatic video classification method can substantially help to reduce the growing amount of cluttered video data on the Internet. This research project is based on finding a successful model for video classification. We have utilized various schemes of visual and audio data analysis methods to build a successful classification model. As far as the classification classes are concerned, we have handpicked News, Animation and Music video classes to carry out the experiments. A total number of 445 video files from all three classes were analyzed to build classification models based on Naïve Bayes and Support Vector Machine classifiers. In order to gather the final results we developed a “weighted voting - meta classifier” model. Our approach attained an average of 90% success rate among all three classification classes

    Machine Learning in Automated Text Categorization

    Full text link
    The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey

    Intelligent Learning Automata-based Strategies Applied to Personalized Service Provisioning in Pervasive Environments

    Get PDF
    Doktorgradsavhandling i informasjons- og kommunikasjonsteknologi, Universitetet i Agder, Grimstad, 201

    A hierarchal framework for recognising activities of daily life

    Get PDF
    PhDIn today’s working world the elderly who are dependent can sometimes be neglected by society. Statistically, after toddlers it is the elderly who are observed to have higher accident rates while performing everyday activities. Alzheimer’s disease is one of the major impairments that elderly people suffer from, and leads to the elderly person not being able to live an independent life due to forgetfulness. One way to support elderly people who aspire to live an independent life and remain safe in their home is to find out what activities the elderly person is carrying out at a given time and provide appropriate assistance or institute safeguards. The aim of this research is to create improved methods to identify tasks related to activities of daily life and determine a person’s current intentions and so reason about that person’s future intentions. A novel hierarchal framework has been developed, which recognises sensor events and maps them to significant activities and intentions. As privacy is becoming a growing concern, the monitoring of an individual’s behaviour can be seen as intrusive. Hence, the monitoring is based around using simple non intrusive sensors and tags on everyday objects that are used to perform daily activities around the home. Specifically there is no use of any cameras or visual surveillance equipment, though the techniques developed are still relevant in such a situation. Models for task recognition and plan recognition have been developed and tested on scenarios where the plans can be interwoven. Potential targets are people in the first stages of Alzheimer’s disease and in the structuring of the library of kernel plan sequences, typical routines used to sustain meaningful activity have been used. Evaluations have been carried out using volunteers conducting activities of daily life in an experimental home environment. The results generated from the sensors have been interpreted and analysis of developed algorithms has been made. The outcomes and findings of these experiments demonstrate that the developed hierarchal framework is capable of carrying activity recognition as well as being able to carry out intention analysis, e.g. predicting what activity they are most likely to carry out next
    • …
    corecore