1,300 research outputs found

    CASP-DM: Context Aware Standard Process for Data Mining

    Get PDF
    We propose an extension of the Cross Industry Standard Process for Data Mining (CRISPDM) which addresses specific challenges of machine learning and data mining for context and model reuse handling. This new general context-aware process model is mapped with CRISP-DM reference model proposing some new or enhanced outputs

    Mining geo-referenced databases: a way to improve decision-making

    Get PDF
    Knowledge discovery in databases is a process that aims at the discovery of associations within data sets. The analysis of geo-referenced data demands a particular approach in this process. This chapter presents a new approach to the process of knowledge discovery, in which qualitative geographic identifiers give the positional aspects of geographic data. Those identifiers are manipulated using qualitative reasoning principles, which allows for the inference of new spatial relations required for the data mining step of the knowledge discovery process. The efficacy and usefulness of the implemented system — PADRÃO — has been tested with a bank dataset. The results obtained support that traditional knowledge discovery systems, developed for relational databases and not having semantic knowledge linked to spatial data, can be used in the process of knowledge discovery in geo-referenced databases, since some of this semantic knowledge and the principles of qualitative spatial reasoning are available as spatial domain knowledge

    Combining data mining and text mining for detection of early stage dementia:the SAMS framework

    Get PDF
    In this paper, we describe the open-source SAMS framework whose novelty lies in bringing together both data collection (keystrokes, mouse movements, application pathways) and text collection (email, documents, diaries) and analysis methodologies. The aim of SAMS is to provide a non-invasive method for large scale collection, secure storage, retrieval and analysis of an individual’s computer usage for the detection of cognitive decline, and to infer whether this decline is consistent with the early stages of dementia. The framework will allow evaluation and study by medical professionals in which data and textual features can be linked to deficits in cognitive domains that are characteristic of dementia. Having described requirements gathering and ethical concerns in previous papers, here we focus on the implementation of the data and text collection components

    Finding and tracking multi-density clusters in an online dynamic data stream

    Get PDF
    The file attached to this record is the author's final peer reviewed version.Change is one of the biggest challenges in dynamic stream mining. From a data-mining perspective, adapting and tracking change is desirable in order to understand how and why change has occurred. Clustering, a form of unsupervised learning, can be used to identify the underlying patterns in a stream. Density-based clustering identifies clusters as areas of high density separated by areas of low density. This paper proposes a Multi-Density Stream Clustering (MDSC) algorithm to address these two problems; the multi-density problem and the problem of discovering and tracking changes in a dynamic stream. MDSC consists of two on-line components; discovered, labelled clusters and an outlier buffer. Incoming points are assigned to a live cluster or passed to the outlier buffer. New clusters are discovered in the buffer using an ant-inspired swarm intelligence approach. The newly discovered cluster is uniquely labelled and added to the set of live clusters. Processed data is subject to an ageing function and will disappear when it is no longer relevant. MDSC is shown to perform favourably to state-of-the-art peer stream-clustering algorithms on a range of real and synthetic data-streams. Experimental results suggest that MDSC can discover qualitatively useful patterns while being scalable and robust to noise

    Multi-Behavior Recommendation with Cascading Graph Convolution Networks

    Full text link
    Multi-behavior recommendation, which exploits auxiliary behaviors (e.g., click and cart) to help predict users' potential interactions on the target behavior (e.g., buy), is regarded as an effective way to alleviate the data sparsity or cold-start issues in recommendation. Multi-behaviors are often taken in certain orders in real-world applications (e.g., click>cart>buy). In a behavior chain, a latter behavior usually exhibits a stronger signal of user preference than the former one does. Most existing multi-behavior models fail to capture such dependencies in a behavior chain for embedding learning. In this work, we propose a novel multi-behavior recommendation model with cascading graph convolution networks (named MB-CGCN). In MB-CGCN, the embeddings learned from one behavior are used as the input features for the next behavior's embedding learning after a feature transformation operation. In this way, our model explicitly utilizes the behavior dependencies in embedding learning. Experiments on two benchmark datasets demonstrate the effectiveness of our model on exploiting multi-behavior data. It outperforms the best baseline by 33.7% and 35.9% on average over the two datasets in terms of Recall@10 and NDCG@10, respectively.Comment: Accepted by WWW 202

    The User Rights Database: Measuring the Impact of Copyright Balance

    Get PDF
    International and domestic copyright law reform around the world is increasingly focused on how copyright user rights should be expanded to promote maximum creativity and access to knowledge in the digital age. These efforts are guided by a relatively rich theoretical literature. However, few empirical studies explore the social and economic impact of expanding user rights in the digital era. One reason for this gap has been the absence of a tool measuring the key independent variable – changes in copyright user rights over time and between countries. We developed such a tool, which we call the “User Rights Database.” This paper describes the methodology used to create the Database and the results of empirical tests using it. We find that all of the countries in our study are trending toward more open copyright user rights over time, but the wealthy countries in our sample are about thirty years ahead of developing countries on this measure. We find evidence of benefits that more open copyright user rights generate, including the development of high technology industries and scholarly publication. We do not find evidence that opening user rights causes harm to revenue of copyright intensive industries like publishing and entertainment

    The Impact of Copyright Exceptions for Researchers on Scholarly Output

    Get PDF
    High prices restrict access to academic journals and books that scholars rely upon to author new research. One possible solution is the expansion of copyright exceptions allowing unauthorized access to copyrighted works for researchers. I test the link between copyright exceptions for health and science researchers and their publishing output at the country-subject level. I find that scientists residing in countries that implement more robust research exceptions publish more papers and books in subsequent years. This relationship between copyright exceptions and publishing is stronger in lower-income countries, and stronger where there is stricter copyright protection of existing works
    • 

    corecore