200,684 research outputs found

    Mining System Specific Rules from Change Patterns

    Get PDF
    International audienceA significant percentage of warnings reported by tools to detect coding standard violations are false positives. Thus, there are some works dedicated to provide better rules by mining them from source code history, analyzing bug-fixes or changes between system releases. However, software evolves over time, and during development not only bugs are fixed, but also features are added, and code is refactored. In such cases, changes must be consistently applied in source code to avoid maintenance problems. In this paper, we propose to extract system specific rules by mining systematic changes over source code history, i.e., not just from bug-fixes or system releases, to ensure that changes are consistently applied over source code. We focus on structural changes done to support API modification or evolution with the goal of providing better rules to developers. Also, rules are mined from predefined rule patterns that ensure their quality. In order to assess the precision of such specific rules to detect real violations, we compare them with generic rules provided by tools to detect coding standard violations on four real world systems covering two programming languages. The results show that specific rules are more precise in identifying real violations in source code than generic ones, and thus can complement them

    Data mining technology for the evaluation of learning content interaction

    Get PDF
    Interactivity is central for the success of learning. In e-learning and other educational multimedia environments, the evaluation of interaction and behaviour is particularly crucial. Data mining – a non-intrusive, objective analysis technology – shall be proposed as the central evaluation technology for the analysis of the usage of computer-based educational environments and in particular of the interaction with educational content. Basic mining techniques are reviewed and their application in a Web-based third-level course environment is illustrated. Analytic models capturing interaction aspects from the application domain (learning) and the software infrastructure (interactive multimedia) are required for the meaningful interpretation of mining results

    Integrating E-Commerce and Data Mining: Architecture and Challenges

    Full text link
    We show that the e-commerce domain can provide all the right ingredients for successful data mining and claim that it is a killer domain for data mining. We describe an integrated architecture, based on our expe-rience at Blue Martini Software, for supporting this integration. The architecture can dramatically reduce the pre-processing, cleaning, and data understanding effort often documented to take 80% of the time in knowledge discovery projects. We emphasize the need for data collection at the application server layer (not the web server) in order to support logging of data and metadata that is essential to the discovery process. We describe the data transformation bridges required from the transaction processing systems and customer event streams (e.g., clickstreams) to the data warehouse. We detail the mining workbench, which needs to provide multiple views of the data through reporting, data mining algorithms, visualization, and OLAP. We con-clude with a set of challenges.Comment: KDD workshop: WebKDD 200

    A log mining approach for process monitoring in SCADA

    Get PDF
    SCADA (Supervisory Control and Data Acquisition) systems are used for controlling and monitoring industrial processes. We propose a methodology to systematically identify potential process-related threats in SCADA. Process-related threats take place when an attacker gains user access rights and performs actions, which look legitimate, but which are intended to disrupt the SCADA process. To detect such threats, we propose a semi-automated approach of log processing. We conduct experiments on a real-life water treatment facility. A preliminary case study suggests that our approach is effective in detecting anomalous events that might alter the regular process workflow

    Finding Temporal Patterns in Noisy Longitudinal Data: A Study in Diabetic Retinopathy

    Get PDF
    This paper describes an approach to temporal pattern mining using the concept of user defined temporal prototypes to define the nature of the trends of interests. The temporal patterns are defined in terms of sequences of support values associated with identified frequent patterns. The prototypes are defined mathematically so that they can be mapped onto the temporal patterns. The focus for the advocated temporal pattern mining process is a large longitudinal patient database collected as part of a diabetic retinopathy screening programme, The data set is, in itself, also of interest as it is very noisy (in common with other similar medical datasets) and does not feature a clear association between specific time stamps and subsets of the data. The diabetic retinopathy application, the data warehousing and cleaning process, and the frequent pattern mining procedure (together with the application of the prototype concept) are all described in the paper. An evaluation of the frequent pattern mining process is also presented

    Expert System for Crop Disease based on Graph Pattern Matching: A proposal

    Get PDF
    Para la agroindustria, las enfermedades en cultivos constituyen uno de los problemas mås frecuentes que generan grandes pérdidas económicas y baja calidad en la producción. Por otro lado, desde las ciencias de la computación, han surgido diferentes herramientas cuya finalidad es mejorar la prevención y el tratamiento de estas enfermedades. En este sentido, investigaciones recientes proponen el desarrollo de sistemas expertos para resolver este problema haciendo uso de técnicas de minería de datos e inteligencia artificial, como inferencia basada en reglas, årboles de decisión, redes bayesianas, entre otras. Ademås, los grafos pueden ser usados para el almacenamiento de los diferentes tipos de variables que se encuentran presentes en un ambiente de cultivos, permitiendo la aplicación de técnicas de minería de datos en grafos, como el emparejamiento de patrones en los mismos. En este artículo presentamos una visión general de las temåticas mencionadas y una propuesta de un sistema experto para enfermedades en cultivos, basado en emparejamiento de patrones en grafos.For agroindustry, crop diseases constitute one of the most common problems that generate large economic losses and low production quality. On the other hand, from computer science, several tools have emerged in order to improve the prevention and treatment of these diseases. In this sense, recent research proposes the development of expert systems to solve this problem, making use of data mining and artificial intelligence techniques like rule-based inference, decision trees, Bayesian network, among others. Furthermore, graphs can be used for storage of different types of variables that are present in an environment of crops, allowing the application of graph data mining techniques like graph pattern matching. Therefore, in this paper we present an overview of the above issues and a proposal of an expert system for crop disease based on graph pattern matching

    Studying patterns of use of transport modes through data mining - Application to U.S. national household travel survey data set

    Get PDF
    Data collection activities related to travel require large amounts of financial and human resources to be conducted successfully. When available resources are scarce, the information hidden in these data sets needs to be exploited, both to increase their added value and to gain support among decision makers not to discontinue such efforts. This study assessed the use of a data mining technique, association analysis, to understand better the patterns of mode use from the 2009 U.S. National Household Travel Survey. Only variables related to self-reported levels of use of the different transportation means are considered, along with those useful to the socioeconomic characterization of the respondents. Association rules potentially showed a substitution effect between cars and public transportation, in economic terms but such an effect was not observed between public transportation and nonmotorized modes (e.g., bicycling and walking). This effect was a policy-relevant finding, because transit marketing should be targeted to car drivers rather than to bikers or walkers for real improvement in the environmental performance of any transportation system. Given the competitive advantage of private modes extensively discussed in the literature, modal diversion from car to transit is seldom observed in practice. However, after such a factor was controlled, the results suggest that modal diversion should mainly occur from cars to transit rather than from nonmotorized modes to transi

    A rule dynamics approach to event detection in Twitter with its application to sports and politics

    Get PDF
    The increasing popularity of Twitter as social network tool for opinion expression as well as informa- tion retrieval has resulted in the need to derive computational means to detect and track relevant top- ics/events in the network. The application of topic detection and tracking methods to tweets enable users to extract newsworthy content from the vast and somehow chaotic Twitter stream. In this paper, we ap- ply our technique named Transaction-based Rule Change Mining to extract newsworthy hashtag keywords present in tweets from two different domains namely; sports (The English FA Cup 2012) and politics (US Presidential Elections 2012 and Super Tuesday 2012). Noting the peculiar nature of event dynamics in these two domains, we apply different time-windows and update rates to each of the datasets in order to study their impact on performance. The performance effectiveness results reveal that our approach is able to accurately detect and track newsworthy content. In addition, the results show that the adaptation of the time-window exhibits better performance especially on the sports dataset, which can be attributed to the usually shorter duration of football events

    Knowledge-Intensive Processes: Characteristics, Requirements and Analysis of Contemporary Approaches

    Get PDF
    Engineering of knowledge-intensive processes (KiPs) is far from being mastered, since they are genuinely knowledge- and data-centric, and require substantial flexibility, at both design- and run-time. In this work, starting from a scientific literature analysis in the area of KiPs and from three real-world domains and application scenarios, we provide a precise characterization of KiPs. Furthermore, we devise some general requirements related to KiPs management and execution. Such requirements contribute to the definition of an evaluation framework to assess current system support for KiPs. To this end, we present a critical analysis on a number of existing process-oriented approaches by discussing their efficacy against the requirements
    • 

    corecore