42,637 research outputs found

    Privacy Violation and Detection Using Pattern Mining Techniques

    Get PDF
    Privacy, its violations and techniques to bypass privacy violation have grabbed the centre-stage of both academia and industry in recent months. Corporations worldwide have become conscious of the implications of privacy violation and its impact on them and to other stakeholders. Moreover, nations across the world are coming out with privacy protecting legislations to prevent data privacy violations. Such legislations however expose organizations to the issues of intentional or unintentional violation of privacy data. A violation by either malicious external hackers or by internal employees can expose the organizations to costly litigations. In this paper, we propose PRIVDAM; a data mining based intelligent architecture of a Privacy Violation Detection and Monitoring system whose purpose is to detect possible privacy violations and to prevent them in the future. Experimental evaluations show that our approach is scalable and robust and that it can detect privacy violations or chances of violations quite accurately. Please contact the author for full text at [email protected]

    Runtime Optimizations for Prediction with Tree-Based Models

    Full text link
    Tree-based models have proven to be an effective solution for web ranking as well as other problems in diverse domains. This paper focuses on optimizing the runtime performance of applying such models to make predictions, given an already-trained model. Although exceedingly simple conceptually, most implementations of tree-based models do not efficiently utilize modern superscalar processor architectures. By laying out data structures in memory in a more cache-conscious fashion, removing branches from the execution flow using a technique called predication, and micro-batching predictions using a technique called vectorization, we are able to better exploit modern processor architectures and significantly improve the speed of tree-based models over hard-coded if-else blocks. Our work contributes to the exploration of architecture-conscious runtime implementations of machine learning algorithms

    Customer profiles:extracting usage models from log files

    Get PDF
    The project "Customer Profiles" is executed under supervision of the Embedded System Innovation by TNO (TNO-ESI) at ASML. The project was a full-time, nine-month graduation assignment in the context of a post-master program in Software Technology offered by the Eindhoven University of Technology. The project goal was to obtain insight into the actual usage of systems by analyzing log files. The project resulted with a prototype, a portable architecture, domain analysis, and suggestions how to improve the process of extracting customer profiles. The most important project artifact is the prototype that shows the feasibility of applying process mining and resources tracing techniques to obtain insight into the actual usage of a system by analyzing log files. The prototype supports set of different activities such as: data collection, data preprocessing, information extraction, and information aggregation that work together to obtain a customer profile model that express the typical and atypical behavior of the participants in production environment as captured in the log files, which defines the prototype output. The validation phase has shown that the prototype output exceeds the stakeholders' expectations. ASML profited from the prototype output and TNO-ESI will reuse the approach for different customers. The success of the prototype output lead to a new requirement: a portable system architecture. Therefore, as a part of the project a portable system architecture that supports extracting customer profiles was designed. The architecture is based on the Pipes and Filters architectural pattern. The system architecture and design are a result of a broad architectural and system analysis, which balances between the stakeholder requirements and the most common practices in the software architecture and software development. As a part of the architecture, components that support different functionalities such as: Data Source, Event Parser, Event Enricher, and Event Combiner were designed. A lot of domain knowledge was gained during the project. The domain knowledge was transformed into a comprehensive domain analysis. The domain analysis contains the most common aspects of applying process mining for extracting customer profiles such us: mapping issues, missing information, and the minimal log data requirements. As a part of the domain analysis an evaluation of the process mining algorithms was performed. The evaluation showed that the heuristics miner and the genetic miner are the most appropriate process mining algorithms for extracting customer profiles. In order to improve the process of extracting customer profiles a list of suggestions was created. The suggestions focus on the most common problems in the logging infrastructures and in the process mining techniques. One of the suggestions is conscious manufacturer decision on the log file content. The manufacturer should define the ratio, the context (based on the minimal log data requirements), and the scope of the logging infrastructure. Another important suggestion for the logging infrastructure is having unique identifiers across the entire logging domain. The next suggestion advocates logging infrastructure on use case (end user activity) level. The last, but not the least suggestion is consistent accurate and standardized timestamp in the logging infrastructure. During the project experiments it was detected that the maturity level of the process mining tools is not on an appropriate level for industrial usage

    An efficient parallel method for mining frequent closed sequential patterns

    Get PDF
    Mining frequent closed sequential pattern (FCSPs) has attracted a great deal of research attention, because it is an important task in sequences mining. In recently, many studies have focused on mining frequent closed sequential patterns because, such patterns have proved to be more efficient and compact than frequent sequential patterns. Information can be fully extracted from frequent closed sequential patterns. In this paper, we propose an efficient parallel approach called parallel dynamic bit vector frequent closed sequential patterns (pDBV-FCSP) using multi-core processor architecture for mining FCSPs from large databases. The pDBV-FCSP divides the search space to reduce the required storage space and performs closure checking of prefix sequences early to reduce execution time for mining frequent closed sequential patterns. This approach overcomes the problems of parallel mining such as overhead of communication, synchronization, and data replication. It also solves the load balance issues of the workload between the processors with a dynamic mechanism that re-distributes the work, when some processes are out of work to minimize the idle CPU time.Web of Science5174021739
    • …
    corecore