3,720 research outputs found

    Data mining based cyber-attack detection

    Get PDF

    A case study on model driven data integration for data centric software development.

    Get PDF
    Model Driven Data Integration is a data integration approach that proactively incorporates and utilizes metadata across the data integration process. By decoupling data and metadata, MDDI drastically reduces complexity of data integration; whilst also providing an integrated standard development method, which is associated with Model Driven Architecture. This paper introduces a case study to adopt MDA technology as an MDDI framework for data centric software development; including data merging and data customization for data mining. A data merging model is also proposed to define relationships between different models at a conceptual level which is then transformed into a physical model. In this case study we collect and integrate historical data from various universities into the Data Warehouse system in order to develop student intervention services through data mining

    Quality Prediction in Interlinked Manufacturing Processes based on Supervised & Unsupervised Machine Learning

    Get PDF
    AbstractIn the context of a rolling mill case study, this paper presents a methodical framework based on data mining for predicting the physical quality of intermediate products in interlinked manufacturing processes. In the first part, implemented data preprocessing and feature extraction components of the Inline Quality Prediction System are introduced. The second part shows how the combination of supervised and unsupervised data mining methods can be applied to identify most striking operational patterns, promising quality-related features and production parameters. The results indicate how sustainable and energy-efficient interlinked manufacturing processes can be achieved by the application of data mining

    Towards Augmented MDM: Overview of Design and Function Areas – A Literature Review

    Get PDF
    Nowadays, the handling of data is of great importance for companies due to the increasing amount of data by digitalization. Time-consuming tasks in master data management (MDM) must be automated to provide data-driven business models with adequate data quality in real time and thus achieve higher data value. To increase the level of automation in companies, technologies as artificial intelligence are used and applied in information systems, including systems for MDM. The corresponding tasks can be summarized under the term augmented MDM. However, it is not entirely clear which of these processes can fall under the scope of augmented MDM. This paper presents a systematic literature review of 20 examined research articles published in four literature and conference databases to determine design areas and functions of augmented MDM. The findings are one design element “systems” with eleven functions and a proposed definition of terms related to augmented MDM

    A Dynamic Knowledge Management Framework for the High Value Manufacturing Industry

    Get PDF
    Dynamic Knowledge Management (KM) is a combination of cultural and technological factors, including the cultural factors of people and their motivations, technological factors of content and infrastructure and, where these both come together, interface factors. In this paper a Dynamic KM framework is described in the context of employees being motivated to create profit for their company through product development in high value manufacturing. It is reported how the framework was discussed during a meeting of the collaborating company’s (BAE Systems) project stakeholders. Participants agreed the framework would have most benefit at the start of the product lifecycle before key decisions were made. The framework has been designed to support organisational learning and to reward employees that improve the position of the company in the market place

    Towards information profiling: data lake content metadata management

    Get PDF
    There is currently a burst of Big Data (BD) processed and stored in huge raw data repositories, commonly called Data Lakes (DL). These BD require new techniques of data integration and schema alignment in order to make the data usable by its consumers and to discover the relationships linking their content. This can be provided by metadata services which discover and describe their content. However, there is currently a lack of a systematic approach for such kind of metadata discovery and management. Thus, we propose a framework for the profiling of informational content stored in the DL, which we call information profiling. The profiles are stored as metadata to support data analysis. We formally define a metadata management process which identifies the key activities required to effectively handle this.We demonstrate the alternative techniques and performance of our process using a prototype implementation handling a real-life case-study from the OpenML DL, which showcases the value and feasibility of our approach.Peer ReviewedPostprint (author's final draft

    PRESISTANT: Learning based assistant for data pre-processing

    Get PDF
    Data pre-processing is one of the most time consuming and relevant steps in a data analysis process (e.g., classification task). A given data pre-processing operator (e.g., transformation) can have positive, negative or zero impact on the final result of the analysis. Expert users have the required knowledge to find the right pre-processing operators. However, when it comes to non-experts, they are overwhelmed by the amount of pre-processing operators and it is challenging for them to find operators that would positively impact their analysis (e.g., increase the predictive accuracy of a classifier). Existing solutions either assume that users have expert knowledge, or they recommend pre-processing operators that are only "syntactically" applicable to a dataset, without taking into account their impact on the final analysis. In this work, we aim at providing assistance to non-expert users by recommending data pre-processing operators that are ranked according to their impact on the final analysis. We developed a tool PRESISTANT, that uses Random Forests to learn the impact of pre-processing operators on the performance (e.g., predictive accuracy) of 5 different classification algorithms, such as J48, Naive Bayes, PART, Logistic Regression, and Nearest Neighbor. Extensive evaluations on the recommendations provided by our tool, show that PRESISTANT can effectively help non-experts in order to achieve improved results in their analytical tasks
    • …
    corecore