171,335 research outputs found

    Distributed data mining in grid computing environments

    Get PDF
    The official published version of this article can be found at the link below.The computing-intensive data mining for inherently Internet-wide distributed data, referred to as Distributed Data Mining (DDM), calls for the support of a powerful Grid with an effective scheduling framework. DDM often shares the computing paradigm of local processing and global synthesizing. It involves every phase of Data Mining (DM) processes, which makes the workflow of DDM very complex and can be modelled only by a Directed Acyclic Graph (DAG) with multiple data entries. Motivated by the need for a practical solution of the Grid scheduling problem for the DDM workflow, this paper proposes a novel two-phase scheduling framework, including External Scheduling and Internal Scheduling, on a two-level Grid architecture (InterGrid, IntraGrid). Currently a DM IntraGrid, named DMGCE (Data Mining Grid Computing Environment), has been developed with a dynamic scheduling framework for competitive DAGs in a heterogeneous computing environment. This system is implemented in an established Multi-Agent System (MAS) environment, in which the reuse of existing DM algorithms is achieved by encapsulating them into agents. Practical classification problems from oil well logging analysis are used to measure the system performance. The detailed experiment procedure and result analysis are also discussed in this paper

    Data quality assurance and performance measurement of data mining for preventive maintenance of power grid

    Get PDF
    Ensuring reliability as the electrical grid morphs into the "smart grid" will require innovations in how we assess the state of the grid, for the purpose of proactive maintenance, rather than reactive maintenance; in the future, we will not only react to failures, but also try to anticipate and avoid them using predictive modeling (machine learning and data mining) techniques. To help in meeting this challenge, we present the Neutral Online Visualization-aided Autonomic evaluation framework (NOVA) for evaluating machine learning and data mining algorithms for preventive maintenance on the electrical grid. NOVA has three stages provided through a unified user interface: evaluation of input data quality, evaluation of machine learning and data mining results, and evaluation of the reliability improvement of the power grid. A prototype version of NOVA has been deployed for the power grid in New York City, and it is able to evaluate machine learning and data mining systems effectively and efficiently

    Heterogeneous data source integration for smart grid ecosystems based on metadata mining

    Get PDF
    The arrival of new technologies related to smart grids and the resulting ecosystem of applications andmanagement systems pose many new problems. The databases of the traditional grid and the variousinitiatives related to new technologies have given rise to many different management systems with several formats and different architectures. A heterogeneous data source integration system is necessary toupdate these systems for the new smart grid reality. Additionally, it is necessary to take advantage of theinformation smart grids provide. In this paper, the authors propose a heterogeneous data source integration based on IEC standards and metadata mining. Additionally, an automatic data mining framework isapplied to model the integrated information.Ministerio de EconomĂ­a y Competitividad TEC2013-40767-

    A catallactic market for data mining services.

    Get PDF
    We describe a Grid market for exchanging data mining services based on the Catallactic market mechanism proposed by von Hayek. This market mechanism allows selection between multiple instances of services based on operations required in a data mining task (such as data migration, data pre-processing and subsequently data analysis). Catallaxy is a decentralized approach, based on a “free market” mechanism, and is particularly useful when the number of market participants is large or when conditions within the market often change. It is therefore particularly suitable in Grid and peer-2-peer systems. The approach assumes that the service provider and user are not co-located, and require multiple message exchanges to carry out a data mining task. A market of J48-based decision tree algorithm instances, each implemented as a Web service, is used to demonstrate our approach. We have validated the feasibility of building catallactic data mining grid applications, and implemented a proof-of-concept application (Cat-COVITE) mapped to a Catallactic Grid Middleware.Peer Reviewe

    A neural network for mining large volumes of time series data

    Get PDF
    Efficiently mining large volumes of time series data is amongst the most challenging problems that are fundamental in many fields such as industrial process monitoring, medical data analysis and business forecasting. This paper discusses a high-performance neural network for mining large time series data set and some practical issues on time series data mining. Examples of how this technology is used to search the engine data within a major UK eScience Grid project (DAME) for supporting the maintenance of Rolls-Royce aero-engine are presented

    The Use of Grid Storage Protocols for Healthcare Applications

    Get PDF
    Grid computing has attracted worldwide attention for a variety of domains. Healthcare projects focus on data mining and standardization techniques, the issue of data accessibility and transparency over the storage systems on the Grid has seldom been tackled. In this position paper, we identify the key issues and requirements imposed by Healthcare applications and point out how Grid Storage Technology can be used to satisfy those requirements. The main contribution of this work is the identification of the characteristics and protocols that make Grid Storage technology attractive for building a Healthcare data storage infrastructure
    • …
    corecore