117 research outputs found

    A Survey of Parallel Data Mining

    Get PDF
    With the fast, continuous increase in the number and size of databases, parallel data mining is a natural and cost-effective approach to tackle the problem of scalability in data mining. Recently there has been a considerable research on parallel data mining. However, most projects focus on the parallelization of a single kind of data mining algorithm/paradigm. This paper surveys parallel data mining with a broader perspective. More precisely, we discuss the parallelization of data mining algorithms of four knowledge discovery paradigms, namely rule induction, instance-based learning, genetic algorithms and neural networks. Using the lessons learned from this discussion, we also derive a set of heuristic principles for designing efficient parallel data mining algorithms

    Cluster Computing in the Classroom: Topics, Guidelines, and Experiences

    Get PDF
    With the progress of research on cluster computing, more and more universities have begun to offer various courses covering cluster computing. A wide variety of content can be taught in these courses. Because of this, a difficulty that arises is the selection of appropriate course material. The selection is complicated by the fact that some content in cluster computing is also covered by other courses such as operating systems, networking, or computer architecture. In addition, the background of students enrolled in cluster computing courses varies. These aspects of cluster computing make the development of good course material difficult. Combining our experiences in teaching cluster computing in several universities in the USA and Australia and conducting tutorials at many international conferences all over the world, we present prospective topics in cluster computing along with a wide variety of information sources (books, software, and materials on the web) from which instructors can choose. The course material described includes system architecture, parallel programming, algorithms, and applications. Instructors are advised to choose selected units in each of the topical areas and develop their own syllabus to meet course objectives. For example, a full course can be taught on system architecture for core computer science students. Or, a course on parallel programming could contain a brief coverage of system architecture and then devote the majority of time to programming methods. Other combinations are also possible. We share our experiences in teaching cluster computing and the topics we have chosen depending on course objectives

    Concurrent Probabilistic Simulation of High Temperature Composite Structural Response

    Get PDF
    A computational structural/material analysis and design tool which would meet industry's future demand for expedience and reduced cost is presented. This unique software 'GENOA' is dedicated to parallel and high speed analysis to perform probabilistic evaluation of high temperature composite response of aerospace systems. The development is based on detailed integration and modification of diverse fields of specialized analysis techniques and mathematical models to combine their latest innovative capabilities into a commercially viable software package. The technique is specifically designed to exploit the availability of processors to perform computationally intense probabilistic analysis assessing uncertainties in structural reliability analysis and composite micromechanics. The primary objectives which were achieved in performing the development were: (1) Utilization of the power of parallel processing and static/dynamic load balancing optimization to make the complex simulation of structure, material and processing of high temperature composite affordable; (2) Computational integration and synchronization of probabilistic mathematics, structural/material mechanics and parallel computing; (3) Implementation of an innovative multi-level domain decomposition technique to identify the inherent parallelism, and increasing convergence rates through high- and low-level processor assignment; (4) Creating the framework for Portable Paralleled architecture for the machine independent Multi Instruction Multi Data, (MIMD), Single Instruction Multi Data (SIMD), hybrid and distributed workstation type of computers; and (5) Market evaluation. The results of Phase-2 effort provides a good basis for continuation and warrants Phase-3 government, and industry partnership

    Beyond XSPEC: Towards Highly Configurable Analysis

    Full text link
    We present a quantitative comparison between software features of the defacto standard X-ray spectral analysis tool, XSPEC, and ISIS, the Interactive Spectral Interpretation System. Our emphasis is on customized analysis, with ISIS offered as a strong example of configurable software. While noting that XSPEC has been of immense value to astronomers, and that its scientific core is moderately extensible--most commonly via the inclusion of user contributed "local models"--we identify a series of limitations with its use beyond conventional spectral modeling. We argue that from the viewpoint of the astronomical user, the XSPEC internal structure presents a Black Box Problem, with many of its important features hidden from the top-level interface, thus discouraging user customization. Drawing from examples in custom modeling, numerical analysis, parallel computation, visualization, data management, and automated code generation, we show how a numerically scriptable, modular, and extensible analysis platform such as ISIS facilitates many forms of advanced astrophysical inquiry.Comment: Accepted by PASP, for July 2008 (15 pages

    An Object-Oriented Programming Environment for Parallel Genetic Algorithms

    Get PDF
    This thesis investigates an object-oriented programming environment for building parallel applications based on genetic algorithms (GAs). It describes the design of the Genetic Algorithms Manipulation Environment (GAME), which focuses on three major software development requirements: flexibility, expandability and portability. Flexibility is provided by GAME through a set of libraries containing pre-defined and parameterised components such as genetic operators and algorithms. Expandability is offered by GAME'S object-oriented design. It allows applications, algorithms and genetic operators to be easily modified and adapted to satisfy diverse problem's requirements. Lastly, portability is achieved through the use of the standard C++ language, and by isolating machine and operating system dependencies into low-level modules, which are hidden from the application developer by GAME'S application programming interfaces. The development of GAME is central to the Programming Environment for Applications of PArallel GENetic Algorithms project (PAPAGENA). This is the principal European Community (ESPRIT III) funded parallel genetic algorithms project. It has two main goals: to provide a general-purpose tool kit, supporting the development and analysis of large-scale parallel genetic algorithms (PGAs) applications, and to demonstrate the potential of applying evolutionary computing in diverse problem domains. The research reported in this thesis is divided in two parts: i) the analysis of GA models and the study of existing GA programming environments from an application developer perspective; ii) the description of a general-purpose programming environment designed to help with the development of GA and PGA-based computer programs. The studies carried out in the first part provide the necessary understanding of GAs' structure and operation to outline the requirements for the development of complex computer programs. The second part presents GAME as the result of combining development requirements, relevant features of existing environments and innovative ideas, into a powerful programming environment. The system is described in terms of its abstract data structures and sub-systems that allow the representation of problems independently of any particular GA model. GAME's programming model is also presented as general-purpose object-oriented framework for programming coarse-grained parallel applications. GAME has a modular architecture comprising five modules: the Virtual Machine, the Parallel Execution Module, the Genetic Libraries, the Monitoring Control Module, and the Graphic User Interface. GAME's genetic-oriented abstract data structures, and the Virtual Machine, isolates genetic operators and algorithms from low-level operations such as memory management, exception handling, etc. The Parallel Execution Module supports GAME's object- oriented parallel programming model. It defines an application programming interface and a runtime library that allow the same parallel application, created within the environment, to run on different hardware and operating system platforms. The Genetic Libraries outline a hierarchy of components implemented as parameterised versions of standard and custom genetic operators, algorithms and applications. The Monitoring Control Module supports dynamic control and monitoring of simulations, whereas the Graphic User Interface defines a basic framework and graphic 'widgets' for displaying and entering data. This thesis describes the design philosophy and rationale behind these modules, covering in more detail the Virtual Machine, the Parallel Execution Module and the Genetic Libraries. The assessment discusses the system's ability to satisfy the main requirements of GA and PGA software development, as well as the features that distinguish GAME from other programming environments
    corecore