298 research outputs found

    Distributed computing methodology for training neural networks in an image-guided diagnostic application

    Get PDF
    Distributed computing is a process through which a set of computers connected by a network is used collectively to solve a single problem. In this paper, we propose a distributed computing methodology for training neural networks for the detection of lesions in colonoscopy. Our approach is based on partitioning the training set across multiple processors using a parallel virtual machine. In this way, interconnected computers of varied architectures can be used for the distributed evaluation of the error function and gradient values, and, thus, training neural networks utilizing various learning methods. The proposed methodology has large granularity and low synchronization, and has been implemented and tested. Our results indicate that the parallel virtual machine implementation of the training algorithms developed leads to considerable speedup, especially when large network architectures and training sets are used

    A Survey of Parallel Data Mining

    Get PDF
    With the fast, continuous increase in the number and size of databases, parallel data mining is a natural and cost-effective approach to tackle the problem of scalability in data mining. Recently there has been a considerable research on parallel data mining. However, most projects focus on the parallelization of a single kind of data mining algorithm/paradigm. This paper surveys parallel data mining with a broader perspective. More precisely, we discuss the parallelization of data mining algorithms of four knowledge discovery paradigms, namely rule induction, instance-based learning, genetic algorithms and neural networks. Using the lessons learned from this discussion, we also derive a set of heuristic principles for designing efficient parallel data mining algorithms

    A high-speed linear algebra library with automatic parallelism

    Get PDF
    Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market

    Aspect ratio distribution and chord length distribution driven modeling of crystallization of two-dimensional crystals for real-time model-based applications

    Get PDF
    Two-dimensional (2D) crystals, for which the shape is described by two linear sizes, are common in fine chemical and pharmaceutical industries. Since the crystal size and shape are directly related to the performance of active pharmaceutical ingredients, the simultaneous size and shape distribution control is of paramount importance in pharmaceutical crystallization engineering. To efficiently achieve simultaneous size and shape control often requires model-based control strategies; however, the increased computational cost of the process simulation and the substantial differences between the simulated and measurable quantities make the implementation of model-based control approaches challenging. This paper addresses the important problem of the real-time simulation of the most likely measurable chord length distribution (CLD) and aspect ratio distribution (ARD) as well as the concentration variations during the crystallization of 2D needle-shaped crystals. This enables the application of focused beam reflectance measurement (FBRM) and particle vision and microscopy (PVM), two routinely applied probes, as quantitative direct feedback control tools. Artificial neural network (ANN)-based FBRM and PVM soft-sensors are developed, which enable the direct and fast transformation of 2D crystal size distribution (CSD) to CLD and ARD on arbitrary 2D grids. The training data for the ANN are generated by a first principle, geometrical model-based simulation of FBRM and PVM for high aspect ratio crystals, although the ANN approach is applicable for any simulated or experimental training data sets. It is also demonstrated that the in situ imaging-based shape measurement underestimates the real aspect ratio (AR) of crystals, for which a simple correction is proposed. From the model-equation solution perspective, the soft-sensors require full population balance solution. The 2D high-resolution finite volume method is applied to simulate the full 2D CSD, which is an accurate, stable, but computationally expensive technique. The real-time applicability is achieved through various implementation improvements including grid optimization and data-type optimized hybrid central processing unit-graphical processing unit calculations

    Parallel recognition and classification of objects

    Get PDF
    The development of parallel algorithms for an automatic recognition and classification of objects from an industrial line (either production or packaging) is presented. This kind of problem introduces a temporal restriction on images processing, a parallel resolution being therefore required. We have chosen simple objects (fruits, eggs, etc.), which are classified according to characteristics such as shape, color, size, defects (stains, loss of color), etc. By means of this classification, objects can be sent, for example, to different sectors of the line. Algorithms parallelization on a heterogeneous computers network with a PVM (Parallel Virtual Machine) support is studied in this paper. Finally, some quantitative results obtained from the application of the algorithm on a representative sample of real images are presented.Facultad de Informátic

    Software framework for geophysical data processing, visualization and code development

    Get PDF
    IGeoS is an integrated open-source software framework for geophysical data processing under development at the UofS seismology group. Unlike other systems, this processing monitor supports structured multicomponent seismic data streams, multidimensional data traces, and employs a unique backpropagation execution logic. This results in an unusual flexibility of processing, allowing the system to handle nearly any geophysical data. In this project, a modern and feature-rich Graphical User Interface (GUI) was developed for the system, allowing editing and submission of processing flows and interaction with running jobs. Multiple jobs can be executed in a distributed multi-processor networks and controlled from the same GUI. Jobs, in their turn, can also be parallelized to take advantage of parallel processing environments such as local area networks and Beowulf clusters. A 3D/2D interactive display server was created and integrated with the IGeoS geophysical data processing framework. With introduction of this major component, the IGeoS system becomes conceptually complete and potentially bridges the gap between the traditional processing and interpretation software. Finally, in a specialized application, network acquisition and relay components were written allowing IGeoS to be used for real-time applications. The completion of this functionality makes the processing and display capabilities of IGeoS available to multiple streams of seismic data from potentially remote sites. Seismic data can be acquired, transferred to the central server, processed, archived, and events picked and placed in database completely automatically

    Integration of a big data emerging on large sparse simulation and its application on green computing platform

    Get PDF
    The process of analyzing large data and verifying a big data set are a challenge for understanding the fundamental concept behind it. Many big data analysis techniques suffer from the poor scalability, variation inequality, instability, lower convergence, and weak accuracy of the large-scale numerical algorithms. Due to these limitations, a wider opportunity for numerical analysts to develop the efficiency and novel parallel algorithms has emerged. Big data analytics plays an important role in the field of sciences and engineering for extracting patterns, trends, actionable information from large sets of data and improving strategies for making a decision. A large data set consists of a large-scale data collection via sensor network, transformation from signal to digital images, high resolution of a sensing system, industry forecasts, existing customer records to predict trends and prepare for new demand. This paper proposes three types of big data analytics in accordance to the analytics requirement involving a large-scale numerical simulation and mathematical modeling for solving a complex problem. First is a big data analytics for theory and fundamental of nanotechnology numerical simulation. Second, big data analytics for enhancing the digital images in 3D visualization, performance analysis of embedded system based on the large sparse data sets generated by the device. Lastly, extraction of patterns from the electroencephalogram (EEG) data set for detecting the horizontal-vertical eye movements. Thus, the process of examining a big data analytics is to investigate the behavior of hidden patterns, unknown correlations, identify anomalies, and discover structure inside unstructured data and extracting the essence, trend prediction, multi-dimensional visualization and real-time observation using the mathematical model. Parallel algorithms, mesh generation, domain-function decomposition approaches, inter-node communication design, mapping the subdomain, numerical analysis and parallel performance evaluations (PPE) are the processes of the big data analytics implementation. The superior of parallel numerical methods such as AGE, Brian and IADE were proven for solving a large sparse model on green computing by utilizing the obsolete computers, the old generation servers and outdated hardware, a distributed virtual memory and multi-processors. The integration of low-cost communication of message passing software and green computing platform is capable of increasing the PPE up to 60% when compared to the limited memory of a single processor. As a conclusion, large-scale numerical algorithms with great performance in scalability, equality, stability, convergence, and accuracy are important features in analyzing big data simulation
    corecore