1,161 research outputs found

    BIG DATA CLASSIFICATION USING DECISION TREES ON THE CLOUD

    Get PDF
    This writing project addresses the topic of attempting to use machine learning on very large data sets on cloud servers. The project consists of two phases. The first being developing a machine learning system which will learn on the data provided by IBM for the “IBM Watson Great minds Challenge SJSU Pilot” competition and providing the best possible results on the evaluation data set, also provided by the IBM Watson team. This will serve as a basis for the second phase of the project, in which the objective is to move the machine learning system on to a cloud server, so that it may be used as a service by future students. The innovation in this project would be to use machine learning based data classification techniques on the cloud and solve a real world classification problem. The challenges involved would be first deploying and testing the classification algorithm that was developed in CS297 on the cloud. The project consists of not just the study of the different techniques of machine learning and its applications, but also involves identifying the algorithm and the environment which will be most suitable for this particular classification problem

    INGEN's advanced IT facilities: The least you need to know

    Get PDF
    The facilities described in this document were made possible in part through funding from Indiana University, the Indiana University Office of the Vice President for Information Technology, the State of Indiana, Shared University Research Grants from IBM, Inc., and from the Lilly Endowment through their support o f the Indiana Genomics Initiative. The Indiana Genomics Initiative (INGEN) of Indiana University is supported in part by Lilly Endowment Inc

    Development of an oceanographic application in HPC

    Get PDF
    High Performance Computing (HPC) is used for running advanced application programs efficiently, reliably, and quickly. In earlier decades, performance analysis of HPC applications was evaluated based on speed, scalability of threads, memory hierarchy. Now, it is essential to consider the energy or the power consumed by the system while executing an application. In fact, the High Power Consumption (HPC) is one of biggest problems for the High Performance Computing (HPC) community and one of the major obstacles for exascale systems design. The new generations of HPC systems intend to achieve exaflop performances and will demand even more energy to processing and cooling. Nowadays, the growth of HPC systems is limited by energy issues Recently, many research centers have focused the attention on doing an automatic tuning of HPC applications which require a wide study of HPC applications in terms of power efficiency. In this context, this paper aims to propose the study of an oceanographic application, named OceanVar, that implements Domain Decomposition based 4D Variational model (DD-4DVar), one of the most commonly used HPC applications, going to evaluate not only the classic aspects of performance but also aspects related to power efficiency in different case of studies. These work were realized at Bsc (Barcelona Supercomputing Center), Spain within the Mont-Blanc project, performing the test first on HCA server with Intel technology and then on a mini-cluster Thunder with ARM technology. In this work of thesis it was initially explained the concept of assimilation date, the context in which it is developed, and a brief description of the mathematical model 4DVAR. After this problem’s close examination, it was performed a porting from Matlab description of the problem of data-assimilation to its sequential version in C language. Secondly, after identifying the most onerous computational kernels in order of time, it has been developed a parallel version of the application with a parallel multiprocessor programming style, using the MPI (Message Passing Interface) protocol. The experiments results, in terms of performance, have shown that, in the case of running on HCA server, an Intel architecture, values of efficiency of the two most onerous functions obtained, growing the number of process, are approximately equal to 80%. In the case of running on ARM architecture, specifically on Thunder mini-cluster, instead, the trend obtained is labeled as "SuperLinear Speedup" and, in our case, it can be explained by a more efficient use of resources (cache memory access) compared with the sequential case. In the second part of this paper was presented an analysis of the some issues of this application that has impact in the energy efficiency. After a brief discussion about the energy consumption characteristics of the Thunder chip in technological landscape, through the use of a power consumption detector, the Yokogawa Power Meter, values of energy consumption of mini-cluster Thunder were evaluated in order to determine an overview on the power-to-solution of this application to use as the basic standard for successive analysis with other parallel styles. Finally, a comprehensive performance evaluation, targeted to estimate the goodness of MPI parallelization, is conducted using a suitable performance tool named Paraver, developed by BSC. Paraver is such a performance analysis and visualisation tool which can be used to analyse MPI, threaded or mixed mode programmes and represents the key to perform a parallel profiling and to optimise the code for High Performance Computing. A set of graphical representation of these statistics make it easy for a developer to identify performance problems. Some of the problems that can be easily identified are load imbalanced decompositions, excessive communication overheads and poor average floating operations per second achieved. Paraver can also report statistics based on hardware counters, which are provided by the underlying hardware. This project aimed to use Paraver configuration files to allow certain metrics to be analysed for this application. To explain in some way the performance trend obtained in the case of analysis on the mini-cluster Thunder, the tracks were extracted from various case of studies and the results achieved is what expected, that is a drastic drop of cache misses by the case ppn (process per node) = 1 to case ppn = 16. This in some way explains a more efficient use of cluster resources with an increase of the number of processes

    University Information Technology Services' Advanced IT Facilities: The least every researcher needs to know

    Get PDF
    This is an archived document containing instructions for using IU's advanced IT facilities ca. 2003. A version of this document updated in 2011 is available from http://hdl.handle.net/2022/13620. Further versions are forthcoming.This document is designed to be read as a printed document, and designed to permit anyone at all familiar with computers and the Internet to start at the beginning, get a general overview of UITS' advanced IT facilities and what they offer, and then read the detailed portions of the document that are of interest. In many cases, examples are provided, as well as directions on how to download sample files. And in some cases there is information that one is best off really not learning – for example the process of logging into IU's IBM supercomputer the first time involves setup steps that should be followed, keystroke by keystroke, from the directions presented herein, and then promptly forgotten. This document is intended to be a starting point, not a comprehensive guide. As such it should get any reader off to a good start, but then point the reader in the direction of consulting staff and online resources that will permit the reader to get additional help and information as needed. Most of all, this document is provided for the convenience of researchers, who may peruse this information at their leisure. Our hope and expectation is that consultants in UITS will provide extensive help and programming assistance to IU researchers who wish to make use of these excellent IT facilities.The facilities described in this document were made possible in part through funding from Indiana University, the Indiana University Office of the Vice President for Information Technology, the State of Indiana, Shared University Research Grants from IBM, Inc., the National Science Foundation under Grant No. 0116050 and Grant CDA- 9601632, and from the Lilly Endowment through their support of the Indiana Genomics Initiative. The Indiana Genomics Initiative (INGEN) of Indiana University is supported in part by Lilly Endowment Inc

    Parallel Computer Needs at Dartmouth College

    Get PDF
    To determine the need for a parallel computer on campus, a committee of the Graduate Program in Computer Science surveyed selected Dartmouth College faculty and students in December, 1991, and January, 1992. We hope that the information in this report can be used by many groups on campus, including the Computer Science graduate program and DAGS summer institute, Kiewit\u27s NH Supercomputer Initiative, and by numerous researchers hoping to collaborate with people in other disciplines. We found significant interest in parallel supercomputing on campus. An on-campus parallel supercomputing facility would not only support numerous courses and research projects, but would provide a locus for intellectual activity in parallel computing, encouraging interdisciplinary collaboration. We believe that this report is a first step in that direction

    Indiana University's Advanced Cyberinfrastructure

    Get PDF
    This is an archived document. The most current version may be found at http://pti.iu.edu/ciThe purpose of this document is to introduce researchers to Indiana University’s cyberinfrastructure – to clarify what these facilities make possible, to discuss how to use them and the professional staff available to work with you. The resources described here are complex and varied, among the most advanced in the world. The intended audience is anyone unfamiliar with IU’s cyberinfrastructure

    REU Site: Supercomputing Undergraduate Program in Maine (SuperMe)

    Get PDF
    This award, for a new Research Experience for Undergraduates (REU) site, builds a Supercomputing Undergraduate Program in Maine (SuperMe). This new site provides ten-week summer research experiences at the University of Maine (UMaine) for ten undergraduates each year for three years. With integrated expertise of ten faculty researchers from both computer systems and domain applications, SuperMe allows each undergraduate to conduct meaningful research, such as developing supercomputing techniques and tools, and solving cutting-edge research problems through parallel computing and scientific visualization. Besides being actively involved in research groups, students attend weekly seminars given by faculty mentors, formally report and present their research experiences and results, conduct field trips, and interact with ITEST, RET and GK-12 participants. SuperMe provides scientific exploration ranging from engineering to sciences with a coherent intellectual focus on supercomputing. It consists of four computer systems projects that aim to improve techniques in grid computing, parallel I/O data accesses, high-resolution scientific visualization and information security, and five computer modeling projects that utilize world-class supercomputing and visualization facilities housed at UMaine to perform large, complex simulation experiments and data analysis in different science domains. SuperMe provides a diversity of cutting-edge research opportunities to students from under-represented groups or from universities in rural areas with limited research opportunities. Through interacting directly with the participant of existing programs at UMaine, including ITEST, RET and GK-12, REU students disseminates their research results and experiences to middle and high school students and teachers. This site is co-funded by the Department of Defense in partnership with the NSF REU Site program

    Simulation of networks of spiking neurons: A review of tools and strategies

    Full text link
    We review different aspects of the simulation of spiking neural networks. We start by reviewing the different types of simulation strategies and algorithms that are currently implemented. We next review the precision of those simulation strategies, in particular in cases where plasticity depends on the exact timing of the spikes. We overview different simulators and simulation environments presently available (restricted to those freely available, open source and documented). For each simulation tool, its advantages and pitfalls are reviewed, with an aim to allow the reader to identify which simulator is appropriate for a given task. Finally, we provide a series of benchmark simulations of different types of networks of spiking neurons, including Hodgkin-Huxley type, integrate-and-fire models, interacting with current-based or conductance-based synapses, using clock-driven or event-driven integration strategies. The same set of models are implemented on the different simulators, and the codes are made available. The ultimate goal of this review is to provide a resource to facilitate identifying the appropriate integration strategy and simulation tool to use for a given modeling problem related to spiking neural networks.Comment: 49 pages, 24 figures, 1 table; review article, Journal of Computational Neuroscience, in press (2007
    • …
    corecore