1,161 research outputs found
BIG DATA CLASSIFICATION USING DECISION TREES ON THE CLOUD
This writing project addresses the topic of attempting to use machine learning on very large data sets on cloud servers. The project consists of two phases. The first being developing a machine learning system which will learn on the data provided by IBM for the “IBM Watson Great minds Challenge SJSU Pilot” competition and providing the best possible results on the evaluation data set, also provided by the IBM Watson team. This will serve as a basis for the second phase of the project, in which the objective is to move the machine learning system on to a cloud server, so that it may be used as a service by future students. The innovation in this project would be to use machine learning based data classification techniques on the cloud and solve a real world classification problem. The challenges involved would be first deploying and testing the classification algorithm that was developed in CS297 on the cloud. The project consists of not just the study of the different techniques of machine learning and its applications, but also involves identifying the algorithm and the environment which will be most suitable for this particular classification problem
INGEN's advanced IT facilities: The least you need to know
The facilities described in this document were made possible in part through funding from Indiana University, the Indiana University Office of the Vice President for
Information Technology, the State of Indiana, Shared University Research Grants from IBM, Inc., and from the Lilly Endowment through their support o f the Indiana Genomics Initiative. The Indiana Genomics Initiative (INGEN) of Indiana University is supported in part by Lilly Endowment Inc
Development of an oceanographic application in HPC
High Performance Computing (HPC) is used for running advanced application programs
efficiently, reliably, and quickly.
In earlier decades, performance analysis of HPC applications was evaluated based on
speed, scalability of threads, memory hierarchy. Now, it is essential to consider the
energy or the power consumed by the system while executing an application.
In fact, the High Power Consumption (HPC) is one of biggest problems for the High
Performance Computing (HPC) community and one of the major obstacles for exascale
systems design.
The new generations of HPC systems intend to achieve exaflop performances and will
demand even more energy to processing and cooling. Nowadays, the growth of HPC
systems is limited by energy issues
Recently, many research centers have focused the attention on doing an automatic tuning
of HPC applications which require a wide study of HPC applications in terms of power
efficiency.
In this context, this paper aims to propose the study of an oceanographic application,
named OceanVar, that implements Domain Decomposition based 4D Variational model
(DD-4DVar), one of the most commonly used HPC applications, going to evaluate not
only the classic aspects of performance but also aspects related to power efficiency in
different case of studies.
These work were realized at Bsc (Barcelona Supercomputing Center), Spain within the
Mont-Blanc project, performing the test first on HCA server with Intel technology and then on a mini-cluster Thunder with ARM technology.
In this work of thesis it was initially explained the concept of assimilation date, the
context in which it is developed, and a brief description of the mathematical model
4DVAR.
After this problem’s close examination, it was performed a porting from Matlab
description of the problem of data-assimilation to its sequential version in C language.
Secondly, after identifying the most onerous computational kernels in order of time, it
has been developed a parallel version of the application with a parallel multiprocessor
programming style, using the MPI (Message Passing Interface) protocol.
The experiments results, in terms of performance, have shown that, in the case of
running on HCA server, an Intel architecture, values of efficiency of the two most
onerous functions obtained, growing the number of process, are approximately equal to
80%.
In the case of running on ARM architecture, specifically on Thunder mini-cluster,
instead, the trend obtained is labeled as "SuperLinear Speedup" and, in our case, it can
be explained by a more efficient use of resources (cache memory access) compared with
the sequential case.
In the second part of this paper was presented an analysis of the some issues of this
application that has impact in the energy efficiency.
After a brief discussion about the energy consumption characteristics of the Thunder
chip in technological landscape, through the use of a power consumption detector, the
Yokogawa Power Meter, values of energy consumption of mini-cluster Thunder were
evaluated in order to determine an overview on the power-to-solution of this application
to use as the basic standard for successive analysis with other parallel styles.
Finally, a comprehensive performance evaluation, targeted to estimate the goodness of
MPI parallelization, is conducted using a suitable performance tool named Paraver,
developed by BSC.
Paraver is such a performance analysis and visualisation tool which can be used to
analyse MPI, threaded or mixed mode programmes and represents the key to perform a parallel profiling and to optimise the code for High Performance Computing.
A set of graphical representation of these statistics make it easy for a developer to
identify performance problems. Some of the problems that can be easily identified are
load imbalanced decompositions, excessive communication overheads and poor average
floating operations per second achieved.
Paraver can also report statistics based on hardware counters, which are provided by the
underlying hardware.
This project aimed to use Paraver configuration files to allow certain metrics to be
analysed for this application.
To explain in some way the performance trend obtained in the case of analysis on the
mini-cluster Thunder, the tracks were extracted from various case of studies and the
results achieved is what expected, that is a drastic drop of cache misses by the case ppn
(process per node) = 1 to case ppn = 16.
This in some way explains a more efficient use of cluster resources with an increase of
the number of processes
University Information Technology Services' Advanced IT Facilities: The least every researcher needs to know
This is an archived document containing instructions for using IU's advanced IT facilities ca. 2003. A version of this document updated in 2011 is available from http://hdl.handle.net/2022/13620. Further versions are forthcoming.This document is designed to be read as a printed document, and designed to permit anyone at all familiar with computers and the Internet to start at the beginning, get a general overview of UITS' advanced IT facilities and what they offer, and then read the detailed portions of the document that are of interest. In many cases, examples are provided, as well as directions on how to download sample files. And in some cases there is information that one is best off really not learning – for example the process of logging into IU's IBM supercomputer the first time involves setup steps that should be followed, keystroke by keystroke, from the directions presented herein, and then promptly forgotten.
This document is intended to be a starting point, not a comprehensive guide. As such it should get any reader off to a good start, but then point the reader in the direction of consulting staff and online resources that will permit the reader to get additional help and information as needed.
Most of all, this document is provided for the convenience of researchers, who may peruse this information at their leisure. Our hope and expectation is that consultants in UITS will provide extensive help and programming assistance to IU researchers who wish to make use of these excellent IT facilities.The facilities described in this document were made possible in part through funding from Indiana University, the Indiana University Office of the Vice President for Information Technology, the State of Indiana, Shared University Research Grants from IBM, Inc., the National Science Foundation under Grant No. 0116050 and Grant CDA- 9601632, and from the Lilly Endowment through their support of the Indiana Genomics Initiative. The Indiana Genomics Initiative (INGEN) of Indiana University is supported in part by Lilly Endowment Inc
Parallel Computer Needs at Dartmouth College
To determine the need for a parallel computer on campus, a committee of the Graduate Program in Computer Science surveyed selected Dartmouth College faculty and students in December, 1991, and January, 1992. We hope that the information in this report can be used by many groups on campus, including the Computer Science graduate program and DAGS summer institute, Kiewit\u27s NH Supercomputer Initiative, and by numerous researchers hoping to collaborate with people in other disciplines.
We found significant interest in parallel supercomputing on campus. An on-campus parallel supercomputing facility would not only support numerous courses and research projects, but would provide a locus for intellectual activity in parallel computing, encouraging interdisciplinary collaboration. We believe that this report is a first step in that direction
Indiana University's Advanced Cyberinfrastructure
This is an archived document. The most current version may be found at http://pti.iu.edu/ciThe purpose of this document is to introduce researchers to Indiana University’s cyberinfrastructure – to clarify what these facilities make possible, to discuss how to use them and the professional staff available to work with you. The resources described here are complex and varied, among the most advanced in the world. The intended audience is anyone unfamiliar with IU’s cyberinfrastructure
REU Site: Supercomputing Undergraduate Program in Maine (SuperMe)
This award, for a new Research Experience for Undergraduates (REU) site, builds a Supercomputing Undergraduate Program in Maine (SuperMe). This new site provides ten-week summer research experiences at the University of Maine (UMaine) for ten undergraduates each year for three years. With integrated expertise of ten faculty researchers from both computer systems and domain applications, SuperMe allows each undergraduate to conduct meaningful research, such as developing supercomputing techniques and tools, and solving cutting-edge research problems through parallel computing and scientific visualization. Besides being actively involved in research groups, students attend weekly seminars given by faculty mentors, formally report and present their research experiences and results, conduct field trips, and interact with ITEST, RET and GK-12 participants. SuperMe provides scientific exploration ranging from engineering to sciences with a coherent intellectual focus on supercomputing. It consists of four computer systems projects that aim to improve techniques in grid computing, parallel I/O data accesses, high-resolution scientific visualization and information security, and five computer modeling projects that utilize world-class supercomputing and visualization facilities housed at UMaine to perform large, complex simulation experiments and data analysis in different science domains. SuperMe provides a diversity of cutting-edge research opportunities to students from under-represented groups or from universities in rural areas with limited research opportunities. Through interacting directly with the participant of existing programs at UMaine, including ITEST, RET and GK-12, REU students disseminates their research results and experiences to middle and high school students and teachers. This site is co-funded by the Department of Defense in partnership with the NSF REU Site program
Simulation of networks of spiking neurons: A review of tools and strategies
We review different aspects of the simulation of spiking neural networks. We
start by reviewing the different types of simulation strategies and algorithms
that are currently implemented. We next review the precision of those
simulation strategies, in particular in cases where plasticity depends on the
exact timing of the spikes. We overview different simulators and simulation
environments presently available (restricted to those freely available, open
source and documented). For each simulation tool, its advantages and pitfalls
are reviewed, with an aim to allow the reader to identify which simulator is
appropriate for a given task. Finally, we provide a series of benchmark
simulations of different types of networks of spiking neurons, including
Hodgkin-Huxley type, integrate-and-fire models, interacting with current-based
or conductance-based synapses, using clock-driven or event-driven integration
strategies. The same set of models are implemented on the different simulators,
and the codes are made available. The ultimate goal of this review is to
provide a resource to facilitate identifying the appropriate integration
strategy and simulation tool to use for a given modeling problem related to
spiking neural networks.Comment: 49 pages, 24 figures, 1 table; review article, Journal of
Computational Neuroscience, in press (2007
- …