154 research outputs found
Computación de alto desempeño en GPU
Este libro es el resultado del trabajo de investigación sobre las características de la GPU y su adopción como arquitectura masivamente paralela para aplicaciones de propósito general. Su propósito es transformarse en una herramienta útil para guiar los primeros pasos de aquellos que se inician en la computación de alto desempeños en GPU. Pretende resumir el estado del arte considerando la bibliografía propuesta.
El objetivo no es solamente describir la arquitectura many-core de la GPU y la herramienta de programación CUDA, sino también conducir al lector hacia el desarrollo de programas con buen desempeño.
El libro se estructura de la siguiente manera:
Capítulo 1: se detallan los conceptos básicos y generales de la computación de alto rendimiento, presentes en el resto del texto.
Capítulo 2: describe las características de la arquitectura de la GPU y su evolución histórica. En ambos casos realizando una comparación con la CPU. Finalmente detalla la evolución de la GPU como co-procesador para el desarrollo de aplicaciones de propósito general.
Capítulo 3: este capítulo contiene los lineamientos básicos del modelo de programación asociado a CUDA. CUDA provee una interfaz para la comunicación CPU-GPU y la administración de los threads. También se describe las características del modelo de ejecución SIMT asociado.
Capítulo 4: analiza las propiedades generales y básicas de la jerarquía de memoria de la GPU, describiendo las propiedades de cada una, la forma de uso y sus ventajas y desventajas.
Capítulo 5: comprende un análisis de los diferentes aspectos a tener en cuenta para resolver aplicaciones con buena performance. La programación de GPU con CUDA no es una mera transcripción de un código secuencial a un código paralelo, es necesario tener en cuenta diferentes aspectos para usar de manera eficiente la arquitectura y llevar a cabo una buena programación.
Finalmente se incluyen tres apéndices. En el primero se describen los calificadores, tipos y funciones básicos de CUDA, el segundo detalla algunas herramientas simples de la biblioteca cutil.h para el control de la programación en CUDA. El último apéndice describe las capacidades de cómputo de CUDA para las distintas GPU existentes, listando los modelos reales que las poseen.XV Escuela Internacional de Informática, realizada durante el XVII Congreso Argentino de Ciencia de la Computación (CACIC 2011).Red de Universidades con Carreras en Informática (RedUNCI
High Performance Computing in GPU
Through this textbook (written in Spanish), the author introduces the GPU as a parallel computer that is able to solve general-propose problems. The book has a didactic approach, it includes the first steps to CUDA programming, some tips to take account to develop good applications performance, many examples and exercises.Es revisión de: http://sedici.unlp.edu.ar/handle/10915/18404Facultad de Informátic
Parallel backpropagation neural networks forTask allocation by means of PVM
Features such as fast response, storage efficiency, fault tolerance and graceful degradation in face of scarce or spurious inputs make neural networks appropriate tools for Intelligent Computer Systems. A neural network is, by itself, an inherently parallel system where many, extremely simple, processing units work simultaneously in the same problem building up a computational device which possess adaptation (learning) and generalisation (recognition) abilities. Implementation of neural networks roughly involve at least three stages; design, training and testing. The second, being CPU intensive, is the one requiring most of the processing resources and depending on size and structure complexity the learning process can be extremely long. Thus, great effort has been done to develop parallel implementations intended for a reduction of learning time.
Pattern partitioning is an approach to parallelise neural networks where the whole net is replicated in different processors and the weight changes owing to diverse training patterns are parallelised. This approach is the most suitable for a distributed architecture such as the one considered here. Incoming task allocation, as a previous step, is a fundamental service aiming for improving distributed system performance facilitating further dynamic load balancing. A Neural Network Device inserted into the kernel of a distributed system as an intelligent tool, allows to achieve automatic allocation of execution requests under some predefined performance criteria based on resource availability and incoming process requirements.
This paper being, a twofold proposal, shows firstly, some design and implementation insights to build a system where decision support for load distribution is based on a neural network device and secondly a distributed implementation to provide parallel learning of neural networks using a pattern partitioning approach.
In the latter case, some performance results of the parallelised approach for learning of backpropagation neural networks, are shown. This include a comparison of recall and generalisation abilities and speed-up when using a socket interface or PVM.Sistemas InteligentesRed de Universidades con Carreras en Informática (RedUNCI
Generic communication in parallel computation
The design of parallel programs requires fancy solutions that are not present in sequential programming. Thus, a designer of parallel applications is concerned with the problem of ensuring the correct behavior of all the processes that the program comprises. There are different solutions to each problem, but the question is to find one, that is general. One possibility is allowing the use of asynchronous groups of processors. We present a general methodology to derive efficient parallel divide and conquer algorithms. Algorithms belonging to this class allow the arbitrary division of the processor subsets, easing the opportunities of the underlying software to divide the network in independent sub networks, minimizing the impact of the traffic in the rest of the network in the predicted cost. This methodology is defined by OTMP model and its expressiveness is exemplified through three divide and conquer programs.Eje: IV - Workshop de procesamiento distribuido y paraleloRed de Universidades con Carreras en Informática (RedUNCI
Skeletal parallel programming
In the last time the high-performance programming community has worked to look for new templates or skeletons for several parallel programming paradigms. This new form of programming allows to programmer to reduce the time of development, since it saves time in the phase of design, testing and codification. We are concerned in some issues of skeletons that are fundamental to the definition of any skeletal parallel programming system. This paper present commentaries about these issues in the context of three types of D&C skeletons.Eje: Procesamiento Concurrente, Paralelo y DistribuidoRed de Universidades con Carreras en Informática (RedUNCI
Estrategias de pre-procesamiento de datos para el análisis de tráfico de redes como problema big data
Detectar posibles ataques a una red de computadoras requiere contar con métodos o estrategias trabajando en conjunto para la clasificación del tráfico. El área constituye un problema básico de amplio interés sobre todo en conceptos emergentes como Big Data, con sus nuevas tecnologías para almacenar, procesar y obtener información a partir de grandes cantidades de datos.
El reconocimiento del tráfico malicioso en una red depende, en primera instancia, de la eficiencia en la recolección de datos y su correcto pre-procesamiento a fin de ser lo más representativo al aplicar el modelo de análisis de datos elegido. Este tema es el abordado en este trabajo, formando parte de un proyecto integral de detección de ataques a redes de computadoras aplicando Computación de Alto Desempeño en GPU, Inteligencia Artificial y Procesamiento de Imágenes.Workshop: WARSO - Arquitectura, Redes y Sistemas OperativosRed de Universidades con Carreras en Informátic
Improvement of a Parallel System for Image Processing
Digital images are digital signals captured through different means. Sometimes these captured images contain variations combined with the original signal, this variations are called noise. Echo is a particular kind of noise with characteristics that turns it into a very interesting problem to solve. The echo detection process and its subsequent elimination from a digital Image involves extensive mathematical calculations.
Differents Parallel approaches taking advantage of new architectures can be implemented to solve this problem. Nevertheless these approaches have time depending characteristics, so the Processing time is still the critical point. One improved version
of a Parallel one may be implemented by using a different algorithm and some other techniques that much more reduce the Processing time. In this work, the authors discuss their earlier work, the present approach and the future directions of this experimental application. Finally the resulting values are sketched.Sistemas Distribuidos - Redes ConcurrenciaRed de Universidades con Carreras en Informática (RedUNCI
Towards a predictive load balancing method based on multiples resources
Processors load unbalance in distributed systems is one of the main problems, because ít involves system performance degradation. Load balance algorithms try to improve the system global performance through migration of processes, but they present also an additional problem, known as instability: lt happens when processes spend an excessive amount of time migrating among different system nodes. In arder to diminish this cost without affecting the mean system response time, load balancíng algoríthms based on dífferent strategíes have been proposed. Multiple Resources Predictíve Load Balance Strategy (MRPLBS), ís a new predíctive, dynamic and nonpreemptive strategy for balancing multiple resources. The predictive approach is based on estimations computed as weighed exponential averages of the load of each node in the system.
This paper presents MRPLBS' system architecture and its performance and system a comparison on different scenarios against Random Load Balancing. The number of requirements, the mean response time, the number of failed migratíons and the percentage of acceptance are shownI Workshop de Procesamiento Distribuido y Paralelo (WPDP)Red de Universidades con Carreras en Informática (RedUNCI
A parallel approach for backpropagation learning of neural networks
Learning algorithms for neural networks involve CPU intensive processing and consequently great effort has been done to develop parallel implemetations intended for a reduction of learning time.
This work briefly describes parallel schemes for a backpropagation algorithm and proposes a distributed system architecture for developing parallel training with a partition pattern scheme. Under this approach, weight changes are computed concurrently, exchanged between system components and adjusted accordingly until the whole parallel learning process is completed. Some comparative results are also shown.Eje: Procesamiento distribuido y paralelo. Tratamiento de señalesRed de Universidades con Carreras en Informática (RedUNCI
Seguridad en redes de computadoras: estrategias y desafíos en la era de big data
As computer networks have transformed in essential tools, their security has become a crucial problem for computer systems. Detecting unusual values from large volumes of information produced by network traffic has acquired huge interest in the network security area. Anomaly detection is a starting point to prevent attacks, therefore it is important for all computer systems in a network have a system of detecting anomalous events in a time near their occurrence. Detecting these events can lead network administrators to identify system failures, take preventive actions and avoid a massive damage.
This work presents, first, how identify network traffic anomalies through applying parallel computing techniques and Graphical Processing Units in two algorithms, one of them a supervised classification algorithm and the other based in network traffic image processing. Finally, it is proposed as a challenge to resolve the anomalies detection using an unsupervised algorithm as Deep Learning.Dado que las redes de computadoras se han transformado en una herramienta esencial, su seguridad se ha convertido en un problema crucial para los sistemas de computación. Detectar valores inusuales en grandes volúmenes de información producidos por el tráfico de red ha adquirido un enorme interés en el área de seguridad de redes. La detección de anomalías es el punto de partida para prevenir ataques, por lo tanto es importante para todos los sistemas de computación pertenecientes a una red tener un sistema de detección de eventos anómalos en un tiempo cercano a su ocurrencia. Detectar estos eventos permitiría a los administradores de red identificar fallas en el sistema, tomar acciones preventivas y evitar daños masivos.
Este trabajo presenta, primero, cómo identificar anomalías de tráfico en la red aplicando técnicas de computación paralela y Unidades de Procesamiento Gráfico en dos algoritmos, un algoritmo de clasificación supervisada y otro basado en procesamiento de imágenes de tráfico de red. Finalmente, se propone como desafío resolver la detección de anomalías usando un algoritmo no supervisado como Aprendizaje Profundo.Facultad de Informátic
- …