32 research outputs found
Extension of a task-based model to functional programming
Recently, efforts have been made to bring together the areas of high-performance computing (HPC) and massive data processing (Big Data). Traditional HPC frameworks, like COMPSs, are mostly task-based, while popular big-data environments, like Spark, are based on functional programming principles. The earlier are know for their good performance for regular, matrix-based computations; on the other hand, for fine-grained, data-parallel workloads, the later has often been considered more successful. In this paper we present our experience with the integration of some dataflow techniques into COMPSs, a task-based framework, in an effort to bring together the best aspects of both worlds. We present our API, called DDF, which provides a new data abstraction that addresses the challenges of integrating Big Data application scenarios into COMPSs. DDF has a functional-based interface, similar to many Data Science tools, that allows us to use dynamic evaluation to adapt the task execution in runtime. Besides the performance optimization it provides, the API facilitates the development of applications by experts in the application domain. In this paper we evaluate DDF's effectiveness by comparing the resulting programs to their original versions in COMPSs and Spark. The results show that DDF can improve COMPSs execution time and even outperform Spark in many use cases.This work was partially supported by CAPES, CNPq, Fapemig and NIC.BR, and by projects Atmosphere (H2020-EU.2.1.1 777154) and INCT-Cyber.Peer ReviewedPostprint (author's final draft
Detecção de Spams Utilizando Conteúdo Web Associado a Mensagens
Neste trabalho propomos uma estratégia de detecção de spams que explora o conteúdo das páginas Web apontadas por mensagens. Descrevemos uma metodologia para a coleta dessas páginas, caracterizamos a relação entre as páginas e as mensagens de spam e, em seguida, utilizamos um algoritmo de aprendizado de máquina para extrair as informações relevantes para a detecção de spam. Mostramos que a utilização de informações das páginas mencionadas melhora significativamente a classificação de spams e hams, gerando um baixo índice de falsos positivos.Nosso estudo revela que as páginas apontadas pelos spams ainda são um campo de batalha não explorado pelos filtros, onde os spammers não se preocupam em esconder a sua identidade
Recommended from our members
Operating system and network support for high-performance computing
High-performance computing applications were once limited to isolated supercomputers. In the past few years, however, there has been an increasing need to share data between different machines. This, combined with new network technologies which provide higher bandwidths, have led high-performance computing systems to adapt so that they can move data over the local network. There are some problems in doing this. Current high-performance systems often use centralized protocol servers, thereby creating bottlenecks to network connections. In addition, the lack of a more appropriate protocol leads to the use of TCP by applications using parallel connections. TCP is not perfectly tuned to such applications. This dissertation presents a detailed analysis of the problems caused by centralized protocol servers and the use of TCP in high-performance computing environments. It shows why the network servers currently available in some supercomputers do not provide good performance. It also presents simulation results that illustrate how TCP connection performance can degrade rapidly when multiple cooperative connections are used. The main contributions in this work are the development of distributed protocol stacks and cooperative rate-based traffic shaping. Distributed stacks use an user-level protocol implementation to replicate the TCP/IP protocol stack in all the nodes of a multicomputer, removing the protocol server from the data path and avoiding the associated bottleneck. Cooperative rate shaping uses bandwidth estimates to pace data packets, avoiding most of the problems that cause performance degradation in parallel cooperative connections. It also provides a way for cooperating connections to share their bandwidth estimates, improving performance by making good use of their combined knowledge
Reliable Ordered Broadcast With Micro-Protocols
The implementation of a complex high-level protocol is described. The protocol, a reliable ordered broadcast for local networks, is implemented using the micro-protocol framework inside the x-kernel. The internal structure of the protocol is described, with its micro-protocols and events. Based on our experiences with the development of this protocol, some general observations about the framework are provided. 1 Introduction Previous works have shown that writing high-level protocols with the x-kernel may be a hard task, due to difficulties in expressing complex interactions among fine-grained modules. A good example of this can be found in the development of Consul, an atomic ordered reliable broadcast protocol [MPS93]. A new framework has been developed and added to the x-kernel to try to solve this kind of problem. The micro-protocol framework[mic] provides an event-oriented environment in which separate modules (called micro-protocols) may be combined in order to describe the intr..
ABSTRACT Limiting the Power Consumption of Main Memory ∗
The peak power consumption of hardware components affects their power supply, packaging, and cooling requirements. When the peak power consumption is high, the hardware components or the systems that use them can become expensive and bulky. Given that components and systems rarely (if ever) actually require peak power, it is highly desirable to limit power consumption to a lessthan-peak power budget, based on which power supply, packaging, and cooling infrastructures can be more intelligently provisioned. In this paper, we study dynamic approaches for limiting the power consumption of main memories. Specifically, we propose four techniques that limit consumption by adjusting the power states of the memory devices, as a function of the load on the memory subsystem. Our simulations of applications from three benchmarks demonstrate that our techniques can consistently limit power to a pre-established budget. Two of the techniques can limit power with very low performance degradation. Our results also show that, when using these superior techniques, limiting power is at least as effective an energy-conservation approach as state-of-the-art techniques explicitly designed for performance-aware energy conservation. These latter results represent a departure from current energy management research and practice
Detecção de Spams Utilizando Conteúdo Web Associado a Mensagens
Neste trabalho propomos uma estratégia de detecção de spams que explora o conteúdo das páginas Web apontadas por mensagens. Descrevemos uma metodologia para a coleta dessas páginas, caracterizamos a relação entre as páginas e as mensagens de spam e, em seguida, utilizamos um algoritmo de aprendizado de máquina para extrair as informações relevantes para a detecção de spam. Mostramos que a utilização de informações das páginas mencionadas melhora significativamente a classificação de spams e hams, gerando um baixo índice de falsos positivos.Nosso estudo revela que as páginas apontadas pelos spams ainda são um campo de batalha não explorado pelos filtros, onde os spammers não se preocupam em esconder a sua identidade
Estratégias de Sondagem para Remapeamento Eficiente de Eventos de Roteamento na Internet
Mudanças de caminho causadas por eventos como engenharia de tráfego, alteração de parcerias de troca de tráfego, ou falhas de enlace impactam vários caminhos na Internet. Plataformas de monitoramento topológico realizam medições periódicas usando traceroute para um grande número de destinos. Esta abordagem, porém, é inadequada para identificar precisamente a extensão do impacto de eventos de roteamento. Por exemplo, uma falha de enlace pode ser restaurada antes que todas as rotas sejam medidas. Neste trabalho apresentamos estratégias de medição que minimizam o custo de sondagem para identificar caminhos impactados por um evento de roteamento. Nossos resultados mostram que é possível identificar o conjunto de caminhos impactados por um evento de forma eficiente. Nossos resultados indicam ainda que, quando integradas a um sistema estado-da-arte de rastreamento de mudanças de caminhos, nossas estratégias mais que dobram o número de mudanças detectadas