Search CORE

15 research outputs found

Optical flyways for handling elephant flows to improve big data performance in SDN enabled Datacenters

Author: Channegowda Mayur
Nejabati Reza
Simeonidou Dimitra
Vlachogiannis Tasos
Publication venue: 'The Optical Society'
Publication date: 01/01/2016
Field of study

Crossref

Explore Bristol Research

QUERY PERFORMANCE EVALUATION OVER HEALTH DATA

Author: Pinarer Ozgun
Turhan Sultan
Publication venue: 'IADIS - International Association for the Development of the Information Society'
Publication date: 17/07/2019
Field of study

International audienceIn recent years, there has been a significant increase in the number and variety of application scenarios studied under the e-health. Each application generates an immense data that is growing constantly. In this context, it becomes an important challenge to store and analyze the data efficiently and economically via conventional database management tools. The traditional relational database systems may sometimes not answer the requirements of the increased type, volume, velocity and dynamic structure of the new datasets. Effective healthcare data management and its transformation into information/knowledge are therefore challenging issues. So, organizations especially hospitals and medical centers that deal with immense data, either have to purchase new systems or re-tool what they already have. The new data models so-called NOSQL, its management tool Hadoop Distributed File Systems is replacing RDBMs especially in real-time healthcare data analytics processes. It becomes a real challenge to perform complex reporting in these applications as the size of the data grows exponentially. Along with that, there is customers demand complex analysis and reporting on those data. Compared to the traditional DBs, Hadoop Framework is designed to process a large volume of data. In this study, we examine the query performance of a traditional DBs and Big Data platforms on healthcare data. In this paper, we try to explore whether it is really necessary to invest on big data environment to run queries on the high volume data or this can also be done with the current relational database management systems and their supporting hardware infrastructure. We present our experience and a comprehensive performance evaluation of data management systems in the context of application performance

Crossref

HAL

Hal-Diderot

Implementasi Manajemen Transfer Rate pada Proses HDFS Berbasis SDN

Author: Arunanto F.X
Studiawan Hudan
Wicaksana Narendra Hanif
Publication venue: 'Lembaga Penelitian dan Pengabdian kepada Masyarakat ITS'
Publication date: 08/01/2017
Field of study

Pada kluster Hadoop perpindahan data akan sering terjadi, karena data yang disimpan akan tersebar ke dalam Datanode terutama pada saat melakukan proses penyimpanan ke dalam HDFS. Lalu lintas jaringan data mempengaruhi performa kinerja klaster Hadoop secara keseluruhan. Permasalahan ketersediaan bandwith dan juga congestion yang disebabkan lalu lintas data lain dapat mempengaruhi proses penyimpanan data ke dalam HDFS. SDN memiiliki fungsi untuk melakukan pengaturan manajemen transter rate sehingga dapat mengkategorikan lalu lintas data dan juga menyediakan nilai transfer rate dengan menggunkan mekanisme queue. Memanfaatkan arsitektur jaringan SDN, pada klaster Hadoop dilakukan manajemen transfer rate untuk dapat mengoptimalkan proses perpindahan data pada saat penyimpanan ke HDFS. Manajemen transfer rate dilakukan dengan cara memanfaatkan fitur queue pada switch OpenFlow. Tiap queue digunakan untuk mengkategorikan lalu lintas data pada jaringan klaster Hadoop. Nilai transfer rate untuk lalu lintas data HDFS dipisahkan dan diberikan nilai transfer rate yang lebih tinggi. Berdasarkan hasil uji coba dengan melakukan manajemen transfer rate waktu proses penyimpanan data ke HDFS tidak terpengaruh walaupun pada saat proses penyimpanan data terdapat lalu lintas data lain yang mengakibatkan congestion

Jurnal Teknik ITS

Institut Teknologi Sepuluh Nopember (ITS): Publikasi Ilmiah Online Mahasiswa ITS (POMITS)

Network Optimizations for Distributed Storage Networks

Author: Morrison Corey Casey
Publication venue
Publication date: 18/01/2019
Field of study

Distributed file systems enable the reliable storage of exabytes of information on thousands of servers distributed throughout a network. These systems achieve reliability and performance by storing three or more copies of data in different locations across the network. The management of these copies of data is commonly handled by intermediate servers that track and coordinate the placement of data in the network. This introduces potential network bottlenecks, as multiple transfers to fast storage nodes can saturate the network links connecting intermediate servers to the storage. The advent of open Network Operating Systems presents an opportunity to alleviate this bottleneck, as it is now possible to treat network elements as intermediate nodes in this distributed file system and have them perform the task of replicating data across storage nodes. In this thesis, we propose a new design paradigm for distributed file systems, driven by a new fundamental component of the system which runs on network elements such as switches or routers. We describe the component’s architecture and how it can be integrated into existing distributed file systems to increase their performance. To measure this performance increase over current approaches, we emulate a distributed file system by creating a block-level storage array distributed across multiple iSCSI targets presented in a network. Furthermore we emulate more complicated redundancy schemes likely to be used in distributed file systems in the future to determine what effect this approach may have on those systems and what benefits it offers. We find that this new component offers a decrease in request latency proportional to the number of storage nodes involved in the request. We also find that the benefits of this approach are limited by the ability of switch hardware to process incoming data from the request, but that these limitations can be surmounted through the proposed design paradigm

Texas A&M Repository

OpenFlow-based Distributed and Fault-Tolerant Software Switch Architecture

Author: Velusamy Gandhimathi
Publication venue
Publication date: 22/07/2014
Field of study

We are living in the era where each of us is connected with each other virtually across the globe. We are sharing the information electronically over the internet every second of our day. There are many networking devices involved in sending the information over the internet. They are routers, gateways, switches, PCs, laptops, handheld devices, etc. The switches are very crucial elements in delivering packets to the intended recipients. Now the networking field is moving towards Software Defined Networking and the network elements are being slowly replaced by the software applications run by OpenFlow protocols. For example the switching functionality in local area networks could be achieved with software switches like OpenvSwitch (OVS), LINC-Switch, etc. Now a days the organizations depend on the datacenters to run their services. The application servers are being run from virtual machines on the hosts to better utilize the computing resources and make the system more scalable. The application servers need to be continuously available to run the business for which they are deployed for. Software switches are used to connect virtual machines as an alternative to Top of Rack switches. If such software switch fails then the application servers will not be able to connect to its clients. This may severely impact the business serviced by the application servers, deployed on the virtual machines. For reliable data connectivity, the switching elements need to be continuously functional. There is a need for reliable and robust switches to cater the today's networking infrastructure. In this study, the software switch LINC-Switch is implemented as distributed application on multiple nodes to make it resilient to failure. The fault-tolerance is achieved by using the distribution properties of the programming language Erlang. By implementing the switch on three redundant nodes and starting the application as a distributed application, the switch will be serving its purpose very promptly by restarting it on other node in case it fails on the current node by using failover/takeover mechanisms of Erlang. The tolerance to failure of the LINC-Switch is verified with Ping based experiment on the GENI test bed and on the Xen-cluster in our Lab.Engineering Technology, Department o

University of Houston Institutional Repository (UHIR)

Network Optimizations for Distributed Storage Networks

Author: Morrison Corey Casey
Publication venue
Publication date: 18/01/2019
Field of study

Texas A&M Repository

Recommended from our members

Optimisation of computing and networking resources of a Hadoop cluster based on software defined network

Author: Al-Raweshidy H
Khaleel A
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

in this paper, we discuss some challenges regarding the Hadoop framework. One of the main ones is the computing performance of Hadoop MapReduce jobs in terms of CPU, memory and hard disk I/O. The networking side of a Hadoop cluster is another challenge, especially for large scale clusters with many switch devices and computing nodes, such as a data centre network. The configurations of Hadoop MapReduce parameters can have a significant impact on the computing performance of a Hadoop cluster. All issues relating to Hadoop MapReduce parameter settings are addressed. Some significant parameters of Hadoop MapReduce are tuned using a novel intelligent technique based on both genetic programming and a genetic Algorithm, with aim of optimising the performance of a Hadoop MapReduce job. In the Hadoop framework, there are more than 150 configurations of parameters and hence, setting them manually is not difficult, but also time consuming. Consequently, the above-mentioned algorithms are used to search for the optimum values of parameter settings. Software Defined Network (SDN) is also employed to improve the networking performance of a Hadoop cluster, thus accelerating Hadoop jobs. Experiments have been carried out on two typical applications of Hadoop, including a Word Count Application and Tera Sort application, using 14 virtual machines in both a traditional network and an SDN. The results for the traditional network show that our proposed technique improves MapReduce jobs performance for 20 GB with the Word Count application by 69.63% and 30.31% when compared to the default and Gunther work, respectively. Whilst for the Tera Sort application, the performance of Hadoop MapReduce is improved by 73.39% and 55.93%, compared with the default and Gunther work, respectively. Moreover, the experimental results in an SDN environment showed the performance of a Hadoop MapReduce job is further improved due to the advantages of the intelligent and centralised management achieved using it. Another experiment has been conducted to evaluate the performance of Hadoop jobs using a large scale cluster in a data centre network, also based on SDN, with the results revealing that this exceeded the performance of a conventional networkIraqi Ministry of Higher Education and Scientific Research and University of Diyal

Brunel University Research Archive

Mineração em Grandes Massas de Dados Utilizando Hadoop MapReduce e Algoritmos Bio-inspirados: Uma Revisão Sistemática

Author: Freitas Rebeca Schroeder
Menezes Sandro Loiola
Parpinelli Rafael Stubs
Publication venue: 'Universidade Federal do Rio Grande do Sul'
Publication date: 29/05/2016
Field of study

A Área de Mineração de Dados tem sido utilizada em diversas áreasde aplicação e visa extrair conhecimento através da análise de dados. Nas últimasdécadas, inúmeras bases de dados estão tendenciando a possuir grande volume, altavelocidade de crescimento e grande variedade. Esse fenômeno é conhecido como BigData e corresponde a novos desafios para tecnologias clássicas como Sistema de Gestãode Banco de Dados Relacional pois não tem oferecido desempenho satisfatórioe escalabilidade para aplicações do tipo Big Data. Ao contrário dessas tecnologias,Hadoop MapReduce é um framework que, além de provêr processamento paralelo,também fornece tolerância a falhas e fácil escalabilidade sobre um sistema de armazenamentodistribuído compatível com cenário Big Data. Uma das técnicas que vemsendo utilizada no contexto Big Data são algoritmos bio-inspirados. Esses algoritmossão boas opções de solução em problemas complexos multidimensionais, multiobjetivose de grande escala. A combinação de sistemas baseados em Hadoop MapReducee algoritmos bio-inspirados tem se mostrado vantajoso em aplicações Big Data. Esseartigo apresenta uma revisão sistemática de trabalhos nesse contexto, visando analisarcritérios como: tarefas de mineração de dados abordadas, algoritmos bio-inspiradosutilizados, disponibilidade das bases utilizadas e quais características Big Data sãotratadas nos trabalhos. Como resultado, esse artigo discute os critérios analisados eidentifica alguns modelos de paralelização, além de sugerir uma direção para trabalhosfuturos

Em Questao

Archives of the Faculty of Veterinary Medicine UFRGS