15 research outputs found

    QUERY PERFORMANCE EVALUATION OVER HEALTH DATA

    Get PDF
    International audienceIn recent years, there has been a significant increase in the number and variety of application scenarios studied under the e-health. Each application generates an immense data that is growing constantly. In this context, it becomes an important challenge to store and analyze the data efficiently and economically via conventional database management tools. The traditional relational database systems may sometimes not answer the requirements of the increased type, volume, velocity and dynamic structure of the new datasets. Effective healthcare data management and its transformation into information/knowledge are therefore challenging issues. So, organizations especially hospitals and medical centers that deal with immense data, either have to purchase new systems or re-tool what they already have. The new data models so-called NOSQL, its management tool Hadoop Distributed File Systems is replacing RDBMs especially in real-time healthcare data analytics processes. It becomes a real challenge to perform complex reporting in these applications as the size of the data grows exponentially. Along with that, there is customers demand complex analysis and reporting on those data. Compared to the traditional DBs, Hadoop Framework is designed to process a large volume of data. In this study, we examine the query performance of a traditional DBs and Big Data platforms on healthcare data. In this paper, we try to explore whether it is really necessary to invest on big data environment to run queries on the high volume data or this can also be done with the current relational database management systems and their supporting hardware infrastructure. We present our experience and a comprehensive performance evaluation of data management systems in the context of application performance

    Implementasi Manajemen Transfer Rate pada Proses HDFS Berbasis SDN

    Get PDF
    Pada kluster Hadoop perpindahan data akan sering terjadi, karena data yang disimpan akan tersebar ke dalam Datanode terutama pada saat melakukan proses penyimpanan ke dalam HDFS. Lalu lintas jaringan data mempengaruhi performa kinerja klaster Hadoop secara keseluruhan. Permasalahan ketersediaan bandwith dan juga congestion yang disebabkan lalu lintas data lain dapat mempengaruhi proses penyimpanan data ke dalam HDFS. SDN memiiliki fungsi untuk melakukan pengaturan manajemen transter rate sehingga dapat mengkategorikan lalu lintas data dan juga menyediakan nilai transfer rate dengan menggunkan mekanisme queue. Memanfaatkan arsitektur jaringan SDN, pada klaster Hadoop dilakukan manajemen transfer rate untuk dapat mengoptimalkan proses perpindahan data pada saat penyimpanan ke HDFS. Manajemen transfer rate dilakukan dengan cara memanfaatkan fitur queue pada switch OpenFlow. Tiap queue digunakan untuk mengkategorikan lalu lintas data pada jaringan klaster Hadoop. Nilai transfer rate untuk lalu lintas data HDFS dipisahkan dan diberikan nilai transfer rate yang lebih tinggi. Berdasarkan hasil uji coba dengan melakukan manajemen transfer rate waktu proses penyimpanan data ke HDFS tidak terpengaruh walaupun pada saat proses penyimpanan data terdapat lalu lintas data lain yang mengakibatkan congestion

    Network Optimizations for Distributed Storage Networks

    Get PDF
    Distributed file systems enable the reliable storage of exabytes of information on thousands of servers distributed throughout a network. These systems achieve reliability and performance by storing three or more copies of data in different locations across the network. The management of these copies of data is commonly handled by intermediate servers that track and coordinate the placement of data in the network. This introduces potential network bottlenecks, as multiple transfers to fast storage nodes can saturate the network links connecting intermediate servers to the storage. The advent of open Network Operating Systems presents an opportunity to alleviate this bottleneck, as it is now possible to treat network elements as intermediate nodes in this distributed file system and have them perform the task of replicating data across storage nodes. In this thesis, we propose a new design paradigm for distributed file systems, driven by a new fundamental component of the system which runs on network elements such as switches or routers. We describe the component’s architecture and how it can be integrated into existing distributed file systems to increase their performance. To measure this performance increase over current approaches, we emulate a distributed file system by creating a block-level storage array distributed across multiple iSCSI targets presented in a network. Furthermore we emulate more complicated redundancy schemes likely to be used in distributed file systems in the future to determine what effect this approach may have on those systems and what benefits it offers. We find that this new component offers a decrease in request latency proportional to the number of storage nodes involved in the request. We also find that the benefits of this approach are limited by the ability of switch hardware to process incoming data from the request, but that these limitations can be surmounted through the proposed design paradigm

    OpenFlow-based Distributed and Fault-Tolerant Software Switch Architecture

    Get PDF
    We are living in the era where each of us is connected with each other virtually across the globe. We are sharing the information electronically over the internet every second of our day. There are many networking devices involved in sending the information over the internet. They are routers, gateways, switches, PCs, laptops, handheld devices, etc. The switches are very crucial elements in delivering packets to the intended recipients. Now the networking field is moving towards Software Defined Networking and the network elements are being slowly replaced by the software applications run by OpenFlow protocols. For example the switching functionality in local area networks could be achieved with software switches like OpenvSwitch (OVS), LINC-Switch, etc. Now a days the organizations depend on the datacenters to run their services. The application servers are being run from virtual machines on the hosts to better utilize the computing resources and make the system more scalable. The application servers need to be continuously available to run the business for which they are deployed for. Software switches are used to connect virtual machines as an alternative to Top of Rack switches. If such software switch fails then the application servers will not be able to connect to its clients. This may severely impact the business serviced by the application servers, deployed on the virtual machines. For reliable data connectivity, the switching elements need to be continuously functional. There is a need for reliable and robust switches to cater the today's networking infrastructure. In this study, the software switch LINC-Switch is implemented as distributed application on multiple nodes to make it resilient to failure. The fault-tolerance is achieved by using the distribution properties of the programming language Erlang. By implementing the switch on three redundant nodes and starting the application as a distributed application, the switch will be serving its purpose very promptly by restarting it on other node in case it fails on the current node by using failover/takeover mechanisms of Erlang. The tolerance to failure of the LINC-Switch is verified with Ping based experiment on the GENI test bed and on the Xen-cluster in our Lab.Engineering Technology, Department o

    Network Optimizations for Distributed Storage Networks

    Get PDF
    Distributed file systems enable the reliable storage of exabytes of information on thousands of servers distributed throughout a network. These systems achieve reliability and performance by storing three or more copies of data in different locations across the network. The management of these copies of data is commonly handled by intermediate servers that track and coordinate the placement of data in the network. This introduces potential network bottlenecks, as multiple transfers to fast storage nodes can saturate the network links connecting intermediate servers to the storage. The advent of open Network Operating Systems presents an opportunity to alleviate this bottleneck, as it is now possible to treat network elements as intermediate nodes in this distributed file system and have them perform the task of replicating data across storage nodes. In this thesis, we propose a new design paradigm for distributed file systems, driven by a new fundamental component of the system which runs on network elements such as switches or routers. We describe the component’s architecture and how it can be integrated into existing distributed file systems to increase their performance. To measure this performance increase over current approaches, we emulate a distributed file system by creating a block-level storage array distributed across multiple iSCSI targets presented in a network. Furthermore we emulate more complicated redundancy schemes likely to be used in distributed file systems in the future to determine what effect this approach may have on those systems and what benefits it offers. We find that this new component offers a decrease in request latency proportional to the number of storage nodes involved in the request. We also find that the benefits of this approach are limited by the ability of switch hardware to process incoming data from the request, but that these limitations can be surmounted through the proposed design paradigm

    Mineração em Grandes Massas de Dados Utilizando Hadoop MapReduce e Algoritmos Bio-inspirados: Uma Revisão Sistemática

    Get PDF
    A Área de Mineração de Dados tem sido utilizada em diversas áreasde aplicação e visa extrair conhecimento através da análise de dados. Nas últimasdécadas, inúmeras bases de dados estão tendenciando a possuir grande volume, altavelocidade de crescimento e grande variedade. Esse fenômeno é conhecido como BigData e corresponde a novos desafios para tecnologias clássicas como Sistema de Gestãode Banco de Dados Relacional pois não tem oferecido desempenho satisfatórioe escalabilidade para aplicações do tipo Big Data. Ao contrário dessas tecnologias,Hadoop MapReduce é um framework que, além de provêr processamento paralelo,também fornece tolerância a falhas e fácil escalabilidade sobre um sistema de armazenamentodistribuído compatível com cenário Big Data. Uma das técnicas que vemsendo utilizada no contexto Big Data são algoritmos bio-inspirados. Esses algoritmossão boas opções de solução em problemas complexos multidimensionais, multiobjetivose de grande escala. A combinação de sistemas baseados em Hadoop MapReducee algoritmos bio-inspirados tem se mostrado vantajoso em aplicações Big Data. Esseartigo apresenta uma revisão sistemática de trabalhos nesse contexto, visando analisarcritérios como: tarefas de mineração de dados abordadas, algoritmos bio-inspiradosutilizados, disponibilidade das bases utilizadas e quais características Big Data sãotratadas nos trabalhos. Como resultado, esse artigo discute os critérios analisados eidentifica alguns modelos de paralelização, além de sugerir uma direção para trabalhosfuturos
    corecore