193 research outputs found

    Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

    Full text link
    Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed

    Database management system performance comparisons: A systematic literature review

    Full text link
    Efficiency has been a pivotal aspect of the software industry since its inception, as a system that serves the end-user fast, and the service provider cost-efficiently benefits all parties. A database management system (DBMS) is an integral part of effectively all software systems, and therefore it is logical that different studies have compared the performance of different DBMSs in hopes of finding the most efficient one. This study systematically synthesizes the results and approaches of studies that compare DBMS performance and provides recommendations for industry and research. The results show that performance is usually tested in a way that does not reflect real-world use cases, and that tests are typically reported in insufficient detail for replication or for drawing conclusions from the stated results.Comment: 36 page

    Forensic attribution challenges during forensic examinations of databases

    Get PDF
    An aspect of database forensics that has not yet received much attention in the academic research community is the attribution of actions performed in a database. When forensic attribution is performed for actions executed in computer systems, it is necessary to avoid incorrectly attributing actions to processes or actors. This is because the outcome of forensic attribution may be used to determine civil or criminal liability. Therefore, correctness is extremely important when attributing actions in computer systems, also when performing forensic attribution in databases. Any circumstances that can compromise the correctness of the attribution results need to be identified and addressed. This dissertation explores possible challenges when performing forensic attribution in databases. What can prevent the correct attribution of actions performed in a database? Thirst identified challenge is the database trigger, which has not yet been studied in the context of forensic examinations. Therefore, the dissertation investigates the impact of database triggers on forensic examinations by examining two sub questions. Firstly, could triggers due to their nature, combined with the way databases are forensically acquired and analysed, lead to the contamination of the data that is being analysed? Secondly, can the current attribution process correctly identify which party is responsible for which changes in a database where triggers are used to create and maintain data? The second identified challenge is the lack of access and audit information in NoSQL databases. The dissertation thus investigates how the availability of access control and logging features in databases impacts forensic attribution. The database triggers, as dened in the SQL standard, are studied together with a number of database trigger implementations. This is done in order to establish, which aspects of a database trigger may have an impact on digital forensic acquisition, analysis and interpretation. Forensic examinations of relational and NoSQL databases are evaluated to determine what challenges the presence of database triggers pose. A number of NoSQL databases are then studied to determine the availability of access control and logging features. This is done because these features leave valuable traces for the forensic attribution process. An algorithm is devised, which provides a simple test to determine if database triggers played any part in the generation or manipulation of data in a specific database object. If the test result is positive, the actions performed by the implicated triggers will have to be considered in a forensic examination. This dissertation identified a group of database triggers, classified as non-data triggers, which have the potential to contaminate the data in popular relational databases by inconspicuous operations, such as connection or shutdown. It also established that database triggers can influence the normal ow of data operations. This means what the original operation intended to do, and what actually happened, are not necessarily the same. Therefore, the attribution of these operations becomes problematic and incorrect deductions can be made. Accordingly, forensic processes need to be extended to include the handling and analysis of all database triggers. This enables safer acquisition and analysis of databases and more accurate attribution of actions performed in databases. This dissertation also established that popular NoSQL databases either lack sufficient access control and logging capabilities or do not enable them by default to support attribution to the same level as in relational databases.Dissertation (MSc)--University of Pretoria, 2018.Computer ScienceMScUnrestricte

    A software to manage rehabilitation sessions with a robotic walker

    Get PDF
    Dissertação de mestrado integrado em Informatics EngineeringCerebellar ataxia arises from damage or dysfunction that affects the cerebellum and its pathways. As a result, the motor abilities of individuals with this condition become weakened. Robotics-assisted therapy is still an emerging area, but it has several advantages that could boost the rehabilitation of these individuals. Considering this problematic, WALKit Smart Walker is being developed. Its main purpose is to improve the treatment of ataxic patients through intelligent and multidisciplinary rehabilitation sessions. Thus, it is equipped with several sensors that provide monitoring capabilities through a continuous evaluation of the end-user gait and posture. A vast amount of data is acquired during each session by the walker sensors. For health professionals to analyse this data and have feedback on the patient’s status throughout therapy, tools are needed to control, manage, and monitor sessions in a clear, practical and intuitive way. Therefore, the main goal of this dissertation is centred on implementing an effective way to store the acquired data, along with the development of software that satisfies these requirements. To address these goals, a polyglot persistence database system, composed of a relational and a non-relational database, was implemented to store the required data while maintaining efficiency. Furthermore, a web application was developed to provide, not only to health professionals, but also to patients themselves, the management of the rehabilitation sessions with the walker. The application provides an individual and temporal analysis of the sessions through interactive graphics adapted to each patient. Additionally, it allows the management of the several patients who are/were in treatment and the addition of clinical ratting scales, which are useful to assess their motor condition and adapt therapies as needed. In this way, professionals can have a better perception of the patient’s condition, and can show patients their evolution, possibly contributing to increase their motivation in therapy. Moreover, in the context of this dissertation, the embedded software of WALKit SmartW, which allows the therapy configuration, was optimized. This software had no security mechanisms, thus the main goal was on the implementation of techniques capable of making the software secure. Additionally, other functionalities such as feedback alerts, were added to the existing application. Throughout the development of this project, it was possible to have continuous feedback from health professionals of the Hospital of Braga. Usability tests and questionnaires were also applied, and the results were very promising, enhancing the need for a system with these characteristics. Professionals claimed the system may help in analysing the patient clinical status in an intuitive form while keeping them motivated during treatments.A ataxia cerebelar surge a partir de danos ou disfunções que afetam o cerebelo e as suas vias. Como resultado, as capacidades motoras dos indivíduos que possuem esta condição ficam fragilizadas. A terapia assistida por robôs é ainda uma área em desenvolvimento, no entanto apresenta diversas vantagens que poderão agilizar os tratamentos destes indivíduos. Atendendo a esta problemática, o WALKit SmartW encontra-se a ser desenvolvido. O seu principal propósito é auxiliar os tratamentos de pacientes ataxicos através de sessões de reabilitação inteligentes e multidisciplinares. Para tal, é composto por um conjunto de sensores que fornecem uma monitorização e avaliação contínua da marcha e da postura do utilizador. Uma grande quantidade de dados é adquirida ao longo de cada sessão através dos sensores. De forma a que os profissionais de saúde analisem estes dados e tenham feedback do estado do paciente ao longo da terapia, são necessárias ferramentas que permitam controlar, gerir e monitorizar as sessões, de forma clara, prática e intuitiva. O principal objetivo desta dissertação centra-se na implementação de uma estratégia eficiente para armazenar os dados, juntamente com o desenvolvimento de um software que satisfaça estes requisitos. Para cumprir estes objetivos, um sistema de base de dados com persistência poliglota, composto por uma base de dados relacional e uma não relacional, foi implementado para armazenar os dados mantendo a eficiência. Além disso, uma aplicação web foi desenvolvida para proporcionar, não só aos profissionais de saúde, como também aos próprios pacientes, a gestão das sessões de reabilitação com o andarilho. A aplicação disponibiliza uma análise individual e temporal das sessões através de gráficos interativos adaptados a cada paciente. Adicionalmente, possibilita também a gestão dos diversos pacientes que estão/estiveram em tratamento, e a adição de escalas de classificação clínica, que são úteis para avaliar a condição motora e adaptar as terapias conforme necessário. Desta forma, os profissionais conseguem ter uma melhor perceção acerca do estado do paciente, e os pacientes podem ver a sua evolução, contribuindo para aumentar a motivação na terapia. Ainda no contexto desta dissertação, otimizou-se a aplicação embebida no software do andarilho WALKit, que permite as configurações da terapia. O software era isento de qualquer mecanismo de segurança, pelo que o maior foco centrou-se na aplicação de técnicas capazes de o tornar seguro. Adicionalmente, outras funcionalidades, como alertas e configurações de algoritmos, foram adicionadas à aplicação existente. Ao longo do desenvolvimento deste projeto, foi possível obter o feedback contínuo de profissionais de saúde do Hospital de Braga. Testes e questionários de usabilidade foram também aplicados, e os resusltados foram bastante promissores, reforçando a necessidade de um sistema com estas características. Os profissionais afirmaram que o sistema irá ajudar a analisar o estado do paciente de forma intuitiva, mantendo-o motivado durante os tratamentos

    DEVELOPMENT OF DATABASE BASED FIELD TEST APPLICATION FOR INDUSTRY

    Get PDF
    The fast development of automation has led to many different dynamics of different applications. According to worldwide statistics, the number of smartphone users has grown exponentially which also persuaded software developers and engineers to make numerous mobile applications. Generally, field testing is only carried out when there is in need of collecting important data from individuals to figure out blunders and database technology allows storing all the information. In this work, a Microsoft based PowerApps application was created for a company. The requirements necessary for the application were given by the customer and in the process of making the application, any improvements which were obligatory were also specified by the client. I have had a chance to learn the complete PowerApps UI with the resources provided by the company I work for. Firstly, the concepts of a database have been introduced to give a general idea of how the Microsoft SQL Server is used for the entire project. Secondly, there is a discussion about how PowerApps is used and how unified the system is when connecting the database server. After that, I have discussed step by step of how the PowerApps application is sending the data to the database server when a user is using the mobile application. In short, the PowerApps application provides an easier field testing approach for the workers

    Alternatives to relational databases in precision medicine: comparison of NOSQL approaches for big data storage using supercomputers

    Get PDF
    Improvements in medical and genomic technologies have dramatically increased the production of electronic data over the last decade. As a result, data management is rapidly becoming a major determinant, and urgent challenge, for the development of Precision Medicine. Although successful data management is achievable using Relational Database Management Systems (RDBMS), exponential data growth is a significant contributor to failure scenarios. Growing amounts of data can also be observed in other sectors, such as economics and business, which, together with the previous facts, suggests that alternate database approaches (NoSQL) may soon be required for efficient storage and management of big databases. However, this hypothesis has been difficult to test in the Precision Medicine field since alternate database architectures are complex to assess and means to integrate heterogeneous electronic health records (EHR) with dynamic genomic data are not easily available. In this dissertation, we present a novel set of experiments for identifying NoSQL database approaches that enable effective data storage and management in Precision Medicine using patients’ clinical and genomic information from the cancer genome atlas (TCGA). The first experiment draws on performance and scalability from biologically meaningful queries with differing complexity and database sizes. The second experiment measures performance and scalability in database updates without schema changes. The third experiment assesses performance and scalability in database updates with schema modifications due dynamic data. We have identified two NoSQL approach, based on Cassandra and Redis, which seems to be the ideal database management systems for our precision medicine queries in terms of performance and scalability. We present NoSQL approaches and show how they can be used to manage clinical and genomic big data. Our research is relevant to the public health since we are focusing on one of the main challenges to the development of Precision Medicine and, consequently, investigating a potential solution to the progressively increasing demands on health care

    Hybrid Data Storage Framework for the Biometrics Domain

    Get PDF
    Biometric based authentication is one of the most popular techniques adopted in large-scale identity matching systems due to its robustness in access control. In recent years, the number of enrolments has increased significantly posing serious issues towards the performance and scalability of these systems. In addition, the use of multiple modalities (such as face, iris and fingerprint) is further increasing the issues related to scalability. This research work focuses on the development of a new Hybrid Data Storage Framework (HDSF) that would improve scalability and performance of biometric authentication systems (BAS). In this framework, the scalability issue is addressed by integrating relational database and NoSQL data store, which combines the strengths of both. The proposed framework improves the performance of BAS in three areas (i) by proposing a new biographic match score based key filtering process, to identify any duplicate records in the storage (de-duplication search); (ii) by proposing a multi-modal biometric index based key filtering process for identification and de-duplication search operations; (iii) by adopting parallel biometric matching approach for identification, enrolment and verification processes. The efficacy of the proposed framework is compared with that of the traditional BAS and on several values of False Rejection Rate (FRR). Using our dataset and algorithms it is observed that when compared to traditional BAS, the HDSF is able to show an overall efficiency improvement of more than 54% for zero FRR and above 60% for FRR values between 1-3.5% during identification search operations
    corecore