498,733 research outputs found

    Big Data Testing Techniques: Taxonomy, Challenges and Future Trends

    Full text link
    Big Data is reforming many industrial domains by providing decision support through analyzing large data volumes. Big Data testing aims to ensure that Big Data systems run smoothly and error-free while maintaining the performance and quality of data. However, because of the diversity and complexity of data, testing Big Data is challenging. Though numerous research efforts deal with Big Data testing, a comprehensive review to address testing techniques and challenges of Big Data is not available as yet. Therefore, we have systematically reviewed the Big Data testing techniques evidence occurring in the period 2010-2021. This paper discusses testing data processing by highlighting the techniques used in every processing phase. Furthermore, we discuss the challenges and future directions. Our findings show that diverse functional, non-functional and combined (functional and non-functional) testing techniques have been used to solve specific problems related to Big Data. At the same time, most of the testing challenges have been faced during the MapReduce validation phase. In addition, the combinatorial testing technique is one of the most applied techniques in combination with other techniques (i.e., random testing, mutation testing, input space partitioning and equivalence testing) to find various functional faults through Big Data testing.Comment: 32 page

    Testing Big Data Applications

    Get PDF
    Today big data has become the basis of discussion for the organizations. The big task associated with big data stream is coping with its various challenges and performing the appropriate testing for the optimal analysis of the data which may benefit the processing of various activities, especially from a business perspective. Big data term follows the massive volume of data, (might be in units of petabytes or exabytes) exceeding the processing and analytical capacity of the conventional systems and thereby raising the need for analyzing and testing the big data before applications can be put into use. Testing such huge data coming from the various number of sources like the internet, smartphones, audios, videos, media, etc. is a challenge itself. The most favourable solution to test big data follows the automated/programmed approach. This paper outlines the big data characteristics, and various challenges associated with it followed by the approach, strategy, and proposed framework for testing big data applications

    Systems thinking, big data, and data protection law: Using Ackoff’s Interactive Planning to respond to emergent policy challenges.

    Get PDF
    This document is the Accepted Manuscript of the following article: Henry Pearce, ‘Systems Thinking, Big Data, and Data Protection Law Using Ackoff’s Interactive Planning to Respond to Emergent Policy Challenges’, European Journal of Law Reform, Issue 4, 2016, available online at: https://www.elevenjournals.com/tijdschrift/ejlr/2016/4/EJLR_1387-2370_2016_018_004_004This article examines the emergence of big data and how it poses a number of significant novel challenges to the smooth operation of some the European data protection framework’s fundamental tenets. Building on previous research in the area, the article argues that recent proposals for reform in this area, as well as proposals based on conventional approaches to policy making and regulatory design more generally, will likely be ill-equipped to deal with some of big data’s most severe emergent difficulties. Instead, it is argued that novel, and possibly unorthodox approaches to regulation and policy design premised on systems thinking methodologies may represent attractive and alternative ways forward. As a means of testing this general hypothesis, the article considers Interactive Planning, a systems thinking methodology popularised by the organisational theorist Russel Ackoff, as a particular embryonic example of one such methodological approach, and, using the challenges posed by big data to the principle of purpose limitation as a case study, explores whether its usage may be beneficial in the development of data protection law and policy in the big data environment.Peer reviewedFinal Accepted Versio

    Hybrid statistical and mechanistic mathematical model guides mobile health intervention for chronic pain

    Full text link
    Nearly a quarter of visits to the Emergency Department are for conditions that could have been managed via outpatient treatment; improvements that allow patients to quickly recognize and receive appropriate treatment are crucial. The growing popularity of mobile technology creates new opportunities for real-time adaptive medical intervention, and the simultaneous growth of big data sources allows for preparation of personalized recommendations. Here we focus on the reduction of chronic suffering in the sickle cell disease community. Sickle cell disease is a chronic blood disorder in which pain is the most frequent complication. There currently is no standard algorithm or analytical method for real-time adaptive treatment recommendations for pain. Furthermore, current state-of-the-art methods have difficulty in handling continuous-time decision optimization using big data. Facing these challenges, in this study we aim to develop new mathematical tools for incorporating mobile technology into personalized treatment plans for pain. We present a new hybrid model for the dynamics of subjective pain that consists of a dynamical systems approach using differential equations to predict future pain levels, as well as a statistical approach tying system parameters to patient data (both personal characteristics and medication response history). Pilot testing of our approach suggests that it has significant potential to predict pain dynamics given patients' reported pain levels and medication usages. With more abundant data, our hybrid approach should allow physicians to make personalized, data driven recommendations for treating chronic pain.Comment: 13 pages, 15 figures, 5 table

    ENIGMA and global neuroscience: A decade of large-scale studies of the brain in health and disease across more than 40 countries

    Get PDF
    This review summarizes the last decade of work by the ENIGMA (Enhancing NeuroImaging Genetics through Meta Analysis) Consortium, a global alliance of over 1400 scientists across 43 countries, studying the human brain in health and disease. Building on large-scale genetic studies that discovered the first robustly replicated genetic loci associated with brain metrics, ENIGMA has diversified into over 50 working groups (WGs), pooling worldwide data and expertise to answer fundamental questions in neuroscience, psychiatry, neurology, and genetics. Most ENIGMA WGs focus on specific psychiatric and neurological conditions, other WGs study normal variation due to sex and gender differences, or development and aging; still other WGs develop methodological pipelines and tools to facilitate harmonized analyses of "big data" (i.e., genetic and epigenetic data, multimodal MRI, and electroencephalography data). These international efforts have yielded the largest neuroimaging studies to date in schizophrenia, bipolar disorder, major depressive disorder, post-traumatic stress disorder, substance use disorders, obsessive-compulsive disorder, attention-deficit/hyperactivity disorder, autism spectrum disorders, epilepsy, and 22q11.2 deletion syndrome. More recent ENIGMA WGs have formed to study anxiety disorders, suicidal thoughts and behavior, sleep and insomnia, eating disorders, irritability, brain injury, antisocial personality and conduct disorder, and dissociative identity disorder. Here, we summarize the first decade of ENIGMA's activities and ongoing projects, and describe the successes and challenges encountered along the way. We highlight the advantages of collaborative large-scale coordinated data analyses for testing reproducibility and robustness of findings, offering the opportunity to identify brain systems involved in clinical syndromes across diverse samples and associated genetic, environmental, demographic, cognitive, and psychosocial factors

    Toward efficient and secure public auditing for dynamic big data storage on cloud

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Cloud and Big Data are two of the most attractive ICT research topics that have emerged in recent years. Requirements of big data processing are now everywhere, while the pay-as-you-go model of cloud systems is especially cost efficient in terms of processing big data applications. However, there are still concerns that hinder the proliferation of cloud, and data security/privacy is a top concern for data owners wishing to migrate their applications into the cloud environment. Compared to users of conventional systems, cloud users need to surrender the local control of their data to cloud servers. Another challenge for big data is the data dynamism which exists in most big data applications. Due to the frequent updates, efficiency becomes a major issue in data management. As security always brings compromises in efficiency, it is difficult but nonetheless important to investigate how to efficiently address security challenges over dynamic cloud data. Data integrity is an essential aspect of data security. Except for server-side integrity protection mechanisms, verification from a third-party auditor is of equal importance because this enables users to verify the integrity of their data through the auditors at any user-chosen timeslot. This type of verification is also named 'public auditing' of data. Existing public auditing schemes allow the integrity of a dataset stored in cloud to be externally verified without retrieval of the whole original dataset. However, in practice, there are many challenges that hinder the application of such schemes. To name a few of these, first, the server still has to aggregate a proof with the cloud controller from data blocks that are distributedly stored and processed on cloud instances and this means that encryption and transfer of these data within the cloud will become time-consuming. Second, security flaws exist in the current designs. The verification processes are insecure against various attacks and this leads to concerns about deploying these schemes in practice. Third, when the dataset is large, auditing of dynamic data becomes costly in terms of communication and storage. This is especially the case for a large number of small data updates and data updates on multi-replica cloud data storage. In this thesis, the research problem of dynamic public data auditing in cloud is systematically investigated. After analysing the research problems, we systematically address the problems regarding secure and efficient public auditing of dynamic big data in cloud by developing, testing and publishing a series of security schemes and algorithms for secure and efficient public auditing of dynamic big data storage on cloud. Specifically, our work focuses on the following aspects: cloud internal authenticated key exchange, authorisation on third-party auditor, fine-grained update support, index verification, and efficient multi-replica public auditing of dynamic data. To the best of our knowledge, this thesis presents the first series of work to systematically analysis and to address this research problem. Experimental results and analyses show that the solutions that are presented in this thesis are suitable for auditing dynamic big data storage on cloud. Furthermore, our solutions represent significant improvements in cloud efficiency and security

    Addressing Challenges of Ultra Large Scale System on Requirements Engineering

    Get PDF
    AbstractAccording to the growing evolution in complex systems and their integrations, Internet of things, communication, massive information flows and big data, a new type of systems has been raised to software engineers known as Ultra Large Scale (ULS) Systems. Hence, it requires dramatic change in all aspects of “Software Engineering” practices and their artifacts due to its unique characteristics.Attendance of all software development members is impossible to meet in regular way and face-to-face, especially stakeholders from different national and organizational cultures. In addition, huge amount of data stored, number of integrations among software components and number of hardware elements. Those obstacles constrict design, development, testing, evolution, assessment and implementation phases of current software development methods.In this respect, ULS system that's considered as a system of systems, has gained considerable reflections on system development activities, as the scale is incomparable to the traditional systems since there are thousands of different stakeholders are involved in developing software, were each of them has different interests, complex and changing needs beside there are already new services are being integrated simultaneously to the current running ULS systems.The scale of ULS systems makes a lot of challenges for Requirements Engineers (RE). As a result, the requirements engineering experts are working on some automatic tools to support requirement engineering activities to overcome many challenges.This paper points to the limitations of the current RE practices for the challenges forced by ULS nature, and focus on the contributions of several approaches to overcome these difficulties in order to tackle unsolved areas of these solutions.As a result, the current approaches for ULS miss some RE essential practices related to find vital dependent requirements, and are not capable to measure the changes impact on ULS systems or other integrated legacy systems, in addition the requirements validation are somehow depended on the user ratings without solid approval from the stakeholders

    Software Engineering for Big Data Systems

    Get PDF
    Software engineering is the application of a systematic approach to designing, operating and maintaining software systems and the study of all the activities involved in achieving the same. The software engineering discipline and research into software systems flourished with the advent of computers and the technological revolution ushered in by the World Wide Web and the Internet. Software systems have grown dramatically to the point of becoming ubiquitous. They have a significant impact on the global economy and on how we interact and communicate with each other and with computers using software in our daily lives. However, there have been major changes in the type of software systems developed over the years. In the past decade owing to breakthrough advancements in cloud and mobile computing technologies, unprecedented volumes of hitherto inaccessible data, referred to as big data, has become available to technology companies and business organizations farsighted and discerning enough to use it to create new products, and services generating astounding profits. The advent of big data and software systems utilizing big data has presented a new sphere of growth for the software engineering discipline. Researchers, entrepreneurs and major corporations are all looking into big data systems to extract the maximum value from data available to them. Software engineering for big data systems is an emergent field that is starting to witness a lot of important research activity. This thesis investigates the application of software engineering knowledge areas and standard practices, established over the years by the software engineering research community, into developing big data systems by: - surveying the existing software engineering literature on applying software engineering principles into developing and supporting big data systems; - identifying the fields of application for big data systems; - investigating the software engineering knowledge areas that have seen research related to big data systems; - revealing the gaps in the knowledge areas that require more focus for big data systems development; and - determining the open research challenges in each software engineering knowledge area that need to be met. The analysis and results obtained from this thesis reveal that recent advances made in distributed computing, non-relational databases, and machine learning applications have lured the software engineering research and business communities primarily into focusing on system design and architecture of big data systems. Despite the instrumental role played by big data systems in the success of several businesses organizations and technology companies by transforming them into market leaders, developing and maintaining stable, robust, and scalable big data systems is still a distant milestone. This can be attributed to the paucity of much deserved research attention into more fundamental and equally important software engineering activities like requirements engineering, testing, and creating good quality assurance practices for big data systems
    • 

    corecore