1,520 research outputs found

    A reference architecture for big data systems

    Get PDF
    Over dozens of years, applying new IT technologies into organizations has always been a big concern for business. Big data certainly is a new concept exciting business. To be able to access more data and empower to analysis big data requires new big data platforms. However, there still remains limited reference architecture for big data systems. In this paper, based on existing reference architecture of big data systems, we propose new high level abstract reference architecture and related reference architecture notations, that better express the overall architecture. The new reference architecture is verified using one existing case and an additional new use case

    Global-Scale Resource Survey and Performance Monitoring of Public OGC Web Map Services

    Full text link
    One of the most widely-implemented service standards provided by the Open Geospatial Consortium (OGC) to the user community is the Web Map Service (WMS). WMS is widely employed globally, but there is limited knowledge of the global distribution, adoption status or the service quality of these online WMS resources. To fill this void, we investigated global WMSs resources and performed distributed performance monitoring of these services. This paper explicates a distributed monitoring framework that was used to monitor 46,296 WMSs continuously for over one year and a crawling method to discover these WMSs. We analyzed server locations, provider types, themes, the spatiotemporal coverage of map layers and the service versions for 41,703 valid WMSs. Furthermore, we appraised the stability and performance of basic operations for 1210 selected WMSs (i.e., GetCapabilities and GetMap). We discuss the major reasons for request errors and performance issues, as well as the relationship between service response times and the spatiotemporal distribution of client monitoring sites. This paper will help service providers, end users and developers of standards to grasp the status of global WMS resources, as well as to understand the adoption status of OGC standards. The conclusions drawn in this paper can benefit geospatial resource discovery, service performance evaluation and guide service performance improvements.Comment: 24 pages; 15 figure

    Big Data Reference Architecture for e-Learning Analytical Systems

    Get PDF
    The recent advancements in technology have produced big data and become the necessity for researcher to analyze the data in order to make it meaningful. Massive amounts of data are collected across social media sites, mobile communications, business environments and institutions. In order to efficiently analyze this large quantity of raw data, the concept of big data was introduced. In this regard, big data analytic is needed in order to provide techniques to analyze the data. This new concept is expected to help education in the near future, by changing the way we approach the e-Learning process, by encouraging the interaction between learners and teachers, by allowing the fulfilment of the individual requirements and goals of learners. The learning environment generates massive knowledge by means of the various services provided in massive open online courses. Such knowledge is produced via learning actor interactions. Also, data analytics can be a valuable tool to help e-Learning organizations deliver better services to the public. It can provide important insights into consumer behavior and better predict demand for goods and services, thereby allowing for better resource management. This result motivates to put forward solutions for big data usage to the educational field. This research article unfolds a big data reference architecture for e-Learning analytical systems to make a unified analysis of the massive data generated by learning actors. This reference architecture makes the process of the massive data produced in big data e-learning system. Finally, the BiDRA for e-Learning analytical systems was evaluated based on the quality of maintainability, modularity, reusability, performance, and scalability

    Fourteenth Biennial Status Report: März 2017 - February 2019

    No full text

    Interactive Feature Selection and Visualization for Large Observational Data

    Get PDF
    Data can create enormous values in both scientific and industrial fields, especially for access to new knowledge and inspiration of innovation. As the massive increases in computing power, data storage capacity, as well as capability of data generation and collection, the scientific research communities are confronting with a transformation of exploiting the advanced uses of the large-scale, complex, and high-resolution data sets in situation awareness and decision-making projects. To comprehensively analyze the big data problems requires the analyses aiming at various aspects which involves of effective selections of static and time-varying feature patterns that fulfills the interests of domain users. To fully utilize the benefits of the ever-growing size of data and computing power in real applications, we proposed a general feature analysis pipeline and an integrated system that is general, scalable, and reliable for interactive feature selection and visualization of large observational data for situation awareness. The great challenge tackled in this dissertation was about how to effectively identify and select meaningful features in a complex feature space. Our research efforts mainly included three aspects: 1. Enable domain users to better define their interests of analysis; 2. Accelerate the process of feature selection; 3. Comprehensively present the intermediate and final analysis results in a visualized way. For static feature selection, we developed a series of quantitative metrics that related the user interest with the spatio-temporal characteristics of features. For timevarying feature selection, we proposed the concept of generalized feature set and used a generalized time-varying feature to describe the selection interest. Additionally, we provided a scalable system framework that manages both data processing and interactive visualization, and effectively exploits the computation and analysis resources. The methods and the system design together actualized interactive feature selections from two representative large observational data sets with large spatial and temporal resolutions respectively. The final results supported the endeavors in applications of big data analysis regarding combining the statistical methods with high performance computing techniques to visualize real events interactively

    A study of EU data protection regulation and appropriate security for digital services and platforms

    Get PDF
    A law often has more than one purpose, more than one intention, and more than one interpretation. A meticulously formulated and context agnostic law text will still, when faced with a field propelled by intense innovation, eventually become obsolete. The European Data Protection Directive is a good example of such legislation. It may be argued that the technological modifications brought on by the EU General Data Protection Regulation (GDPR) are nominal in comparison to the previous Directive, but from a business perspective the changes are significant and important. The Directive’s lack of direct economic incentive for companies to protect personal data has changed with the Regulation, as companies may now have to pay severe fines for violating the legislation. The objective of the thesis is to establish the notion of trust as a key design goal for information systems handling personal data. This includes interpreting the EU legislation on data protection and using the interpretation as a foundation for further investigation. This interpretation is connected to the areas of analytics, security, and privacy concerns for intelligent service development. Finally, the centralised platform business model and its challenges is examined, and three main resolution themes for regulating platform privacy are proposed. The aims of the proposed resolutions are to create a more trustful relationship between providers and data subjects, while also improving the conditions for competition and thus providing data subjects with service alternatives. The thesis contributes new insights into the evolving privacy practices in the digital society at an important time of transition from the service driven business models to the platform business models. Firstly, privacy-related regulation and state of the art analytics development are examined to understand their implications for intelligent services that are based on automated processing and profiling. The ability to choose between providers of intelligent services is identified as the core challenge. Secondly, the thesis examines what is meant by appropriate security for systems that handle personal data, something the GDPR requires that organisations use without however specifying what can be considered appropriate. We propose a method for active network security in web software that is developed through the use of analytics for detection and by inserting data generators into a software installation. The active network security method is proposed as a framework for achieving compliance with the GDPR requirements for services and platforms to use appropriate security. Thirdly, the platform business model is considered from the privacy point of view and the implication of “processing silos” for intelligent services. The centralised platform model is considered problematic from both the data subject and from the competition standpoint. A resolution is offered for enabling user-initiated open data flow to counter the centralised “processing silos”, and thereby to facilitate the introduction of decentralised platforms. The thesis provides an interdisciplinary analysis considering the legal study (lex lata) and additionally the resolution (lex ferenda) is defined through argumentativist legal dogmatics and (de lege ferenda) of how the legal framework ought to be adapted to fit the described environment. User-friendly Legal Science is applied as a theory framework to provide a holistic approach to answering the research questions. The User-friendly Legal Science theory has its roots in design science and offers a way towards achieving interdisciplinary research in the fields of information systems and legal science

    Software Engineering for Big Data Systems

    Get PDF
    Software engineering is the application of a systematic approach to designing, operating and maintaining software systems and the study of all the activities involved in achieving the same. The software engineering discipline and research into software systems flourished with the advent of computers and the technological revolution ushered in by the World Wide Web and the Internet. Software systems have grown dramatically to the point of becoming ubiquitous. They have a significant impact on the global economy and on how we interact and communicate with each other and with computers using software in our daily lives. However, there have been major changes in the type of software systems developed over the years. In the past decade owing to breakthrough advancements in cloud and mobile computing technologies, unprecedented volumes of hitherto inaccessible data, referred to as big data, has become available to technology companies and business organizations farsighted and discerning enough to use it to create new products, and services generating astounding profits. The advent of big data and software systems utilizing big data has presented a new sphere of growth for the software engineering discipline. Researchers, entrepreneurs and major corporations are all looking into big data systems to extract the maximum value from data available to them. Software engineering for big data systems is an emergent field that is starting to witness a lot of important research activity. This thesis investigates the application of software engineering knowledge areas and standard practices, established over the years by the software engineering research community, into developing big data systems by: - surveying the existing software engineering literature on applying software engineering principles into developing and supporting big data systems; - identifying the fields of application for big data systems; - investigating the software engineering knowledge areas that have seen research related to big data systems; - revealing the gaps in the knowledge areas that require more focus for big data systems development; and - determining the open research challenges in each software engineering knowledge area that need to be met. The analysis and results obtained from this thesis reveal that recent advances made in distributed computing, non-relational databases, and machine learning applications have lured the software engineering research and business communities primarily into focusing on system design and architecture of big data systems. Despite the instrumental role played by big data systems in the success of several businesses organizations and technology companies by transforming them into market leaders, developing and maintaining stable, robust, and scalable big data systems is still a distant milestone. This can be attributed to the paucity of much deserved research attention into more fundamental and equally important software engineering activities like requirements engineering, testing, and creating good quality assurance practices for big data systems

    Frequent Itemset Mining for Big Data

    Get PDF
    Traditional data mining tools, developed to extract actionable knowledge from data, demonstrated to be inadequate to process the huge amount of data produced nowadays. Even the most popular algorithms related to Frequent Itemset Mining, an exploratory data analysis technique used to discover frequent items co-occurrences in a transactional dataset, are inefficient with larger and more complex data. As a consequence, many parallel algorithms have been developed, based on modern frameworks able to leverage distributed computation in commodity clusters of machines (e.g., Apache Hadoop, Apache Spark). However, frequent itemset mining parallelization is far from trivial. The search-space exploration, on which all the techniques are based, is not easily partitionable. Hence, distributed frequent itemset mining is a challenging problem and an interesting research topic. In this context, our main contributions consist in an (i) exhaustive theoretical and experimental analysis of the best-in-class approaches, whose outcomes and open issues motivated (ii) the development of a distributed high-dimensional frequent itemset miner. The dissertation introduces also a data mining framework which takes strongly advantage of distributed frequent itemset mining for the extraction of a specific type of itemsets (iii). The theoretical analysis highlights the challenges related to the distribution and the preliminary partitioning of the frequent itemset mining problem (i.e. the search-space exploration) describing the most adopted distribution strategies. The extensive experimental campaign, instead, compares the expectations related to the algorithmic choices against the actual performances of the algorithms. We run more than 300 experiments in order to evaluate and discuss the performances of the algorithms with respect to different real life use cases and data distributions. The outcomes of the review is that no algorithm is universally superior and performances are heavily skewed by the data distribution. Moreover, we were able to identify a concrete lack as regards frequent pattern extraction within high-dimensional use cases. For this reason, we have developed our own distributed high-dimensional frequent itemset miner based on Apache Hadoop. The algorithm splits the search-space exploration into independent sub-tasks. However, since the exploration strongly benefits of a full-knowledge of the problem, we introduced an interleaving synchronization phase. The result is a trade-off between the benefits of a centralized state and the ones related to the additional computational power due to parallelism. The experimental benchmarks, performed on real-life high-dimensional use cases, show the efficiency of the proposed approach in terms of execution time, load balancing and reliability to memory issues. Finally, the dissertation introduces a data mining framework in which distributed itemset mining is a fundamental component of the processing pipeline. The aim of the framework is the extraction of a new type of itemsets, called misleading generalized itemsets
    • …
    corecore