280 research outputs found

    A survey of large-scale reasoning on the Web of data

    Get PDF
    As more and more data is being generated by sensor networks, social media and organizations, the Webinterlinking this wealth of information becomes more complex. This is particularly true for the so-calledWeb of Data, in which data is semantically enriched and interlinked using ontologies. In this large anduncoordinated environment, reasoning can be used to check the consistency of the data and of asso-ciated ontologies, or to infer logical consequences which, in turn, can be used to obtain new insightsfrom the data. However, reasoning approaches need to be scalable in order to enable reasoning over theentire Web of Data. To address this problem, several high-performance reasoning systems, whichmainly implement distributed or parallel algorithms, have been proposed in the last few years. Thesesystems differ significantly; for instance in terms of reasoning expressivity, computational propertiessuch as completeness, or reasoning objectives. In order to provide afirst complete overview of thefield,this paper reports a systematic review of such scalable reasoning approaches over various ontologicallanguages, reporting details about the methods and over the conducted experiments. We highlight theshortcomings of these approaches and discuss some of the open problems related to performing scalablereasoning

    Big data in Finnish companies: What has been done and the results achieved

    Get PDF
    This study considers the topical subject of big data in the context of Finnish business environment. A need for clarifying the concept of big data is recognized. The first objective of the research is to untangle, introduce and analyse the concept of big data. Also a research gap in the area of big data utilization in Finnish companies is identified. Another objective is to empirically assess the state of big data in Finnish companies and the results that the companies have achieved by utilizing big data. The research questions draw from the theoretical models of big data maturity and success factors of business intelligence implementations. The research questions include the following: 1. How big data mature are Finnish companies that are utilizing big data? 2. Is the big data maturity model applicable in Finnish business environment and does it succeed in differentiating the companies with different levels of maturity? 3. What kind of external factors and internal competences the companies recognize as defining their big data potential. 4. What are the factors that Finnish companies identify as contributing towards success in big data efforts? 5. Are the identified success factors aligned with the model of Yeoh and Koronios (2010)? The method of multiple case study is applied in order to answer the research questions. The data consists of ten interviews with experts of large Finnish companies utilizing big data. In the analysis of the interview data the maturity model is successfully applied in Finnish context. Interviewed companies are mostly on the early stages of big data maturity. However, they are using big data in versatile ways and in many different business functions. SWOT-analysis is used as a tool to recognize external and internal factors defining the big data potential of the companies. A large share of the companies report access to large amounts of data and strong know-how in the analytics area as their strengths. Identified weaknesses include e.g. lack of organizational agility. Increasing availability of data and improving technological solutions are seen as the most predominant opportunities. Fast pace of change and fierce competition in the area are things that the companies find challenging. Other mentioned treats include crisis with privacy and changing privacy legislation. Evidence of all other success factors of Yeoh's and Koronios' (2010) model is found in the interview data except the success factors in the technology dimension. This might indicate that the companies do not consider the technological factors as critical as factors related to the process and organization. The data also suggests that the success factor of openness to look for partners and solutions outside the company's own industry should be included in the organizational success factors

    pmSys: Implementation of a digital Player Monitoring System

    Get PDF
    A football match can be determined by the smallest factors such as mood, however, but other factors as injuries can determine whether you place first or second. The teams with the least injuried players would have a better edge in reaching the top each season. Since the beginning of monitoring in football it has all been registered by hand using paper and pen. During the 21th century technology has been one of the best and most accurate helping hand any area within monitoring can get. Being able to process large amounts of data in split seconds has proven to be worth the investment in going digital when it comes to monitoring. On the basis of this, pmSys was created to enhance the power of processing personal data in real time. In this master thesis we wanted to develop a system for both the football players and trainers to be able to register and follow up the submitted data in real time. By giving a team these tools we wish to constitute the small factor that can push any football team to the limits without going over the edge and into an injury nightmare

    IoT for measurements and measurements for IoT

    Get PDF
    The thesis is framed in the broad strand of the Internet of Things, providing two parallel paths. On one hand, it deals with the identification of operational scenarios in which the IoT paradigm could be innovative and preferable to pre-existing solutions, discussing in detail a couple of applications. On the other hand, the thesis presents methodologies to assess the performance of technologies and related enabling protocols for IoT systems, focusing mainly on metrics and parameters related to the functioning of the physical layer of the systems

    Proliferating Cloud Density through Big Data Ecosystem, Novel XCLOUDX Classification and Emergence of as-a-Service Era

    Get PDF
    Big Data is permeating through the bigger aspect of human life for scientific and commercial dependencies, especially for massive scale data analytics of beyond the exabyte magnitude. As the footprint of Big Data applications is continuously expanding, the reliability on cloud environments is also increasing to obtain appropriate, robust and affordable services to deal with Big Data challenges. Cloud computing avoids any need to locally maintain the overly scaled computing infrastructure that include not only dedicated space, but the expensive hardware and software also. Several data models to process Big Data are already developed and a number of such models are still emerging, potentially relying on heterogeneous underlying storage technologies, including cloud computing. In this paper, we investigate the growing role of cloud computing in Big Data ecosystem. Also, we propose a novel XCLOUDX {XCloudX, X…X} classification to zoom in to gauge the intuitiveness of the scientific name of the cloud-assisted NoSQL Big Data models and analyze whether XCloudX always uses cloud computing underneath or vice versa. XCloudX symbolizes those NoSQL Big Data models that embody the term “cloud” in their name, where X is any alphanumeric variable. The discussion is strengthen by a set of important case studies. Furthermore, we study the emergence of as-a-Service era, motivated by cloud computing drive and explore the new members beyond traditional cloud computing stack, developed over the last few years

    A Data-driven Methodology Towards Mobility- and Traffic-related Big Spatiotemporal Data Frameworks

    Get PDF
    Human population is increasing at unprecedented rates, particularly in urban areas. This increase, along with the rise of a more economically empowered middle class, brings new and complex challenges to the mobility of people within urban areas. To tackle such challenges, transportation and mobility authorities and operators are trying to adopt innovative Big Data-driven Mobility- and Traffic-related solutions. Such solutions will help decision-making processes that aim to ease the load on an already overloaded transport infrastructure. The information collected from day-to-day mobility and traffic can help to mitigate some of such mobility challenges in urban areas. Road infrastructure and traffic management operators (RITMOs) face several limitations to effectively extract value from the exponentially growing volumes of mobility- and traffic-related Big Spatiotemporal Data (MobiTrafficBD) that are being acquired and gathered. Research about the topics of Big Data, Spatiotemporal Data and specially MobiTrafficBD is scattered, and existing literature does not offer a concrete, common methodological approach to setup, configure, deploy and use a complete Big Data-based framework to manage the lifecycle of mobility-related spatiotemporal data, mainly focused on geo-referenced time series (GRTS) and spatiotemporal events (ST Events), extract value from it and support decision-making processes of RITMOs. This doctoral thesis proposes a data-driven, prescriptive methodological approach towards the design, development and deployment of MobiTrafficBD Frameworks focused on GRTS and ST Events. Besides a thorough literature review on Spatiotemporal Data, Big Data and the merging of these two fields through MobiTraffiBD, the methodological approach comprises a set of general characteristics, technical requirements, logical components, data flows and technological infrastructure models, as well as guidelines and best practices that aim to guide researchers, practitioners and stakeholders, such as RITMOs, throughout the design, development and deployment phases of any MobiTrafficBD Framework. This work is intended to be a supporting methodological guide, based on widely used Reference Architectures and guidelines for Big Data, but enriched with inherent characteristics and concerns brought about by Big Spatiotemporal Data, such as in the case of GRTS and ST Events. The proposed methodology was evaluated and demonstrated in various real-world use cases that deployed MobiTrafficBD-based Data Management, Processing, Analytics and Visualisation methods, tools and technologies, under the umbrella of several research projects funded by the European Commission and the Portuguese Government.A população humana cresce a um ritmo sem precedentes, particularmente nas áreas urbanas. Este aumento, aliado ao robustecimento de uma classe média com maior poder económico, introduzem novos e complexos desafios na mobilidade de pessoas em áreas urbanas. Para abordar estes desafios, autoridades e operadores de transportes e mobilidade estão a adotar soluções inovadoras no domínio dos sistemas de Dados em Larga Escala nos domínios da Mobilidade e Tráfego. Estas soluções irão apoiar os processos de decisão com o intuito de libertar uma infraestrutura de estradas e transportes já sobrecarregada. A informação colecionada da mobilidade diária e da utilização da infraestrutura de estradas pode ajudar na mitigação de alguns dos desafios da mobilidade urbana. Os operadores de gestão de trânsito e de infraestruturas de estradas (em inglês, road infrastructure and traffic management operators — RITMOs) estão limitados no que toca a extrair valor de um sempre crescente volume de Dados Espaciotemporais em Larga Escala no domínio da Mobilidade e Tráfego (em inglês, Mobility- and Traffic-related Big Spatiotemporal Data —MobiTrafficBD) que estão a ser colecionados e recolhidos. Os trabalhos de investigação sobre os tópicos de Big Data, Dados Espaciotemporais e, especialmente, de MobiTrafficBD, estão dispersos, e a literatura existente não oferece uma metodologia comum e concreta para preparar, configurar, implementar e usar uma plataforma (framework) baseada em tecnologias Big Data para gerir o ciclo de vida de dados espaciotemporais em larga escala, com ênfase nas série temporais georreferenciadas (em inglês, geo-referenced time series — GRTS) e eventos espacio- temporais (em inglês, spatiotemporal events — ST Events), extrair valor destes dados e apoiar os RITMOs nos seus processos de decisão. Esta dissertação doutoral propõe uma metodologia prescritiva orientada a dados, para o design, desenvolvimento e implementação de plataformas de MobiTrafficBD, focadas em GRTS e ST Events. Além de uma revisão de literatura completa nas áreas de Dados Espaciotemporais, Big Data e na junção destas áreas através do conceito de MobiTrafficBD, a metodologia proposta contem um conjunto de características gerais, requisitos técnicos, componentes lógicos, fluxos de dados e modelos de infraestrutura tecnológica, bem como diretrizes e boas práticas para investigadores, profissionais e outras partes interessadas, como RITMOs, com o objetivo de guiá-los pelas fases de design, desenvolvimento e implementação de qualquer pla- taforma MobiTrafficBD. Este trabalho deve ser visto como um guia metodológico de suporte, baseado em Arqui- teturas de Referência e diretrizes amplamente utilizadas, mas enriquecido com as característi- cas e assuntos implícitos relacionados com Dados Espaciotemporais em Larga Escala, como no caso de GRTS e ST Events. A metodologia proposta foi avaliada e demonstrada em vários cenários reais no âmbito de projetos de investigação financiados pela Comissão Europeia e pelo Governo português, nos quais foram implementados métodos, ferramentas e tecnologias nas áreas de Gestão de Dados, Processamento de Dados e Ciência e Visualização de Dados em plataformas MobiTrafficB

    Investigating system intrusions with data provenance analytics

    Get PDF
    To aid threat detection and investigation, enterprises are increasingly relying on commercially available security solutions, such as Intrusion Detection Systems (IDS) and Endpoint Detection and Response (EDR) tools. These security solutions first collect and analyze audit logs throughout the enterprise and then generate threat alerts when suspicious activities occur. Later, security analysts investigate those threat alerts to separate false alarms from true attacks by extracting contextual history from the audit logs, i.e., the trail of events that caused the threat alert. Unfortunately, investigating threats in enterprises is a notoriously difficult task, even for expert analysts, due to two main challenges. First, existing enterprise security solutions are optimized to miss as few threats as possible – as a result, they generate an overwhelming volume of false alerts, creating a backlog of investigation tasks. Second, modern computing systems are operationally complex that produce an enormous volume of audit logs per day, making it difficult to correlate events for threats that span across multiple processes, applications, and hosts. In this dissertation, I propose leveraging data provenance analytics to address the challenges mentioned above. I present five provenance-based techniques that enable system defenders to effectively and efficiently investigate malicious behaviors in enterprise settings. First, I present NoDoze, an alert triage system that automatically prioritizes generated alerts based on their anomalous contextual history. Following that, RapSheet brings benefits of data provenance to commercial EDR tools and provides compact visualization of multi-stage attacks to system defenders. Swift then realized a provenance graph database that generates contextual history around generated alerts in real-time even when analyzing audit logs containing tens of millions of events. Finally, OmegaLog and Zeek Agent introduced the vision of universal provenance analysis, which unifies all forensically relevant provenance information on the system regardless of their layer of origin, improving investigation capabilities

    Towards Practical Access Control and Usage Control on the Cloud using Trusted Hardware

    Get PDF
    Cloud-based platforms have become the principle way to store, share, and synchronize files online. For individuals and organizations alike, cloud storage not only provides resource scalability and on-demand access at a low cost, but also eliminates the necessity of provisioning and maintaining complex hardware installations. Unfortunately, because cloud-based platforms are frequent victims of data breaches and unauthorized disclosures, data protection obliges both access control and usage control to manage user authorization and regulate future data use. Encryption can ensure data security against unauthorized parties, but complicates file sharing which now requires distributing keys to authorized users, and a mechanism that prevents revoked users from accessing or modifying sensitive content. Further, as user data is stored and processed on remote ma- chines, usage control in a distributed setting requires incorporating the local environmental context at policy evaluation, as well as tamper-proof and non-bypassable enforcement. Existing cryptographic solutions either require server-side coordination, offer limited flexibility in data sharing, or incur significant re-encryption overheads on user revocation. This combination of issues are ill-suited within large-scale distributed environments where there are a large number of users, dynamic changes in user membership and access privileges, and resources are shared across organizational domains. Thus, developing a robust security and privacy solution for the cloud requires: fine-grained access control to associate the largest set of users and resources with variable granularity, scalable administration costs when managing policies and access rights, and cross-domain policy enforcement. To address the above challenges, this dissertation proposes a practical security solution that relies solely on commodity trusted hardware to ensure confidentiality and integrity throughout the data lifecycle. The aim is to maintain complete user ownership against external hackers and malicious service providers, without losing the scalability or availability benefits of cloud storage. Furthermore, we develop a principled approach that is: (i) portable across storage platforms without requiring any server-side support or modifications, (ii) flexible in allowing users to selectively share their data using fine-grained access control, and (iii) performant by imposing modest overheads on standard user workloads. Essentially, our system must be client-side, provide end-to-end data protection and secure sharing, without significant degradation in performance or user experience. We introduce NeXUS, a privacy-preserving filesystem that enables cryptographic protection and secure file sharing on existing network-based storage services. NeXUS protects the confidentiality and integrity of file content, as well as file and directory names, while mitigating against rollback attacks of the filesystem hierarchy. We also introduce Joplin, a secure access control and usage control system that provides practical attribute-based sharing with decentralized policy administration, including efficient revocation, multi-domain policies, secure user delegation, and mandatory audit logging. Both systems leverage trusted hardware to prevent the leakage of sensitive material such as encryption keys and access control policies; they are completely client-side, easy to install and use, and can be readily deployed across remote storage platforms without requiring any server-side changes or trusted intermediary. We developed prototypes for NeXUS and Joplin, and evaluated their respective overheads in isolation and within a real-world environment. Results show that both prototypes introduce modest overheads on interactive workloads, and achieve portability across storage platforms, including Dropbox and AFS. Together, NeXUS and Joplin demonstrate that a client-side solution employing trusted hardware such as Intel SGX can effectively protect remotely stored data on existing file sharing services

    Technologies and Applications for Big Data Value

    Get PDF
    This open access book explores cutting-edge solutions and best practices for big data and data-driven AI applications for the data-driven economy. It provides the reader with a basis for understanding how technical issues can be overcome to offer real-world solutions to major industrial areas. The book starts with an introductory chapter that provides an overview of the book by positioning the following chapters in terms of their contributions to technology frameworks which are key elements of the Big Data Value Public-Private Partnership and the upcoming Partnership on AI, Data and Robotics. The remainder of the book is then arranged in two parts. The first part “Technologies and Methods” contains horizontal contributions of technologies and methods that enable data value chains to be applied in any sector. The second part “Processes and Applications” details experience reports and lessons from using big data and data-driven approaches in processes and applications. Its chapters are co-authored with industry experts and cover domains including health, law, finance, retail, manufacturing, mobility, and smart cities. Contributions emanate from the Big Data Value Public-Private Partnership and the Big Data Value Association, which have acted as the European data community's nucleus to bring together businesses with leading researchers to harness the value of data to benefit society, business, science, and industry. The book is of interest to two primary audiences, first, undergraduate and postgraduate students and researchers in various fields, including big data, data science, data engineering, and machine learning and AI. Second, practitioners and industry experts engaged in data-driven systems, software design and deployment projects who are interested in employing these advanced methods to address real-world problems
    corecore