4,225 research outputs found

    Towards a big data reference architecture

    Get PDF

    Social media analytics: a survey of techniques, tools and platforms

    Get PDF
    This paper is written for (social science) researchers seeking to analyze the wealth of social media now available. It presents a comprehensive review of software tools for social networking media, wikis, really simple syndication feeds, blogs, newsgroups, chat and news feeds. For completeness, it also includes introductions to social media scraping, storage, data cleaning and sentiment analysis. Although principally a review, the paper also provides a methodology and a critique of social media tools. Analyzing social media, in particular Twitter feeds for sentiment analysis, has become a major research and business activity due to the availability of web-based application programming interfaces (APIs) provided by Twitter, Facebook and News services. This has led to an ‘explosion’ of data services, software tools for scraping and analysis and social media analytics platforms. It is also a research area undergoing rapid change and evolution due to commercial pressures and the potential for using social media data for computational (social science) research. Using a simple taxonomy, this paper provides a review of leading software tools and how to use them to scrape, cleanse and analyze the spectrum of social media. In addition, it discussed the requirement of an experimental computational environment for social media research and presents as an illustration the system architecture of a social media (analytics) platform built by University College London. The principal contribution of this paper is to provide an overview (including code fragments) for scientists seeking to utilize social media scraping and analytics either in their research or business. The data retrieval techniques that are presented in this paper are valid at the time of writing this paper (June 2014), but they are subject to change since social media data scraping APIs are rapidly changing

    On the security of NoSQL cloud database services

    Get PDF
    Processing a vast volume of data generated by web, mobile and Internet-enabled devices, necessitates a scalable and flexible data management system. Database-as-a-Service (DBaaS) is a new cloud computing paradigm, promising a cost-effective and scalable, fully-managed database functionality meeting the requirements of online data processing. Although DBaaS offers many benefits it also introduces new threats and vulnerabilities. While many traditional data processing threats remain, DBaaS introduces new challenges such as confidentiality violation and information leakage in the presence of privileged malicious insiders and adds new dimension to the data security. We address the problem of building a secure DBaaS for a public cloud infrastructure where, the Cloud Service Provider (CSP) is not completely trusted by the data owner. We present a high level description of several architectures combining modern cryptographic primitives for achieving this goal. A novel searchable security scheme is proposed to leverage secure query processing in presence of a malicious cloud insider without disclosing sensitive information. A holistic database security scheme comprised of data confidentiality and information leakage prevention is proposed in this dissertation. The main contributions of our work are: (i) A searchable security scheme for non-relational databases of the cloud DBaaS; (ii) Leakage minimization in the untrusted cloud. The analysis of experiments that employ a set of established cryptographic techniques to protect databases and minimize information leakage, proves that the performance of the proposed solution is bounded by communication cost rather than by the cryptographic computational effort

    Big Data Now, 2015 Edition

    Get PDF
    Now in its fifth year, O’Reilly’s annual Big Data Now report recaps the trends, tools, applications, and forecasts we’ve talked about over the past year. For 2015, we’ve included a collection of blog posts, authored by leading thinkers and experts in the field, that reflect a unique set of themes we’ve identified as gaining significant attention and traction. Our list of 2015 topics include: Data-driven cultures Data science Data pipelines Big data architecture and infrastructure The Internet of Things and real time Applications of big data Security, ethics, and governance Is your organization on the right track? Get a hold of this free report now and stay in tune with the latest significant developments in big data

    IIoT Data Ness: From Streaming to Added Value

    Get PDF
    In the emerging Industry 4.0 paradigm, the internet of things has been an innovation driver, allowing for environment visibility and control through sensor data analysis. However the data is of such volume and velocity that data quality cannot be assured by conventional architectures. It has been argued that the quality and observability of data are key to a project’s success, allowing users to interact with data more effectively and rapidly. In order for a project to become successful in this context, it is of imperative importance to incorporate data quality mechanisms in order to extract the most value out of data. If this goal is achieved one can expect enormous advantages that could lead to financial and innovation gains for the industry. To cope with this reality, this work presents a data mesh oriented methodology based on the state-of-the-art data management tools that exist to design a solution which leverages data quality in the Industrial Internet of Things (IIoT) space, through data contextualization. In order to achieve this goal, practices such as FAIR data principles and data observability concepts were incorporated into the solution. The result of this work allowed for the creation of an architecture that focuses on data and metadata management to elevate data context, ownership and quality.O conceito de Internet of Things (IoT) é um dos principais fatores de sucesso para a nova Indústria 4.0. Através de análise de dados sobre os valores que os sensores coletam no seu ambiente, é possível a construção uma plataforma capaz de identificar condições de sucesso e eventuais problemas antes que estes ocorram, resultando em ganho monetário relevante para as empresas. No entanto, este caso de uso não é de fácil implementação, devido à elevada quantidade e velocidade de dados proveniente de um ambiente de IIoT (Industrial Internet of Things)

    A comparison of statistical machine learning methods in heartbeat detection and classification

    Get PDF
    In health care, patients with heart problems require quick responsiveness in a clinical setting or in the operating theatre. Towards that end, automated classification of heartbeats is vital as some heartbeat irregularities are time consuming to detect. Therefore, analysis of electro-cardiogram (ECG) signals is an active area of research. The methods proposed in the literature depend on the structure of a heartbeat cycle. In this paper, we use interval and amplitude based features together with a few samples from the ECG signal as a feature vector. We studied a variety of classification algorithms focused especially on a type of arrhythmia known as the ventricular ectopic fibrillation (VEB). We compare the performance of the classifiers against algorithms proposed in the literature and make recommendations regarding features, sampling rate, and choice of the classifier to apply in a real-time clinical setting. The extensive study is based on the MIT-BIH arrhythmia database. Our main contribution is the evaluation of existing classifiers over a range sampling rates, recommendation of a detection methodology to employ in a practical setting, and extend the notion of a mixture of experts to a larger class of algorithms

    Healthy Transportation Choices with IoT and Smart Nudging

    Get PDF
    Modern technology has provided people with ease of living but at the same time has given birth to the problems of equally modern nature. For instance, high reliance on private transportation has resulted in unintended consequences such as high level of air pollution and congestion in urban cities. Another main disadvantage that is often overlooked is related to the rise of several noncommunicable diseases that are caused due to excessive dependence on cars and lack of physical activity. This thesis is entirely dedicated to encounter serious hazards of lack of physical activity by choosing unhealthy transportation choices. The interaction between people and the computers has become ubiquitous over the span of years. People interact in digital environment for a number of reasons. From checking weather conditions to running multinational trading businesses, computer driven digital automation has taken over what has always remained a manual handiwork. Cognizant of the potency of computer driven services and its authority, we propose applying nudge theory to encourage users to choose healthy options when it comes to any type of mobility. The first step involves researching about collecting, storing and performing analysis on data from different resources and then suggesting different techniques to manipulate it in order to perform an effective nudge
    corecore