207 research outputs found

    Metocean Big Data Processing Using Hadoop

    Get PDF
    This report will discuss about MapReduce and how it handles big data. In this report, Metocean (Meteorology and Oceanography) Data will be used as it consist of large data. As the number and type of data acquisition devices grows annually, the sheer size and rate of data being collected is rapidly expanding. These big data sets can contain gigabytes or terabytes of data, and can grow on the order of megabytes or gigabytes per day. While the collection of this information presents opportunities for insight, it also presents many challenges. Most algorithms are not designed to process big data sets in a reasonable amount of time or with a reasonable amount of memory. MapReduce allows us to meet many of these challenges to gain important insights from large data sets. The objective of this project is to use MapReduce to handle big data. MapReduce is a programming technique for analysing data sets that do not fit in memory. The problem statement chapter in this project will discuss on how MapReduce comes as an advantage to deal with large data. The literature review part will explain the definition of NoSQL and RDBMS, Hadoop Mapreduce and big data, things to do when selecting database, NoSQL database deployments, scenarios for using Hadoop and Hadoop real world example. The methodology part will explain the waterfall method used in this project development. The result and discussion will explain in details the result and discussion from my project. The last chapter in this project report is conclusion and recommendatio

    Fuzzy Quantified Queries to Fuzzy RDF Databases

    Get PDF
    International audienceIn a relational database context, fuzzy quantified queries have been long recognized for their ability to express different types of imprecise and flexible information needs. In this paper, we introduce the notion of fuzzy quantified statements in a (fuzzy) RDF database context. We show how these statements can be defined and implemented in FURQL, which is a fuzzy extension of the SPARQL query language that we previously proposed. Then, we present some experimental results that show the feasibility of this approach

    Methodological challenges and analytic opportunities for modeling and interpreting Big Healthcare Data

    Full text link
    Abstract Managing, processing and understanding big healthcare data is challenging, costly and demanding. Without a robust fundamental theory for representation, analysis and inference, a roadmap for uniform handling and analyzing of such complex data remains elusive. In this article, we outline various big data challenges, opportunities, modeling methods and software techniques for blending complex healthcare data, advanced analytic tools, and distributed scientific computing. Using imaging, genetic and healthcare data we provide examples of processing heterogeneous datasets using distributed cloud services, automated and semi-automated classification techniques, and open-science protocols. Despite substantial advances, new innovative technologies need to be developed that enhance, scale and optimize the management and processing of large, complex and heterogeneous data. Stakeholder investments in data acquisition, research and development, computational infrastructure and education will be critical to realize the huge potential of big data, to reap the expected information benefits and to build lasting knowledge assets. Multi-faceted proprietary, open-source, and community developments will be essential to enable broad, reliable, sustainable and efficient data-driven discovery and analytics. Big data will affect every sector of the economy and their hallmark will be ‘team science’.http://deepblue.lib.umich.edu/bitstream/2027.42/134522/1/13742_2016_Article_117.pd

    A comparison of statistical machine learning methods in heartbeat detection and classification

    Get PDF
    In health care, patients with heart problems require quick responsiveness in a clinical setting or in the operating theatre. Towards that end, automated classification of heartbeats is vital as some heartbeat irregularities are time consuming to detect. Therefore, analysis of electro-cardiogram (ECG) signals is an active area of research. The methods proposed in the literature depend on the structure of a heartbeat cycle. In this paper, we use interval and amplitude based features together with a few samples from the ECG signal as a feature vector. We studied a variety of classification algorithms focused especially on a type of arrhythmia known as the ventricular ectopic fibrillation (VEB). We compare the performance of the classifiers against algorithms proposed in the literature and make recommendations regarding features, sampling rate, and choice of the classifier to apply in a real-time clinical setting. The extensive study is based on the MIT-BIH arrhythmia database. Our main contribution is the evaluation of existing classifiers over a range sampling rates, recommendation of a detection methodology to employ in a practical setting, and extend the notion of a mixture of experts to a larger class of algorithms
    corecore