207 research outputs found
Metocean Big Data Processing Using Hadoop
This report will discuss about MapReduce and how it handles big data. In this report, Metocean (Meteorology and Oceanography) Data will be used as it consist of large data. As the number and type of data acquisition devices grows annually, the sheer size and rate of data being collected is rapidly expanding. These big data sets can contain gigabytes or terabytes of data, and can grow on the order of megabytes or gigabytes per day. While the collection of this information presents opportunities for insight, it also presents many challenges. Most algorithms are not designed to process big data sets in a reasonable amount of time or with a reasonable amount of memory. MapReduce allows us to meet many of these challenges to gain important insights from large data sets. The objective of this project is to use MapReduce to handle big data. MapReduce is a programming technique for analysing data sets that do not fit in memory. The problem statement chapter in this project will discuss on how MapReduce comes as an advantage to deal with large data. The literature review part will explain the definition of NoSQL and RDBMS, Hadoop Mapreduce and big data, things to do when selecting database, NoSQL database deployments, scenarios for using Hadoop and Hadoop real world example. The methodology part will explain the waterfall method used in this project development. The result and discussion will explain in details the result and discussion from my project. The last chapter in this project report is conclusion and recommendatio
Fuzzy Quantified Queries to Fuzzy RDF Databases
International audienceIn a relational database context, fuzzy quantified queries have been long recognized for their ability to express different types of imprecise and flexible information needs. In this paper, we introduce the notion of fuzzy quantified statements in a (fuzzy) RDF database context. We show how these statements can be defined and implemented in FURQL, which is a fuzzy extension of the SPARQL query language that we previously proposed. Then, we present some experimental results that show the feasibility of this approach
Methodological challenges and analytic opportunities for modeling and interpreting Big Healthcare Data
Abstract
Managing, processing and understanding big healthcare data is challenging, costly and demanding. Without a robust fundamental theory for representation, analysis and inference, a roadmap for uniform handling and analyzing of such complex data remains elusive. In this article, we outline various big data challenges, opportunities, modeling methods and software techniques for blending complex healthcare data, advanced analytic tools, and distributed scientific computing. Using imaging, genetic and healthcare data we provide examples of processing heterogeneous datasets using distributed cloud services, automated and semi-automated classification techniques, and open-science protocols. Despite substantial advances, new innovative technologies need to be developed that enhance, scale and optimize the management and processing of large, complex and heterogeneous data. Stakeholder investments in data acquisition, research and development, computational infrastructure and education will be critical to realize the huge potential of big data, to reap the expected information benefits and to build lasting knowledge assets. Multi-faceted proprietary, open-source, and community developments will be essential to enable broad, reliable, sustainable and efficient data-driven discovery and analytics. Big data will affect every sector of the economy and their hallmark will be ‘team science’.http://deepblue.lib.umich.edu/bitstream/2027.42/134522/1/13742_2016_Article_117.pd
A comparison of statistical machine learning methods in heartbeat detection and classification
In health care, patients with heart problems require quick responsiveness in a clinical setting or in the operating theatre. Towards that end, automated classification of heartbeats is vital as some heartbeat irregularities are time consuming to detect. Therefore, analysis of electro-cardiogram (ECG) signals is an active area of research. The methods proposed in the literature depend on the structure of a heartbeat cycle. In this paper, we use interval and amplitude based features together with a few samples from the ECG signal as a feature vector. We studied a variety of classification algorithms focused especially on a type of arrhythmia known as the ventricular ectopic fibrillation (VEB). We compare the performance of the classifiers against algorithms proposed in the literature and make recommendations regarding features, sampling rate, and choice of the classifier to apply in a real-time clinical setting. The extensive study is based on the MIT-BIH arrhythmia database. Our main contribution is the evaluation of existing classifiers over a range sampling rates, recommendation of a detection methodology to employ in a practical setting, and extend the notion of a mixture of experts to a larger class of algorithms
Recommended from our members
An automated methodology for rapid information extraction from large drilling datasets
Extracting information and knowledge from large datasets often takes a significant amount of time in collecting, cleaning and processing the data. This process, from data curation to data interpretation can last from a couple of weeks to several months. Therefore, a structured methodology is developed using concepts such as spider bots and storyboarding to rapidly extract meaningful information from drilling datasets. Three categories of spider bots are identified: cleansing bots, processing bots and indexing bots. These bots efficiently (1) cleanse raw data that may be structured, semi-structured or unstructured, (2) process the cleansed data, and then (3) create index tables so that information can be efficiently retrieved. Next, the storyboarding concept is used to construct a series of visualizations from the information categorized and indexed in the database. Lastly, depending on the question that needs to be answered from the data in the database, a visual report, which contains a summary table and a set of graphs, are generated and presented to the end user. Now, a process that used to take weeks or even months when done manually only takes seconds to generate and present an answer. The method and its effectiveness in rapidly retrieving information from large datasets is demonstrated on a field dataset consisting of five wells on a drilling pad.Civil, Architectural, and Environmental Engineerin
- …