14,286 research outputs found

    Benchmarking Apache Spark and Hadoop MapReduce on Big Data Classification

    Full text link
    Most of the popular Big Data analytics tools evolved to adapt their working environment to extract valuable information from a vast amount of unstructured data. The ability of data mining techniques to filter this helpful information from Big Data led to the term Big Data Mining. Shifting the scope of data from small-size, structured, and stable data to huge volume, unstructured, and quickly changing data brings many data management challenges. Different tools cope with these challenges in their own way due to their architectural limitations. There are numerous parameters to take into consideration when choosing the right data management framework based on the task at hand. In this paper, we present a comprehensive benchmark for two widely used Big Data analytics tools, namely Apache Spark and Hadoop MapReduce, on a common data mining task, i.e., classification. We employ several evaluation metrics to compare the performance of the benchmarked frameworks, such as execution time, accuracy, and scalability. These metrics are specialized to measure the performance for classification task. To the best of our knowledge, there is no previous study in the literature that employs all these metrics while taking into consideration task-specific concerns. We show that Spark is 5 times faster than MapReduce on training the model. Nevertheless, the performance of Spark degrades when the input workload gets larger. Scaling the environment by additional clusters significantly improves the performance of Spark. However, similar enhancement is not observed in Hadoop. Machine learning utility of MapReduce tend to have better accuracy scores than that of Spark, like around 3%, even in small size data sets.Comment: 2021 5th International Conference on Cloud and Big Data Computing (ICCBDC 2021

    Optimization Technique for Efficient Dynamic Query Forms with Keyword Search and NoSQL

    Get PDF
    Modern web database as well as scientific database maintains tremendous and heterogeneous that is unstructured data. In order to mine this data traditional data mining technologies cannot work properly. These real word databases may contain hundreds or even thousands of relations and attributes. Latest trends like Big data and cloud computing that leads to the adoption of NoSQL which simply means Not Only SQL. In current scenario Most of the web applications are hosted in cloud and that available through internet. This create explosion in number of concurrent users. So the technique to handle unstructured data is proposed in our work named as Dynamic query forms with nosql. This system presents dynamic query form interface for database exploration of an organization. In this system document oriented NoSQL database is used for that purpose MONGODB is used which support dynamic queries that do not require predefined map reduce function. And further the process generation of a query form is an iterative process guided by user. At each step system automatically generate ranking list of form components and user adds the desired form component into query form and submit queries to view query result. Two traditional measures to evaluate the quality of query result i.e, precision and recall is presented. Quality measures can be derived using overall performance measure as Fscore DOI: 10.17762/ijritcc2321-8169.15070

    Sketch of Big Data Real-Time Analytics Model

    Get PDF
    Big Data has drawn huge attention from researchers in information sciences, decision makers in governments and enterprises. However, there is a lot of potential and highly useful value hidden in the huge volume of data. Data is the new oil, but unlike oil data can be refined further to create even more value. Therefore, a new scientific paradigm is born as data-intensive scientific discovery, also known as Big Data. The growth volume of real-time data requires new techniques and technologies to discover insight value. In this paper we introduce the Big Data real-time analytics model as a new technique. We discuss and compare several Big Data technologies for real-time processing along with various challenges and issues in adapting Big Data. Real-time Big Data analysis based on cloud computing approach is our future research direction

    Intelligent Management and Efficient Operation of Big Data

    Get PDF
    This chapter details how Big Data can be used and implemented in networking and computing infrastructures. Specifically, it addresses three main aspects: the timely extraction of relevant knowledge from heterogeneous, and very often unstructured large data sources, the enhancement on the performance of processing and networking (cloud) infrastructures that are the most important foundational pillars of Big Data applications or services, and novel ways to efficiently manage network infrastructures with high-level composed policies for supporting the transmission of large amounts of data with distinct requisites (video vs. non-video). A case study involving an intelligent management solution to route data traffic with diverse requirements in a wide area Internet Exchange Point is presented, discussed in the context of Big Data, and evaluated.Comment: In book Handbook of Research on Trends and Future Directions in Big Data and Web Intelligence, IGI Global, 201

    Challenges of Internet of Things and Big Data Integration

    Full text link
    The Internet of Things anticipates the conjunction of physical gadgets to the In-ternet and their access to wireless sensor data which makes it expedient to restrain the physical world. Big Data convergence has put multifarious new opportunities ahead of business ventures to get into a new market or enhance their operations in the current market. considering the existing techniques and technologies, it is probably safe to say that the best solution is to use big data tools to provide an analytical solution to the Internet of Things. Based on the current technology deployment and adoption trends, it is envisioned that the Internet of Things is the technology of the future, while to-day's real-world devices can provide real and valuable analytics, and people in the real world use many IoT devices. Despite all the advertisements that companies offer in connection with the Internet of Things, you as a liable consumer, have the right to be suspicious about IoT advertise-ments. The primary question is: What is the promise of the Internet of things con-cerning reality and what are the prospects for the future.Comment: Proceedings of the International Conference on International Conference on Emerging Technologies in Computing 2018 (iCETiC '18), 23rd -24th August, 2018, at London Metropolitan University, London, UK, Published by Springer-Verla

    Medical data processing and analysis for remote health and activities monitoring

    Get PDF
    Recent developments in sensor technology, wearable computing, Internet of Things (IoT), and wireless communication have given rise to research in ubiquitous healthcare and remote monitoring of human\u2019s health and activities. Health monitoring systems involve processing and analysis of data retrieved from smartphones, smart watches, smart bracelets, as well as various sensors and wearable devices. Such systems enable continuous monitoring of patients psychological and health conditions by sensing and transmitting measurements such as heart rate, electrocardiogram, body temperature, respiratory rate, chest sounds, or blood pressure. Pervasive healthcare, as a relevant application domain in this context, aims at revolutionizing the delivery of medical services through a medical assistive environment and facilitates the independent living of patients. In this chapter, we discuss (1) data collection, fusion, ownership and privacy issues; (2) models, technologies and solutions for medical data processing and analysis; (3) big medical data analytics for remote health monitoring; (4) research challenges and opportunities in medical data analytics; (5) examples of case studies and practical solutions
    corecore