14,286 research outputs found
Benchmarking Apache Spark and Hadoop MapReduce on Big Data Classification
Most of the popular Big Data analytics tools evolved to adapt their working
environment to extract valuable information from a vast amount of unstructured
data. The ability of data mining techniques to filter this helpful information
from Big Data led to the term Big Data Mining. Shifting the scope of data from
small-size, structured, and stable data to huge volume, unstructured, and
quickly changing data brings many data management challenges. Different tools
cope with these challenges in their own way due to their architectural
limitations. There are numerous parameters to take into consideration when
choosing the right data management framework based on the task at hand. In this
paper, we present a comprehensive benchmark for two widely used Big Data
analytics tools, namely Apache Spark and Hadoop MapReduce, on a common data
mining task, i.e., classification. We employ several evaluation metrics to
compare the performance of the benchmarked frameworks, such as execution time,
accuracy, and scalability. These metrics are specialized to measure the
performance for classification task. To the best of our knowledge, there is no
previous study in the literature that employs all these metrics while taking
into consideration task-specific concerns. We show that Spark is 5 times faster
than MapReduce on training the model. Nevertheless, the performance of Spark
degrades when the input workload gets larger. Scaling the environment by
additional clusters significantly improves the performance of Spark. However,
similar enhancement is not observed in Hadoop. Machine learning utility of
MapReduce tend to have better accuracy scores than that of Spark, like around
3%, even in small size data sets.Comment: 2021 5th International Conference on Cloud and Big Data Computing
(ICCBDC 2021
Optimization Technique for Efficient Dynamic Query Forms with Keyword Search and NoSQL
Modern web database as well as scientific database maintains tremendous and heterogeneous that is unstructured data. In order to mine this data traditional data mining technologies cannot work properly. These real word databases may contain hundreds or even thousands of relations and attributes. Latest trends like Big data and cloud computing that leads to the adoption of NoSQL which simply means Not Only SQL. In current scenario Most of the web applications are hosted in cloud and that available through internet. This create explosion in number of concurrent users. So the technique to handle unstructured data is proposed in our work named as Dynamic query forms with nosql. This system presents dynamic query form interface for database exploration of an organization. In this system document oriented NoSQL database is used for that purpose MONGODB is used which support dynamic queries that do not require predefined map reduce function. And further the process generation of a query form is an iterative process guided by user. At each step system automatically generate ranking list of form components and user adds the desired form component into query form and submit queries to view query result. Two traditional measures to evaluate the quality of query result i.e, precision and recall is presented. Quality measures can be derived using overall performance measure as Fscore
DOI: 10.17762/ijritcc2321-8169.15070
Sketch of Big Data Real-Time Analytics Model
Big Data has drawn huge attention from researchers in information sciences, decision makers in governments and enterprises. However, there is a lot of potential and highly useful value hidden in the huge volume of data. Data is the new oil, but unlike oil data can be refined further to create even more value. Therefore, a new scientific paradigm is born as data-intensive scientific discovery, also known as Big Data. The growth volume of real-time data requires new techniques and technologies to discover insight value. In this paper we introduce the Big Data real-time analytics model as a new technique. We discuss and compare several Big Data technologies for real-time processing along with various challenges and issues in adapting Big Data. Real-time Big Data analysis based on cloud computing approach is our future research direction
Intelligent Management and Efficient Operation of Big Data
This chapter details how Big Data can be used and implemented in networking
and computing infrastructures. Specifically, it addresses three main aspects:
the timely extraction of relevant knowledge from heterogeneous, and very often
unstructured large data sources, the enhancement on the performance of
processing and networking (cloud) infrastructures that are the most important
foundational pillars of Big Data applications or services, and novel ways to
efficiently manage network infrastructures with high-level composed policies
for supporting the transmission of large amounts of data with distinct
requisites (video vs. non-video). A case study involving an intelligent
management solution to route data traffic with diverse requirements in a wide
area Internet Exchange Point is presented, discussed in the context of Big
Data, and evaluated.Comment: In book Handbook of Research on Trends and Future Directions in Big
Data and Web Intelligence, IGI Global, 201
Challenges of Internet of Things and Big Data Integration
The Internet of Things anticipates the conjunction of physical gadgets to the
In-ternet and their access to wireless sensor data which makes it expedient to
restrain the physical world. Big Data convergence has put multifarious new
opportunities ahead of business ventures to get into a new market or enhance
their operations in the current market. considering the existing techniques and
technologies, it is probably safe to say that the best solution is to use big
data tools to provide an analytical solution to the Internet of Things. Based
on the current technology deployment and adoption trends, it is envisioned that
the Internet of Things is the technology of the future, while to-day's
real-world devices can provide real and valuable analytics, and people in the
real world use many IoT devices. Despite all the advertisements that companies
offer in connection with the Internet of Things, you as a liable consumer, have
the right to be suspicious about IoT advertise-ments. The primary question is:
What is the promise of the Internet of things con-cerning reality and what are
the prospects for the future.Comment: Proceedings of the International Conference on International
Conference on Emerging Technologies in Computing 2018 (iCETiC '18), 23rd
-24th August, 2018, at London Metropolitan University, London, UK, Published
by Springer-Verla
Medical data processing and analysis for remote health and activities monitoring
Recent developments in sensor technology, wearable computing, Internet of Things (IoT), and wireless communication have given rise to research in ubiquitous healthcare and remote monitoring of human\u2019s health and activities. Health monitoring systems involve processing and analysis of data retrieved from smartphones, smart watches, smart bracelets, as well as various sensors and wearable devices. Such systems enable continuous monitoring of patients psychological and health conditions by sensing and transmitting measurements such as heart rate, electrocardiogram, body temperature, respiratory rate, chest sounds, or blood pressure. Pervasive healthcare, as a relevant application domain in this context, aims at revolutionizing the delivery of medical services through a medical assistive environment and facilitates the independent living of patients. In this chapter, we discuss (1) data collection, fusion, ownership and privacy issues; (2) models, technologies and solutions for medical data processing and analysis; (3) big medical data analytics for remote health monitoring; (4) research challenges and opportunities in medical data analytics; (5) examples of case studies and practical solutions
- …