Search CORE

5 research outputs found

Clustering big urban data sets

Author: Al Shami Ahmad
Guo Weisi
Pogrebna Ganna
Publication venue
Publication date: 01/01/2015
Field of study

Cities are producing and collecting massive amount of data from various sources such as transportation network, energy sector, smart homes, tax records, surveys, LIDAR data, mobile phones sensors etc. All of the aforementioned data, when connected via the Internet, fall under the Internet of Things (IoT) category. To use such a large volume of data for potential scientific computing benefits, it is important to store and analyze such amount of urban data using efficient computing resources and algorithms. However, this can be problematic due to many challenges. This article explores some of these challenges and test the performance of two partitional algorithms for clustering Big Urban Datasets, namely: the K-Means vs. the Fuzzy cMean (FCM). Clustering Big Urban Data in compact format represents the information of the whole data and this can benefit researchers to deal with this reorganized data much efficiently. Our experiments conclude that FCM outperformed the K-Means when presented with such type of dataset, however the later is lighter on the hardware utilisations

Warwick Research Archives Portal Repository

A Survey on Data Mining and Analysis in Hadoop and MongoDb

Author: C.Zala Manmitsinh
Dhobi Jitendra S.
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 29/06/2015
Field of study

Data Mining is a process to generate pattern and rules from various types of data marts and data warehouses ,in this process there are several steps which contains data cleaning data anomaly detection then clean data is mined with various approaches .In this research we have discussed data mining on large datasets ( Big Data) with this large data set major issues are scalability and security ,Hadoop is the tool to mine the data and Mongo db provides input for it, which is a key-value paradigm for parsing the data ,Other approaches are discussed with this report and their capability for data storage ,Map reduce is method which can be used to reduce the data set to reduce query processing time and improve system throughput, In the Proposed system we are going to mine the big data this Hadoop and Mongo db and we will try to mine the data with sorted or double sorted key value pair ,for and analyze the outcome of system. Keywords- DataMIning , Hadoop, MapReduce, HDFS, MongoDb

International Institute for Science, Technology and Education (IISTE): E-Journals

CLASSIFICATION ALGORITHMS FOR BIG DATA ANALYSIS, A MAP REDUCE APPROACH

Author
Publication venue: 'Copernicus GmbH'
Publication date
Field of study

Crossref

INEFFICIENCY OF DATA MINING ALGORITHMS AND ITS ARCHITECTURE: WITH EMPHASIS TO THE SHORTCOMING OF DATA MINING ALGORITHMS ON THE OUTPUT OF THE RESEARCHES

Author: TESEMA Workineh
Publication venue: Lublin University of Technology
Publication date: 01/01/2019
Field of study

This review paper presents a shortcoming associated to data mining algorithm(s) classification, clustering, association and regression which are highly used as a tool in different research communities. Data mining researches has successfully handling large amounts of dataset to solve the problems. An increase in data sizes was brought a bottleneck on algorithms to retrieve hidden knowledge from a large volume of datasets. On the other hand, data mining algorithm(s) has been unable to analysis the same rate of growth. Data mining algorithm(s) must be efficient and visual architecture in order to effectively extract information from huge amounts of data in many data repositories or in dynamic data streams. The increasing use of information visualization tools (architecture) and data mining algorithm(s) stems from two separate lines of research. Data visualization researchers believe in the importance of giving users an overview and insight into the data distributions. Many powerful visual graphical interfaces are built on top of statistical analysis and data mining algorithms to permit users to leverage their power without a deep understanding of the underlying technology. The combination of the graphical interface is permit to navigate through the complexity of statistical and data mining techniques to create powerful models. Therefore, there is an increasing need to understand the bottlenecks associated with the data mining algorithms in modern architectures and research community. This review paper basically to guide and help the researchers specifically to identify the shortcoming of data mining techniques with domain area in solving a certain problems they will explore. It also shows the research areas particularly a multimedia (where data can be sequential, audio signal, video signal, spatio-temporal, temporal, time series etc) in which data mining algorithms not yet used

Biblioteka Nauki - repozytorium artykuÅÃ³w

Lublin University of Technology Journals