7 research outputs found

    Peningkatan Akurasi K-Nearest Neighbor Pada Data Index Standar Pencemaran Udara Kota Pekanbaru

    Get PDF
    kNN adalah salah satu metode yang popular karena mudah dieksploitasi, generalisasi yang biak, mudah dimengerti, kemampuan beradaptasi ke ruang fitur yang rumit, intuitif, atraktif, efektif, flexibility, mudah diterapkan, sederhana dan memiliki hasil akurasi yang cukup baik. Namun kNN memiliki beberapa kelemahan, diantaranya memberikan bobot yang sama pada setiap attribut sehingga attribut yang tidak relevant juga memberikan dampak yang sama dengan attribut yang relevant terhadap kemiripan antar data. Masalah lain dari kNN adalah pemilihan tetangga terdekat dengan system suara terbanyak, dimana system ini mengabaikan kemiripan setiap tetangga terdekat dan kemungkinan munculnya mayoritas ganda serta kemungkinan terpilihnya outlier sebagai tetangga terdekat. Masalah-masalah tersebut tentu saja dapat menimbulkan kesalahan klasifikasi yang mengakibatkan rendahnya akurasi. Pada penelitian kali ini akan dilakukan peningkatan akurasi dari kNN tersebut dalam melakukan klasifikasi terhadap data Index Standar Pencemaran Udara di Pekanbaru dengan menggunakan pembobotan attribut (Attibute Weighting) dan local mean. Adapun hasil dari penelitian ini didapati bahwa metode yang diusulkan mampu untuk meningkatkan akurasi sebesar 2.42% dengan rata-rata tingkat akurasi sebesar 97.09%

    Exploring and Evaluating the Scalability and Efficiency of Apache Spark using Educational Datasets

    Get PDF
    Research into the combination of data mining and machine learning technology with web-based education systems (known as education data mining, or EDM) is becoming imperative in order to enhance the quality of education by moving beyond traditional methods. With the worldwide growth of the Information Communication Technology (ICT), data are becoming available at a significantly large volume, with high velocity and extensive variety. In this thesis, four popular data mining methods are applied to Apache Spark, using large volumes of datasets from Online Cognitive Learning Systems to explore the scalability and efficiency of Spark. Various volumes of datasets are tested on Spark MLlib with different running configurations and parameter tunings. The thesis convincingly presents useful strategies for allocating computing resources and tuning to take full advantage of the in-memory system of Apache Spark to conduct the tasks of data mining and machine learning. Moreover, it offers insights that education experts and data scientists can use to manage and improve the quality of education, as well as to analyze and discover hidden knowledge in the era of big data

    Algoritmo de classificação por particionamento hierárquico

    Get PDF
    This dissertation proposes a new method of hierarchical partitioning classification that aims not only to return a response regarding the class of an element, but also to provide more information about the classification process and the spatial arrangement of the classes along the attribute space. Through iterative partitioning and the use of concepts such as divergence between distributions, the method seeks to find regions where there is a predominant class and regions where overlap between classes makes classification more complex. Experiments with artificial and real databases were performed to demonstrate the competitiveness of the method and its advantage in separating regions of easy classification of more complex regions, both for own classification and to obtain more information on the performance of well-known methods.Esta dissertação propõe um novo método de classificação por particionamento hierárquico que visa não só retornar uma resposta referente `a classe de um elemento, mas também fornecer maiores informações quanto ao processo de classificação e quanto a disposição espacial das classes ao longo do espaço de atributos. Através de um particionamento iterativo e do uso de conceitos como divergência entre distribuições, o método busca encontrar regiões em que haja uma classe predominante e regiões em que a sobreposição entre as classes torna a classificação mais complexa. Experimentos com bancos de dados artificiais e reais foram realizados para demonstrar a competitividade do método e a sua vantagem em separar regiões de fácil classificação de regiões mais complexas, tanto para classificação própria quanto para obter maiores informações quanto ao desempenho de outros métodos mais conhecidos

    Transforming big data into smart data: An insight on the use of the k-nearest neighbors algorithm to obtain quality data

    Get PDF
    The k-nearest neighbours algorithm is characterised as a simple yet effective data mining technique. The main drawback of this technique appears when massive amounts of data -likely to contain noise and imperfections - are involved, turning this algorithm into an imprecise and especially inefficient technique. These disadvantages have been subject of research for many years, and among others approaches, data preprocessing techniques such as instance reduction or missing values imputation have targeted these weaknesses. As a result, these issues have turned out as strengths and the k-nearest neighbours rule has become a core algorithm to identify and correct imperfect data, removing noisy and redundant samples, or imputing missing values, transforming Big Data into Smart Data - which is data of sufficient quality to expect a good outcome from any data mining algorithm. The role of this smart data gleaning algorithm in a supervised learning context will be investigated. This will include a brief overview of Smart Data, current and future trends for the k-nearest neighbour algorithm in the Big Data context, and the existing data preprocessing techniques based on this algorithm. We present the emerging big data-ready versions of these algorithms and develop some new methods to cope with Big Data. We carry out a thorough experimental analysis in a series of big datasets that provide guidelines as to how to use the k-nearest neighbour algorithm to obtain Smart/Quality Data for a high quality data mining process. Moreover, multiple Spark Packages have been developed including all the Smart Data algorithms analysed

    Computational intelligent impact force modeling and monitoring in HISLO conditions for maximizing surface mining efficiency, safety, and health

    Get PDF
    Shovel-truck systems are the most widely employed excavation and material handling systems for surface mining operations. During this process, a high-impact shovel loading operation (HISLO) produces large forces that cause extreme whole body vibrations (WBV) that can severely affect the safety and health of haul truck operators. Previously developed solutions have failed to produce satisfactory results as the vibrations at the truck operator seat still exceed the “Extremely Uncomfortable Limits”. This study was a novel effort in developing deep learning-based solution to the HISLO problem. This research study developed a rigorous mathematical model and a 3D virtual simulation model to capture the dynamic impact force for a multi-pass shovel loading operation. The research further involved the application of artificial intelligence and machine learning for implementing the impact force detection in real time. Experimental results showed the impact force magnitudes of 571 kN and 422 kN, for the first and second shovel pass, respectively, through an accurate representation of HISLO with continuous flow modelling using FEA-DEM coupled methodology. The novel ‘DeepImpact’ model, showed an exceptional performance, giving an R2, RMSE, and MAE values of 0.9948, 10.750, and 6.33, respectively, during the model validation. This research was a pioneering effort for advancing knowledge and frontiers in addressing the WBV challenges in deploying heavy mining machinery in safe and healthy large surface mining environments. The smart and intelligent real-time monitoring system from this study, along with process optimization, minimizes the impact force on truck surface, which in turn reduces the level of vibration on the operator, thus leading to a safer and healthier working mining environments --Abstract, page iii

    Context-aware mobility analytics and trip planning

    Get PDF
    The study of user mobility is to understand and analyse the movement of individuals in the spatial and temporal domains. Mobility analytics and trip planning are two vital components of user mobility that facilitate the end users with easy to access navigational support through the urban spaces and beyond. Mobility context describes the situational factors that can influence user mobility decisions. The context-awareness in mobility analytics and trip planning enables a wide range of end users to make effective mobility decisions. With the ubiquity of urban sensing technologies, various situational factors related to user mobility decisions can now be collected at low cost and effort. This huge volume of data collected from heterogeneous data sources can facilitate context-aware mobility analytics and trip planning through intelligent analysis of mobility contexts, mobility context prediction, mobility context representation and integration considering different user perspectives. In each chapter of this thesis such issues are addressed through the development of case-specific solutions and real-world deployments. Mobility analytics include prediction and analysis of many diverse mobility contexts. In this thesis, we present several real-world user mobility scenarios to conduct intelligent contextual analysis leveraging existing statistical methods. The factors related to user mobility decisions are collected and fused from various publicly available open datasets. We also provide future prediction of important mobility contexts which can be utilized for mobility decision making. The performance of context prediction tasks can be affected by the imbalance in context distribution. Another aspect of context prediction is that the knowledge from domain experts can enhance the prediction performance however, it is very difficult to infer and incorporate into mobility analytics applications. We present a number of data-driven solutions aiming to address the imbalanced context distribution and domain knowledge incorporation problems for mobility context prediction. Given an imbalanced dataset, we design and implement a framework for context prediction leveraging existing data mining and sampling techniques. Furthermore, we propose a technique for incorporating domain knowledge in feature weight computation to enhance the task of mobility context prediction. In this thesis, we address key issues related to trip planning. Mobility context inference is a challenging problem in many real-world trip planning scenarios. We introduce a framework that can fuse contextual information captured from heterogeneous data sources to infer mobility contexts. In this work, we utilize public datasets to infer mobility contexts and compute trip plans. We propose graph based context representation and query based adaptation techniques on top of the existing methods to facilitate trip planning tasks. The effectiveness of trip plans relies on the efficient integration of mobility contexts considering different user perspectives. Given a contextual graph, we introduce a framework that can handle multiple user perspectives concurrently to compute and recommend trip plans to the end user. This thesis contains efficient techniques that can be employed in the area of urban mobility especially, context-aware mobility analytics and trip planning. This research is built on top of the existing predictive analytics and trip planning techniques to solve problems of contextual analysis, prediction, context representation and integration in trip planning for real-world scenarios. The contributions of this research enable data-driven decision support for traveling smarter through urban spaces and beyond
    corecore