6 research outputs found

    Rails Quality Data Modelling via Machine Learning-Based Paradigms

    Get PDF

    Rekomendasi Anotasi Otomatis Pada Konten Pembelajaran pada Konten Pembelajaran MOOC

    Get PDF
    Saat ini proses belajar mengajar perkuliahan dapat dilakukan hanya dengan mengikuti kelas secara online melalui situs-situs yang menerapkan sistem Massive Open Online Course (MOOC). Namun pada praktiknya, ketika pengajar membuat kursus baru pada situs MOOC, pengajar kurang mendefinisikan secara detail perihal pemberian keterangan kursus. Khususnya pada MOOC yang menggunakan Moodle, pengajar cenderung hanya memberikan materi berupa attachment file atau tugas perminggunya. Maka pada tugas akhir ini, akan dilakukan pelabelan dengan menganotasikan konten pembelajaran secara otomatis dengan metode pengekstrakan konten pembelajaran yang kemudian diklasifikasikan. Data yang diambil merupakan judul mata kuliah dan deksripsi, capaian pembelajaran, atau pokok bahasan mata kuliah. Kemudian dilakukan proses data mining berupa pengklasifikasian menggunakan metode tanpa Machine Learning dengan menerapkan beberapa rule dan dengan metode Machine Learning dengan Random Forest, Support Vector Machine, dan Naive Bayes. Dari keempat metode yang diuji, metode dengan hasil terendah didapatkan pada metode klasifikasi tanpa Machine Learning dengan akurasi 71,7%. Sedangkan hasil terbaik diperoleh dari metode menggunakan Machine Learning menggunakan Random Forest Classifier dengan data training yang sudah di-over sampling dengan ADASYN mendapatkan nilai akurasi yaitu 93,3%. Model tersebut juga dikatakan terbaik karena terbukti menghasilkan keluaran label yang sesuai dari data uji baru. =============================================================================================== Nowadays, teaching and learning process of lectures can be done only by taking classes online through sites that implement the Massive Open Online Course (MOOC) system. But in practice, when the teacher makes a new course on the MOOC website, the teacher does not define the course in detail in the about the course description. Especially in the MOOC that uses Moodle, teachers tend to only provide material in the form of file attachments or weekly assignments. In this final project, labeling will be carried out by annotating learning content automatically by extracting the course content which is then classified. The data taken is the subject title and description, learning outcomes, or subject matter. Then the data mining process is carried out in the form of classifying using a method without Machine Learning by applying several rules and using Machine Learning methods with Random Forest, Support Vector Machine, and Naive Bayes. From the four tested methods, the method with the lowest results was obtained from the classification method without Machine Learning with an accuracy of 71.7%. While the best results are obtained from the method of using Machine Learning, that is using a Random Forest Classifier with training data that has been over-sampled with ADASYN with an accuracy with of 93.3%. The model is also said to be the best because it is proven to produce the appropriate label output from the new test data. Keywords: MOOC, Naïve Bayes, Random Forest, SV

    Facilitating and Enhancing the Performance of Model Selection for Energy Time Series Forecasting in Cluster Computing Environments

    Get PDF
    Applying Machine Learning (ML) manually to a given problem setting is a tedious and time-consuming process which brings many challenges with it, especially in the context of Big Data. In such a context, gaining insightful information, finding patterns, and extracting knowledge from large datasets are quite complex tasks. Additionally, the configurations of the underlying Big Data infrastructure introduce more complexity for configuring and running ML tasks. With the growing interest in ML the last few years, particularly people without extensive ML expertise have a high demand for frameworks assisting people in applying the right ML algorithm to their problem setting. This is especially true in the field of smart energy system applications where more and more ML algorithms are used e.g. for time series forecasting. Generally, two groups of non-expert users are distinguished to perform energy time series forecasting. The first one includes the users who are familiar with statistics and ML but are not able to write the necessary programming code for training and evaluating ML models using the well-known trial-and-error approach. Such an approach is time consuming and wastes resources for constructing multiple models. The second group is even more inexperienced in programming and not knowledgeable in statistics and ML but wants to apply given ML solutions to their problem settings. The goal of this thesis is to scientifically explore, in the context of more concrete use cases in the energy domain, how such non-expert users can be optimally supported in creating and performing ML tasks in practice on cluster computing environments. To support the first group of non-expert users, an easy-to-use modular extendable microservice-based ML solution for instrumenting and evaluating ML algorithms on top of a Big Data technology stack is conceptualized and evaluated. Our proposed solution facilitates applying trial-and-error approach by hiding the low level complexities from the users and introduces the best conditions to efficiently perform ML tasks in cluster computing environments. To support the second group of non-expert users, the first solution is extended to realize meta learning approaches for automated model selection. We evaluate how meta learning technology can be efficiently applied to the problem space of data analytics for smart energy systems to assist energy system experts which are not data analytics experts in applying the right ML algorithms to their data analytics problems. To enhance the predictive performance of meta learning, an efficient characterization of energy time series datasets is required. To this end, Descriptive Statistics Time based Meta Features (DSTMF), a new kind of meta features, is designed to accurately capture the deep characteristics of energy time series datasets. We find that DSTMF outperforms the other state-of-the-art meta feature sets introduced in the literature to characterize energy time series datasets in terms of the accuracy of meta learning models and the time needed to extract them. Further enhancement in the predictive performance of the meta learning classification model is achieved by training the meta learner on new efficient meta examples. To this end, we proposed two new approaches to generate new energy time series datasets to be used as training meta examples by the meta learner depending on the type of time series dataset (i.e. generation or energy consumption time series). We find that extending the original training sets with new meta examples generated by our approaches outperformed the case in which the original is extended by new simulated energy time series datasets

    Image-based Modeling of Flow through Porous Media: Development of Multiscale Techniques for the Pore Level

    Get PDF
    Increasingly, imaging technology allows porous media problems to be modeled at microscopic and sub-microscopic levels with finer resolution. However, the physical domain size required to be representative of the media prohibits comprehensive micro-scale simulation. A hybrid or multiscale approach is necessary to overcome this challenge. In this work, a technique was developed for determining the characteristic scales of porous materials, and a multiscale modeling methodology was developed to better understand the interaction/dependence of phenomena occurring at different microscopic scales. The multiscale method couples microscopic simulations at the pore and sub-pore scales. Network modeling is a common pore-scale technique which employs severe assumptions, making it more computationally efficient than direct numerical simulation, enabling simulation over larger length scales. However, microscopic features of the medium are lost in the discretization of a material into a network of interconnected pores and throats. In contrast, detailed microstructure and flow patterns can be captured by modern meshing and direct numerical simulation techniques, but these models are computationally expensive. In this study, a data-driven multiscale technique has been developed that couples the two types of models, taking advantage of the benefits of each. Specifically, an image-based physically-representative pore network model is coupled to an FEM (finite element method) solver that operates on unstructured meshes capable of resolving details orders of magnitude smaller than the pore size. In addition to allowing simulation at multiple scales, the current implementation couples the models using a machine learning approach, where results from the FEM model are used to learn network model parameters. Examples of the model operating on real materials are given that demonstrate improvements in network modeling enabled by the multiscale framework. The framework enables more advanced multiscale and multiphysics modeling – an application to particle straining problems is shown. More realistic network filtration simulations are possible by incorporating information from the sub-pore-scale. New insights into the size exclusion mechanism of particulate filtration were gained in the process of generating data for machine learning of conductivity reduction due to particle trapping. Additional tests are required to validate the multiscale network filtration model, and compare with experimental findings in literature
    corecore