130,521 research outputs found

    Data Mining and Prediction Tools (for Predicting Stndents' Success in Programming Course)

    Get PDF
    This project addresses the importance of extraction and analysis of data from different types of educational settings such as computer-based or web-based educational system (i.e. course management system), classroom environment factors as well as psychosocial factors in the university, which can affect the students and use these data to foresee students' learning patterns. The vast amount of data from different educational settings can be fully utilized to predict the students' performance or a particular course. However, there is no tool as such, that can automatically manage, extract and analyze this kind of information. Besides that, most of the current data mining tools are too complex for educators to use and their features go well beyond the scope of what an educator might require. This project will use the data mining approach and techniques in analyzing different types of data gathered from different educational settings. The project aims to develop a new data mining and prediction tools, which will analyze different types of data coming from different educational settings to assist lecturer to predict students' performance in a programming course. The scope of study for this project is one of the programming courses m the university, Advanced Business Application Programming (ABAP) and the university's E-Learning System. The main contribution of this project is the development of a new data mining and analysis tools, that can produce prediction output to assist the lecturer in his or her decision making activities to improve the learning process in a particular programming course

    Data Mining Applications in Higher Education and Academic Intelligence Management

    Get PDF
    Higher education institutions are nucleus of research and future development acting in a competitive environment, with the prerequisite mission to generate, accumulate and share knowledge. The chain of generating knowledge inside and among external organizations (such as companies, other universities, partners, community) is considered essential to reduce the limitations of internal resources and could be plainly improved with the use of data mining technologies. Data mining has proven to be in the recent years a pioneering field of research and investigation that faces a large variety of techniques applied in a multitude of areas, both in business and higher education, relating interdisciplinary studies and development and covering a large variety of practice. Universities require an important amount of significant knowledge mined from its past and current data sets using special methods and processes. The ways in which information and knowledge are represented and delivered to the university managers are in a continuous transformation due to the involvement of the information and communication technologies in all the academic processes. Higher education institutions have long been interested in predicting the paths of students and alumni (Luan, 2004), thus identifying which students will join particular course programs (Kalathur, 2006), and which students will require assistance in order to graduate. Another important preoccupation is the academic failure among students which has long fuelled a large number of debates. Researchers (Vandamme et al., 2007) attempted to classify students into different clusters with dissimilar risks in exam failure, but also to detect with realistic accuracy what and how much the students know, in order to deduce specific learning gaps (Piementel & Omar, 2005). The distance and on-line education, together with the intelligent tutoring systems and their capability to register its exchanges with students (Mostow et al., 2005) present various feasible information sources for the data mining processes. Studies based on collecting and interpreting the information from several courses could possibly assist teachers and students in the web-based learning setting (Myller et al., 2002). Scientists (Anjewierden et al., 2007) derived models for classifying chat messages using data mining techniques, in order to offer learners real-time adaptive feedback which could result in the improvement of learning environments. In scientific literature there are some studies which seek to classify students in order to predict their final grade based on features extracted from logged data ineducational web-based systems (Minaei-Bidgoli & Punch, 2003). A combination of multiple classifiers led to a significant improvement in classification performance through weighting the feature vectors. The author’s research directions through the data mining practices consist in finding feasible ways to offer the higher education institutions’ managers ample knowledge to prepare new hypothesis, in a short period of time, which was formerly rigid or unachievable, in view of large datasets and earlier methods. Therefore, the aim is to put forward a way to understand the students’ opinions, satisfactions and discontentment in the each element of the educational process, and to predict their preference in certain fields of study, the choice in continuing education, academic failure, and to offer accurate correlations between their knowledge and the requirements in the labor market. Some of the most interesting data mining processes in the educational field are illustrated in the present chapter, in which the author adds own ideas and applications in educational issues using specific data mining techniques. The organization of this chapter is as follows. Section 2 offers an insight of how data mining processes are being applied in the large spectrum of education, presenting recent applications and studies published in the scientific literature, significant to the development of this emerging science. In Section 3 the author introduces his work through a number of new proposed directions and applications conducted over data collected from the students of the Babes-Bolyai University, using specific data mining classification learning and clustering methods. Section 4 presents the integration of data mining processes and their particular role in higher education issues and management, for the conception of an Academic Intelligence Management. Interrelated future research and plans are discussed as a conclusion in Section 5.data mining,data clustering, higher education, decision trees, C4.5 algorithm, k-means, decision support, academic intelligence management

    Data Mining and Prediction Tools (for Predicting Stndents' Success in Programming Course)

    Get PDF
    This project addresses the importance of extraction and analysis of data from different types of educational settings such as computer-based or web-based educational system (i.e. course management system), classroom environment factors as well as psychosocial factors in the university, which can affect the students and use these data to foresee students' learning patterns. The vast amount of data from different educational settings can be fully utilized to predict the students' performance or a particular course. However, there is no tool as such, that can automatically manage, extract and analyze this kind of information. Besides that, most of the current data mining tools are too complex for educators to use and their features go well beyond the scope of what an educator might require. This project will use the data mining approach and techniques in analyzing different types of data gathered from different educational settings. The project aims to develop a new data mining and prediction tools, which will analyze different types of data coming from different educational settings to assist lecturer to predict students' performance in a programming course. The scope of study for this project is one of the programming courses m the university, Advanced Business Application Programming (ABAP) and the university's E-Learning System. The main contribution of this project is the development of a new data mining and analysis tools, that can produce prediction output to assist the lecturer in his or her decision making activities to improve the learning process in a particular programming course

    Crosslingual Document Embedding as Reduced-Rank Ridge Regression

    Get PDF
    There has recently been much interest in extending vector-based word representations to multiple languages, such that words can be compared across languages. In this paper, we shift the focus from words to documents and introduce a method for embedding documents written in any language into a single, language-independent vector space. For training, our approach leverages a multilingual corpus where the same concept is covered in multiple languages (but not necessarily via exact translations), such as Wikipedia. Our method, Cr5 (Crosslingual reduced-rank ridge regression), starts by training a ridge-regression-based classifier that uses language-specific bag-of-word features in order to predict the concept that a given document is about. We show that, when constraining the learned weight matrix to be of low rank, it can be factored to obtain the desired mappings from language-specific bags-of-words to language-independent embeddings. As opposed to most prior methods, which use pretrained monolingual word vectors, postprocess them to make them crosslingual, and finally average word vectors to obtain document vectors, Cr5 is trained end-to-end and is thus natively crosslingual as well as document-level. Moreover, since our algorithm uses the singular value decomposition as its core operation, it is highly scalable. Experiments show that our method achieves state-of-the-art performance on a crosslingual document retrieval task. Finally, although not trained for embedding sentences and words, it also achieves competitive performance on crosslingual sentence and word retrieval tasks.Comment: In The Twelfth ACM International Conference on Web Search and Data Mining (WSDM '19

    Data Mining Models for Short Term Solar Radiation Prediction and Forecast-Based Assessment of Photovoltaic Facilities

    Get PDF
    Solar radiation prediction is useful to integrate photovoltaic power plants into the electrical system. Integrating energy generation in urban environments is interesting because that is where the most energy is consumed and avoids wasting energy in transport infrastructure. Renewable energies are often the easiest to integrate into these environments because they require less infrastructure and cause fewer problems related to noise, dirt, pollution, etc. The overall objective of this thesis is to develop data mining models to forecast solar global radiation 24 hours ahead and to use these predictions to evaluate the performance of photovoltaic systems. The specific objectives are: 1. Propose an index that allows us to remove the seasonal and daily trends observed in global hourly radiation data. 2. Analyze the different sources of meteorological variables that can be used to predict solar radiation and use API's to access external sources of meteorological data. 3. Develop data mining models that allow including the different relationships observed between the radiation values of the next day depending on the values of the current day radiation and other meteorological parameters. 4. Development of a web system that include the proposed models for short-term radiation forescasting and integrate the developed models in the evaluation models of photovoltaic systems. Chapter 3 introduces the methods and models used in this work (Cumulative Probability Distribution Function, Artificial Neural Networks and Support Vector Machines). Also classification methods are presented (Decision Trees and Support Vector Machines for Classification). Performance metrics are presented to measure the accuracy of the proposed models. The data sets and data sources used in this work to test the proposed models are presented, including data from the meteorological station installed at University of Malaga, data from OpenWeatherMap website and data from AEMET (Agencia Estatal de Meteorología). Chapter 4 is dedicated to the solar radiation fundamentals, including astronomical concepts related to Earth-Sun position, characterization of solar radiation hourly series, clearnes index, used to remove seasonal trends, persistence model, used to compare with proposed models and the forecast skill, based on persistence model and used as reference model as well. Chapter 5 introduces a model to model and characterize hourly solar global radiation using statistical methods like CPDF, K-means, and also using the clearness index. This models aims to predict the hourly solar radiation using the daily clearness index as input. Chapter 6 details the proposed model to forecast the hourly global solar radiation using data mining methods and daily profiles of clearness index. K-means is again used to cluster daily solar radiation profiles, then a new variable is defined from the clearness index daily profiles. Support Vector Machines, Decision Trees and Artificial Neural Networks are used to predict the desired hourly solar radiation values. Chapter 7 presents a methodology to assess solar power plants performance based on forecasted solar radiation. A OPC-based system is presented, which is able to obtain data from a large variety of equipment, then an algorithm to assess the performance of the plants is presented

    Comparison of The Data-Mining Methods in Predicting The Risk Level of Diabetes

    Get PDF
    Mellitus Diabetes is an illness that happened in consequence of the too high glucose level in blood because the body could not release or use insulin normally. The purpose of this research is to compare the two methods in The data-mining, those are a Regression Logistic method and a Bayesian method, to predict the risk level of diabetes by web-based application and nine attributes of patients data. The data which is used in this research are 1450 patients that are taken from RSD BALUNG JEMBER, by collecting data from 26 September 2014 until 30 April 2015. This research uses performance measuring from two methods by using discrimination score with ROC curve (Receiver Operating Characteristic).  On the experiment result, it showed that two methods, Regression Logistic method and Bayesian method, have different performance excess score and are good at both. From the highest accuracy measurement and ROC using the same dataset, where the excess of Bayesian has the highest accuracy with 0,91 in the score while Regression Logistic method has the highest ROC score with 0.988, meanwhile on Bayesian, the ROC is 0.964. In this research, the plus of using Bayesian is not only can use categorical but also numerical

    Regularly expected reference-time as a metric of web cache replacement policy

    Get PDF
    The growth of Internet access was increasing significantly.In facts, more than one user access the same object so there is an opportunity to reduce this redundancy by placing an intermediate storage called cache.By this approach, the bandwidth consumption and response time of system in term of user perception can be improved. When the size of web cache is limited, it needs to manage the objects in web cache so that the hit ratio and byte hit ratio are maximized. Based on previous research the performance of cache replacement is dependent on the user/program access behavior.Therefore, the success of IRT implementation in memory cache replacement is not guaranteed a same result for web cache environment.Researcher has explored the regularity of user access and used this characteristic to be included in a metric of web cache replacement. Other researcher uses the regularity to predict the next occurrences and combine with past frequency occurrences.In predicting process, they use statistic or data mining approach.However, it takes time in computing prediction process. Therefore, this paper proposes a simple approach in predicting the next object reference.This approach is based on assumption that the object could be accessed by user regularly such DA-IRT that be used to calculate the time of next object reference called the regularly expected reference time (RERT).The object with longer RERT will be evicted sooner from the web cache.Based on experiment result, the performance of RERT is dependent on user access behavior and opposite of DA-IRT policy

    Customer purchase behavior prediction in E-commerce: a conceptual framework and research agenda

    Get PDF
    Digital retailers are experiencing an increasing number of transactions coming from their consumers online, a consequence of the convenience in buying goods via E-commerce platforms. Such interactions compose complex behavioral patterns which can be analyzed through predictive analytics to enable businesses to understand consumer needs. In this abundance of big data and possible tools to analyze them, a systematic review of the literature is missing. Therefore, this paper presents a systematic literature review of recent research dealing with customer purchase prediction in the E-commerce context. The main contributions are a novel analytical framework and a research agenda in the field. The framework reveals three main tasks in this review, namely, the prediction of customer intents, buying sessions, and purchase decisions. Those are followed by their employed predictive methodologies and are analyzed from three perspectives. Finally, the research agenda provides major existing issues for further research in the field of purchase behavior prediction online
    corecore