Search CORE

3,140 research outputs found

Reducing the Effects of Detrimental Instances

Author: Martinez Tony
Smith Michael R.
Publication venue
Publication date: 14/10/2014
Field of study

Not all instances in a data set are equally beneficial for inducing a model of the data. Some instances (such as outliers or noise) can be detrimental. However, at least initially, the instances in a data set are generally considered equally in machine learning algorithms. Many current approaches for handling noisy and detrimental instances make a binary decision about whether an instance is detrimental or not. In this paper, we 1) extend this paradigm by weighting the instances on a continuous scale and 2) present a methodology for measuring how detrimental an instance may be for inducing a model of the data. We call our method of identifying and weighting detrimental instances reduced detrimental instance learning (RDIL). We examine RIDL on a set of 54 data sets and 5 learning algorithms and compare RIDL with other weighting and filtering approaches. RDIL is especially useful for learning algorithms where every instance can affect the classification boundary and the training instances are considered individually, such as multilayer perceptrons trained with backpropagation (MLPs). Our results also suggest that a more accurate estimate of which instances are detrimental can have a significant positive impact for handling them.Comment: 6 pages, 5 tables, 2 figures. arXiv admin note: substantial text overlap with arXiv:1403.189

arXiv.org e-Print Archive

Crossref

An Easy to Use Repository for Comparing and Improving Machine Learning Algorithm Usage

Author: Giraud-Carrier Christophe
Martinez Tony
Smith Michael R.
White Andrew
Publication venue
Publication date: 05/06/2014
Field of study

The results from most machine learning experiments are used for a specific purpose and then discarded. This results in a significant loss of information and requires rerunning experiments to compare learning algorithms. This also requires implementation of another algorithm for comparison, that may not always be correctly implemented. By storing the results from previous experiments, machine learning algorithms can be compared easily and the knowledge gained from them can be used to improve their performance. The purpose of this work is to provide easy access to previous experimental results for learning and comparison. These stored results are comprehensive -- storing the prediction for each test instance as well as the learning algorithm, hyperparameters, and training set that were used. Previous results are particularly important for meta-learning, which, in a broad sense, is the process of learning from previous machine learning results such that the learning process is improved. While other experiment databases do exist, one of our focuses is on easy access to the data. We provide meta-learning data sets that are ready to be downloaded for meta-learning experiments. In addition, queries to the underlying database can be made if specific information is desired. We also differ from previous experiment databases in that our databases is designed at the instance level, where an instance is an example in a data set. We store the predictions of a learning algorithm trained on a specific training set for each instance in the test set. Data set level information can then be obtained by aggregating the results from the instances. The instance level information can be used for many tasks such as determining the diversity of a classifier or algorithmically determining the optimal subset of training instances for a learning algorithm.Comment: 7 pages, 1 figure, 6 table

arXiv.org e-Print Archive

CiteSeerX

A data mining approach to guide students through the enrollment process based on academic performance

Author: Alvarado Gustavo
Chue Gallardo Jorge
Estrella Jhonny
Ortigosa Álvaro
Peche Juan Pablo
Vialardi Sacín César
Vinatea Bruno
Publication venue: 'MTT Agrifood Research Finland'
Publication date: 01/01/2011
Field of study

Student academic performance at universities is crucial for education management systems. Many actions and decisions are made based on it, specifically the enrollment process. During enrollment, students have to decide which courses to sign up for. This research presents the rationale behind the design of a recommender system to support the enrollment process using the students’ academic performance record. To build this system, the CRISP-DM methodology was applied to data from students of the Computer Science Department at University of Lima, Perú. One of the main contributions of this work is the use of two synthetic attributes to improve the relevance of the recommendations made. The first attribute estimates the inherent difficulty of a given course. The second attribute, named potential, is a measure of the competence of a student for a given course based on the grades obtained in relatedcourses. Data was mined using C4.5, KNN (K-nearest neighbor), Naïve Bayes, Bagging and Boosting, and a set of experiments was developed in order to determine the best algorithm for this application domain. Results indicate that Bagging is the best method regarding predictive accuracy. Based on these results, the “Student Performance Recommender System” (SPRS) was developed, including a learning engine. SPRS was tested with a sample group of 39 students during the enrollment process. Results showed that the system had a very good performance under real-life conditions

Repositorio Institucional Ulima

Learning to Auto Weight: Entirely Data-driven and Highly Efficient Weighting Framework

Author: Chen Ken
Li Zhenmao
Liu Jiaheng
Liu Jiaheng
Wu Yichao
Wu Yudong
Yan Junjie
Zhou Shunfeng
Publication venue
Publication date: 26/11/2019
Field of study

Example weighting algorithm is an effective solution to the training bias problem, however, most previous typical methods are usually limited to human knowledge and require laborious tuning of hyperparameters. In this paper, we propose a novel example weighting framework called Learning to Auto Weight (LAW). The proposed framework finds step-dependent weighting policies adaptively, and can be jointly trained with target networks without any assumptions or prior knowledge about the dataset. It consists of three key components: Stage-based Searching Strategy (3SM) is adopted to shrink the huge searching space in a complete training process; Duplicate Network Reward (DNR) gives more accurate supervision by removing randomness during the searching process; Full Data Update (FDU) further improves the updating efficiency. Experimental results demonstrate the superiority of weighting policy explored by LAW over standard training pipeline. Compared with baselines, LAW can find a better weighting schedule which achieves much more superior accuracy on both biased CIFAR and ImageNet.Comment: Accepted by AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Development and validation of the social emotional competence questionnaire (SECQ)

Author: Ee Jessie
Zhou Mingming
Publication venue: Centre for Resilence & Socio-Emotional Health
Publication date: 01/11/2012
Field of study

Reliable and valid measures of children’s and adolescents’ social emotional competence (SEC) are necessary to develop in order to assess their social emotional development and provide appropriate intervention in child and adolescent development. A pool of 25 items was created for the Social Emotional Competence Questionnaire (SECQ) that represented five dimensions of SEC: self-awareness, social awareness, self-management, relationship management and responsible decision-making. A series of four studies are reported relating to the development and validation of the measure. Confirmatory factor analyses of the responses of 444 fourth-graders showed an acceptable fit of the model. The model was replicated with another 356 secondary school students. Additional studies revealed good internal consistency. The significant correlations among the five SEC components and academic performance provided evidence for the predictive validity of the instrument. With multiple samples, these results showed that the scale holds promise as a reliable, valid measure of SECpeer-reviewe

OAR@UM

Directory of Open Access Journals

Open Issues, Research Challenges, and Survey on Education Sector in India and Exploring Machine Learning Algorithm to Mitigate These Challenges

Author: Abarna K. T. Meena
Kannan K. Rajesh
Vairachilai S.
Publication venue: Auricle Global Society of Education and Research
Publication date: 11/06/2023
Field of study

The nation's core sector is education. But dealing with problems in educational institutions, particularly in higher education, is a challenging task. The growth of education and technology has led to a number of research challenges that have attracted significant attention as well as a notable increase in the amount of data available in academic databases. Higher education institutions today are worried about outcome-based education and various techniques to assess a student's knowledge level or capacity for learning. In general, there are more contributors in the academic field than there are authors. Research is being done in this field to determine the best algorithm and features that are crucial for predicting the future outcomes. This survey can help educational institutions assess themselves and find any gaps that need to be filled in order to fulfil their purpose and vision. Machine Learning (ML) approaches have been explored to solve the issues as higher education systems have grown in size

International Journal on Recent and Innovation Trends in Computing and Communication

Crack detection in paintings using convolutional neural networks

Author: Cornelis Bruno
Dubois Hélène
Martens Maximiliaan
Meeus Laurens
Pizurica Aleksandra
Sizyakin Roman
Voronin Viacheslav
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

The accurate detection of cracks in paintings, which generally portray rich and varying content, is a challenging task. Traditional crack detection methods are often lacking on recent acquisitions of paintings as they are poorly adapted to high-resolutions and do not make use of the other imaging modalities often at hand. Furthermore, many paintings portray a complex or cluttered composition, significantly complicating a precise detection of cracks when using only photographic material. In this paper, we propose a fast crack detection algorithm based on deep convolutional neural networks (CNN) that is capable of combining several imaging modalities, such as regular photographs, infrared photography and X-Ray images. Moreover, we propose an efficient solution to improve the CNN-based localization of the actual crack boundaries and extend the CNN architecture such that areas where it makes little sense to run expensive learning models are ignored. This allows us to process large resolution scans of paintings more efficiently. The proposed on-line method is capable of continuously learning from newly acquired visual data, thus further improving classification results as more data becomes available. A case study on multimodal acquisitions of the Ghent Altarpiece, taken during the currently ongoing conservation-restoration treatment, shows improvements over the state-of-the-art in crack detection methods and demonstrates the potential of our proposed method in assisting art conservators

Ghent University Academic Bibliography