3,140 research outputs found
Reducing the Effects of Detrimental Instances
Not all instances in a data set are equally beneficial for inducing a model
of the data. Some instances (such as outliers or noise) can be detrimental.
However, at least initially, the instances in a data set are generally
considered equally in machine learning algorithms. Many current approaches for
handling noisy and detrimental instances make a binary decision about whether
an instance is detrimental or not. In this paper, we 1) extend this paradigm by
weighting the instances on a continuous scale and 2) present a methodology for
measuring how detrimental an instance may be for inducing a model of the data.
We call our method of identifying and weighting detrimental instances reduced
detrimental instance learning (RDIL). We examine RIDL on a set of 54 data sets
and 5 learning algorithms and compare RIDL with other weighting and filtering
approaches. RDIL is especially useful for learning algorithms where every
instance can affect the classification boundary and the training instances are
considered individually, such as multilayer perceptrons trained with
backpropagation (MLPs). Our results also suggest that a more accurate estimate
of which instances are detrimental can have a significant positive impact for
handling them.Comment: 6 pages, 5 tables, 2 figures. arXiv admin note: substantial text
overlap with arXiv:1403.189
An Easy to Use Repository for Comparing and Improving Machine Learning Algorithm Usage
The results from most machine learning experiments are used for a specific
purpose and then discarded. This results in a significant loss of information
and requires rerunning experiments to compare learning algorithms. This also
requires implementation of another algorithm for comparison, that may not
always be correctly implemented. By storing the results from previous
experiments, machine learning algorithms can be compared easily and the
knowledge gained from them can be used to improve their performance. The
purpose of this work is to provide easy access to previous experimental results
for learning and comparison. These stored results are comprehensive -- storing
the prediction for each test instance as well as the learning algorithm,
hyperparameters, and training set that were used. Previous results are
particularly important for meta-learning, which, in a broad sense, is the
process of learning from previous machine learning results such that the
learning process is improved. While other experiment databases do exist, one of
our focuses is on easy access to the data. We provide meta-learning data sets
that are ready to be downloaded for meta-learning experiments. In addition,
queries to the underlying database can be made if specific information is
desired. We also differ from previous experiment databases in that our
databases is designed at the instance level, where an instance is an example in
a data set. We store the predictions of a learning algorithm trained on a
specific training set for each instance in the test set. Data set level
information can then be obtained by aggregating the results from the instances.
The instance level information can be used for many tasks such as determining
the diversity of a classifier or algorithmically determining the optimal subset
of training instances for a learning algorithm.Comment: 7 pages, 1 figure, 6 table
A data mining approach to guide students through the enrollment process based on academic performance
Student academic performance at universities is crucial for education
management systems. Many actions and decisions are made based on it, specifically the enrollment process. During enrollment, students have to decide which courses to sign up for. This research presents the rationale behind the design of a recommender system to support the enrollment process using the studentsâ academic performance
record. To build this system, the CRISP-DM methodology was applied to data from students of the Computer Science Department at University of Lima, PerĂș. One of the main contributions of this work is the use of two synthetic attributes to improve the relevance of the recommendations made. The first attribute estimates the inherent
difficulty of a given course. The second attribute, named potential, is a measure of the competence of a student for a given course based on the grades obtained in relatedcourses. Data was mined using C4.5, KNN (K-nearest neighbor), NaĂŻve Bayes, Bagging and Boosting, and a set of experiments was developed in order to determine the best algorithm for this application domain. Results indicate that Bagging is the best
method regarding predictive accuracy. Based on these results, the âStudent Performance Recommender Systemâ (SPRS) was developed, including a learning engine. SPRS was tested with a sample group of 39 students during the enrollment process. Results showed that the system had a very good performance under real-life conditions
Learning to Auto Weight: Entirely Data-driven and Highly Efficient Weighting Framework
Example weighting algorithm is an effective solution to the training bias
problem, however, most previous typical methods are usually limited to human
knowledge and require laborious tuning of hyperparameters. In this paper, we
propose a novel example weighting framework called Learning to Auto Weight
(LAW). The proposed framework finds step-dependent weighting policies
adaptively, and can be jointly trained with target networks without any
assumptions or prior knowledge about the dataset. It consists of three key
components: Stage-based Searching Strategy (3SM) is adopted to shrink the huge
searching space in a complete training process; Duplicate Network Reward (DNR)
gives more accurate supervision by removing randomness during the searching
process; Full Data Update (FDU) further improves the updating efficiency.
Experimental results demonstrate the superiority of weighting policy explored
by LAW over standard training pipeline. Compared with baselines, LAW can find a
better weighting schedule which achieves much more superior accuracy on both
biased CIFAR and ImageNet.Comment: Accepted by AAAI 202
Development and validation of the social emotional competence questionnaire (SECQ)
Reliable and valid measures of childrenâs and adolescentsâ social emotional
competence (SEC) are necessary to develop in order to assess their social
emotional development and provide appropriate intervention in child and
adolescent development. A pool of 25 items was created for the Social
Emotional Competence Questionnaire (SECQ) that represented five dimensions
of SEC: self-awareness, social awareness, self-management, relationship
management and responsible decision-making. A series of four studies are
reported relating to the development and validation of the measure.
Confirmatory factor analyses of the responses of 444 fourth-graders showed an
acceptable fit of the model. The model was replicated with another 356
secondary school students. Additional studies revealed good internal
consistency. The significant correlations among the five SEC components and
academic performance provided evidence for the predictive validity of the
instrument. With multiple samples, these results showed that the scale holds
promise as a reliable, valid measure of SECpeer-reviewe
Open Issues, Research Challenges, and Survey on Education Sector in India and Exploring Machine Learning Algorithm to Mitigate These Challenges
The nation's core sector is education. But dealing with problems in educational institutions, particularly in higher education, is a challenging task. The growth of education and technology has led to a number of research challenges that have attracted significant attention as well as a notable increase in the amount of data available in academic databases. Higher education institutions today are worried about outcome-based education and various techniques to assess a student's knowledge level or capacity for learning. In general, there are more contributors in the academic field than there are authors. Research is being done in this field to determine the best algorithm and features that are crucial for predicting the future outcomes. This survey can help educational institutions assess themselves and find any gaps that need to be filled in order to fulfil their purpose and vision. Machine Learning (ML) approaches have been explored to solve the issues as higher education systems have grown in size
Crack detection in paintings using convolutional neural networks
The accurate detection of cracks in paintings, which generally portray rich and varying content, is a challenging task. Traditional crack detection methods are often lacking on recent acquisitions of paintings as they are poorly adapted to high-resolutions and do not make use of the other imaging modalities often at hand. Furthermore, many paintings portray a complex or cluttered composition, significantly complicating a precise detection of cracks when using only photographic material. In this paper, we propose a fast crack detection algorithm based on deep convolutional neural networks (CNN) that is capable of combining several imaging modalities, such as regular photographs, infrared photography and X-Ray images. Moreover, we propose an efficient solution to improve the CNN-based localization of the actual crack boundaries and extend the CNN architecture such that areas where it makes little sense to run expensive learning models are ignored. This allows us to process large resolution scans of paintings more efficiently. The proposed on-line method is capable of continuously learning from newly acquired visual data, thus further improving classification results as more data becomes available. A case study on multimodal acquisitions of the Ghent Altarpiece, taken during the currently ongoing conservation-restoration treatment, shows improvements over the state-of-the-art in crack detection methods and demonstrates the potential of our proposed method in assisting art conservators
- âŠ