195,851 research outputs found
A study of hierarchical and flat classification of proteins
Automatic classification of proteins using machine learning is an important problem that has received significant attention in the literature. One feature of this problem is that expert-defined hierarchies of protein classes exist and can potentially be exploited to improve classification performance. In this article we investigate empirically whether this is the case for two such hierarchies. We compare multi-class classification techniques that exploit the information in those class hierarchies and those that do not, using logistic regression, decision trees, bagged decision trees, and support vector machines as the underlying base learners. In particular, we compare hierarchical and flat variants of ensembles of nested dichotomies. The latter have been shown to deliver strong classification performance in multi-class settings. We present experimental results for synthetic, fold recognition, enzyme classification, and remote homology detection data. Our results show that exploiting the class hierarchy improves performance on the synthetic data, but not in the case of the protein classification problems. Based on this we recommend that strong flat multi-class methods be used as a baseline to establish the benefit of exploiting class hierarchies in this area
Human-Machine Collaborative Optimization via Apprenticeship Scheduling
Coordinating agents to complete a set of tasks with intercoupled temporal and
resource constraints is computationally challenging, yet human domain experts
can solve these difficult scheduling problems using paradigms learned through
years of apprenticeship. A process for manually codifying this domain knowledge
within a computational framework is necessary to scale beyond the
``single-expert, single-trainee" apprenticeship model. However, human domain
experts often have difficulty describing their decision-making processes,
causing the codification of this knowledge to become laborious. We propose a
new approach for capturing domain-expert heuristics through a pairwise ranking
formulation. Our approach is model-free and does not require enumerating or
iterating through a large state space. We empirically demonstrate that this
approach accurately learns multifaceted heuristics on a synthetic data set
incorporating job-shop scheduling and vehicle routing problems, as well as on
two real-world data sets consisting of demonstrations of experts solving a
weapon-to-target assignment problem and a hospital resource allocation problem.
We also demonstrate that policies learned from human scheduling demonstration
via apprenticeship learning can substantially improve the efficiency of a
branch-and-bound search for an optimal schedule. We employ this human-machine
collaborative optimization technique on a variant of the weapon-to-target
assignment problem. We demonstrate that this technique generates solutions
substantially superior to those produced by human domain experts at a rate up
to 9.5 times faster than an optimization approach and can be applied to
optimally solve problems twice as complex as those solved by a human
demonstrator.Comment: Portions of this paper were published in the Proceedings of the
International Joint Conference on Artificial Intelligence (IJCAI) in 2016 and
in the Proceedings of Robotics: Science and Systems (RSS) in 2016. The paper
consists of 50 pages with 11 figures and 4 table
A critical assessment of imbalanced class distribution problem: the case of predicting freshmen student attrition
Predicting student attrition is an intriguing yet challenging problem for any academic institution. Class-imbalanced data is a common in the field of student retention, mainly because a lot of students register but fewer students drop out. Classification techniques for imbalanced dataset can yield deceivingly high
prediction accuracy where the overall predictive accuracy is usually driven by the majority class at the expense of having very poor performance on the crucial minority class. In this study, we compared different data balancing techniques to improve the predictive accuracy in minority class while maintaining satisfactory overall classification performance. Specifically, we tested three balancing techniquesāoversampling, under-sampling and synthetic minority over-sampling (SMOTE)āalong with four popular classification methodsālogistic regression, decision trees, neuron networks and support vector machines. We used a large and feature rich institutional student data (between the years 2005 and 2011) to assess the efficacy of both balancing techniques as well as prediction methods. The results indicated that the support vector machine combined with SMOTE data-balancing technique achieved the best classification performance with a 90.24% overall accuracy on the 10-fold holdout sample. All three data-balancing techniques improved the prediction accuracy for the minority class. Applying sensitivity analyses on developed models, we also identified the most important variables for accurate prediction of student attrition. Application of these models has the potential to accurately predict at-risk students and help reduce student dropout rates
Intelligent systems in manufacturing: current developments and future prospects
Global competition and rapidly changing customer requirements are demanding increasing changes in manufacturing environments. Enterprises are required to constantly redesign their products and continuously reconfigure their manufacturing systems. Traditional approaches to manufacturing systems do not fully satisfy this new situation. Many authors have proposed that artificial intelligence will bring the flexibility and efficiency needed by manufacturing systems. This paper is a review of artificial intelligence techniques used in manufacturing systems. The paper first defines the components of a simplified intelligent manufacturing systems (IMS), the different Artificial Intelligence (AI) techniques to be considered and then shows how these AI techniques are used for the components of IMS
A survey on utilization of data mining approaches for dermatological (skin) diseases prediction
Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data
- ā¦