114,376 research outputs found
API design for machine learning software: experiences from the scikit-learn project
Scikit-learn is an increasingly popular machine learning li- brary. Written
in Python, it is designed to be simple and efficient, accessible to
non-experts, and reusable in various contexts. In this paper, we present and
discuss our design choices for the application programming interface (API) of
the project. In particular, we describe the simple and elegant interface shared
by all learning and processing units in the library and then discuss its
advantages in terms of composition and reusability. The paper also comments on
implementation details specific to the Python ecosystem and analyzes obstacles
faced by users and developers of the library
QuesNet: A Unified Representation for Heterogeneous Test Questions
Understanding learning materials (e.g. test questions) is a crucial issue in
online learning systems, which can promote many applications in education
domain. Unfortunately, many supervised approaches suffer from the problem of
scarce human labeled data, whereas abundant unlabeled resources are highly
underutilized. To alleviate this problem, an effective solution is to use
pre-trained representations for question understanding. However, existing
pre-training methods in NLP area are infeasible to learn test question
representations due to several domain-specific characteristics in education.
First, questions usually comprise of heterogeneous data including content text,
images and side information. Second, there exists both basic linguistic
information as well as domain logic and knowledge. To this end, in this paper,
we propose a novel pre-training method, namely QuesNet, for comprehensively
learning question representations. Specifically, we first design a unified
framework to aggregate question information with its heterogeneous inputs into
a comprehensive vector. Then we propose a two-level hierarchical pre-training
algorithm to learn better understanding of test questions in an unsupervised
way. Here, a novel holed language model objective is developed to extract
low-level linguistic features, and a domain-oriented objective is proposed to
learn high-level logic and knowledge. Moreover, we show that QuesNet has good
capability of being fine-tuned in many question-based tasks. We conduct
extensive experiments on large-scale real-world question data, where the
experimental results clearly demonstrate the effectiveness of QuesNet for
question understanding as well as its superior applicability
FSL-BM: Fuzzy Supervised Learning with Binary Meta-Feature for Classification
This paper introduces a novel real-time Fuzzy Supervised Learning with Binary
Meta-Feature (FSL-BM) for big data classification task. The study of real-time
algorithms addresses several major concerns, which are namely: accuracy, memory
consumption, and ability to stretch assumptions and time complexity. Attaining
a fast computational model providing fuzzy logic and supervised learning is one
of the main challenges in the machine learning. In this research paper, we
present FSL-BM algorithm as an efficient solution of supervised learning with
fuzzy logic processing using binary meta-feature representation using Hamming
Distance and Hash function to relax assumptions. While many studies focused on
reducing time complexity and increasing accuracy during the last decade, the
novel contribution of this proposed solution comes through integration of
Hamming Distance, Hash function, binary meta-features, binary classification to
provide real time supervised method. Hash Tables (HT) component gives a fast
access to existing indices; and therefore, the generation of new indices in a
constant time complexity, which supersedes existing fuzzy supervised algorithms
with better or comparable results. To summarize, the main contribution of this
technique for real-time Fuzzy Supervised Learning is to represent hypothesis
through binary input as meta-feature space and creating the Fuzzy Supervised
Hash table to train and validate model.Comment: FICC201
- …