28,822 research outputs found
Learning the Relation between Code Features and Code Transforms with Structured Prediction
We present in this paper the first approach for structurally predicting code
transforms at the level of AST nodes using conditional random fields. Our
approach first learns offline a probabilistic model that captures how certain
code transforms are applied to certain AST nodes, and then uses the learned
model to predict transforms for new, unseen code snippets. We implement our
approach in the context of repair transform prediction for Java programs. Our
implementation contains a set of carefully designed code features, deals with
the training data imbalance issue, and comprises transform constraints that are
specific to code. We conduct a large-scale experimental evaluation based on a
dataset of 4,590,679 bug fixing commits from real-world Java projects. The
experimental results show that our approach predicts the code transforms with a
success rate varying from 37.1% to 61.1% depending on the transforms
kLog: A Language for Logical and Relational Learning with Kernels
We introduce kLog, a novel approach to statistical relational learning.
Unlike standard approaches, kLog does not represent a probability distribution
directly. It is rather a language to perform kernel-based learning on
expressive logical and relational representations. kLog allows users to specify
learning problems declaratively. It builds on simple but powerful concepts:
learning from interpretations, entity/relationship data modeling, logic
programming, and deductive databases. Access by the kernel to the rich
representation is mediated by a technique we call graphicalization: the
relational representation is first transformed into a graph --- in particular,
a grounded entity/relationship diagram. Subsequently, a choice of graph kernel
defines the feature space. kLog supports mixed numerical and symbolic data, as
well as background knowledge in the form of Prolog or Datalog programs as in
inductive logic programming systems. The kLog framework can be applied to
tackle the same range of tasks that has made statistical relational learning so
popular, including classification, regression, multitask learning, and
collective classification. We also report about empirical comparisons, showing
that kLog can be either more accurate, or much faster at the same level of
accuracy, than Tilde and Alchemy. kLog is GPLv3 licensed and is available at
http://klog.dinfo.unifi.it along with tutorials
Machine learning to analyze single-case data : a proof of concept
Visual analysis is the most commonly used method for interpreting data from singlecase designs, but levels of interrater agreement remain a concern. Although structured
aids to visual analysis such as the dual-criteria (DC) method may increase interrater
agreement, the accuracy of the analyses may still benefit from improvements. Thus, the
purpose of our study was to (a) examine correspondence between visual analysis and
models derived from different machine learning algorithms, and (b) compare the
accuracy, Type I error rate and power of each of our models with those produced by
the DC method. We trained our models on a previously published dataset and then
conducted analyses on both nonsimulated and simulated graphs. All our models
derived from machine learning algorithms matched the interpretation of the visual
analysts more frequently than the DC method. Furthermore, the machine learning
algorithms outperformed the DC method on accuracy, Type I error rate, and power.
Our results support the somewhat unorthodox proposition that behavior analysts may
use machine learning algorithms to supplement their visual analysis of single-case data,
but more research is needed to examine the potential benefits and drawbacks of such an
approach
Feature Reinforcement Learning: Part I: Unstructured MDPs
General-purpose, intelligent, learning agents cycle through sequences of
observations, actions, and rewards that are complex, uncertain, unknown, and
non-Markovian. On the other hand, reinforcement learning is well-developed for
small finite state Markov decision processes (MDPs). Up to now, extracting the
right state representations out of bare observations, that is, reducing the
general agent setup to the MDP framework, is an art that involves significant
effort by designers. The primary goal of this work is to automate the reduction
process and thereby significantly expand the scope of many existing
reinforcement learning algorithms and the agents that employ them. Before we
can think of mechanizing this search for suitable MDPs, we need a formal
objective criterion. The main contribution of this article is to develop such a
criterion. I also integrate the various parts into one learning algorithm.
Extensions to more realistic dynamic Bayesian networks are developed in Part
II. The role of POMDPs is also considered there.Comment: 24 LaTeX pages, 5 diagram
Predicting ConceptNet Path Quality Using Crowdsourced Assessments of Naturalness
In many applications, it is important to characterize the way in which two
concepts are semantically related. Knowledge graphs such as ConceptNet provide
a rich source of information for such characterizations by encoding relations
between concepts as edges in a graph. When two concepts are not directly
connected by an edge, their relationship can still be described in terms of the
paths that connect them. Unfortunately, many of these paths are uninformative
and noisy, which means that the success of applications that use such path
features crucially relies on their ability to select high-quality paths. In
existing applications, this path selection process is based on relatively
simple heuristics. In this paper we instead propose to learn to predict path
quality from crowdsourced human assessments. Since we are interested in a
generic task-independent notion of quality, we simply ask human participants to
rank paths according to their subjective assessment of the paths' naturalness,
without attempting to define naturalness or steering the participants towards
particular indicators of quality. We show that a neural network model trained
on these assessments is able to predict human judgments on unseen paths with
near optimal performance. Most notably, we find that the resulting path
selection method is substantially better than the current heuristic approaches
at identifying meaningful paths.Comment: In Proceedings of the Web Conference (WWW) 201
NLSC: Unrestricted Natural Language-based Service Composition through Sentence Embeddings
Current approaches for service composition (assemblies of atomic services)
require developers to use: (a) domain-specific semantics to formalize services
that restrict the vocabulary for their descriptions, and (b) translation
mechanisms for service retrieval to convert unstructured user requests to
strongly-typed semantic representations. In our work, we argue that effort to
developing service descriptions, request translations, and matching mechanisms
could be reduced using unrestricted natural language; allowing both: (1)
end-users to intuitively express their needs using natural language, and (2)
service developers to develop services without relying on syntactic/semantic
description languages. Although there are some natural language-based service
composition approaches, they restrict service retrieval to syntactic/semantic
matching. With recent developments in Machine learning and Natural Language
Processing, we motivate the use of Sentence Embeddings by leveraging richer
semantic representations of sentences for service description, matching and
retrieval. Experimental results show that service composition development
effort may be reduced by more than 44\% while keeping a high precision/recall
when matching high-level user requests with low-level service method
invocations.Comment: This paper will appear on SCC'19 (IEEE International Conference on
Services Computing) on July 1
- …