13 research outputs found
API design for machine learning software: experiences from the scikit-learn project
Scikit-learn is an increasingly popular machine learning li- brary. Written
in Python, it is designed to be simple and efficient, accessible to
non-experts, and reusable in various contexts. In this paper, we present and
discuss our design choices for the application programming interface (API) of
the project. In particular, we describe the simple and elegant interface shared
by all learning and processing units in the library and then discuss its
advantages in terms of composition and reusability. The paper also comments on
implementation details specific to the Python ecosystem and analyzes obstacles
faced by users and developers of the library
Adversarial Attacks on Classifiers for Eye-based User Modelling
An ever-growing body of work has demonstrated the rich information content
available in eye movements for user modelling, e.g. for predicting users'
activities, cognitive processes, or even personality traits. We show that
state-of-the-art classifiers for eye-based user modelling are highly vulnerable
to adversarial examples: small artificial perturbations in gaze input that can
dramatically change a classifier's predictions. We generate these adversarial
examples using the Fast Gradient Sign Method (FGSM) that linearises the
gradient to find suitable perturbations. On the sample task of eye-based
document type recognition we study the success of different adversarial attack
scenarios: with and without knowledge about classifier gradients (white-box vs.
black-box) as well as with and without targeting the attack to a specific
class, In addition, we demonstrate the feasibility of defending against
adversarial attacks by adding adversarial examples to a classifier's training
data.Comment: 9 pages, 7 figure
API design for machine learning software: experiences from the scikit-learn project
scikit-learn is an increasingly popular machine learning library. Written in Python, it is designed to be simple and efficient, accessible to non-experts, and reusable in various contexts. In this paper, we present and discuss our design choices for the application programming interface (API) of the project. In particular, we describe the simple and elegant interface shared by all learning and processing units in the library and then discuss its advantages in terms of composition and reusability. The paper also comments on implementation details specific to the Python ecosystem and analyzes obstacles faced by users and developers of the library
Linguistically Informed Information Retrieval for Contact Center Automation
Customer service departments need to handle an increasing volume of textual data in the form of electronic mail. To handle this volume, some kind of automated processing is required. The aim of the research described in this thesis is to employ techniques from the fields of information retrieval (IR) and natural language processing (NLP) to automate part of the customer service pipeline.
UvA-DARE (Digital Academic Repository) Multi-Emotion Detection in User-Generated Reviews
Abstract. Expressions of emotion abound in user-generated content, whether it be in blogs, reviews, or on social media. Much work has been devoted to detecting and classifying these emotions, but little of it has acknowledged the fact that emotionally charged text may express multiple emotions at the same time. We describe a new dataset of user-generated movie reviews annotated for emotional expressions, and experimentally validate two algorithms that can detect multiple emotions in each sentence of these reviews
PCR-GLOBWB_model: eWaterCycle Development Version
<p>This is the alpha release of the PCR-GLOBWB model, as used in the eWaterCycle project.</p>
<p>This version contains a BMI interface for PCRGlobWB, among other improvements.</p>
<p>It is strongly advised to use the main version of PCRGlobWB whenever possible, as it contains multiple improvements to the model itself not in this version.</p
PattyAnalytics
Patty Analytics aims to register pointclouds that were generated from photos or video to an absolute position, scale and orientation.Pointclouds generated from photos are generally messy; they have holes and floating unidentified objects. In our scripts we assume to have the following information: a map (drivemap) which has an extremely low resolution but has good absolute coordinates; a footprint polygon denoting more or less the latitude and longitude and area of the object (x and y coordinates). Finally, we have the high-resolution pointcloud of the object. By the nature of creating this pointcloud, it is densest at the object, since the photos usually center on this object. In some cases, there are also camera positions available, relative to the object.Reusable point cloud analytics software. Includes segmentation, registration, file format conversion. This makes uses of the python bindings of the Point Cloud Library (PCL).</p
evidence
This release fixes a small bug in how fragments are displayed in the UI.</p
Texcavator
Texcavator allows you to use full-text search on the newspaper archive of the Dutch Royal Library. On top of that, it allows for visualizations like word clouds, time lines and heat maps. It also provides services to enhance your search experience like filtering, stopword removal, normalization and stemming.</p
python-pcl
Python bindings for Point Cloud Librar