76,780 research outputs found
CASSL: Curriculum Accelerated Self-Supervised Learning
Recent self-supervised learning approaches focus on using a few thousand data
points to learn policies for high-level, low-dimensional action spaces.
However, scaling this framework for high-dimensional control require either
scaling up the data collection efforts or using a clever sampling strategy for
training. We present a novel approach - Curriculum Accelerated Self-Supervised
Learning (CASSL) - to train policies that map visual information to high-level,
higher- dimensional action spaces. CASSL orders the sampling of training data
based on control dimensions: the learning and sampling are focused on few
control parameters before other parameters. The right curriculum for learning
is suggested by variance-based global sensitivity analysis of the control
space. We apply our CASSL framework to learning how to grasp using an adaptive,
underactuated multi-fingered gripper, a challenging system to control. Our
experimental results indicate that CASSL provides significant improvement and
generalization compared to baseline methods such as staged curriculum learning
(8% increase) and complete end-to-end learning with random exploration (14%
improvement) tested on a set of novel objects
Contextualised Browsing in a Digital Library's Living Lab
Contextualisation has proven to be effective in tailoring \linebreak search
results towards the users' information need. While this is true for a basic
query search, the usage of contextual session information during exploratory
search especially on the level of browsing has so far been underexposed in
research. In this paper, we present two approaches that contextualise browsing
on the level of structured metadata in a Digital Library (DL), (1) one variant
bases on document similarity and (2) one variant utilises implicit session
information, such as queries and different document metadata encountered during
the session of a users. We evaluate our approaches in a living lab environment
using a DL in the social sciences and compare our contextualisation approaches
against a non-contextualised approach. For a period of more than three months
we analysed 47,444 unique retrieval sessions that contain search activities on
the level of browsing. Our results show that a contextualisation of browsing
significantly outperforms our baseline in terms of the position of the first
clicked item in the result set. The mean rank of the first clicked document
(measured as mean first relevant - MFR) was 4.52 using a non-contextualised
ranking compared to 3.04 when re-ranking the result lists based on similarity
to the previously viewed document. Furthermore, we observed that both
contextual approaches show a noticeably higher click-through rate. A
contextualisation based on document similarity leads to almost twice as many
document views compared to the non-contextualised ranking.Comment: 10 pages, 2 figures, paper accepted at JCDL 201
Vaex: Big Data exploration in the era of Gaia
We present a new Python library called vaex, to handle extremely large
tabular datasets, such as astronomical catalogues like the Gaia catalogue,
N-body simulations or any other regular datasets which can be structured in
rows and columns. Fast computations of statistics on regular N-dimensional
grids allows analysis and visualization in the order of a billion rows per
second. We use streaming algorithms, memory mapped files and a zero memory copy
policy to allow exploration of datasets larger than memory, e.g. out-of-core
algorithms. Vaex allows arbitrary (mathematical) transformations using normal
Python expressions and (a subset of) numpy functions which are lazily evaluated
and computed when needed in small chunks, which avoids wasting of RAM. Boolean
expressions (which are also lazily evaluated) can be used to explore subsets of
the data, which we call selections. Vaex uses a similar DataFrame API as
Pandas, a very popular library, which helps migration from Pandas.
Visualization is one of the key points of vaex, and is done using binned
statistics in 1d (e.g. histogram), in 2d (e.g. 2d histograms with colormapping)
and 3d (using volume rendering). Vaex is split in in several packages:
vaex-core for the computational part, vaex-viz for visualization mostly based
on matplotlib, vaex-jupyter for visualization in the Jupyter notebook/lab based
in IPyWidgets, vaex-server for the (optional) client-server communication,
vaex-ui for the Qt based interface, vaex-hdf5 for hdf5 based memory mapped
storage, vaex-astro for astronomy related selections, transformations and
memory mapped (column based) fits storage. Vaex is open source and available
under MIT license on github, documentation and other information can be found
on the main website: https://vaex.io, https://docs.vaex.io or
https://github.com/maartenbreddels/vaexComment: 14 pages, 8 figures, Submitted to A&A, interactive version of Fig 4:
https://vaex.io/paper/fig
A geo-temporal information extraction service for processing descriptive metadata in digital libraries
In the context of digital map libraries, resources are usually described according to metadata records that define the relevant subject, location, time-span, format and keywords. On what concerns locations and time-spans, metadata records are often incomplete or they provide information in a way that is not machine-understandable (e.g. textual descriptions). This paper presents techniques for extracting geotemporal information from text, using relatively simple text mining methods that leverage on a Web gazetteer service. The idea is to go from human-made geotemporal referencing (i.e. using place and period names in textual expressions) into geo-spatial coordinates and time-spans. A prototype system, implementing the proposed methods, is described in detail. Experimental results demonstrate the efficiency and accuracy of the proposed approaches
Customer churn prediction in telecom using machine learning and social network analysis in big data platform
Customer churn is a major problem and one of the most important concerns for
large companies. Due to the direct effect on the revenues of the companies,
especially in the telecom field, companies are seeking to develop means to
predict potential customer to churn. Therefore, finding factors that increase
customer churn is important to take necessary actions to reduce this churn. The
main contribution of our work is to develop a churn prediction model which
assists telecom operators to predict customers who are most likely subject to
churn. The model developed in this work uses machine learning techniques on big
data platform and builds a new way of features' engineering and selection. In
order to measure the performance of the model, the Area Under Curve (AUC)
standard measure is adopted, and the AUC value obtained is 93.3%. Another main
contribution is to use customer social network in the prediction model by
extracting Social Network Analysis (SNA) features. The use of SNA enhanced the
performance of the model from 84 to 93.3% against AUC standard. The model was
prepared and tested through Spark environment by working on a large dataset
created by transforming big raw data provided by SyriaTel telecom company. The
dataset contained all customers' information over 9 months, and was used to
train, test, and evaluate the system at SyriaTel. The model experimented four
algorithms: Decision Tree, Random Forest, Gradient Boosted Machine Tree "GBM"
and Extreme Gradient Boosting "XGBOOST". However, the best results were
obtained by applying XGBOOST algorithm. This algorithm was used for
classification in this churn predictive model.Comment: 24 pages, 14 figures. PDF https://rdcu.be/budK
TopExNet: Entity-Centric Network Topic Exploration in News Streams
The recent introduction of entity-centric implicit network representations of
unstructured text offers novel ways for exploring entity relations in document
collections and streams efficiently and interactively. Here, we present
TopExNet as a tool for exploring entity-centric network topics in streams of
news articles. The application is available as a web service at
https://topexnet.ifi.uni-heidelberg.de/ .Comment: Published in Proceedings of the Twelfth ACM International Conference
on Web Search and Data Mining, WSDM 2019, Melbourne, VIC, Australia, February
11-15, 201
- …