Search CORE

76,780 research outputs found

CASSL: Curriculum Accelerated Self-Supervised Learning

Author: Gandhi Dhiraj
Gupta Abhinav
Murali Adithyavairavan
Pinto Lerrel
Publication venue
Publication date: 12/02/2018
Field of study

Recent self-supervised learning approaches focus on using a few thousand data points to learn policies for high-level, low-dimensional action spaces. However, scaling this framework for high-dimensional control require either scaling up the data collection efforts or using a clever sampling strategy for training. We present a novel approach - Curriculum Accelerated Self-Supervised Learning (CASSL) - to train policies that map visual information to high-level, higher- dimensional action spaces. CASSL orders the sampling of training data based on control dimensions: the learning and sampling are focused on few control parameters before other parameters. The right curriculum for learning is suggested by variance-based global sensitivity analysis of the control space. We apply our CASSL framework to learning how to grasp using an adaptive, underactuated multi-fingered gripper, a challenging system to control. Our experimental results indicate that CASSL provides significant improvement and generalization compared to baseline methods such as staged curriculum learning (8% increase) and complete end-to-end learning with random exploration (14% improvement) tested on a set of novel objects

arXiv.org e-Print Archive

Crossref

Contextualised Browsing in a Digital Library's Living Lab

Author: Belkin Nicholas J.
Carevic Zeljko
Kanoulas Evangelos
Mayr Philipp
Pharo Nils
Sepliarskaia Anna
White Ryen W
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/04/2018
Field of study

Contextualisation has proven to be effective in tailoring \linebreak search results towards the users' information need. While this is true for a basic query search, the usage of contextual session information during exploratory search especially on the level of browsing has so far been underexposed in research. In this paper, we present two approaches that contextualise browsing on the level of structured metadata in a Digital Library (DL), (1) one variant bases on document similarity and (2) one variant utilises implicit session information, such as queries and different document metadata encountered during the session of a users. We evaluate our approaches in a living lab environment using a DL in the social sciences and compare our contextualisation approaches against a non-contextualised approach. For a period of more than three months we analysed 47,444 unique retrieval sessions that contain search activities on the level of browsing. Our results show that a contextualisation of browsing significantly outperforms our baseline in terms of the position of the first clicked item in the result set. The mean rank of the first clicked document (measured as mean first relevant - MFR) was 4.52 using a non-contextualised ranking compared to 3.04 when re-ranking the result lists based on similarity to the previously viewed document. Furthermore, we observed that both contextual approaches show a noticeably higher click-through rate. A contextualisation based on document similarity leads to almost twice as many document views compared to the non-contextualised ranking.Comment: 10 pages, 2 figures, paper accepted at JCDL 201

arXiv.org e-Print Archive

Crossref

Vaex: Big Data exploration in the era of Gaia

Author: Breddels Maarten A.
Veljanoski Jovan
Publication venue: 'EDP Sciences'
Publication date: 08/01/2018
Field of study

We present a new Python library called vaex, to handle extremely large tabular datasets, such as astronomical catalogues like the Gaia catalogue, N-body simulations or any other regular datasets which can be structured in rows and columns. Fast computations of statistics on regular N-dimensional grids allows analysis and visualization in the order of a billion rows per second. We use streaming algorithms, memory mapped files and a zero memory copy policy to allow exploration of datasets larger than memory, e.g. out-of-core algorithms. Vaex allows arbitrary (mathematical) transformations using normal Python expressions and (a subset of) numpy functions which are lazily evaluated and computed when needed in small chunks, which avoids wasting of RAM. Boolean expressions (which are also lazily evaluated) can be used to explore subsets of the data, which we call selections. Vaex uses a similar DataFrame API as Pandas, a very popular library, which helps migration from Pandas. Visualization is one of the key points of vaex, and is done using binned statistics in 1d (e.g. histogram), in 2d (e.g. 2d histograms with colormapping) and 3d (using volume rendering). Vaex is split in in several packages: vaex-core for the computational part, vaex-viz for visualization mostly based on matplotlib, vaex-jupyter for visualization in the Jupyter notebook/lab based in IPyWidgets, vaex-server for the (optional) client-server communication, vaex-ui for the Qt based interface, vaex-hdf5 for hdf5 based memory mapped storage, vaex-astro for astronomy related selections, transformations and memory mapped (column based) fits storage. Vaex is open source and available under MIT license on github, documentation and other information can be found on the main website: https://vaex.io, https://docs.vaex.io or https://github.com/maartenbreddels/vaexComment: 14 pages, 8 figures, Submitted to A&A, interactive version of Fig 4: https://vaex.io/paper/fig

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

EDP Sciences OAI-PMH repository (1.2.0)

ARTS repository - University of Groningen

Dissertations of the University of Groningen

A geo-temporal information extraction service for processing descriptive metadata in digital libraries

Author: Borbinha José
Manguinhas H.
Martins Bruno
Siabato Vaca Willington Libardo
Publication venue: E.T.S.I. en Topografía, Geodesia y Cartografía (UPM)
Publication date: 01/01/2009
Field of study

In the context of digital map libraries, resources are usually described according to metadata records that define the relevant subject, location, time-span, format and keywords. On what concerns locations and time-spans, metadata records are often incomplete or they provide information in a way that is not machine-understandable (e.g. textual descriptions). This paper presents techniques for extracting geotemporal information from text, using relatively simple text mining methods that leverage on a Web gazetteer service. The idea is to go from human-made geotemporal referencing (i.e. using place and period names in textual expressions) into geo-spatial coordinates and time-spans. A prototype system, implementing the proposed methods, is described in detail. Experimental results demonstrate the efficiency and accuracy of the proposed approaches

Archivo Digital UPM

Customer churn prediction in telecom using machine learning and social network analysis in big data platform

Author: Ahmad Abdelrahim Kasem
Aljoumaa Kadan
Jafar Assef
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2019
Field of study

Customer churn is a major problem and one of the most important concerns for large companies. Due to the direct effect on the revenues of the companies, especially in the telecom field, companies are seeking to develop means to predict potential customer to churn. Therefore, finding factors that increase customer churn is important to take necessary actions to reduce this churn. The main contribution of our work is to develop a churn prediction model which assists telecom operators to predict customers who are most likely subject to churn. The model developed in this work uses machine learning techniques on big data platform and builds a new way of features' engineering and selection. In order to measure the performance of the model, the Area Under Curve (AUC) standard measure is adopted, and the AUC value obtained is 93.3%. Another main contribution is to use customer social network in the prediction model by extracting Social Network Analysis (SNA) features. The use of SNA enhanced the performance of the model from 84 to 93.3% against AUC standard. The model was prepared and tested through Spark environment by working on a large dataset created by transforming big raw data provided by SyriaTel telecom company. The dataset contained all customers' information over 9 months, and was used to train, test, and evaluate the system at SyriaTel. The model experimented four algorithms: Decision Tree, Random Forest, Gradient Boosted Machine Tree "GBM" and Extreme Gradient Boosting "XGBOOST". However, the best results were obtained by applying XGBOOST algorithm. This algorithm was used for classification in this churn predictive model.Comment: 24 pages, 14 figures. PDF https://rdcu.be/budK

arXiv.org e-Print Archive

Directory of Open Access Journals

TopExNet: Entity-Centric Network Topic Exploration in News Streams

Author: Almasian Satya
Gertz Michael
Spitz Andreas
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 31/05/2019
Field of study

The recent introduction of entity-centric implicit network representations of unstructured text offers novel ways for exploring entity relations in document collections and streams efficiently and interactively. Here, we present TopExNet as a tool for exploring entity-centric network topics in streams of news articles. The application is available as a web service at https://topexnet.ifi.uni-heidelberg.de/ .Comment: Published in Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, WSDM 2019, Melbourne, VIC, Australia, February 11-15, 201

arXiv.org e-Print Archive

Crossref