Search CORE

8,677 research outputs found

Analyzing and Visualizing State Sequences in R with TraMineR

Author: Alexis Gabadinho
Gilbert Ritschard
Matthias Studer
Nicolas S Müller
Publication venue
Publication date
Field of study

This article describes the many capabilities offered by the TraMineR toolbox for categorical sequence data. It focuses more specifically on the analysis and rendering of state sequences. Addressed features include the description of sets of sequences by means of transversal aggregated views, the computation of longitudinal characteristics of individual sequences and the measure of pairwise dissimilarities. Special emphasis is put on the multiple ways of visualizing sequences. The core element of the package is the state se- quence object in which we store the set of sequences together with attributes such as the alphabet, state labels and the color palette. The functions can then easily retrieve this information to ensure presentation homogeneity across all printed and graphical displays. The article also demonstrates how TraMineRÃ¢ÂÂs outcomes give access to advanced analyses such as clustering and statistical modeling of sequence data.

Research Papers in Economics

Feature-based time-series analysis

Author: Fulcher Ben D.
Publication venue
Publication date: 01/10/2017
Field of study

This work presents an introduction to feature-based time-series analysis. The time series as a data type is first described, along with an overview of the interdisciplinary time-series analysis literature. I then summarize the range of feature-based representations for time series that have been developed to aid interpretable insights into time-series structure. Particular emphasis is given to emerging research that facilitates wide comparison of feature-based representations that allow us to understand the properties of a time-series dataset that make it suited to a particular feature-based representation or analysis algorithm. The future of time-series analysis is likely to embrace approaches that exploit machine learning methods to partially automate human learning to aid understanding of the complex dynamical patterns in the time series we measure from the world.Comment: 28 pages, 9 figure

arXiv.org e-Print Archive

Crossref

Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)

Author: Ailamaki A.
Boncz P. A.
Boncz P. A.
Chen M.-S.
Chen M.-S.
DeWitt D. J.
Lang H.
Li Y.
Liu B.
Lohman G. M.
Lu H.
Manegold S.
Ono K.
Schneider D. A.
Shatdal A.
Stillger M.
Zhang N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/07/2015
Field of study

Database management systems (DBMSs) carefully optimize complex multi-join queries to avoid expensive disk I/O. As servers today feature tens or hundreds of gigabytes of RAM, a significant fraction of many analytic databases becomes memory-resident. Even after careful tuning for an in-memory environment, a linear disk I/O model such as the one implemented in PostgreSQL may make query response time predictions that are up to 2X slower than the optimal multi-join query plan over memory-resident data. This paper introduces a memory I/O cost model to identify good evaluation strategies for complex query plans with multiple hash-based equi-joins over memory-resident data. The proposed cost model is carefully validated for accuracy using three different systems, including an Amazon EC2 instance, to control for hardware-specific differences. Prior work in parallel query evaluation has advocated right-deep and bushy trees for multi-join queries due to their greater parallelization and pipelining potential. A surprising finding is that the conventional wisdom from shared-nothing disk-based systems does not directly apply to the modern shared-everything memory hierarchy. As corroborated by our model, the performance gap between the optimal left-deep and right-deep query plan can grow to about 10X as the number of joins in the query increases.Comment: 15 pages, 8 figures, extended version of the paper to appear in SoCC'1

arXiv.org e-Print Archive

Crossref

Unsupervised, Efficient and Semantic Expertise Retrieval

Author: Bailey P.
Balog K.
Cao Y.
Craswell N.
Craswell N.
Davenport T. H.
Glorot X.
Hinton G. E.
Kiros R.
Maybury M. T.
Mikolov T.
Mikolov T.
Mnih A.
Mnih A.
Moreira C.
Rumelhart D.
Shaw J. A.
Sorg P.
Vapnik V.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

We introduce an unsupervised discriminative model for the task of retrieving experts in online document collections. We exclusively employ textual evidence and avoid explicit feature engineering by learning distributed word representations in an unsupervised way. We compare our model to state-of-the-art unsupervised statistical vector space and probabilistic generative approaches. Our proposed log-linear model achieves the retrieval performance levels of state-of-the-art document-centric methods with the low inference cost of so-called profile-centric approaches. It yields a statistically significant improved ranking over vector space and generative models in most cases, matching the performance of supervised methods on various benchmarks. That is, by using solely text we can do as well as methods that work with external evidence and/or relevance feedback. A contrastive analysis of rankings produced by discriminative and generative approaches shows that they have complementary strengths due to the ability of the unsupervised discriminative model to perform semantic matching.Comment: WWW2016, Proceedings of the 25th International Conference on World Wide Web. 201

arXiv.org e-Print Archive

Crossref

UvA-DARE

International Migration, Integration and Social Cohesion online publications