Search CORE

71,767 research outputs found

Toward a generic representation of random variables for machine learning

Author: Donnat Philippe
Marti Gautier
Very Philippe
Publication venue
Publication date: 03/09/2015
Field of study

This paper presents a pre-processing and a distance which improve the performance of machine learning algorithms working on independent and identically distributed stochastic processes. We introduce a novel non-parametric approach to represent random variables which splits apart dependency and distribution without losing any information. We also propound an associated metric leveraging this representation and its statistical estimate. Besides experiments on synthetic datasets, the benefits of our contribution is illustrated through the example of clustering financial time series, for instance prices from the credit default swaps market. Results are available on the website www.datagrapple.com and an IPython Notebook tutorial is available at www.datagrapple.com/Tech for reproducible research.Comment: submitted to Pattern Recognition Letter

arXiv.org e-Print Archive

HAL-Polytechnique

Common Biases In Business Research

Author: A Oppenheim
CJ Pannucci
FJ Fowler
G Daniel
J Heckman
L Cohen
M Easterby-Smith
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Crossref

Coventry University Pure Portal

Methodological considerations concerning manual annotation of musical audio in function of algorithm development

Author: De Baets Bernard
Leman Marc
Lesaffre Micheline
Martens Jean-Pierre
Publication venue
Publication date: 01/01/2004
Field of study

In research on musical audio-mining, annotated music databases are needed which allow the development of computational tools that extract from the musical audiostream the kind of high-level content that users can deal with in Music Information Retrieval (MIR) contexts. The notion of musical content, and therefore the notion of annotation, is ill-defined, however, both in the syntactic and semantic sense. As a consequence, annotation has been approached from a variety of perspectives (but mainly linguistic-symbolic oriented), and a general methodology is lacking. This paper is a step towards the definition of a general framework for manual annotation of musical audio in function of a computational approach to musical audio-mining that is based on algorithms that learn from annotated data. 1

CiteSeerX

Ghent University Academic Bibliography

Why We Read Wikipedia

Author: DeMaio T. J.
Gelman A.
Goel S.
Harkness J. A.
Jurgens D.
Kish L.
Klösgen W.
Krug S.
Lee B. K.
Mukhopadhyay P.
Salganik M. J.
Strauss A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Wikipedia is one of the most popular sites on the Web, with millions of users relying on it to satisfy a broad range of information needs every day. Although it is crucial to understand what exactly these needs are in order to be able to meet them, little is currently known about why users visit Wikipedia. The goal of this paper is to fill this gap by combining a survey of Wikipedia readers with a log-based analysis of user activity. Based on an initial series of user surveys, we build a taxonomy of Wikipedia use cases along several dimensions, capturing users' motivations to visit Wikipedia, the depth of knowledge they are seeking, and their knowledge of the topic of interest prior to visiting Wikipedia. Then, we quantify the prevalence of these use cases via a large-scale user survey conducted on live Wikipedia with almost 30,000 responses. Our analyses highlight the variety of factors driving users to Wikipedia, such as current events, media coverage of a topic, personal curiosity, work or school assignments, or boredom. Finally, we match survey responses to the respondents' digital traces in Wikipedia's server logs, enabling the discovery of behavioral patterns associated with specific use cases. For instance, we observe long and fast-paced page sequences across topics for users who are bored or exploring randomly, whereas those using Wikipedia for work or school spend more time on individual articles focused on topics such as science. Our findings advance our understanding of reader motivations and behavior on Wikipedia and can have implications for developers aiming to improve Wikipedia's user experience, editors striving to cater to their readers' needs, third-party services (such as search engines) providing access to Wikipedia content, and researchers aiming to build tools such as recommendation engines.Comment: Published in WWW'17; v2 fixes caption of Table

arXiv.org e-Print Archive

Crossref

MAnnheim DOCument Server

Publikationsserver der RWTH Aachen University

Honesty and Integrity in Economics

Author: Thomas Mayer
Publication venue
Publication date
Field of study

When looked at individually there is little reason to think that economists lack integrity and are dishonest. Yet, when we look at academic papers written by economists we can see biases. This paper tries to reconcile these two observations by arguing that the constraints the profession sets on permitted practices are loose enough to allow economists to maintain their biases while conforming to the mores of their profession. There is little reason to think that economics is worse in this respect than some other fields.honesty, integrity, culture of economics, significance tests, data mining

Research Papers in Economics

Fundamental principles in drawing inference from sequence analysis

Author: King Tom
Publication venue: Southampton Statistical Sciences Research Institute, University of Southampton
Publication date: 15/03/2010
Field of study

Individual life courses are dynamic and can be represented as a sequence of states for some portion of their experiences. More generally, study of such sequences has been made in many fields around social science; for example, sociology, linguistics, psychology, and the conceptualisation of subjects progressing through a sequence of states is common. However, many models and sets of data allow only for the treatment of aggregates or transitions, rather than interpreting whole sequences. The temporal aspect of the analysis is fundamental to any inference about the evolution of the subjects but assumptions about time are not normally made explicit. Moreover, without a clear idea of what sequences look like, it is impossible to determine when something is not seen whether it was not actually there. Some principles are proposed which link the ideas of sequences, hypothesis, analytical framework, categorisation and representation; each one being underpinned by the consideration of time. To make inferences about sequences, one needs to: understand what these sequences represent; the hypothesis and assumptions that can be derived about sequences; identify the categories within the sequences; and data representation at each stage. These ideas are obvious in themselves but they are interlinked, imposing restrictions on each other and on the inferences which can be draw

Southampton (e-Prints Soton)

On the Optimization of Visualizations of Complex Phenomena

Author: Bair Althea D
House Donald H
Ware Colin
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/01/2005
Field of study

The problem of perceptually optimizing complex visualizations is a difficult one, involving perceptual as well as aesthetic issues. In our experience, controlled experiments are quite limited in their ability to uncover interrelationships among visualization parameters, and thus may not be the most useful way to develop rules-of-thumb or theory to guide the production of high-quality visualizations. In this paper, we propose a new experimental approach to optimizing visualization quality that integrates some of the strong points of controlled experiments with methods more suited to investigating complex highly-coupled phenomena. We use human-in-the-loop experiments to search through visualization parameter space, generating large databases of rated visualization solutions. This is followed by data mining to extract results such as exemplar visualizations, guidelines for producing visualizations, and hypotheses about strategies leading to strong visualizations. The approach can easily address both perceptual and aesthetic concerns, and can handle complex parameter interactions. We suggest a genetic algorithm as a valuable way of guiding the human-in-the-loop search through visualization parameter space. We describe our methods for using clustering, histogramming, principal component analysis, and neural networks for data mining. The experimental approach is illustrated with a study of the problem of optimal texturing for viewing layered surfaces so that both surfaces are maximally observable

UNH Scholars' Repository