30 research outputs found

    Discovering Periodic Patterns in Historical News

    Get PDF
    We address the problem of observing periodic changes in the behaviour of a large population, by analysing the daily contents of newspapers published in the United States and United Kingdom from 1836 to 1922. This is done by analysing the daily time series of the relative frequency of the 25K most frequent words for each country, resulting in the study of 50K time series for 31,755 days. Behaviours that are found to be strongly periodic include seasonal activities, such as hunting and harvesting. A strong connection with natural cycles is found, with a pronounced presence of fruits, vegetables, flowers and game. Periodicities dictated by religious or civil calendars are also detected and show a different wave-form than those provoked by weather. States that can be revealed include the presence of infectious disease, with clear annual peaks for fever, pneumonia and diarrhoea. Overall, 2% of the words are found to be strongly periodic, and the period most frequently found is 365 days. Comparisons between UK and US, and between modern and historical news, reveal how the fundamental cycles of life are shaped by the seasons, but also how this effect has been reduced in modern times

    A Dataset for Learning Graph Representations to Predict Customer Returns in Fashion Retail

    Get PDF
    We present a novel dataset collected by ASOS (a major online fashion retailer) to address the challenge of predicting customer returns in a fashion retail ecosystem. With the release of this substantial dataset we hope to motivate further collaboration between research communities and the fashion industry. We first explore the structure of this dataset with a focus on the application of Graph Representation Learning in order to exploit the natural data structure and provide statistical insights into particular features within the data. In addition to this, we show examples of a return prediction classification task with a selection of baseline models (i.e. with no intermediate representation learning step) and a graph representation based model. We show that in a downstream return prediction classification task, an F1-score of 0.792 can be found using a Graph Neural Network (GNN), improving upon other models discussed in this work. Alongside this increased F1-score, we also present a lower cross-entropy loss by recasting the data into a graph structure, indicating more robust predictions from a GNN based solution. These results provide evidence that GNNs could provide more impactful and usable classifications than other baseline models on the presented dataset and with this motivation, we hope to encourage further research into graph-based approaches using the ASOS GraphReturns dataset.Comment: The ASOS GraphReturns dataset can be found at https://osf.io/c793h/. Accepted at FashionXRecSys 2022 workshop. Published Versio

    ASOS graph returns dataset

    No full text
    A graph dataset of anonymised customer returns in online fashion retai

    Représentation et apprentissage à partir de textes pour des informations émotionnelles et pour des informations dynamiques

    No full text
    Automatic knowledge extraction from texts consists in mapping lowlevel information, as carried by the words and phrases extracted fromdocuments, to higher level information. The choice of datarepresentation for describing documents is, thus, essential and thedefinition of a learning algorithm is subject to theirspecifics. This thesis addresses these two issues in the context ofemotional information on the one hand and dynamic information on theother.In the first part, we consider the task of emotion extraction forwhich the semantic gap is wider than it is with more traditionalthematic information. Therefore, we propose to study representationsaimed at modeling the many nuances of natural language used fordescribing emotional, hence subjective, information. Furthermore, wepropose to study the integration of semantic knowledge which provides,from a characterization perspective, support for extracting theemotional content of documents and, from a prediction perspective,assistance to the learning algorithm.In the second part, we study information dynamics: any corpus ofdocuments published over the Internet can be associated to sources inperpetual activity which exchange information in a continuousmovement. We explore three main lines of work: automaticallyidentified sources; the communities they form in a dynamic and verysparse description space; and the noteworthy themes they develop. Foreach we propose original extraction methods which we apply to a corpusof real data we have collected from information streams over the Internet.L'extraction de connaissances automatique à partir de textes consiste àmettre en correspondance une information bas niveau, extraite desdocuments au travers des mots et des groupes de mots, avec uneinformation de plus haut niveau. Les choix de représentation pourdécrire les documents sont alors essentiels et leurs particularitéscontraignent la définition de l'algorithme d'apprentissage mis enoeuvre. Les travaux de cette thèse considèrent ces deux problématiquesd'une part pour des informations émotionnelles, d'autre part pour desinformations dynamiques.Dans une première partie, nous considérons une tâche d'extraction desémotions pour laquelle le fossé sémantique est plus important que pourdes informations traditionnellement thématiques. Aussi, nous étudionsdes représentations destinées à capturer les nuances du langage pourdécrire une information subjective puisque émotionnelle. Nous étudionsde plus l'intégration de connaissances sémantiques qui permettent, dans unetâche de caractérisation, d'extraire la charge émotionnelle desdocuments, dans une tâche de prédiction de guider l'apprentissageréalisé.Dans une seconde partie, nous étudions la dynamique de l'information :à tout corpus de documents publié sur Internet peut être associé dessources en perpétuelle activité qui échangent des informations dansun mouvement continu. Nous explorons trois axes d'étude : les sourcesidentifiées, les communautés qu'elles forment dans un espace dynamiquetrès parcimonieux, et les thématiques remarquables qu'ellesdéveloppent. Pour chacun nous proposons des méthodes d'extractionoriginales que nous mettons en oeuvre sur un corpus réel collecté encontinu sur Internet.PARIS-BIUSJ-Mathématiques rech (751052111) / SudocSudocFranceF

    Data from: Circadian mood variations in Twitter content

    No full text
    Background: Circadian regulation of sleep, cognition, and metabolic state is driven by a central clock, which is in turn entrained by environmental signals. Understanding the circadian regulation of mood, which is vital for coping with day-to-day needs, requires large datasets and has classically utilised subjective reporting. Methods: In this study, we use a massive dataset of over 800 million Twitter messages collected over 4 years in the United Kingdom. We extract robust signals of the changes that happened during the course of the day in the collective expression of emotions and fatigue. We use methods of statistical analysis and Fourier analysis to identify periodic structures, extrema, change-points, and compare the stability of these events across seasons and weekends. Results: We reveal strong, but different, circadian patterns for positive and negative moods. The cycles of fatigue and anger appear remarkably stable across seasons and weekend/weekday boundaries. Positive mood and sadness interact more in response to these changing conditions. Anger and, to a lower extent, fatigue show a pattern that inversely mirrors the known circadian variation of plasma cortisol concentrations. Most quantities show a strong inflexion in the morning. Conclusion: Since circadian rhythm and sleep disorders have been reported across the whole spectrum of mood disorders, we suggest that analysis of social media could provide a valuable resource to the understanding of mental disorder

    Diurnal variations of psychometric indicators in Twitter content

    Get PDF
    <div><p>The psychological state of a person is characterised by cognitive and emotional variables which can be inferred by psychometric methods. Using the word lists from the Linguistic Inquiry and Word Count, designed to infer a range of psychological states from the word usage of a person, we studied temporal changes in the average expression of psychological traits in the general population. We sampled the contents of Twitter in the United Kingdom at hourly intervals for a period of four years, revealing a strong diurnal rhythm in most of the psychometric variables, and finding that two independent factors can explain 85% of the variance across their 24-h profiles. The first has peak expression time starting at 5am/6am, it correlates with measures of analytical thinking, with the language of drive (e.g power, and achievement), and personal concerns. It is anticorrelated with the language of negative affect and social concerns. The second factor has peak expression time starting at 3am/4am, it correlates with the language of existential concerns, and anticorrelates with expression of positive emotions. Overall, we see strong evidence that our language changes dramatically between night and day, reflecting changes in our concerns and underlying cognitive and emotional processes. These shifts occur at times associated with major changes in neural activity and hormonal levels.</p></div

    FindMyPast Daily Words

    No full text
    Time Series of Daily Frequencies of 25k words over 87 years in Uk historical newspapers. Release of the daily frequency of the 25K most published words in News content in the United Kingdom between 1st January 1836 and 31st December 1922. The frequency was measured from a representative set of Newspaper across the United Kingdom at the time

    Levels of periodicity in the temporal expression of the 73 psychometric variables at period 24-h.

    No full text
    <p>The label of a category has size proportional to the percentage of variance explained in the four years fluctuations of the indicator by the profile of diurnal variation.</p
    corecore