5 research outputs found

    Two to Five Truths in Non-Negative Matrix Factorization

    Full text link
    In this paper, we explore the role of matrix scaling on a matrix of counts when building a topic model using non-negative matrix factorization. We present a scaling inspired by the normalized Laplacian (NL) for graphs that can greatly improve the quality of a non-negative matrix factorization. The results parallel those in the spectral graph clustering work of \cite{Priebe:2019}, where the authors proved adjacency spectral embedding (ASE) spectral clustering was more likely to discover core-periphery partitions and Laplacian Spectral Embedding (LSE) was more likely to discover affinity partitions. In text analysis non-negative matrix factorization (NMF) is typically used on a matrix of co-occurrence ``contexts'' and ``terms" counts. The matrix scaling inspired by LSE gives significant improvement for text topic models in a variety of datasets. We illustrate the dramatic difference a matrix scalings in NMF can greatly improve the quality of a topic model on three datasets where human annotation is available. Using the adjusted Rand index (ARI), a measure cluster similarity we see an increase of 50\% for Twitter data and over 200\% for a newsgroup dataset versus using counts, which is the analogue of ASE. For clean data, such as those from the Document Understanding Conference, NL gives over 40\% improvement over ASE. We conclude with some analysis of this phenomenon and some connections of this scaling with other matrix scaling methods

    Trajectories of change in Mediterranean Holocene vegetation through classification of pollen data

    Get PDF
    © 2017 Springer-Verlag GmbH Germany, part of Springer Nature Quantification of vegetation cover from pollen analysis has been a goal of palynologists since the advent of the method in 1916 by the great Lennart von Post. Pollen-based research projects are becoming increasingly ambitious in scale, and the emergence of spatially extensive open-access datasets, advanced methods and computer power has facilitated sub-continental analysis of Holocene pollen data. This paper presents results of one such study, focussing on the Mediterranean basin. Pollen data from 105 fossil sequences have been extracted from the European Pollen database, harmonised by both taxonomy and chronologies, and subjected to a hierarchical agglomerative clustering method to synthesise the dataset into 16 main groupings. A particular focus of analysis was to describe the common transitions from one group to another to understand pathways of Holocene vegetation change in the Mediterranean. Two pollen-based indices of human impact (OJC: Oleaceae, Juglans, Castanea; API: anthropogenic pollen indicators) have been used to infer the degree of human modification of vegetation within each pollen grouping. Pollen-inferred cluster groups that are interpreted as representing more natural vegetation states show a restricted number of pathways of change. A set of cluster groups were identified that closely resemble anthropogenically-disturbed vegetation, and might be considered anthromes (anthopogenic biomes). These clusters show a very wide set of potential pathways, implying that all potential vegetation communities identified through this analysis have been altered in response to land exploitation and transformation by human societies in combination with other factors, such as climatic change. Future work to explain these ecosystem pathways will require developing complementary datasets from the social sciences and humanities (archaeology and historical sources), along with synthesis of the climatic records from the region

    Physical activity in older age: perspectives for healthy ageing and frailty.

    Get PDF
    Regular physical activity helps to improve physical and mental functions as well as reverse some effects of chronic disease to keep older people mobile and independent. Despite the highly publicised benefits of physical activity, the overwhelming majority of older people in the United Kingdom do not meet the minimum physical activity levels needed to maintain health. The sedentary lifestyles that predominate in older age results in premature onset of ill health, disease and frailty. Local authorities have a responsibility to promote physical activity amongst older people, but knowing how to stimulate regular activity at the population-level is challenging. The physiological rationale for physical activity, risks of adverse events, societal and psychological factors are discussed with a view to inform public health initiatives for the relatively healthy older person as well as those with physical frailty. The evidence shows that regular physical activity is safe for healthy and for frail older people and the risks of developing major cardiovascular and metabolic diseases, obesity, falls, cognitive impairments, osteoporosis and muscular weakness are decreased by regularly completing activities ranging from low intensity walking through to more vigorous sports and resistance exercises. Yet, participation in physical activities remains low amongst older adults, particularly those living in less affluent areas. Older people may be encouraged to increase their activities if influenced by clinicians, family or friends, keeping costs low and enjoyment high, facilitating group-based activities and raising self-efficacy for exercise

    26th Annual Computational Neuroscience Meeting (CNS*2017): Part 3 - Meeting Abstracts - Antwerp, Belgium. 15–20 July 2017

    Get PDF
    This work was produced as part of the activities of FAPESP Research,\ud Disseminations and Innovation Center for Neuromathematics (grant\ud 2013/07699-0, S. Paulo Research Foundation). NLK is supported by a\ud FAPESP postdoctoral fellowship (grant 2016/03855-5). ACR is partially\ud supported by a CNPq fellowship (grant 306251/2014-0)

    <tt>occams</tt>: A Text Summarization Package

    No full text
    Extractive text summarization selects asmall subset of sentences from a document, which gives good “coverage” of a document. When given a set of term weights indicating the importance of the terms, the concept of coverage may be formalized into a combinatorial optimization problem known as the budgeted maximum coverage problem. Extractive methods in this class are known to beamong the best of classic extractive summarization systems. This paper gives a synopsis of thesoftware package occams, which is a multilingual extractive single and multi-document summarization package based on an algorithm giving an optimal approximation to the budgeted maximum coverage problem. The occams package is written in Python and provides an easy-to-use modular interface, allowing it to work in conjunction with popular Python NLP packages, such as nltk, stanza or spacy
    corecore