62 research outputs found

    Uncovering the structures of privacy research using bibliometric network analysis and topic modelling

    Get PDF
    Purpose This paper aims that privacy research is divided in distinct communities and rarely considered as a singular field, harming its disciplinary identity. The authors collected 119.810 publications and over 3 million references to perform a bibliometric domain analysis as a quantitative approach to uncover the structures within the privacy research field. Design/methodology/approach The bibliometric domain analysis consists of a combined directed network and topic model of published privacy research. The network contains 83,159 publications and 462,633 internal references. A Latent Dirichlet allocation (LDA) topic model from the same dataset offers an additional lens on structure by classifying each publication on 36 topics with the network data. The combined outcomes of these methods are used to investigate the structural position and topical make-up of the privacy research communities. Findings The authors identified the research communities as well as categorised their structural positioning. Four communities form the core of privacy research: individual privacy and law, cloud computing, location data and privacy-preserving data publishing. The latter is a macro-community of data mining, anonymity metrics and differential privacy. Surrounding the core are applied communities. Further removed are communities with little influence, most notably the medical communities that make up 14.4% of the network. The topic model shows system design as a potentially latent community. Noteworthy is the absence of a centralised body of knowledge on organisational privacy management. Originality/value This is the first in-depth, quantitative mapping study of all privacy research

    Narrow lenses for capturing the complexity of fisheries: A topic analysis of fisheries science from 1990 to 2016

    Get PDF
    Despite increased fisheries science output and publication outlets, the global crisis in fisheries management is as present as ever. Since a narrow research focus may be a contributing factor to this failure, this study uncovers topics in fisheries research and their trends over time. This interdisciplinary research evaluates whether science is diversifying fisheries research topics in an attempt to capture the complexity of the fisheries system, or whether it is multiplying research on similar topics, attempting to achieve an in-depth, but possibly marginal, understanding of a few selected components of this system. By utilizing latent Dirichlet allocation as a generative probabilistic topic model, we analyse a unique dataset consisting of 46,582 full-text articles published in the last 26 years in 21 specialized scientific fisheries journals. Among the 25 topics uncovered by the model, only one (Fisheries management) refers to the human dimension of fisheries understood as socio-ecological complex adaptive systems. The most prevalent topics in our dataset directly relating to fisheries refer to Fisheries management, Stock assessment, and Fishing gear, with Fisheries management attracting the most interest. We propose directions for future research focus that most likely could contribute to providing useful advice for successful management of fisheries

    Responding to the voice of the markets: an analysis of Tripadvisor reviews of UK retail markets

    Get PDF
    © 2020, Emerald Publishing Limited. Purpose: The purpose of this paper is to investigate the experience of visitors to UK markets by analysing their Tripadvisor reviews to identify perceived experiential dimensions with a view to informing actions by those responsible for market management to provide a better consumer experience. Design/methodology/approach: This research analysed 41,071 Tripadvisor reviews of 61 UK markets. A latent Dirichlet allocation machine learning algorithm was conducted to identify the experience dimensions of visitors. A text analysis was performed to indicate salience and valence of commonly used words. Findings: Five dimensions of experience are identified: atmosphere, merchandise, local variety, food and disappointment, together with the underlying factors that drive positive experience. Practical implications: Place and market managers should assess and position their market informed by diverse experiential dimensions. They should also improve and enhance the experience of visitors according to the underlying factors of each dimension. Originality/value: Retail markets have historically played an important role in the development of urban places. However, the ability to continue performing this role requires a greater understanding of how markets are perceived by those who use them. One way to achieve this is to use emergent technologies to inform decision-making by those responsible for their management. It demonstrates the potential of a new analytical technique using digital technologies to improve one of the oldest forms of retailing

    The computational neurology of active vision

    Get PDF
    In this thesis, we appeal to recent developments in theoretical neurobiology – namely, active inference – to understand the active visual system and its disorders. Chapter 1 reviews the neurobiology of active vision. This introduces some of the key conceptual themes around attention and inference that recur through subsequent chapters. Chapter 2 provides a technical overview of active inference, and its interpretation in terms of message passing between populations of neurons. Chapter 3 applies the material in Chapter 2 to provide a computational characterisation of the oculomotor system. This deals with two key challenges in active vision: deciding where to look, and working out how to look there. The homology between this message passing and the brain networks solving these inference problems provide a basis for in silico lesion experiments, and an account of the aberrant neural computations that give rise to clinical oculomotor signs (including internuclear ophthalmoplegia). Chapter 4 picks up on the role of uncertainty resolution in deciding where to look, and examines the role of beliefs about the quality (or precision) of data in perceptual inference. We illustrate how abnormal prior beliefs influence inferences about uncertainty and give rise to neuromodulatory changes and visual hallucinatory phenomena (of the sort associated with synucleinopathies). We then demonstrate how synthetic pharmacological perturbations that alter these neuromodulatory systems give rise to the oculomotor changes associated with drugs acting upon these systems. Chapter 5 develops a model of visual neglect, using an oculomotor version of a line cancellation task. We then test a prediction of this model using magnetoencephalography and dynamic causal modelling. Chapter 6 concludes by situating the work in this thesis in the context of computational neurology. This illustrates how the variational principles used here to characterise the active visual system may be generalised to other sensorimotor systems and their disorders

    Bayesian approaches of mixture copulas with applications

    Get PDF
    Copula theory has become one of the most important ideologies and methodologies for modeling the dependence among random variables. Rather than using point performance metrics such as Pearson linear correlation, copula functions enable us to construct the multivariate distributions among the concerned random variables by starting from the corresponding marginal distributions. Hence, it gives us a full description of the dependence mode. The most frequently used copula models are parametric copulas such as Gaussian, Clayton, and Gumbel copulas. However, in many practical scenarios, these copulas often fail to fully describe the dependence as real data often contain complex patterns with multi-modals. In addition, classic copulas are mostly studied in their bivariate form, leaving the application of copulas into higher dimensional data non-trivial. This thesis intends to approach the above-mentioned problems by utilizing Bayesian samplers into mixture copulas. In particular, we study the problems of estimating, selecting, and simulating mixture components of copulas by using Bayesian approaches. Families of multivariate elliptical and skew-elliptical copulas are given special attention as they can be naturally extended to higher dimensions. For applications, we apply our proposed approaches to study the dependence among financial markets. Meanwhile, we extend the application of our Bayesian mixture copulas to improve the oversampling methods for imbalance learning problems in the field of data science. The thesis mainly consists of four major parts. In the first part, we applied the Bayesian sparse finite mixture model to the copula mixture modeling, which enables us to estimate and select the correct finite mixture copulas simultaneously without having to repeatedly estimate various forms of models and compare their AICs or BICs. The second part focused on the construction of infinite mixture t copulas using the Dirichlet process prior. Although we are concentrated on the t copulas due to their usefulness in financial applications. This approach can be extended to more general copulas. The approaches further advance the previously proposed finite mixture Bayesian approaches despite being more complicated in terms of modeling. The third part further extends previous parts to construct the non-parametric Bayesian copula mixture models for serially correlated data. In particular, we discuss the modeling of the hidden Markov models (HMM) with multivariate emission distributions. We use copula theories to decompose the construction of multivariate emission distributions into univariate marginal distributions and a dependence structure. Meanwhile, many real-life applications of HMM have an unknown number of states, which need to be manually specified by analysts if the classic HMM method is used. Introducing the hierarchical Dirichlet process into the Copula-HMM model enables us to infer the number of unknown states from the dataset automatically. We thoroughly introduce the inference method of this non-parametric Bayesian copula-HMM model therein. The final part is about the introduction and study of the evaluation metrics of imbalance learning problems as well as applying the mixture copulas approach to solving the data imbalance. One major obstacle of applying the copulas approach to imbalanced datasets is the high dimensional features of many tasks. On the other hand, data science applications often include features that are discrete-valued, while most of the copulas literature only deals with continuous random vectors. Therefore, we develop the MCMC approaches for estimating the mixed valued copulas (i.e., the copula contains both continuous and discrete valued variables) and apply them to estimate the dataset and perform the oversampling. The Bayesian approach would be useful in these tasks as the real applications often involve high dimensional large dataset, whereas the classic MLE approaches struggle in this case due to the exponential complexity in evaluating the discrete dimensions. The approaches are applied to the simulated dataset to prove its validity in the paper. Meanwhile, the real oversampling task is performed using mixture copulas, and the results are compared with the classic random oversampling and the SMOTE approaches

    Pre-trained Convolutional Networks and generative statiscial models: a study in semi-supervised learning

    Get PDF
    Comparative study between the performance of Convolutional Networks using pretrained models and statistical generative models on tasks of image classification in semi-supervised enviroments.Study of multiple ensembles using these techniques and generated data from estimated pdfs.Pretrained Convents, LDA, pLSA, Fisher Vectors, Sparse-coded SPMs, TSVMs being the key models worked upon

    Modeling and Understanding Communities in Online Social Media using Probabilistic Methods

    Get PDF
    The amount of multimedia content is on a constant increase, and people interact with each other and with content on a daily basis through social media systems. The goal of this thesis was to model and understand emerging online communities that revolve around multimedia content, more specifically photos, by using large-scale data and probabilistic models in a quantitative approach. The dissertation has four contributions. First, using data from two online photo management systems, this thesis examined different aspects of the behavior of users of these systems pertaining to the uploading and sharing of photos with other users and online groups. Second, probabilistic topic models were used to model online entities, such as users and groups of users, and the new proposed representations were shown to be useful for further understanding such entities, as well as to have practical applications in search and recommendation scenarios. Third, by jointly modeling users from two different social photo systems, it was shown that differences at the level of vocabulary exist, and different sharing behaviors can be observed. Finally, by modeling online user groups as entities in a topic-based model, hyper-communities were discovered in an automatic fashion based on various topic-based representations. These hyper-communities were shown, both through an objective and a subjective evaluation with a number of users, to be generally homogeneous, and therefore likely to constitute a viable exploration technique for online communities

    A comparison of the CAR and DAGAR spatial random effects models with an application to diabetics rate estimation in Belgium

    Get PDF
    When hierarchically modelling an epidemiological phenomenon on a finite collection of sites in space, one must always take a latent spatial effect into account in order to capture the correlation structure that links the phenomenon to the territory. In this work, we compare two autoregressive spatial models that can be used for this purpose: the classical CAR model and the more recent DAGAR model. Differently from the former, the latter has a desirable property: its ρ parameter can be naturally interpreted as the average neighbor pair correlation and, in addition, this parameter can be directly estimated when the effect is modelled using a DAGAR rather than a CAR structure. As an application, we model the diabetics rate in Belgium in 2014 and show the adequacy of these models in predicting the response variable when no covariates are available
    corecore