107 research outputs found
Why We Read Wikipedia
Wikipedia is one of the most popular sites on the Web, with millions of users
relying on it to satisfy a broad range of information needs every day. Although
it is crucial to understand what exactly these needs are in order to be able to
meet them, little is currently known about why users visit Wikipedia. The goal
of this paper is to fill this gap by combining a survey of Wikipedia readers
with a log-based analysis of user activity. Based on an initial series of user
surveys, we build a taxonomy of Wikipedia use cases along several dimensions,
capturing users' motivations to visit Wikipedia, the depth of knowledge they
are seeking, and their knowledge of the topic of interest prior to visiting
Wikipedia. Then, we quantify the prevalence of these use cases via a
large-scale user survey conducted on live Wikipedia with almost 30,000
responses. Our analyses highlight the variety of factors driving users to
Wikipedia, such as current events, media coverage of a topic, personal
curiosity, work or school assignments, or boredom. Finally, we match survey
responses to the respondents' digital traces in Wikipedia's server logs,
enabling the discovery of behavioral patterns associated with specific use
cases. For instance, we observe long and fast-paced page sequences across
topics for users who are bored or exploring randomly, whereas those using
Wikipedia for work or school spend more time on individual articles focused on
topics such as science. Our findings advance our understanding of reader
motivations and behavior on Wikipedia and can have implications for developers
aiming to improve Wikipedia's user experience, editors striving to cater to
their readers' needs, third-party services (such as search engines) providing
access to Wikipedia content, and researchers aiming to build tools such as
recommendation engines.Comment: Published in WWW'17; v2 fixes caption of Table
Data mining: a tool for detecting cyclical disturbances in supply networks.
Disturbances in supply chains may be either exogenous or endogenous. The ability automatically to detect, diagnose, and distinguish between the causes of disturbances is of prime importance to decision makers in order to avoid uncertainty. The spectral principal component analysis (SPCA) technique has been utilized to distinguish between real and rogue disturbances in a steel supply network. The data set used was collected from four different business units in the network and consists of 43 variables; each is described by 72 data points. The present paper will utilize the same data set to test an alternative approach to SPCA in detecting the disturbances. The new approach employs statistical data pre-processing, clustering, and classification learning techniques to analyse the supply network data. In particular, the incremental k-means
clustering and the RULES-6 classification rule-learning algorithms, developed by the present authors’ team, have been applied to identify important patterns in the data set. Results show that the proposed approach has the capability automatically to detect and characterize network-wide cyclical disturbances and generate hypotheses about their root cause
Improved comprehensibility and reliability of explanations via restricted halfspace discretization
Abstract. A number of two-class classification methods first discretize each attribute of two given training sets and then construct a propositional DNF formula that evaluates to True for one of the two discretized training sets and to False for the other one. The formula is not just a classification tool but constitutes a useful explanation for the differences between the two underlying populations if it can be comprehended by humans and is reliable. This paper shows that comprehensibility as well as reliability of the formulas can sometimes be improved using a discretization scheme where linear combinations of a small number of attributes are discretized
Caspase-8 binding to cardiolipin in giant unilamellar vesicles provides a functional docking platform for bid
Caspase-8 is involved in death receptor-mediated apoptosis in type II cells, the proapoptotic programme of which is triggered by truncated Bid. Indeed, caspase-8 and Bid are the known intermediates of this signalling pathway. Cardiolipin has been shown to provide an anchor and an essential activating platform for caspase-8 at the mitochondrial membrane surface. Destabilisation of this platform alters receptor-mediated apoptosis in diseases such as Barth Syndrome, which is characterised by the presence of immature cardiolipin which does not allow caspase-8 binding. We used a simplified in vitro system that mimics contact sites and/or cardiolipin-enriched microdomains at the outer mitochondrial surface in which the platform consisting of caspase-8, Bid and cardiolipin was reconstituted in giant unilamellar vesicles. We analysed these vesicles by flow cytometry and confirm previous results that demonstrate the requirement for intact mature cardiolipin for caspase-8 activation and Bid binding and cleavage. We also used confocal microscopy to visualise the rupture of the vesicles and their revesiculation at smaller sizes due to alteration of the curvature following caspase-8 and Bid binding. Biophysical approaches, including Laurdan fluorescence and rupture/tension measurements, were used to determine the ability of these three components (cardiolipin, caspase-8 and Bid) to fulfil the minimal requirements for the formation and function of the platform at the mitochondrial membrane. Our results shed light on the active functional role of cardiolipin, bridging the gap between death receptors and mitochondria
Mining Exceptional Social Behaviour
Essentially, our lives are made of social interactions. These can be recorded through personal gadgets as well as sensors adequately attached to people for research purposes. In particular, such sensors may record real time location of people. This location data can then be used to infer interactions, which may be translated into behavioural patterns. In this paper, we focus on the automatic discovery of exceptional social behaviour from spatio-temporal data. For that, we propose a method for Exceptional Behaviour Discovery (EBD). The proposed method combines Subgroup Discovery and Network Science techniques for finding social behaviour that deviates from the norm. In particular, it transforms movement and demographic data into attributed social interaction networks, and returns descriptive subgroups. We applied the proposed method on two real datasets containing location data from children playing in the school playground. Our results indicate that this is a valid approach which is able to obtain meaningful knowledge from the data.This work has been partially supported by the German Research Foundation (DFG) project “MODUS” (under grant AT 88/4-1). Furthermore, the research leading to these results has received funding (JG) from ESRC grant ES/N006577/1. This work was financed by the project Kids First, project number 68639
Transductive Learning for Spatial Data Classification
Learning classifiers of spatial data presents several issues, such as the heterogeneity of spatial objects, the implicit definition of spatial relationships among objects, the spatial autocorrelation and the abundance of unlabelled data which potentially convey a large amount of information. The first three issues are due to the inherent structure of spatial units of analysis, which can be easily accommodated if a (multi-)relational data mining approach is considered. The fourth issue demands for the adoption of a transductive setting, which aims to make predictions for a given set of unlabelled data. Transduction is also motivated by the contiguity of the concept of positive autocorrelation, which typically affect spatial phenomena, with the smoothness assumption which characterize the transductive setting. In this work, we investigate a relational approach to spatial classification in a transductive setting. Computational solutions to the main difficulties met in this approach are presented. In particular, a relational upgrade of the nave Bayes classifier is proposed as discriminative model, an iterative algorithm is designed for the transductive classification of unlabelled data, and a distance measure between relational descriptions of spatial objects is defined in order to determine the k-nearest neighbors of each example in the dataset. Computational solutions have been tested on two real-world spatial datasets. The transformation of spatial data into a multi-relational representation and experimental results are reported and commented
- …