17,836 research outputs found

    Effective Unsupervised Author Disambiguation with Relative Frequencies

    Full text link
    This work addresses the problem of author name homonymy in the Web of Science. Aiming for an efficient, simple and straightforward solution, we introduce a novel probabilistic similarity measure for author name disambiguation based on feature overlap. Using the researcher-ID available for a subset of the Web of Science, we evaluate the application of this measure in the context of agglomeratively clustering author mentions. We focus on a concise evaluation that shows clearly for which problem setups and at which time during the clustering process our approach works best. In contrast to most other works in this field, we are sceptical towards the performance of author name disambiguation methods in general and compare our approach to the trivial single-cluster baseline. Our results are presented separately for each correct clustering size as we can explain that, when treating all cases together, the trivial baseline and more sophisticated approaches are hardly distinguishable in terms of evaluation results. Our model shows state-of-the-art performance for all correct clustering sizes without any discriminative training and with tuning only one convergence parameter.Comment: Proceedings of JCDL 201

    An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets

    Full text link
    As advances in technology allow for the collection, storage, and analysis of vast amounts of data, the task of screening and assessing the significance of discovered patterns is becoming a major challenge in data mining applications. In this work, we address significance in the context of frequent itemset mining. Specifically, we develop a novel methodology to identify a meaningful support threshold s* for a dataset, such that the number of itemsets with support at least s* represents a substantial deviation from what would be expected in a random dataset with the same number of transactions and the same individual item frequencies. These itemsets can then be flagged as statistically significant with a small false discovery rate. We present extensive experimental results to substantiate the effectiveness of our methodology.Comment: A preliminary version of this work was presented in ACM PODS 2009. 20 pages, 0 figure

    Uncertainty-wise Test Case Generation and Minimization for Cyber-Physical Systems

    Get PDF
    Cyber-Physical Systems (CPSs) typically operate in highly indeterminateenvironmental conditions, which require the development of testing methods that must explicitly consider uncertainty in test design, test generation, and test optimization. Towards this direction, we propose a set of uncertainty-wise test case generation and test case minimizationstrategies that rely on test ready models explicitly specifying subjective uncertainty. We propose two test case generation strategies and four test case minimizationstrategies based on the Uncertainty Theory and multi-objectivesearch. These strategies include a novel methodology for designing and introducing indeterminacy sources in the environment during test execution and a novel set of uncertainty-wise test verdicts. We performed an extensive empirical study to select the bestalgorithm out of eight commonly used multi-objective search algorithms, for each of the four minimizationstrategies, with five use cases of two industrial CPS case studies. The minimizedset of test cases obtained with the best algorithm for each minimizationstrategy were executedon the two real CPSs. The results showed that our best test strategy managed to observe 51% more uncertainties due to unknown indeterminate behaviorsof the physical environmentsof the CPSs as compared to the other test strategies. Also, the same test strategy managed to observe 118% more unknown uncertainties as compared to the unique number of known uncertainties.submittedVersio

    Depression, Relationship Quality, and Couples’ Demand/Withdraw and Demand/Submit Sequential Interactions

    Get PDF
    This study investigated the associations among depression, relationship quality, and demand/withdraw and demand/submit behavior in couples’ conflict interactions. Two 10-min conflict interactions were coded for each couple (N = 97) using Structural Analysis of Social Behavior (SASB; Benjamin, 1979a, 1987, 2000a). Depression was assessed categorically (via the presence of depressive disorders) and dimensionally (via symptom reports). Results revealed that relationship quality was negatively associated with demanding behavior, as well as receiving submissive or withdrawing behavior from one’s partner. Relationship quality was positively associated with withdrawal. Demanding behavior was positively associated with women’s depression symptoms but negatively associated with men’s depression symptoms. Sequential analysis revealed couples’ behavior was highly stable across time. Initiation of demand/withdraw and demand/submit sequences were negatively associated with partners’ relationship adjustment. Female demand/male withdraw was positively associated with men’s depression diagnosis. Results underscore the importance of sequential analysis when investigating associations among depression, relationship quality, and couples’ interpersonal behavior

    A temporal switch model for estimating transcriptional activity in gene expression

    Get PDF
    Motivation: The analysis and mechanistic modelling of time series gene expression data provided by techniques such as microarrays, NanoString, reverse transcription–polymerase chain reaction and advanced sequencing are invaluable for developing an understanding of the variation in key biological processes. We address this by proposing the estimation of a flexible dynamic model, which decouples temporal synthesis and degradation of mRNA and, hence, allows for transcriptional activity to switch between different states. Results: The model is flexible enough to capture a variety of observed transcriptional dynamics, including oscillatory behaviour, in a way that is compatible with the demands imposed by the quality, time-resolution and quantity of the data. We show that the timing and number of switch events in transcriptional activity can be estimated alongside individual gene mRNA stability with the help of a Bayesian reversible jump Markov chain Monte Carlo algorithm. To demonstrate the methodology, we focus on modelling the wild-type behaviour of a selection of 200 circadian genes of the model plant Arabidopsis thaliana. The results support the idea that using a mechanistic model to identify transcriptional switch points is likely to strongly contribute to efforts in elucidating and understanding key biological processes, such as transcription and degradation

    Assessment of apparent nonstationarity in time series of annual inflow, daily precipitation, and atmospheric circulation indices: A case study from southwest Western Australia

    Get PDF
    The southwest region of Western Australia has experienced a sustained sequence of low annual inflows to major water supply dams over the past 30 years. Until recently, the dominant interpretation of this phenomenon has been predicated on the existence of one or more sharp breaks (change or jump points), with inflows fluctuating around relatively constant levels between them. This paper revisits this interpretation. To understand the mechanisms behind the changes, we also analyze daily precipitation series at multiple sites in the vicinity and time series for several indices of regional atmospheric circulation that may be considered as drivers of regional precipitation. We focus on the winter half-year for the region (May to October) as up to 80% of annual precipitation occurs during this "season". We find that the decline in the annual inflow is in fact more consistent with a smooth declining trend than with a sequence of sharp breaks, the decline is associated with decreases both in the frequency of daily precipitation occurrence and in wet-day amounts, and the decline in regional precipitation is strongly associated with a marked decrease in moisture content in the lower troposphere, an increase in regionally averaged sea level pressure in the first half of the season, and intraseasonal changes in the regional north-south sea level pressure gradient. Overall, our approach provides an integrated understanding of the linkages between declining dam inflows, declining precipitation, and changes in regional atmospheric circulation that favor drier conditions

    Estimation of temporal covariances in pathogen dynamics using Bayesian multivariate autoregressive models

    Get PDF
    It is well recognised that animal and plant pathogens form complex ecological communities of interacting organisms within their hosts, and there is growing interest in the health implications of such pathogen interactions. Although community ecology approaches have been used to identify pathogen interactions at the within-host scale, methodologies enabling robust identification of interactions from population-scale data such as that available from health authorities are lacking. To address this gap, we developed a statistical framework that jointly identifies interactions between multiple viruses from contemporaneous non-stationary infection time series. Our conceptual approach is derived from a Bayesian multivariate disease mapping framework. Importantly, our approach captures within- and between-year dependencies in infection risk while controlling for confounding factors such as seasonality, demographics and infection frequencies, allowing genuine pathogen interactions to be distinguished from simple correlations. We validated our framework using a broad range of synthetic data. We then applied it to diagnostic data available for five respiratory viruses co-circulating in a major urban population between 2005 and 2013: adenovirus, human coronavirus, human metapneumovirus, influenza B virus and respiratory syncytial virus. We found positive and negative covariances indicative of epidemiological interactions among specific virus pairs. This statistical framework enables a community ecology perspective to be applied to infectious disease epidemiology with important utility for public health planning and preparedness

    Estimation of temporal covariances in pathogen dynamics using Bayesian multivariate autoregressive models

    Get PDF
    It is well recognised that animal and plant pathogens form complex ecological communities of interacting organisms within their hosts, and there is growing interest in the health implications of such pathogen interactions. Although community ecology approaches have been used to identify pathogen interactions at the within-host scale, methodologies enabling robust identification of interactions from population-scale data such as that available from health authorities are lacking. To address this gap, we developed a statistical framework that jointly identifies interactions between multiple viruses from contemporaneous non-stationary infection time series. Our conceptual approach is derived from a Bayesian multivariate disease mapping framework. Importantly, our approach captures within- and between-year dependencies in infection risk while controlling for confounding factors such as seasonality, demographics and infection frequencies, allowing genuine pathogen interactions to be distinguished from simple correlations. We validated our framework using a broad range of synthetic data. We then applied it to diagnostic data available for five respiratory viruses co-circulating in a major urban population between 2005 and 2013: adenovirus, human coronavirus, human metapneumovirus, influenza B virus and respiratory syncytial virus. We found positive and negative covariances indicative of epidemiological interactions among specific virus pairs. This statistical framework enables a community ecology perspective to be applied to infectious disease epidemiology with important utility for public health planning and preparedness
    • 

    corecore