42 research outputs found

    Orometric methods in bounded metric data

    Get PDF
    A large amount of data accommodated in knowledge graphs (KG) is metric. For example, the Wikidata KG contains a plenitude of metric facts about geographic entities like cities or celestial objects. In this paper, we propose a novel approach that transfers orometric (topographic) measures to bounded metric spaces. While these methods were originally designed to identify relevant mountain peaks on the surface of the earth, we demonstrate a notion to use them for metric data sets in general. Notably, metric sets of items enclosed in knowledge graphs. Based on this we present a method for identifying outstanding items using the transferred valuations functions isolation and prominence. Building up on this we imagine an item recommendation process. To demonstrate the relevance of the valuations for such processes, we evaluate the usefulness of isolation and prominence empirically in a machine learning setting. In particular, we find structurally relevant items in the geographic population distributions of Germany and France. © 2020, The Author(s)

    Stream-based active learning for sliding windows under the influence of verification latency

    Get PDF
    Stream-based active learning (AL) strategies minimize the labeling effort by querying labels that improve the classifier’s performance the most. So far, these strategies neglect the fact that an oracle or expert requires time to provide a queried label. We show that existing AL methods deteriorate or even fail under the influence of such verification latency. The problem with these methods is that they estimate a label’s utility on the currently available labeled data. However, when this label would arrive, some of the current data may have gotten outdated and new labels have arrived. In this article, we propose to simulate the available data at the time when the label would arrive. Therefore, our method Forgetting and Simulating (FS) forgets outdated information and simulates the delayed labels to get more realistic utility estimates. We assume to know the label’s arrival date a priori and the classifier’s training data to be bounded by a sliding window. Our extensive experiments show that FS improves stream-based AL strategies in settings with both, constant and variable verification latency

    Active Selection of Classification Features

    Full text link
    Some data analysis applications comprise datasets, where explanatory variables are expensive or tedious to acquire, but auxiliary data are readily available and might help to construct an insightful training set. An example is neuroimaging research on mental disorders, specifically learning a diagnosis/prognosis model based on variables derived from expensive Magnetic Resonance Imaging (MRI) scans, which often requires large sample sizes. Auxiliary data, such as demographics, might help in selecting a smaller sample that comprises the individuals with the most informative MRI scans. In active learning literature, this problem has not yet been studied, despite promising results in related problem settings that concern the selection of instances or instance-feature pairs. Therefore, we formulate this complementary problem of Active Selection of Classification Features (ASCF): Given a primary task, which requires to learn a model f: x-> y to explain/predict the relationship between an expensive-to-acquire set of variables x and a class label y. Then, the ASCF-task is to use a set of readily available selection variables z to select these instances, that will improve the primary task's performance most when acquiring their expensive features z and including them to the primary training set. We propose two utility-based approaches for this problem, and evaluate their performance on three public real-world benchmark datasets. In addition, we illustrate the use of these approaches to efficiently acquire MRI scans in the context of neuroimaging research on mental disorders, based on a simulated study design with real MRI data.Comment: Accepted for publication at the 19th Intelligent Data Analysis Symposium, 2021. The final authenticated publication will be made available online at springer.co

    The sialic acid binding activity of the S protein facilitates infection by porcine transmissible gastroenteritis coronavirus

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Transmissible gastroenteritis virus (TGEV) has a sialic acid binding activity that is believed to be important for enteropathogenicity, but that has so far appeared to be dispensable for infection of cultured cells. The aims of this study were to determine the effect of sialic acid binding for the infection of cultured cells under unfavorable conditions, and comparison of TGEV strains and mutants, as well as the avian coronavirus IBV concerning their dependence on the sialic acid binding activity.</p> <p>Methods</p> <p>The infectivity of different viruses was analyzed by a plaque assay after adsorption times of 5, 20, and 60 min. Prior to infection, cultured cells were either treated with neuraminidase to deplete sialic acids from the cell surface, or mock-treated. In a second approach, pre-treatment of the virus with porcine intestinal mucin was performed, followed by the plaque assay after a 5 min adsorption time. A student's t-test was used to verify the significance of the results.</p> <p>Results</p> <p>Desialylation of cells only had a minor effect on the infection by TGEV strain Purdue 46 when an adsorption period of 60 min was allowed for initiation of infection. However, when the adsorption time was reduced to 5 min the infectivity on desialylated cells decreased by more than 60%. A TGEV PUR46 mutant (HAD3) deficient in sialic acid binding showed a 77% lower titer than the parental virus after a 5 min adsorption time. After an adsorption time of 60 min the titer of HAD3 was 58% lower than that of TGEV PUR46. Another TGEV strain, TGEV Miller, and IBV Beaudette showed a reduction in infectivity after neuraminidase treatment of the cultured cells irrespective of the virion adsorption time.</p> <p>Conclusions</p> <p>Our results suggest that the sialic acid binding activity facilitates the infection by TGEV under unfavorable environmental conditions. The dependence on the sialic acid binding activity for an efficient infection differs in the analyzed TGEV strains.</p

    Beyond Adaptation: Understanding Distributional Changes (Dagstuhl Seminar 20372)

    No full text
    This report documents the program and the outcomes of Dagstuhl Seminar 20372 "Beyond Adaptation: Understanding Distributional Changes". It was centered around the aim to establish a better understanding of the causes, nature and consequences of distributional changes. Four key research questions were identified and discussed in during the seminar. These were the practical relevance of different scenarios and types of change, the modelling of change, the detection and measuring of change, and the adaptation to change. The seminar brought together participants from several distinct communities in which parts of these questions are already studied, albeit in separate lines of research. These included data stream mining, where the focus is on concept drift detection and adaptation, transfer learning and domain adaptation in machine learning and algorithmic learning theory, change point detection in statistics, and the evolving and adaptive systems community. Therefore, this seminar contributed to stimulate research towards a thorough understanding of distributional changes

    ACE - A Novel Approach for the Statistical Analysis of Pairwise Connectivity

    Get PDF
    Analysing correlations between streams of events is an important problem. It arises for example in Neurosciences, when the connectivity of neurons should be inferred from spike trains that record neurons' individual spiking activity. While recently some approaches for inferring delayed synaptic connections have been proposed, they are limited in the types of connectivities and delays they are able to handle, or require computation-intensive procedures. This paper proposes a faster and more flexible approach for analysing such delayed correlated activity: a statistical approach for the Analysis of Connectivity in spiking Events (ACE), based on the idea of hypothesis testing. It first computes for any pair of a source and a target neuron the inter-spike delays between subsequent source- and target-spikes. Then, it derives a null model for the distribution of inter-spike delays for \emph{uncorrelated}~neurons. Finally, it compares the observed distribution of inter-spike delays to this null model and infers pairwise connectivity based on the Pearson's Chi-squared test statistic. Thus, ACE is capable to detect connections with a priori unknown, non-discrete (and potentially large) inter-spike delays, which might vary between pairs of neurons. Since ACE works incrementally, it has potential for being used in online processing. In our experiments, we visualise the advantages of ACE in varying experimental scenarios (except for one special case) and in a state-of-the-art dataset which has been generated for neuro-scientific research under most realistic conditions
    corecore