87 research outputs found

    Using Permuted States of Validated Simulation to Analyze Conflict Rates in Optimistic Replication

    Get PDF
    Optimistic replication provides high data availability in the presence of network outages. Although widely deployed, this relaxed consistency model introduces concurrent updates, whose behavior is poorly understood due to the vast state space. This paper introduces the notion of permuted states to eliminate system states that are redundant and unreachable, which can constitute the majority of states (4069 out of 4096 for four replicas). With the aid of permuted states, we are for the first time able to construct analytical models beyond the two-replica case. By examining the analysis for 2 to 4 replicas, we can demystify the process of forming identical conflicts—the most common conflict type at high replication factors. Additionally, we have automated and optimized the generation of permuted states, which allows us to explore higher replication factors (up to 10 replicas) using hybrid techniques. It also allows us to validate our results with existing simulations based on actual replication mechanisms, which previously were analytically validated with only one pair of replicas. Finally, we have discovered that update locality and bimodal access patterns are the primary factors contributing to the formation of identical conflicts

    Evaluation of clustering results and novel cluster algorithms

    Get PDF
    Cluster analysis is frequently performed in many application fields to find groups in data. For example, in medicine, researchers have used gene expression data to cluster patients suffering from a particular disease (e.g., breast cancer), in order to detect new disease subtypes. Many cluster algorithms and methods for cluster validation, i.e., methods for evaluating the quality of cluster analysis results, have been proposed in the literature. However, open questions about the evaluation of both clustering results and novel cluster algorithms remain. It has rarely been discussed whether a) interesting clustering results or b) promising performance evaluations of newly presented cluster algorithms might be over-optimistic, in the sense that these good results cannot be replicated on new data or in other settings. Such questions are relevant in light of the so-called "replication crisis"; in various research disciplines such as medicine, biology, psychology, and economics, many results have turned out to be non-replicable, casting doubt on the trustworthiness and reliability of scientific findings. This crisis has led to increasing popularity of "metascience". Metascientific studies analyze problems that have contributed to the replication crisis (e.g., questionable research practices), and propose and evaluate possible solutions. So far, metascientific studies have mainly focused on issues related to significance testing. In contrast, this dissertation addresses the reliability of a) clustering results in applied research and b) results concerning newly presented cluster algorithms in the methodological literature. Different aspects of this topic are discussed in three Contributions. The first Contribution presents a framework for validating clustering results on validation data. Using validation data is vital to examine the replicability and generalizability of results. While applied researchers sometimes use validation data to check their clustering results, our article is the first to review the different approaches in the literature and to structure them in a systematic manner. We demonstrate that many classical cluster validation techniques, such as internal and external validation, can be combined with validation data. Our framework provides guidance to applied researchers who wish to evaluate their own clustering results or the results of other teams on new data. The second Contribution applies the framework from Contribution 1 to quantify over-optimistic bias in the context of a specific application field, namely unsupervised microbiome research. We analyze over-optimism effects which result from the multiplicity of analysis strategies for cluster analysis and network learning. The plethora of possible analysis strategies poses a challenge for researchers who are often uncertain about which method to use. Researchers might be tempted to try different methods on their dataset and look for the method yielding the "best" result. If only the "best" result is selectively reported, this may cause "overfitting" of the method to the dataset and the result might not be replicable on validation data. We quantify such over-optimism effects for four illustrative types of unsupervised research tasks (clustering of bacterial genera, hub detection in microbial association networks, differential network analysis, and clustering of samples). Contributions 1 and 2 consider the evaluation of clustering results and thus adopt a metascientific perspective on applied research. In contrast, the third Contribution is a metascientific study about methodological research on the development of new cluster algorithms. This Contribution analyzes the over-optimistic evaluation and reporting of novel cluster algorithms. As an illustrative example, we consider the recently proposed cluster algorithm "Rock"; initially deemed promising, it later turned out to be not generally better than its competitors. We demonstrate how Rock can nevertheless appear to outperform competitors via optimization of the evaluation design, namely the used data types, data characteristics, the algorithm’s parameters, and the choice of competing algorithms. The study is a cautionary tale that illustrates how easy it can be for researchers to claim apparent "superiority" of a new cluster algorithm. This, in turn, stresses the importance of strategies for avoiding the problems of over-optimism, such as neutral benchmark studies

    Analysis of recombination in molecular sequence data

    Get PDF
    We present the new and fast method Recco for analyzing a multiple alignment regarding recombination. Recco is based on a dynamic program that explains one sequence in the alignment with the other sequences using mutation and recombination. The dynamic program allows for an intuitive visualization of the optimal solution and also introduces a parameter α controlling the number of recombinations in the solution. Recco performs a parametric analysis regarding α and orders all pareto-optimal solutions by increasing number of recombinations. α is also directly related to the Savings value, a quantitative and intuitive measure for the preference of recombination in the solution. The Savings value and the solutions have a simple interpretation regarding the ancestry of the sequences in the alignment and it is usually easy to understand the output of the method. The distribution of the Savings value for non-recombining alignments is estimated by processing column permutations of the alignment and p-values are provided for recombination in the alignment, in a sequence and at a breakpoint position. Recco also uses the p-values to suggest a single solution, or recombinant structure, for the explained sequence. Recco is validated on a large set of simulated alignments and has a recombination detection performance superior to all current methods. The analysis of real alignments confirmed that Recco is among the best methods for recombination analysis and further supported that Recco is very intuitive compared to other methods.Wir prĂ€sentieren Recco, eine neue und schnelle Methode zur Analyse von Rekombinationen in multiplen Alignments. Recco basiert auf einem dynamischen Programm, welches eine Sequenz im Alignment durch die anderen Sequenzen im Alignment rekonstruiert, wobei die Operatoren Mutation und Rekombination erlaubt sind. Das dynamische Programm ermöglicht eine intuitive Visualisierung der optimalen Lösung und besitzt einen Parameter α, welcher die Anzahl der Rekombinationsereignisse in der optimalen Lösung steuert. Recco fĂŒhrt eine parametrische Analyse bezĂŒglich des Parameters α durch, so dass alle pareto-optimalen Lösungen nach der Anzahl ihrer Rekombinationsereignisse sortiert werden können. α steht auch direkt in Beziehung mit dem sogenannten Savings-Wert, der die Neigung zum EinfĂŒgen von Rekombinationsereignissen in die optimale Lösung quantitativ und intuitiv bemisst. Der Savings-Wert und die optimalen Lösungen haben eine einfache Interpretation bezĂŒglich der Historie der Sequenzen im Alignment, so dass es in der Regel leicht fĂ€llt, die Ausgabe von Recco zu verstehen. Recco schĂ€tzt die Verteilung des Savings-Werts fĂŒr Alignments ohne Rekombinationen durch einen Permutationstest, der auf Spaltenpermutationen basiert. Dieses Verfahren resultiert in p-Werten fĂŒr Rekombination im Alignment, in einer Sequenz und an jeder Position im Alignment. Basierend auf diesen p-Werten schlĂ€gt Recco eine optimale Lösung vor, als SchĂ€tzer fĂŒr die rekombinante Struktur der erklĂ€rten Sequenz. Recco wurde auf einem großen Datensatz simulierter Alignments getestet und erzielte auf diesem Datensatz eine bessere VorhersagegĂŒte in Bezug auf das Erkennen von Alignments mit Rekombination als alle anderen aktuellen Verfahren. Die Analyse von realen DatensĂ€tzen bestĂ€tigte, dass Recco zu den besten Methoden fĂŒr die Rekombinationsanalyse gehört und im Vergleich zu anderen Methoden oft leichter verstĂ€ndliche Resultate liefert

    Performance mapping of a class of fully decoupled architecture

    Get PDF

    Timely and reliable evaluation of the effects of interventions: a framework for adaptive meta-analysis (FAME)

    Get PDF
    Most systematic reviews are retrospective and use aggregate data AD) from publications, meaning they can be unreliable, lag behind therapeutic developments and fail to influence ongoing or new trials. Commonly, the potential influence of unpublished or ongoing trials is overlooked when interpreting results, or determining the value of updating the meta-analysis or need to collect individual participant data (IPD). Therefore, we developed a Framework for Adaptive Metaanalysis (FAME) to determine prospectively the earliest opportunity for reliable AD meta-analysis. We illustrate FAME using two systematic reviews in men with metastatic (M1) and non-metastatic (M0)hormone-sensitive prostate cancer (HSPC)

    Neural foundations of cooperative social interactions

    Get PDF
    The embodied-embedded-enactive-extended (4E) approach to study cognition suggests that interaction with the world is a crucial component of our cognitive processes. Most of our time, we interact with other people. Therefore, studying cognition without interaction is incomplete. Until recently, social neuroscience has only focused on studying isolated human and animal brains, leaving interaction unexplored. To fill this gap, we studied interacting participants, focusing on both intra- and inter-brain (hyperscanning) neural activity. In the first study, we invited dyads to perform a visual task in both a cooperative and a competitive context while we measured EEG. We found that mid-frontal activity around 200-300 ms after receiving monetary rewards was sensitive to social context and differed between cooperative and competitive situations. In the second study, we asked participants to coordinate their movements with each other and with a robotic partner. We found significantly stronger EEG amplitudes at frontocentral electrodes when people interacted with a robotic partner. Lastly, we performed a comprehensive literature review and the first meta-analysis in the emerging field of hyperscanning that validated it as a method to study social interaction. Taken together, our results showed that adding a second participant (human or AI/robotic) fostered our understanding of human cognition. We learned that the activity at frontocentral electrodes is sensitive to social context and type of partner (human or robotic). In both studies, the participants’ interaction was required to show these novel neural processes involved in action monitoring. Similarly, studying inter-brain neural activity allows for the exploration of new aspects of cognition. Many cognitive functions involved in successful social interactions are accompanied by neural synchrony between brains, suggesting the extended form of our cognition

    Using Permuted States and Validated Simulation to Analyze Conflict Rates in Optimistic Replication

    No full text
    Optimistic replication provides high data availability in the presence of network outages. Although widely deployed, this relaxed consistency model introduces concurrent updates, whose behavior is poorly understood due to the vast state space. This paper introduces the notion of permuted states to eliminate system states that are redundant and unreachable, which can constitute the majority of states (4069 out of 4096 for four replicas). With the aid of permuted states, we are for the first time able to construct analytical models beyond the two-replica case. By examining the analysis for 2 to 4 replicas, we can demystify the process of forming identical conflicts—the most common conflict type at high replication factors. Additionally, we have automated and optimized the generation of permuted states, which allows us to explore higher replication factors (up to 10 replicas) using hybrid techniques. It also allows us to validate our results with existing simulations based on actual replication mechanisms, which previously were analytically validated with only one pair of replicas. Finally, we have discovered that update locality and bimodal access patterns are the primary factors contributing to the formation of identical conflicts. 1
    • 

    corecore