14,305 research outputs found
Beyond actions : exploring the discovery of tactics from user logs
Search log analysis has become a common practice to gain insights into user search behaviour; it helps gain an understanding of user needs and preferences, as well as an insight into how well a system supports such needs. Currently, log analysis is typically focused on low-level user actions, i.e. logged events such as issued queries and clicked results, and often only a selection of such events are logged and analysed. However, types of logged events may differ widely from interface to interface, making comparison between systems difficult. Further, the interpretation of the meaning of and subsequent analysis of a selection of events may lead to conclusions out of context—e.g. the statistics of observed query reformulations may be influenced by the existence of a relevance feedback component. Alternatively, in lab studies user activities can be analysed at a higher level, such as search tactics and strategies, abstracted away from detailed interface implementation. Unfortunately, until now the required manual codings that map logged events to higher-level interpretations have prevented large-scale use of this type of analysis. In this paper, we propose a new method for analysing search logs by (semi-)automatically identifying user search tactics from logged events, allowing large-scale analysis that is comparable across search systems. In addition, as the resulting analysis is at a tactical level we reduce potential issues surrounding the need for interpretation of low-level user actions for log analysis. We validate the efficiency and effectiveness of the proposed tactic identification method using logs of two reference search systems of different natures: a product search system and a video search system. With the identified tactics, we perform a series of novel log analyses in terms of entropy rate of user search tactic sequences, demonstrating how this type of analysis allows comparisons of user search behaviours across systems of different nature and design. This analysis provides insights not achievable with traditional log analysis
Pull request latency explained:an empirical overview
Pull request latency evaluation is an essential application of effort evaluation in the pull-based development scenario. It can help the reviewers sort the pull request queue, remind developers about the review processing time, speed up the review process and accelerate software development. There is a lack of work that systematically organizes the factors that affect pull request latency. Also, there is no related work discussing the differences and variations in characteristics in different scenarios and contexts. In this paper, we collected relevant factors through a literature review approach. Then we assessed their relative importance in five scenarios and six different contexts using the mixed-effects linear regression model. The most important factors differ in different scenarios. The length of the description is most important when pull requests are submitted. The existence of comments is most important when closing pull requests, using CI tools, and when the contributor and the integrator are different. When there exist comments, the latency of the first comment is the most important. Meanwhile, the influence of factors may change in different contexts. For example, the number of commits in a pull request has a more significant impact on pull request latency when closing than submitting due to changes in contributions brought about by the review process. Both human and bot comments are positively correlated with pull request latency. In contrast, the bot’s first comments are more strongly correlated with latency, but the number of comments is less correlated. Future research and tool implementation needs to consider the impact of different contexts. Researchers can conduct related studies based on our publicly available datasets and replication scripts
pkaPS: prediction of protein kinase A phosphorylation sites with the simplified kinase-substrate binding model
BACKGROUND: Protein kinase A (cAMP-dependent kinase, PKA) is a serine/threonine kinase, for which ca. 150 substrate proteins are known. Based on a refinement of the recognition motif using the available experimental data, we wished to apply the simplified substrate protein binding model for accurate prediction of PKA phosphorylation sites, an approach that was previously successful for the prediction of lipid posttranslational modifications and of the PTS1 peroxisomal translocation signal. RESULTS: Approximately 20 sequence positions flanking the phosphorylated residue on both sides have been found to be restricted in their sequence variability (region -18...+23 with the site at position 0). The conserved physical pattern can be rationalized in terms of a qualitative binding model with the catalytic cleft of the protein kinase A. Positions -6...+4 surrounding the phosphorylation site are influenced by direct interaction with the kinase in a varying degree. This sequence stretch is embedded in an intrinsically disordered region composed preferentially of hydrophilic residues with flexible backbone and small side chain. This knowledge has been incorporated into a simplified analytical model of productive binding of substrate proteins with PKA. CONCLUSION: The scoring function of the pkaPS predictor can confidently discriminate PKA phosphorylation sites from serines/threonines with non-permissive sequence environments (sensitivity of ~96% at a specificity of ~94%). The tool "pkaPS" has been applied on the whole human proteome. Among new predicted PKA targets, there are entirely uncharacterized protein groups as well as apparently well-known families such as those of the ribosomal proteins L21e, L22 and L6. AVAILABILITY: The supplementary data as well as the prediction tool as WWW server are available at . REVIEWERS: Erik van Nimwegen (Biozentrum, University of Basel, Switzerland), Sandor Pongor (International Centre for Genetic Engineering and Biotechnology, Trieste, Italy), Igor Zhulin (University of Tennessee, Oak Ridge National Laboratory, USA)
Towards Automated Circuit Discovery for Mechanistic Interpretability
Recent work in mechanistic interpretability has reverse-engineered nontrivial
behaviors of transformer models. These contributions required considerable
effort and researcher intuition, which makes it difficult to apply the same
methods to understand the complex behavior that current models display. At
their core however, the workflow for these discoveries is surprisingly similar.
Researchers create a data set and metric that elicit the desired model
behavior, subdivide the network into appropriate abstract units, replace
activations of those units to identify which are involved in the behavior, and
then interpret the functions that these units implement. By varying the data
set, metric, and units under investigation, researchers can understand the
functionality of each neural network region and the circuits they compose. This
work proposes a novel algorithm, Automatic Circuit DisCovery (ACDC), to
automate the identification of the important units in the network. Given a
model's computational graph, ACDC finds subgraphs that explain a behavior of
the model. ACDC was able to reproduce a previously identified circuit for
Python docstrings in a small transformer, identifying 6/7 important attention
heads that compose up to 3 layers deep, while including 91% fewer the
connections
Braid: Weaving Symbolic and Neural Knowledge into Coherent Logical Explanations
Traditional symbolic reasoning engines, while attractive for their precision
and explicability, have a few major drawbacks: the use of brittle inference
procedures that rely on exact matching (unification) of logical terms, an
inability to deal with uncertainty, and the need for a precompiled rule-base of
knowledge (the "knowledge acquisition" problem). To address these issues, we
devise a novel logical reasoner called Braid, that supports probabilistic
rules, and uses the notion of custom unification functions and dynamic rule
generation to overcome the brittle matching and knowledge-gap problem prevalent
in traditional reasoners. In this paper, we describe the reasoning algorithms
used in Braid, and their implementation in a distributed task-based framework
that builds proof/explanation graphs for an input query. We use a simple QA
example from a children's story to motivate Braid's design and explain how the
various components work together to produce a coherent logical explanation.
Finally, we evaluate Braid on the ROC Story Cloze test and achieve close to
state-of-the-art results while providing frame-based explanations.Comment: Accepted at AAAI-202
CBR and MBR techniques: review for an application in the emergencies domain
The purpose of this document is to provide an in-depth analysis of current reasoning engine practice and the integration strategies of Case Based Reasoning and Model Based Reasoning that will be used in the design and development of the RIMSAT system.
RIMSAT (Remote Intelligent Management Support and Training) is a European Commission funded project designed to:
a.. Provide an innovative, 'intelligent', knowledge based solution aimed at improving the quality of critical decisions
b.. Enhance the competencies and responsiveness of individuals and organisations involved in highly complex, safety critical incidents - irrespective of their location.
In other words, RIMSAT aims to design and implement a decision support system that using Case Base Reasoning as well as Model Base Reasoning technology is applied in the management of emergency situations.
This document is part of a deliverable for RIMSAT project, and although it has been done in close contact with the requirements of the project, it provides an overview wide enough for providing a state of the art in integration strategies between CBR and MBR technologies.Postprint (published version
- …