33 research outputs found
Decentralized Non-Convex Learning with Linearly Coupled Constraints
Motivated by the need for decentralized learning, this paper aims at
designing a distributed algorithm for solving nonconvex problems with general
linear constraints over a multi-agent network. In the considered problem, each
agent owns some local information and a local variable for jointly minimizing a
cost function, but local variables are coupled by linear constraints. Most of
the existing methods for such problems are only applicable for convex problems
or problems with specific linear constraints. There still lacks a distributed
algorithm for such problems with general linear constraints and under nonconvex
setting. In this paper, to tackle this problem, we propose a new algorithm,
called "proximal dual consensus" (PDC) algorithm, which combines a proximal
technique and a dual consensus method. We build the theoretical convergence
conditions and show that the proposed PDC algorithm can converge to an
-Karush-Kuhn-Tucker solution within
iterations. For computation reduction, the PDC algorithm can choose to perform
cheap gradient descent per iteration while preserving the same order of
iteration complexity. Numerical results are presented
to demonstrate the good performance of the proposed algorithms for solving a
regression problem and a classification problem over a network where agents
have only partial observations of data features
LawBench: Benchmarking Legal Knowledge of Large Language Models
Large language models (LLMs) have demonstrated strong capabilities in various
aspects. However, when applying them to the highly specialized, safe-critical
legal domain, it is unclear how much legal knowledge they possess and whether
they can reliably perform legal-related tasks. To address this gap, we propose
a comprehensive evaluation benchmark LawBench. LawBench has been meticulously
crafted to have precise assessment of the LLMs' legal capabilities from three
cognitive levels: (1) Legal knowledge memorization: whether LLMs can memorize
needed legal concepts, articles and facts; (2) Legal knowledge understanding:
whether LLMs can comprehend entities, events and relationships within legal
text; (3) Legal knowledge applying: whether LLMs can properly utilize their
legal knowledge and make necessary reasoning steps to solve realistic legal
tasks. LawBench contains 20 diverse tasks covering 5 task types: single-label
classification (SLC), multi-label classification (MLC), regression, extraction
and generation. We perform extensive evaluations of 51 LLMs on LawBench,
including 20 multilingual LLMs, 22 Chinese-oriented LLMs and 9 legal specific
LLMs. The results show that GPT-4 remains the best-performing LLM in the legal
domain, surpassing the others by a significant margin. While fine-tuning LLMs
on legal specific text brings certain improvements, we are still a long way
from obtaining usable and reliable LLMs in legal tasks. All data, model
predictions and evaluation code are released in
https://github.com/open-compass/LawBench/. We hope this benchmark provides
in-depth understanding of the LLMs' domain-specified capabilities and speed up
the development of LLMs in the legal domain
DomPep—A General Method for Predicting Modular Domain-Mediated Protein-Protein Interactions
Protein-protein interactions (PPIs) are frequently mediated by the binding of a modular domain in one protein to a short, linear peptide motif in its partner. The advent of proteomic methods such as peptide and protein arrays has led to the accumulation of a wealth of interaction data for modular interaction domains. Although several computational programs have been developed to predict modular domain-mediated PPI events, they are often restricted to a given domain type. We describe DomPep, a method that can potentially be used to predict PPIs mediated by any modular domains. DomPep combines proteomic data with sequence information to achieve high accuracy and high coverage in PPI prediction. Proteomic binding data were employed to determine a simple yet novel parameter Ligand-Binding Similarity which, in turn, is used to calibrate Domain Sequence Identity and Position-Weighted-Matrix distance, two parameters that are used in constructing prediction models. Moreover, DomPep can be used to predict PPIs for both domains with experimental binding data and those without. Using the PDZ and SH2 domain families as test cases, we show that DomPep can predict PPIs with accuracies superior to existing methods. To evaluate DomPep as a discovery tool, we deployed DomPep to identify interactions mediated by three human PDZ domains. Subsequent in-solution binding assays validated the high accuracy of DomPep in predicting authentic PPIs at the proteome scale. Because DomPep makes use of only interaction data and the primary sequence of a domain, it can be readily expanded to include other types of modular domains
Phosphorylation of Mcm2 modulates Mcm2–7 activity and affects the cell’s response to DNA damage
The S-phase kinase, DDK controls DNA replication through phosphorylation of the replicative helicase, Mcm2–7. We show that phosphorylation of Mcm2 at S164 and S170 is not essential for viability. However, the relevance of Mcm2 phosphorylation is demonstrated by the sensitivity of a strain containing alanine at these positions (mcm2AA) to methyl methanesulfonate (MMS) and caffeine. Consistent with a role for Mcm2 phosphorylation in response to DNA damage, the mcm2AA strain accumulates more RPA foci than wild type. An allele with the phosphomimetic mutations S164E and S170E (mcm2EE) suppresses the MMS and caffeine sensitivity caused by deficiencies in DDK function. In vitro, phosphorylation of Mcm2 or Mcm2EE reduces the helicase activity of Mcm2–7 while increasing DNA binding. The reduced helicase activity likely results from the increased DNA binding since relaxing DNA binding with salt restores helicase activity. The finding that the ATP site mutant mcm2K549R has higher DNA binding and less ATPase than mcm2EE, but like mcm2AA results in drug sensitivity, supports a model whereby a specific range of Mcm2–7 activity is required in response to MMS and caffeine. We propose that phosphorylation of Mcm2 fine-tunes the activity of Mcm2–7, which in turn modulates DNA replication in response to DNA damage
Identification and Characterization of a Leucine-Rich Repeat Kinase 2 (LRRK2) Consensus Phosphorylation Motif
Mutations in LRRK2 (leucine-rich repeat kinase 2) have been identified as major genetic determinants of Parkinson's disease (PD). The most prevalent mutation, G2019S, increases LRRK2's kinase activity, therefore understanding the sites and substrates that LRRK2 phosphorylates is critical to understanding its role in disease aetiology. Since the physiological substrates of this kinase are unknown, we set out to reveal potential targets of LRRK2 G2019S by identifying its favored phosphorylation motif. A non-biased screen of an oriented peptide library elucidated F/Y-x-T-x-R/K as the core dependent substrate sequence. Bioinformatic analysis of the consensus phosphorylation motif identified several novel candidate substrates that potentially function in neuronal pathophysiology. Peptides corresponding to the most PD relevant proteins were efficiently phosphorylated by LRRK2 in vitro. Interestingly, the phosphomotif was also identified within LRRK2 itself. Autophosphorylation was detected by mass spectrometry and biochemical means at the only F-x-T-x-R site (Thr 1410) within LRRK2. The relevance of this site was assessed by measuring effects of mutations on autophosphorylation, kinase activity, GTP binding, GTP hydrolysis, and LRRK2 multimerization. These studies indicate that modification of Thr1410 subtly regulates GTP hydrolysis by LRRK2, but with minimal effects on other parameters measured. Together the identification of LRRK2's phosphorylation consensus motif, and the functional consequences of its phosphorylation, provide insights into downstream LRRK2-signaling pathways
Computational Prediction and Experimental Verification of New MAP Kinase Docking Sites and Substrates Including Gli Transcription Factors
In order to fully understand protein kinase networks, new methods are needed to identify regulators and substrates of kinases, especially for weakly expressed proteins. Here we have developed a hybrid computational search algorithm that combines machine learning and expert knowledge to identify kinase docking sites, and used this algorithm to search the human genome for novel MAP kinase substrates and regulators focused on the JNK family of MAP kinases. Predictions were tested by peptide array followed by rigorous biochemical verification with in vitro binding and kinase assays on wild-type and mutant proteins. Using this procedure, we found new ‘D-site’ class docking sites in previously known JNK substrates (hnRNP-K, PPM1J/PP2Czeta), as well as new JNK-interacting proteins (MLL4, NEIL1). Finally, we identified new D-site-dependent MAPK substrates, including the hedgehog-regulated transcription factors Gli1 and Gli3, suggesting that a direct connection between MAP kinase and hedgehog signaling may occur at the level of these key regulators. These results demonstrate that a genome-wide search for MAP kinase docking sites can be used to find new docking sites and substrates
Structural prediction of the interaction of the tumor suppressor p27KIP1 with cyclin A/CDK2 identifies a novel catalytically relevant determinant
Federated Learning-Based Multi-Energy Load Forecasting Method Using CNN-Attention-LSTM Model
Integrated Energy Microgrid (IEM) has emerged as a critical energy utilization mechanism for alleviating environmental and economic pressures. As a part of demand-side energy prediction, multi-energy load forecasting is a vital precondition for the planning and operation scheduling of IEM. In order to increase data diversity and improve model generalization while protecting data privacy, this paper proposes a method that uses the CNN-Attention-LSTM model based on federated learning to forecast the multi-energy load of IEMs. CNN-Attention-LSTM is the global model for extracting features. Federated learning (FL) helps IEMs to train a forecasting model in a distributed manner without sharing local data. This paper examines the individual, central, and federated models with four federated learning strategies (FedAvg, FedAdagrad, FedYogi, and FedAdam). Moreover, considering that FL uses communication technology, the impact of false data injection attacks (FDIA) is also investigated. The results show that federated models can achieve an accuracy comparable to the central model while having a higher precision than individual models, and FedAdagrad has the best prediction performance. Furthermore, FedAdagrad can maintain stability when attacked by false data injection
Progress on China nuclear data processing code system
China is developing the nuclear data processing code Ruler, which can be used for producing multi-group cross sections and related quantities from evaluated nuclear data in the ENDF format [1]. The Ruler includes modules for reconstructing cross sections in all energy range, generating Doppler-broadened cross sections for given temperature, producing effective self-shielded cross sections in unresolved energy range, calculating scattering cross sections in thermal energy range, generating group cross sections and matrices, preparing WIMS-D format data files for the reactor physics code WIMS-D [2]. Programming language of the Ruler is Fortran-90. The Ruler is tested for 32-bit computers with Windows-XP and Linux operating systems. The verification of Ruler has been performed by comparison with calculation results obtained by the NJOY99 [3] processing code. The validation of Ruler has been performed by using WIMSD5B code