26 research outputs found

    On the complexity of undominated core and farsighted solution concepts in coalition games

    Get PDF
    ABSTRACT In this paper, we study the computational complexity of solution concepts in the context of coalitional games. Firstly, we distinguish two different kinds of core, the undominated core and excess core, and investigate the difference and relationship between them. Secondly, we thoroughly investigate the computational complexity of undominated core and three farsighted solution concepts-farsighted core, farsighted stable set and largest consistent set

    An exploration strategy for non-stationary opponents

    Get PDF
    The success or failure of any learning algorithm is partially due to the exploration strategy it exerts. However, most exploration strategies assume that the environment is stationary and non-strategic. In this work we shed light on how to design exploration strategies in non-stationary and adversarial environments. Our proposed adversarial drift exploration (DE) is able to efficiently explore the state space while keeping track of regions of the environment that have changed. This proposed exploration is general enough to be applied in single agent non-stationary environments as well as in multiagent settings where the opponent changes its strategy in time. We use a two agent strategic interaction setting to test this new type of exploration, where the opponent switches between different behavioral patterns to emulate a non-deterministic, stochastic and adversarial environment. The agent’s objective is to learn a model of the opponent’s strategy to act optimally. Our contribution is twofold. First, we present DE as a strategy for switch detection. Second, we propose a new algorithm called R-max# for learning and planning against non-stationary opponent. To handle such opponents, R-max# reasons and acts in terms of two objectives: (1) to maximize utilities in the short term while learning and (2) eventually explore opponent behavioral changes. We provide theoretical results showing that R-max# is guaranteed to detect the opponent’s switch and learn a new model in terms of finite sample complexity. R-max# makes efficient use of exploration experiences, which results in rapid adaptation and efficient DE, to deal with the non-stationary nature of the opponent. We show experimentally how using DE outperforms the state of the art algorithms that were explicitly designed for modeling opponents (in terms average rewards) in two complimentary domains

    Efficiently detecting switches against non-stationary opponents

    Get PDF
    Interactions in multiagent systems are generally more complicated than single agent ones. Game theory provides solutions on how to act in multiagent scenarios; however, it assumes that all agents will act rationally. Moreover, some works also assume the opponent will use a stationary strategy. These assumptions usually do not hold in real world scenarios where agents have limited capacities and may deviate from a perfect rational response. Our goal is still to act optimally in these cases by learning the appropriate response and without any prior policies on how to act. Thus, we focus on the problem when another agent in the environment uses different stationary strategies over time. This will turn the problem into learning in a non-stationary environment, posing a problem for most learning algorithms. This paper introduces DriftER, an algorithm that (1) learns a model of the opponent, (2) uses that to obtain an optimal policy and then (3) determines when it must re-learn due to an opponent strategy change. We provide theoretical results showing that DriftER guarantees to detect switches with high probability. Also, we provide empirical results showing that our approach outperforms state of the art algorithms, in normal form games such as prisoner’s dilemma and then in a more realistic scenario, the Power TAC simulator

    Medicinal chemistry strategies towards the development of non-covalent SARS-CoV-2 Mpro inhibitors

    Get PDF
    The main protease (Mpro) of SARS-CoV-2 is an attractive target in anti-COVID-19 therapy for its high conservation and major role in the virus life cycle. The covalent Mpro inhibitor nirmatrelvir (in combination with ritonavir, a pharmacokinetic enhancer) and the non-covalent inhibitor ensitrelvir have shown efficacy in clinical trials and have been approved for therapeutic use. Effective antiviral drugs are needed to fight the pandemic, while non-covalent Mpro inhibitors could be promising alternatives due to their high selectivity and favorable druggability. Numerous non-covalent Mpro inhibitors with desirable properties have been developed based on available crystal structures of Mpro. In this article, we describe medicinal chemistry strategies applied for the discovery and optimization of non-covalent Mpro inhibitors, followed by a general overview and critical analysis of the available information. Prospective viewpoints and insights into current strategies for the development of non-covalent Mpro inhibitors are also discussed.We gratefully acknowledge financial support from Major Basic Research Project of Shandong Provincial Natural Science Foundation (ZR2021ZD17, China), Science Foundation for Outstanding Young Scholars of Shandong Province (ZR2020JQ31, China), Foreign Cultural and Educational Experts Project (GXL20200015001, China), Guangdong Basic and Applied Basic Research Foundation (2021A1515110740, China), China Postdoctoral Science Foundation (2021M702003). This work was supported in part by the Ministry of Science and Innovation of Spain through grant PID2019-104176RB-I00/AEI/10.13039/501100011033 awarded to Luis Menéndez-Arias; An institutional grant of the Fundación Ramón Areces (Madrid, Spain) to the CBMSO is also acknowledged.Peer reviewe

    Robust estimation of bacterial cell count from optical density

    Get PDF
    Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    POLICY ADVICE, NON-CONVEX AND DISTRIBUTED OPTIMIZATION IN REINFORCEMENT LEARNING

    No full text
    Transfer learning is a method in machine learning that tries to use previous trainingknowledge to speed up the learning process. Policy advice is a type of transfer learningmethod where a student agent is able to learn faster via advice from a teacheragent. Here, the agent who provides advice (actions) is called the teacher agent. Theagent who receives advice (actions) is the student agent. However, both this andother current reinforcement learning transfer methods have little theoretical analysis.This dissertation formally denes a setting where multiple teacher agents can provideadvice to a student and introduces an algorithm to leverage both autonomous explorationand the teacher's advice. Regret bounds are provided and negative transfer isformally dened and studied.On the other hand, policy search is a class of reinforcement learning algorithmsfor nding optimal policies to control problems with limited feedback. These methodshave shown successful applications in high-dimensional problems, such as roboticscontrol. Though successful, current methods can lead to unsafe policy parametersdamaging hardware units. Motivated by such constraints, Bhatnagar et al. and othersproposed projection based methods for safe policies [8]. These methods, however,can only handle convex policy constraints. In this dissertation, we contribute therst safe policy search reinforcement learner capable of operating under non-convexpolicy constraints. This is achieved by observing a connection between non-convexvariational inequalities and policy search problems. We provide two algorithms, i.e.,Mann and two-step iteration, to solve the above and prove convergence in the nonconvexstochastic setting.Lastly, lifelong reinforcement learning is a framework similar to transfer learningthat allows agents to learn multiple consecutive tasks sequentially online. Currentmethods, however, suer from scalability issues when the agent has to solve a largenumber of tasks. In this dissertation, we remedy the above drawbacks and proposea novel scalable technique for lifelong reinforcement learning. We derive an algorithmwhich assumes the availability of multiple processing units and computes sharedrepositories and local policies using only local information exchange

    Accelerating Ranking in E-Commerce Search Engines through Contextual Factor Selection

    No full text
    In large-scale search systems, the quality of the ranking results is continually improved with the introduction of more factors from complex procedures. Meanwhile, the increase in factors demands more computation resources and increases system response latency. It has been observed that, under some certain context a search instance may require only a small set of useful factors instead of all factors in order to return high quality results. Therefore, removing ineffective factors accordingly can significantly improve system efficiency. In this paper, we report our experience incorporating our Contextual Factor Selection (CFS) approach into the Taobao e-commerce platform to optimize the selection of factors based on the context of each search query in order to simultaneously achieve high quality search results while significantly reducing latency time. This problem is treated as a combinatorial optimization problem which can be tackled through a sequential decision-making procedure. The problem can be efficiently solved by CFS through a deep reinforcement learning method with reward shaping to address the problems of reward signal scarcity and wide reward signal distribution in real-world search engines. Through extensive off-line experiments based on data from the Taobao.com platform, CFS is shown to significantly outperform state-of-the-art approaches. Online deployment on Taobao.com demonstrated that CFS is able to reduce average search latency time by more than 40% compared to the previous approach with negligible reduction in search result quality. Under peak usage during the Single's Day Shopping Festival (November 11th) in 2017, CFS reduced peak load search latency time by 33% compared to the previous approach, helping Taobao.com achieve 40% higher revenue than the same period during 2016.   Corrigendum The spelling of coauthor Yusen Zan in the paper "Accelerating Ranking in E-Commerce Search Engines through Contextual Factor Selection" has been changed from Zan to Zhan. The original spelling was a typographical error.&nbsp
    corecore