62 research outputs found

    Importance-aware Co-teaching for Offline Model-based Optimization

    Full text link
    Offline model-based optimization aims to find a design that maximizes a property of interest using only an offline dataset, with applications in robot, protein, and molecule design, among others. A prevalent approach is gradient ascent, where a proxy model is trained on the offline dataset and then used to optimize the design. This method suffers from an out-of-distribution issue, where the proxy is not accurate for unseen designs. To mitigate this issue, we explore using a pseudo-labeler to generate valuable data for fine-tuning the proxy. Specifically, we propose \textit{\textbf{I}mportance-aware \textbf{C}o-\textbf{T}eaching for Offline Model-based Optimization}~(\textbf{ICT}). This method maintains three symmetric proxies with their mean ensemble as the final proxy, and comprises two steps. The first step is \textit{pseudo-label-driven co-teaching}. In this step, one proxy is iteratively selected as the pseudo-labeler for designs near the current optimization point, generating pseudo-labeled data. Subsequently, a co-teaching process identifies small-loss samples as valuable data and exchanges them between the other two proxies for fine-tuning, promoting knowledge transfer. This procedure is repeated three times, with a different proxy chosen as the pseudo-labeler each time, ultimately enhancing the ensemble performance. To further improve accuracy of pseudo-labels, we perform a secondary step of \textit{meta-learning-based sample reweighting}, which assigns importance weights to samples in the pseudo-labeled dataset and updates them via meta-learning. ICT achieves state-of-the-art results across multiple design-bench tasks, achieving the best mean rank of 3.13.1 and median rank of 22, among 1515 methods. Our source code can be found here.Comment: Accepted by NeurIPS 202

    Parallel-mentoring for Offline Model-based Optimization

    Full text link
    We study offline model-based optimization to maximize a black-box objective function with a static dataset of designs and scores. These designs encompass a variety of domains, including materials, robots and DNA sequences. A common approach trains a proxy on the static dataset to approximate the black-box objective function and performs gradient ascent to obtain new designs. However, this often results in poor designs due to the proxy inaccuracies for out-of-distribution designs. Recent studies indicate that: (a) gradient ascent with a mean ensemble of proxies generally outperforms simple gradient ascent, and (b) a trained proxy provides weak ranking supervision signals for design selection. Motivated by (a) and (b), we propose \textit{parallel-mentoring} as an effective and novel method that facilitates mentoring among parallel proxies, creating a more robust ensemble to mitigate the out-of-distribution issue. We focus on the three-proxy case and our method consists of two modules. The first module, \textit{voting-based pairwise supervision}, operates on three parallel proxies and captures their ranking supervision signals as pairwise comparison labels. These labels are combined through majority voting to generate consensus labels, which incorporate ranking supervision signals from all proxies and enable mutual mentoring. However, label noise arises due to possible incorrect consensus. To alleviate this, we introduce an \textit{adaptive soft-labeling} module with soft-labels initialized as consensus labels. Based on bi-level optimization, this module fine-tunes proxies in the inner level and learns more accurate labels in the outer level to adaptively mentor proxies, resulting in a more robust ensemble. Experiments validate the effectiveness of our method. Our code is available here.Comment: Accepted by NeurIPS 202

    Gradient-based Bi-level Optimization for Deep Learning: A Survey

    Full text link
    Bi-level optimization, especially the gradient-based category, has been widely used in the deep learning community including hyperparameter optimization and meta-knowledge extraction. Bi-level optimization embeds one problem within another and the gradient-based category solves the outer-level task by computing the hypergradient, which is much more efficient than classical methods such as the evolutionary algorithm. In this survey, we first give a formal definition of the gradient-based bi-level optimization. Next, we delineate criteria to determine if a research problem is apt for bi-level optimization and provide a practical guide on structuring such problems into a bi-level optimization framework, a feature particularly beneficial for those new to this domain. More specifically, there are two formulations: the single-task formulation to optimize hyperparameters such as regularization parameters and the distilled data, and the multi-task formulation to extract meta-knowledge such as the model initialization. With a bi-level formulation, we then discuss four bi-level optimization solvers to update the outer variable including explicit gradient update, proxy update, implicit function update, and closed-form update. Finally, we wrap up the survey by highlighting two prospective future directions: (1) Effective Data Optimization for Science examined through the lens of task formulation. (2) Accurate Explicit Proxy Update analyzed from an optimization standpoint.Comment: AI4Science; Bi-level Optimization; Hyperparameter Optimization; Meta Learning; Implicit Functio

    Interferon regulatory factor 2 binding protein 2b regulates neutrophil versus macrophage fate during zebrafish definitive myelopoiesis

    Get PDF
    International audienceInterferon regulatory factor 2 binding protein 2b regulates neutrophil versus macrophage fate during zebrafish definitive myelopoiesis

    GLM-130B: An Open Bilingual Pre-trained Model

    Full text link
    We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters. It is an attempt to open-source a 100B-scale model at least as good as GPT-3 (davinci) and unveil how models of such a scale can be successfully pre-trained. Over the course of this effort, we face numerous unexpected technical and engineering challenges, particularly on loss spikes and divergence. In this paper, we introduce the training process of GLM-130B including its design choices, training strategies for both efficiency and stability, and engineering efforts. The resultant GLM-130B model offers significant outperformance over GPT-3 175B (davinci) on a wide range of popular English benchmarks while the performance advantage is not observed in OPT-175B and BLOOM-176B. It also consistently and significantly outperforms ERNIE TITAN 3.0 260B -- the largest Chinese language model -- across related benchmarks. Finally, we leverage a unique scaling property of GLM-130B to reach INT4 quantization without post training, with almost no performance loss, making it the first among 100B-scale models and more importantly, allowing its effective inference on 4Ă—\timesRTX 3090 (24G) or 8Ă—\timesRTX 2080 Ti (11G) GPUs, the most affordable GPUs required for using 100B-scale models. The GLM-130B model weights are publicly accessible and its code, training logs, related toolkit, and lessons learned are open-sourced at \url{https://github.com/THUDM/GLM-130B/}.Comment: Accepted to ICLR 202

    Robust estimation of bacterial cell count from optical density

    Get PDF
    Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data

    Handwritten Notes Vs Typed Notes

    No full text
    Researchers have had discussions on whether handwritten notes is better than typed notes. We had experiences of professors banning electronics in class and showing strong support on the form of written notes. Prior studies had attempted to use different experiments in order to test the the quality of both; researchers share different views on this matter, most support written notes; Researchers who support written notes think “writing by hand strengthens the learning process. When typing on a keyboard, this process may be impaired (Science Daily)” and that “students who took notes on laptops performed worse on conceptual questions than students who took notes longhand (Muller and Oppenheimer).” The researcher who support typed notes thinks “Typing notes produced higher retention scores than handwriting notes…; typing as a method of note-taking may be an influential factor in memory retention, particularly in a lecture context. (Ian Schoen).” We asked MTH 165 and MTH 141 students to provide information on their style and their opinion on their performances in classes in order to help us understand the issue. The question was “Is hand written notes more effective than typed notes for helping undergraduate students to retain information during lectures?” We hypothesised that “Hand-written notes is more effective than typed notes in terms of information retainment.” In order to get insights, we sent out an anonymous survey asking four questions to students in our study groups (the survey was anonymous): ● What kind of method do you use to take lecture notes? ● How well do you think you are retaining the information from the lecture. ● How well are you doing in this class? ● Do you want to change your note-taking method? If so, why? Thirty-two people responded to the survey. None of the responders used typed notes. We found that and most students think they retain the information in lecture well – 56.3%; Also, most students are doing well in their classes – 59.4%. As predicted by our hypothesis, those who take notes by hand retain lecture materials well. Also most students do not want to change their note-taking method. The main flaw on this project is that we did not predict it is almost impossible for math students to type notes, due to all the complicated diagrams, so we failed to get samples of students who do type notes, and missed out on the insights. We think we could have done better by making the sample size bigger, asking students of all disciplines to share what they do in class. References Schoen, Ian, "Effects of Method and Context of Note-taking on Memory: Handwriting versus Typing in Lecture and Textbook Reading Contexts" (2012). Pitzer Senior Theses. Paper 20. http://scholarship.claremont.edu/pitzer_theses/20 (n.d.). Retrieved December 04, 2017, from https://www.sciencedaily.com/releases/2011/01/110119095458.htm Mueller, P. A., & Oppenheimer, D. M. (2014). The Pen Is Mightier Than the Keyboard. Psychological Science, 25(6), 1159-1168. doi:10.1177/095679761452458
    • …
    corecore