62 research outputs found
Importance-aware Co-teaching for Offline Model-based Optimization
Offline model-based optimization aims to find a design that maximizes a
property of interest using only an offline dataset, with applications in robot,
protein, and molecule design, among others. A prevalent approach is gradient
ascent, where a proxy model is trained on the offline dataset and then used to
optimize the design. This method suffers from an out-of-distribution issue,
where the proxy is not accurate for unseen designs. To mitigate this issue, we
explore using a pseudo-labeler to generate valuable data for fine-tuning the
proxy. Specifically, we propose \textit{\textbf{I}mportance-aware
\textbf{C}o-\textbf{T}eaching for Offline Model-based
Optimization}~(\textbf{ICT}). This method maintains three symmetric proxies
with their mean ensemble as the final proxy, and comprises two steps. The first
step is \textit{pseudo-label-driven co-teaching}. In this step, one proxy is
iteratively selected as the pseudo-labeler for designs near the current
optimization point, generating pseudo-labeled data. Subsequently, a co-teaching
process identifies small-loss samples as valuable data and exchanges them
between the other two proxies for fine-tuning, promoting knowledge transfer.
This procedure is repeated three times, with a different proxy chosen as the
pseudo-labeler each time, ultimately enhancing the ensemble performance. To
further improve accuracy of pseudo-labels, we perform a secondary step of
\textit{meta-learning-based sample reweighting}, which assigns importance
weights to samples in the pseudo-labeled dataset and updates them via
meta-learning. ICT achieves state-of-the-art results across multiple
design-bench tasks, achieving the best mean rank of and median rank of
, among methods. Our source code can be found here.Comment: Accepted by NeurIPS 202
Parallel-mentoring for Offline Model-based Optimization
We study offline model-based optimization to maximize a black-box objective
function with a static dataset of designs and scores. These designs encompass a
variety of domains, including materials, robots and DNA sequences. A common
approach trains a proxy on the static dataset to approximate the black-box
objective function and performs gradient ascent to obtain new designs. However,
this often results in poor designs due to the proxy inaccuracies for
out-of-distribution designs. Recent studies indicate that: (a) gradient ascent
with a mean ensemble of proxies generally outperforms simple gradient ascent,
and (b) a trained proxy provides weak ranking supervision signals for design
selection. Motivated by (a) and (b), we propose \textit{parallel-mentoring} as
an effective and novel method that facilitates mentoring among parallel
proxies, creating a more robust ensemble to mitigate the out-of-distribution
issue. We focus on the three-proxy case and our method consists of two modules.
The first module, \textit{voting-based pairwise supervision}, operates on three
parallel proxies and captures their ranking supervision signals as pairwise
comparison labels. These labels are combined through majority voting to
generate consensus labels, which incorporate ranking supervision signals from
all proxies and enable mutual mentoring. However, label noise arises due to
possible incorrect consensus. To alleviate this, we introduce an
\textit{adaptive soft-labeling} module with soft-labels initialized as
consensus labels. Based on bi-level optimization, this module fine-tunes
proxies in the inner level and learns more accurate labels in the outer level
to adaptively mentor proxies, resulting in a more robust ensemble. Experiments
validate the effectiveness of our method. Our code is available here.Comment: Accepted by NeurIPS 202
Gradient-based Bi-level Optimization for Deep Learning: A Survey
Bi-level optimization, especially the gradient-based category, has been
widely used in the deep learning community including hyperparameter
optimization and meta-knowledge extraction. Bi-level optimization embeds one
problem within another and the gradient-based category solves the outer-level
task by computing the hypergradient, which is much more efficient than
classical methods such as the evolutionary algorithm. In this survey, we first
give a formal definition of the gradient-based bi-level optimization. Next, we
delineate criteria to determine if a research problem is apt for bi-level
optimization and provide a practical guide on structuring such problems into a
bi-level optimization framework, a feature particularly beneficial for those
new to this domain. More specifically, there are two formulations: the
single-task formulation to optimize hyperparameters such as regularization
parameters and the distilled data, and the multi-task formulation to extract
meta-knowledge such as the model initialization. With a bi-level formulation,
we then discuss four bi-level optimization solvers to update the outer variable
including explicit gradient update, proxy update, implicit function update, and
closed-form update. Finally, we wrap up the survey by highlighting two
prospective future directions: (1) Effective Data Optimization for Science
examined through the lens of task formulation. (2) Accurate Explicit Proxy
Update analyzed from an optimization standpoint.Comment: AI4Science; Bi-level Optimization; Hyperparameter Optimization; Meta
Learning; Implicit Functio
Interferon regulatory factor 2 binding protein 2b regulates neutrophil versus macrophage fate during zebrafish definitive myelopoiesis
International audienceInterferon regulatory factor 2 binding protein 2b regulates neutrophil versus macrophage fate during zebrafish definitive myelopoiesis
GLM-130B: An Open Bilingual Pre-trained Model
We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language
model with 130 billion parameters. It is an attempt to open-source a 100B-scale
model at least as good as GPT-3 (davinci) and unveil how models of such a scale
can be successfully pre-trained. Over the course of this effort, we face
numerous unexpected technical and engineering challenges, particularly on loss
spikes and divergence. In this paper, we introduce the training process of
GLM-130B including its design choices, training strategies for both efficiency
and stability, and engineering efforts. The resultant GLM-130B model offers
significant outperformance over GPT-3 175B (davinci) on a wide range of popular
English benchmarks while the performance advantage is not observed in OPT-175B
and BLOOM-176B. It also consistently and significantly outperforms ERNIE TITAN
3.0 260B -- the largest Chinese language model -- across related benchmarks.
Finally, we leverage a unique scaling property of GLM-130B to reach INT4
quantization without post training, with almost no performance loss, making it
the first among 100B-scale models and more importantly, allowing its effective
inference on 4RTX 3090 (24G) or 8RTX 2080 Ti (11G) GPUs, the
most affordable GPUs required for using 100B-scale models. The GLM-130B model
weights are publicly accessible and its code, training logs, related toolkit,
and lessons learned are open-sourced at
\url{https://github.com/THUDM/GLM-130B/}.Comment: Accepted to ICLR 202
Robust estimation of bacterial cell count from optical density
Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data
Handwritten Notes Vs Typed Notes
Researchers have had discussions on whether handwritten notes is better than typed
notes. We had experiences of professors banning electronics in class and showing strong support
on the form of written notes. Prior studies had attempted to use different experiments in order to
test the the quality of both; researchers share different views on this matter, most support written
notes; Researchers who support written notes think “writing by hand strengthens the learning
process. When typing on a keyboard, this process may be impaired (Science Daily)” and that
“students who took notes on laptops performed worse on conceptual questions than students who
took notes longhand (Muller and Oppenheimer).” The researcher who support typed notes thinks
“Typing notes produced higher retention scores than handwriting notes…; typing as a method of
note-taking may be an influential factor in memory retention, particularly in a lecture context.
(Ian Schoen).”
We asked MTH 165 and MTH 141 students to provide information on their style and
their opinion on their performances in classes in order to help us understand the issue. The
question was “Is hand written notes more effective than typed notes for helping undergraduate
students to retain information during lectures?” We hypothesised that “Hand-written notes is
more effective than typed notes in terms of information retainment.”
In order to get insights, we sent out an anonymous survey asking four questions to students in
our study groups (the survey was anonymous):
â—Ź What kind of method do you use to take lecture notes?
â—Ź How well do you think you are retaining the information from the lecture.
â—Ź How well are you doing in this class?
â—Ź Do you want to change your note-taking method? If so, why?
Thirty-two people responded to the survey. None of the responders used typed notes. We
found that and most students think they retain the information in lecture well – 56.3%; Also,
most students are doing well in their classes – 59.4%. As predicted by our hypothesis, those who
take notes by hand retain lecture materials well. Also most students do not want to change their
note-taking method.
The main flaw on this project is that we did not predict it is almost impossible for math
students to type notes, due to all the complicated diagrams, so we failed to get samples of
students who do type notes, and missed out on the insights. We think we could have done better
by making the sample size bigger, asking students of all disciplines to share what they do in
class.
References
Schoen, Ian, "Effects of Method and Context of Note-taking on Memory: Handwriting versus
Typing in Lecture and Textbook Reading Contexts" (2012). Pitzer Senior Theses. Paper 20.
http://scholarship.claremont.edu/pitzer_theses/20
(n.d.). Retrieved December 04, 2017, from
https://www.sciencedaily.com/releases/2011/01/110119095458.htm
Mueller, P. A., & Oppenheimer, D. M. (2014). The Pen Is Mightier Than the Keyboard.
Psychological Science, 25(6), 1159-1168. doi:10.1177/095679761452458
- …