89 research outputs found
The Girl with the Peanut Necklace: Experiences of Infertility and in vitro Fertilization in China
A 2014-2015 William Prize for best essay in East Asian Studies was awarded to Ruoxi Yu (Berkeley College \u2715) for her essay submitted to the Department of Anthropology, “The Girl with the Peanut Necklace: Experiences of Infertility and in vitro Fertilization in China.” (Marcia Inhorn, William K. Lanman Jr. Professor of Anthropology, advisor; Susan Brownell, Professor of Anthropology at USML, secondary reader.)
Ruoxi Yu’s essay, “The Girl with the Peanut Necklace: Experiences of Infertility and in vitro Fertilization in China,” situates original research within the history of the one-child birth control policy and the tension between the demands of the family and the state. The first thing that strikes one about this senior essay is that, at 130 pages, it is not far from being a dissertation. It is based on 10 weeks of ethnographic fieldwork in an infertility clinic in Tianjin combined with two semesters of library research and writing. The quality of the ethnographic research is remarkable for an undergraduate. The setting was very sensitive and required sitting around the clinic waiting for an opportunity to draw a patient into conversation, eventually asking for permission to conduct an interview. Ruoxi’s social skills and facility in Chinese enabled her to interview a number of women who divulged intimate details of their lives. Even many anthropology Ph.D. students find it difficult to pull meaningful information out of the messiness of real life and to organize it within academic frameworks. In the end, Ruoxi is able to successfully draw from medical anthropology and feminist theory to link her research results with time-honored anthropological debates about the Chinese family, and also with recent thinking about medical technologies and their relationship with the state
Does GNN Pretraining Help Molecular Representation?
Extracting informative representations of molecules using Graph neural
networks (GNNs) is crucial in AI-driven drug discovery. Recently, the graph
research community has been trying to replicate the success of self-supervised
pretraining in natural language processing, with several successes claimed.
However, we find the benefit brought by self-supervised pretraining on small
molecular data can be negligible in many cases. We conduct thorough ablation
studies on the key components of GNN pretraining, including pretraining
objectives, data splitting methods, input features, pretraining dataset scales,
and GNN architectures, to see how they affect the accuracy of the downstream
tasks. Our first important finding is, self-supervised graph pretraining do not
always have statistically significant advantages over non-pretraining methods
in many settings. Secondly, although noticeable improvement can be observed
with additional supervised pretraining, the improvement may diminish with
richer features or more balanced data splits. Thirdly, hyper-parameters could
have larger impacts on accuracy of downstream tasks than the choice of
pretraining tasks, especially when the scales of downstream tasks are small.
Finally, we provide our conjectures where the complexity of some pretraining
methods on small molecules might be insufficient, followed by empirical
evidences on different pretraining datasets
Estimating the Distribution of Random Parameters in a Diffusion Equation Forward Model for a Transdermal Alcohol Biosensor
We estimate the distribution of random parameters in a distributed parameter
model with unbounded input and output for the transdermal transport of ethanol
in humans. The model takes the form of a diffusion equation with the input
being the blood alcohol concentration and the output being the transdermal
alcohol concentration. Our approach is based on the idea of reformulating the
underlying dynamical system in such a way that the random parameters are now
treated as additional space variables. When the distribution to be estimated is
assumed to be defined in terms of a joint density, estimating the distribution
is equivalent to estimating the diffusivity in a multi-dimensional diffusion
equation and thus well-established finite dimensional approximation schemes,
functional analytic based convergence arguments, optimization techniques, and
computational methods may all be employed. We use our technique to estimate a
bivariate normal distribution based on data for multiple drinking episodes from
a single subject.Comment: 10 page
Just Fine-tune Twice: Selective Differential Privacy for Large Language Models
With the increasing adoption of NLP models in real-world products, it becomes
more and more important to protect these models from privacy leakage. Because
private information in language data is sparse, previous research formalized a
Selective-Differential-Privacy (SDP) notion to provide protection for sensitive
tokens detected by policy functions, and prove its effectiveness on RNN-based
models. But the previous mechanism requires separating the private and public
model parameters and thus cannot be applied on large attention-based models. In
this paper, we propose a simple yet effective just-fine-tune-twice privacy
mechanism to first fine-tune on in-domain redacted data and then on in-domain
private data, to achieve SDP for large Transformer-based language models. We
also design explicit and contextual policy functions to provide protections at
different levels. Experiments show that our models achieve strong performance
while staying robust to the canary insertion attack. We further show that even
under low-resource settings with a small amount of in-domain data, SDP can
still improve the model utility. We will release the code, data and models to
facilitate future research
Selective Differential Privacy for Language Modeling
With the increasing applications of language models, it has become crucial to
protect these models from leaking private information. Previous work has
attempted to tackle this challenge by training RNN-based language models with
differential privacy guarantees. However, applying classical differential
privacy to language models leads to poor model performance as the underlying
privacy notion is over-pessimistic and provides undifferentiated protection for
all tokens in the data. Given that the private information in natural language
is sparse (for example, the bulk of an email might not carry personally
identifiable information), we propose a new privacy notion, selective
differential privacy, to provide rigorous privacy guarantees on the sensitive
portion of the data to improve model utility. To realize such a new notion, we
develop a corresponding privacy mechanism, Selective-DPSGD, for RNN-based
language models. Besides language modeling, we also apply the method to a more
concrete application--dialog systems. Experiments on both language modeling and
dialog system building show that the proposed privacy-preserving mechanism
achieves better utilities while remaining safe under various privacy attacks
compared to the baselines. The data and code are released at
https://github.com/wyshi/lm_privacy to facilitate future research .Comment: NAACL 202
Revisiting Data-Free Knowledge Distillation with Poisoned Teachers
Data-free knowledge distillation (KD) helps transfer knowledge from a
pre-trained model (known as the teacher model) to a smaller model (known as the
student model) without access to the original training data used for training
the teacher model. However, the security of the synthetic or
out-of-distribution (OOD) data required in data-free KD is largely unknown and
under-explored. In this work, we make the first effort to uncover the security
risk of data-free KD w.r.t. untrusted pre-trained models. We then propose
Anti-Backdoor Data-Free KD (ABD), the first plug-in defensive method for
data-free KD methods to mitigate the chance of potential backdoors being
transferred. We empirically evaluate the effectiveness of our proposed ABD in
diminishing transferred backdoor knowledge while maintaining compatible
downstream performances as the vanilla KD. We envision this work as a milestone
for alarming and mitigating the potential backdoors in data-free KD. Codes are
released at https://github.com/illidanlab/ABD.Comment: Accepted to ICML 202
Convergence and Disparities in Higher Education Fiscal Expenditures in China: A Regional Perspective
У цьому дослідженні досліджуються відмінності та конвергенція у фінансових витратах на вищу освіту в різних регіонах Китаю. У дослідженні використовується аналіз коефіцієнтів Джині та тести σ-конвергенції/β-конвергенції для кількісної оцінки ступеня розбіжностей і вивчення тенденцій конвергенції протягом дванадцятирічного періоду дослідження (2007–2018). Результати проливають світло на дисбаланс у розподілі ресурсів і дають цінну інформацію про зусилля, необхідні для досягнення більш справедливого розподілу бюджетних ресурсів для вищої освіти. Отримані результати показують значні відмінності у фінансових витратах на вищу освіту між східним, центральним, західним і північно-східним регіонами, причому східний регіон демонструє найбільший розрив порівняно з іншими. Примітно, диспропорція між східним і центральним регіонами навіть більша, ніж між східним і західним регіонами, що підкреслює необхідність цілеспрямованих втручань для усунення регіональних дисбалансів. Протягом досліджуваного періоду розрив між Східним і Центральним регіонами залишався стабільно вищим, ніж інші регіональні відмінності. Крім того, дослідження показує загальну тенденцію до скорочення розбіжностей у фіскальних видатках у регіонах, причому найбільш виражена конвергенція спостерігається між центральним і північно-східним регіонами. У Західному регіоні спостерігаються дещо більші відмінності, ніж у Центральному та Північно-Східному регіонах, що, ймовірно, пояснюється більшою підтримкою фіскальної політики та меншою кількістю студентів. Тим не менш, розрив у фіскальних видатках між Західним і Центральним регіонами продемонстрував тенденцію до скорочення. Дослідження також досліджує абсолютну та умовну β-конвергенцію, виявляючи помітні моделі конвергенції у східному та центральному регіонах. Однак західні та північно-східні регіони демонструють різний ступінь конвергенції, що вказує на необхідність механізмів конвергенції, характерних для кожного регіону. Щоб досягти збалансованого розподілу фінансових ресурсів для вищої освіти між регіонами, дослідження рекомендує цільову фіскальну політику, додаткове фінансування та покращення прозорості та підзвітності. Політики повинні зосередитися на посиленні механізмів конвергенції для забезпечення більш справедливого розподілу ресурсів і сприяння сталому розвитку вищої освіти в усій країні. Незважаючи на те, що це дослідження дає цінну інформацію, важливо розглянути інші потенційні фактори, що впливають на диспропорції у фіскальних видатках, такі як політична орієнтація, економічні відмінності та демографічні структури, для більш повного розуміння. У майбутніх дослідженнях можуть бути використані якісні дослідження для подальшого вивчення складнощів дисбалансу фінансових витрат на вищу освіту та визначення ефективних заходів політики.This research investigates the disparities and convergence in higher education fiscal expenditures across different regions in China. The study utilises Gini coefficient analysis and σ-convergence/β-convergence tests to quantify the extent of disparities and explore convergence trends over a twelve-year investigation period (2007–2018). The results shed light on the imbalances in resource allocation and provide valuable insights into the efforts required to achieve a more equitable distribution of fiscal resources for higher education. The findings reveal significant disparities in higher education fiscal expenditures between the Eastern, Central, Western, and Northeastern regions, with the Eastern region exhibiting the largest gap compared to others. Remarkably, the disparity between the Eastern and Central regions is even greater than that between the Eastern and Western regions, emphasising the need for targeted interventions to address regional imbalances. Over the study period, the gap between the Eastern and Central regions remained consistently higher than other regional disparities. Moreover, the research shows a general trend towards narrowing regional fiscal expenditure disparities, with the most pronounced convergence observed between the Central and Northeastern regions. The Western region exhibits slightly larger disparities than the Central and Northeastern regions, possibly attributed to greater fiscal policy support and lower student enrollments. Nevertheless, the fiscal expenditure gap between the Western and Central regions has shown a trend towards reduction. The study also explores absolute and conditional β-convergence, revealing notable convergence patterns in the Eastern and Central regions. However, the Western and Northeastern regions exhibit varying degrees of convergence, indicating the necessity for region-specific convergence mechanisms. To achieve a balanced allocation of financial resources for higher education across regions, the study recommends targeted fiscal policies, additional funding, and improved transparency and accountability. Policymakers should focus on enhancing convergence mechanisms to ensure a more equitable distribution of resources and foster the sustainable development of higher education throughout the country. While this research provides valuable insights, it is essential to consider other potential factors influencing fiscal expenditure disparities, such as policy orientation, economic disparities, and demographic structures, for a more comprehensive understanding. Future research may benefit from qualitative investigations to further explore the complexities of higher education fiscal expenditure imbalances and identify effective policy interventions
Excitement Surfeited Turns to Errors: Deep Learning Testing Framework Based on Excitable Neurons
Despite impressive capabilities and outstanding performance, deep neural
networks (DNNs) have captured increasing public concern about their security
problems, due to their frequently occurred erroneous behaviors. Therefore, it
is necessary to conduct a systematical testing for DNNs before they are
deployed to real-world applications. Existing testing methods have provided
fine-grained metrics based on neuron coverage and proposed various approaches
to improve such metrics. However, it has been gradually realized that a higher
neuron coverage does \textit{not} necessarily represent better capabilities in
identifying defects that lead to errors. Besides, coverage-guided methods
cannot hunt errors due to faulty training procedure. So the robustness
improvement of DNNs via retraining by these testing examples are
unsatisfactory. To address this challenge, we introduce the concept of
excitable neurons based on Shapley value and design a novel white-box testing
framework for DNNs, namely DeepSensor. It is motivated by our observation that
neurons with larger responsibility towards model loss changes due to small
perturbations are more likely related to incorrect corner cases due to
potential defects. By maximizing the number of excitable neurons concerning
various wrong behaviors of models, DeepSensor can generate testing examples
that effectively trigger more errors due to adversarial inputs, polluted data
and incomplete training. Extensive experiments implemented on both image
classification models and speaker recognition models have demonstrated the
superiority of DeepSensor.Comment: 32 page
- …