Search CORE

75 research outputs found

Strong scaling of general-purpose molecular dynamics simulations on GPUs

Author: Anderson Joshua A.
Glaser Jens
Glotzer Sharon C.
Lui Pak
Millan Jaime A.
Morse David C.
Nguyen Trung Dac
Spiga Filippo
Publication venue: 'Elsevier BV'
Publication date: 10/12/2014
Field of study

We describe a highly optimized implementation of MPI domain decomposition in a GPU-enabled, general-purpose molecular dynamics code, HOOMD-blue (Anderson and Glotzer, arXiv:1308.5587). Our approach is inspired by a traditional CPU-based code, LAMMPS (Plimpton, J. Comp. Phys. 117, 1995), but is implemented within a code that was designed for execution on GPUs from the start (Anderson et al., J. Comp. Phys. 227, 2008). The software supports short-ranged pair force and bond force fields and achieves optimal GPU performance using an autotuning algorithm. We are able to demonstrate equivalent or superior scaling on up to 3,375 GPUs in Lennard-Jones and dissipative particle dynamics (DPD) simulations of up to 108 million particles. GPUDirect RDMA capabilities in recent GPU generations provide better performance in full double precision calculations. For a representative polymer physics application, HOOMD-blue 1.0 provides an effective GPU vs. CPU node speed-up of 12.5x.Comment: 30 pages, 14 figure

arXiv.org e-Print Archive

CiteSeerX

Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback

Author: Dernoncourt Franck
Lai Viet Dac
Ngo Nghia Trung
Nguyen Thien Huu
Nguyen Thuat
Rossi Ryan A.
Van Nguyen Chien
Publication venue
Publication date: 01/08/2023
Field of study

A key technology for the development of large language models (LLMs) involves instruction tuning that helps align the models' responses with human expectations to realize impressive learning abilities. Two major approaches for instruction tuning characterize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), which are currently applied to produce the best commercial LLMs (e.g., ChatGPT). To improve the accessibility of LLMs for research and development efforts, various instruction-tuned open-source LLMs have also been introduced recently, e.g., Alpaca, Vicuna, to name a few. However, existing open-source LLMs have only been instruction-tuned for English and a few popular languages, thus hindering their impacts and accessibility to many other languages in the world. Among a few very recent work to explore instruction tuning for LLMs in multiple languages, SFT has been used as the only approach to instruction-tune LLMs for multiple languages. This has left a significant gap for fine-tuned LLMs based on RLHF in diverse languages and raised important questions on how RLHF can boost the performance of multilingual instruction tuning. To overcome this issue, we present Okapi, the first system with instruction-tuned LLMs based on RLHF for multiple languages. Okapi introduces instruction and response-ranked data in 26 diverse languages to facilitate the experiments and development of future multilingual LLM research. We also present benchmark datasets to enable the evaluation of generative LLMs in multiple languages. Our experiments demonstrate the advantages of RLHF for multilingual instruction over SFT for different base models and datasets. Our framework and resources are released at https://github.com/nlp-uoregon/Okapi

arXiv.org e-Print Archive

CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

Author: Dernoncourt Franck
Lai Viet Dac
Man Hieu
Ngo Nghia Trung
Nguyen Thien Huu
Nguyen Thuat
Rossi Ryan A.
Van Nguyen Chien
Publication venue
Publication date: 17/09/2023
Field of study

The driving factors behind the development of large language models (LLMs) with impressive learning capabilities are their colossal model sizes and extensive training datasets. Along with the progress in natural language processing, LLMs have been frequently made accessible to the public to foster deeper investigation and applications. However, when it comes to training datasets for these LLMs, especially the recent state-of-the-art models, they are often not fully disclosed. Creating training data for high-performing LLMs involves extensive cleaning and deduplication to ensure the necessary level of quality. The lack of transparency for training data has thus hampered research on attributing and addressing hallucination and bias issues in LLMs, hindering replication efforts and further advancements in the community. These challenges become even more pronounced in multilingual learning scenarios, where the available multilingual text datasets are often inadequately collected and cleaned. Consequently, there is a lack of open-source and readily usable dataset to effectively train LLMs in multiple languages. To overcome this issue, we present CulturaX, a substantial multilingual dataset with 6.3 trillion tokens in 167 languages, tailored for LLM development. Our dataset undergoes meticulous cleaning and deduplication through a rigorous pipeline of multiple stages to accomplish the best quality for model training, including language identification, URL-based filtering, metric-based cleaning, document refinement, and data deduplication. CulturaX is fully released to the public in HuggingFace to facilitate research and advancements in multilingual LLMs: https://huggingface.co/datasets/uonlp/CulturaX.Comment: Ongoing Wor

arXiv.org e-Print Archive

Rigid body constraints realized in massively-parallel molecular dynamics on graphics processing units,

Author: Carolyn L Phillips
Joshua A Anderson
Sharon C Glotzer
Trung Dac Nguyen
Publication venue
Publication date: 01/01/2011
Field of study

Molecular dynamics (MD) methods compute the trajectory of a system of point particles in response to a potential function by numerically integrating Newton's equations of motion. Extending these basic methods with rigid body constraints enables composite particles with complex shapes such as anisotropic nanoparticles, grains, molecules, and rigid proteins to be modeled. Rigid body constraints are added to the GPU-accelerated MD package, HOOMD-blue, version 0.10.0. The software can now simulate systems of particles, rigid bodies, or mixed systems in microcanonical (NVE), canonical (NVT), and isothermalisobaric (NPT) ensembles. It can also apply the FIRE energy minimization technique to these systems. In this paper, we detail the massively parallel scheme that implements these algorithms and discuss how our design is tuned for the maximum possible performance. Two different case studies are included to demonstrate the performance attained, patchy spheres and tethered nanorods. In typical cases, HOOMDblue on a single GTX 480 executes 2.5-3.6 times faster than LAMMPS executing the same simulation on any number of CPU cores in parallel. Simulations with rigid bodies may now be run with larger systems and for longer time scales on a single workstation than was previously even possible on large clusters

CiteSeerX

Multiplexing siRNAs to compress RNAi-based screen size in human cells

Author: Caplen Natasha J.
Goldsmith Paul K.
Gunsior Michele
Huppi Konrad
Jones Tamara L.
Lader Eric
Lorenzi Philip L.
Martin Scott E.
Nguyen Dac A.
Runfola Timothy
Thomas Cheryl L.
Weinstein John N.
Publication venue: Oxford University Press
Publication date
Field of study

Here we describe a novel strategy using multiplexes of synthetic small interfering RNAs (siRNAs) corresponding to multiple gene targets in order to compress RNA interference (RNAi) screen size. Before investigating the practical use of this strategy, we first characterized the gene-specific RNAi induced by a large subset (258 siRNAs, 129 genes) of the entire siRNA library used in this study (∼800 siRNAs, ∼400 genes). We next demonstrated that multiplexed siRNAs could silence at least six genes to the same degree as when the genes were targeted individually. The entire library was then used in a screen in which randomly multiplexed siRNAs were assayed for their affect on cell viability. Using this strategy, several gene targets that influenced the viability of a breast cancer cell line were identified. This study suggests that the screening of randomly multiplexed siRNAs may provide an important avenue towards the identification of candidate gene targets for downstream functional analyses and may also be useful for the rapid identification of positive controls for use in novel assay systems. This approach is likely to be especially applicable where assay costs or platform limitations are prohibitive

Crossref

PubMed Central

A hidden HIV epidemic among women in Vietnam

Author: Anita Hardon
BD Thang
BS Pham
CD Dinh
DH Dac
DQ Vinh
DT Hieu
E Morch
General Statistic Office
General Statistic Office Ministry of Health, United Nations Children's Fund, World Health Organization
General Statistic Office National Institute of Hygiene and Epidemiology, ORC Macro
Hanoi Medical University
Hien Nguyen Tran
HT Anh
HT Vu
I Brugemann
J Ananworanich
Joint United Nations Program on HIV/AIDS (UNAIDS)
Joint United Nations Program on HIV/AIDS (UNAIDS) World Health Organization
K Ohshige
LG Johnston
Ministry of Health National Institute of Hygiene and Epidemiology
Ministry of Health Statistic Head Office
N Dao
N Quang
NA Tuan
NA Tuan
National Committee for AIDS Prevention and Drug and Prostitution Control Ministry of Health
National Committee for AIDS – Drug – Prostitution Control Vietnam Ministry of Health
National Committee for Population Family and Children
ND Tho
ND Tung
ND Tung
NM Thang
NN Trang
NT Hien
NT Lam
NT Thuyen
NV Thuong
P Ha
Pamela Wright
Pauline Oosterhoff
PT Lien
Roel A Coutinho
S Jolly
S Moses
Socialist Republic of Vietnam
Socialist Republic of Vietnam
T Brown
T Nguyen
Thu Anh Nguyen
TN Tran
TT Minh
United Nations Drug Control Program Ministry of Labor, Invalids and Social Affairs
V Hien
VF Go
VF Go
Vietnam Administrative for HIV/AIDS Control
Vietnam Ministry of Health
VS Ha
VTM Hanh
World Health Organization (WHO)
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The HIV epidemic in Vietnam is still concentrated among high risk populations, including IDU and FSW. The response of the government has focused on the recognized high risk populations, mainly young male drug users. This concentration on one high risk population may leave other populations under-protected or unprepared for the risk and the consequences of HIV infection. In particular, attention to women's risks of exposure and needs for care may not receive sufficient attention as long as the perception persists that the epidemic is predominantly among young males. Without more knowledge of the epidemic among women, policy makers and planners cannot ensure that programs will also serve women's needs. Methods More than 300 documents appearing in the period 1990 to 2005 were gathered and reviewed to build an understanding of HIV infection and related risk behaviors among women and of the changes over time that may suggest needed policy changes. Results It appears that the risk of HIV transmission among women in Vietnam has been underestimated; the reported data may represent as little as 16% of the real number. Although modeling predicted that there would be 98,500 cases of HIV-infected women in 2005, only 15,633 were accounted for in reports from the health system. That could mean that in 2005, up to 83,000 women infected with HIV have not been detected by the health care system, for a number of possible reasons. For both detection and prevention, these women can be divided into sub-groups with different risk characteristics. They can be infected by sharing needles and syringes with IDU partners, or by having unsafe sex with clients, husbands or lovers. However, most new infections among women can be traced to sexual relations with young male injecting drug users engaged in extramarital sex. Each of these groups may need different interventions to increase the detection rate and thus ensure that the women receive the care they need. Conclusion Women in Vietnam are increasingly at risk of HIV transmission but that risk is under-reported and under-recognized. The reasons are that women are not getting tested, are not aware of risks, do not protect themselves and are not being protected by men. Based on this information, policy-makers and planners can develop better prevention and care programs that not only address women's needs but also reduce further spread of the infection among the general population.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

UvA-DARE

International Migration, Integration and Social Cohesion online publications

One health, une seule santé

Author: Abela-Ridder Bernadette
Allen-Scott Lisa
Atkinson Mark W.
Atkinson Shirley J.
Baljinnyam Zolzaya
Beetz Andrea
Boa Éric
Bonfoh Bassirou
Bresalier Michael
Bunch Martin J.
Buntain Bonnie
Béchir Mahamat Mahamat
Cailleau Aurélie
Cam Phung Dac
Cassidy Angela
Chitnis Nakul
Choudhury Adnan
Cleaveland Sarah
Coleman Paul
Cork Susan C.
Crump Lisa
Cumming David H.M.
Cumming Graeme S.
Danielsen Solveig
Dean Anna
Fuhrimann Samuel
Geale Dorothy W.
Grace Delia
Grigg Cheri
Haesen Sophie
Hafner Felix
Hall David C.
Hatfield Jennifer
Hattendorf Jan
Haydon Daniel
Hediger Karin
Houle Karen L.F.
Huong Giang Pham Thi
Ibrahim Abderahim
Jaeger Fabienne
Jean-Richard Vreni
Kama Mike
Kasymbekov Joldoshbek
King Lonnie
Kunkel Rebekah
Lubroth Juan
Léchenne Monique
Lévy Goldblum Anne
Meisser Andrea
Miranda Mary Elizabeth
Nguyen Vi
Nguyen-Viet Hung
North Michelle
Odermatt Peter
Okello Anna
Osofsky Steven A.
Ouattara Karim
Pham-Duc Phuc
Racloz Vanessa
Reid Simon
Rock Melanie
Roth Felix
Rubin Carol S.
Schelling Esther
Shaw Alexandra
Sinh Dang Xuan
Stephen Craig
Stärk Katharina D.C.
Tanner Marcel
Tidjani Abdessalam
Toan Luu Quoc
Tschanz Cooke Karin
Tschopp Rea
Turner Dennis C.
Tuyet Hanh Tran Thi
Vallat Bernard
Vandersmissen Alain
Van Minh Hoang
Vu-Van Tu
Vu Anh Le
Waltner-Toews David
Welburn Susan C.
Wettlaufer Lenke
Whittaker Maxine A.
Woods Abigail
Zinsstag Jakob
Zurbrügg Christian
Publication venue: Éditions Quæ
Publication date: 17/05/2021
Field of study

One Health, « Une seule santé », est une stratégie mondiale visant à développer les collaborations interdisciplinaires pour la santé humaine, animale et environnementale. Elle promeut une approche intégrée, systémique et unifiée de la santé aux échelles locale, nationale et mondiale, afin de mieux affronter les maladies émergentes à risque pandémique, mais aussi s'adapter aux impacts environnementaux présents et futurs. Bien que ce mouvement s’étende, la littérature en français reste rare. Traduit de l’anglais, coordonné par d’éminents épidémiologistes et s'appuyant sur un large panel d' approches scientifiques rarement réunies autour de la santé, cet ouvrage retrace les origines du concept et présente un contenu pratique sur les outils méthodologiques, la collecte de données, les techniques de surveillance et les plans d’étude. Il combine recherche et pratique en un seul volume et constitue un ouvrage de référence unique pour la santé mondiale

OpenEdition

Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking

Author: Agarwal Vinayak
Aigle Bertrand
Alexandrov Theodore
Allard Pierre-Marie
Almaliti Jehad
Bandeira Nuno
Baric Ralph
Boudreau Paul D.
Bouslimani Amina
Boya Cristopher A.P.
Briand Enora
Carlson Erin E.
Carver Jeremy J.
Charusanti Pep
Crüsemann Max
Dai Jingqui
Demarque Daniel P.
Dorrestein Kathleen
Dorrestein Pieter C.
Duggan Brendan M.
Duncan Katherine R.
Dutton Rachel J.
Edlund Anna
Elfeki Maryam
Engene Niclas
Esquenazi Eduardo
Floros Dimitrios J.
Garg Neha
Gavilan Ronnie G.
Gerwick Lena
Gerwick William H.
Glukhov Evgenia
Gonzalez David J.
Granatosky Eve A.
Gurr Joshua
Gutiérrez Marcelino
Helfrich Eric J. N.
Hoffman Thomas
Houson Hailey
Hsu Cheng-Chih
Humpf Hans-Ulrich
Jadhav Ajit
Jelsbak Lars
Jenkins Stefan
Jensen Paul R.
Johnson Andrew R.
Kapono Clifford A.
Kersten Roland D.
Keyzers Robert A.
Kharbush Jenan J.
Kleigrewe Karin
Klitgaard Andreas
Knight Rob
Koyama Nobuhiro
Kurita Kenji L.
Kyle Jennifer E.
Lamsa Anne
Larson Charles B.
Liaw Chih-Chuang
Linington Roger G.
Litaudon Marc
Liu Wei-Ting
Liu Xueting
Lopes Norberto P.
Luzzatto-Knaan Tal
Maansson Maria
Macherla Venkat
Marques Lucas M.
Mascuch Samantha J.
McLean Jeffrey
McPhail Kerry L.
Meehan Michael J.
Melnik Alexey V.
Metz Thomas O.
Michelsen Charlotte F.
Mohimani Hosein
Moore Bradley S.
Murphy Brian T.
Müller Rolf
Neupane Ram
Nguyen Dac-Trung
Nguyen Don Duy
Nielsen Kristian Fog
Northen Trent
Nothias Louis-Felix
O'Neill Ellis C.
Pace Laura A.
Palsson Bernhard Ø
Parrot Delphine
Peng Yao
Peryea Tyler
Pevzner Pavel
Phapale Prasad
Phelan Vanessa V.
Piel Jörn
Pociute Egle
Pogliano Kit
Porto Carla
Quinn Robert A.
Rodríguez Andrés M. C.
Ryffel Florian
Sanchez Laura M.
Sandoval-Calderón Mario
Sedio Brian E.
Shi Wenyuan
Shinn Paul
Sidebottom Ashley M.
Silva Denise B.
Silva Ricardo R.
Sims Amy C.
Sohlenkamp Christian
Tomasi Sophie
Torres-Mendoza Daniel
Traxler Matthew F.
VanLeer Danielle
Vining Oliver B.
Vorholt Julia A.
Vuong Lisa
Wang Mingxun
Waters Katrina M.
Watrous Jeramie
Williams Philip G.
Wolfender Jean-Luc
Yang Yu-Liang
Zeng Yi
Zhang Chen
Zhang Lixin
Publication venue
Publication date: 01/01/2016
Field of study

The potential of the diverse chemistries present in natural products (NP) for biotechnology and medicine remains untapped because NP databases are not searchable with raw data and the NP community has no way to share data other than in published papers. Although mass spectrometry techniques are well-suited to high-throughput characterization of natural products, there is a pressing need for an infrastructure to enable sharing and curation of data. We present Global Natural Products Social molecular networking (GNPS, http://gnps.ucsd.edu), an open-access knowledge base for community wide organization and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data. In GNPS crowdsourced curation of freely available community-wide reference MS libraries will underpin improved annotations. Data-driven social-networking should facilitate identification of spectra and foster collaborations. We also introduce the concept of ‘living data’ through continuous reanalysis of deposited data

Carolina Digital Repository

Situation et perspectives mondiales du riz (deuxième partie)

Author: Nguyen Dac Simone A.
Publication venue: 'PERSEE Program'
Publication date: 01/01/1995
Field of study

Nguyen Dac Simone A. Situation et perspectives mondiales du riz (deuxième partie). In: L'information géographique, volume 59, n°2, 1995. pp. 75-79

Situation et perspectives mondiales du riz (première partie)

Author: Nguyen Dac Simone A.
Publication venue: 'PERSEE Program'
Publication date: 01/01/1995
Field of study

Nguyen Dac Simone A. Situation et perspectives mondiales du riz (première partie). In: L'information géographique, volume 59, n°2, 1995. pp. 57-61