36 research outputs found

    Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback

    Full text link
    A key technology for the development of large language models (LLMs) involves instruction tuning that helps align the models' responses with human expectations to realize impressive learning abilities. Two major approaches for instruction tuning characterize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), which are currently applied to produce the best commercial LLMs (e.g., ChatGPT). To improve the accessibility of LLMs for research and development efforts, various instruction-tuned open-source LLMs have also been introduced recently, e.g., Alpaca, Vicuna, to name a few. However, existing open-source LLMs have only been instruction-tuned for English and a few popular languages, thus hindering their impacts and accessibility to many other languages in the world. Among a few very recent work to explore instruction tuning for LLMs in multiple languages, SFT has been used as the only approach to instruction-tune LLMs for multiple languages. This has left a significant gap for fine-tuned LLMs based on RLHF in diverse languages and raised important questions on how RLHF can boost the performance of multilingual instruction tuning. To overcome this issue, we present Okapi, the first system with instruction-tuned LLMs based on RLHF for multiple languages. Okapi introduces instruction and response-ranked data in 26 diverse languages to facilitate the experiments and development of future multilingual LLM research. We also present benchmark datasets to enable the evaluation of generative LLMs in multiple languages. Our experiments demonstrate the advantages of RLHF for multilingual instruction over SFT for different base models and datasets. Our framework and resources are released at https://github.com/nlp-uoregon/Okapi

    CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

    Full text link
    The driving factors behind the development of large language models (LLMs) with impressive learning capabilities are their colossal model sizes and extensive training datasets. Along with the progress in natural language processing, LLMs have been frequently made accessible to the public to foster deeper investigation and applications. However, when it comes to training datasets for these LLMs, especially the recent state-of-the-art models, they are often not fully disclosed. Creating training data for high-performing LLMs involves extensive cleaning and deduplication to ensure the necessary level of quality. The lack of transparency for training data has thus hampered research on attributing and addressing hallucination and bias issues in LLMs, hindering replication efforts and further advancements in the community. These challenges become even more pronounced in multilingual learning scenarios, where the available multilingual text datasets are often inadequately collected and cleaned. Consequently, there is a lack of open-source and readily usable dataset to effectively train LLMs in multiple languages. To overcome this issue, we present CulturaX, a substantial multilingual dataset with 6.3 trillion tokens in 167 languages, tailored for LLM development. Our dataset undergoes meticulous cleaning and deduplication through a rigorous pipeline of multiple stages to accomplish the best quality for model training, including language identification, URL-based filtering, metric-based cleaning, document refinement, and data deduplication. CulturaX is fully released to the public in HuggingFace to facilitate research and advancements in multilingual LLMs: https://huggingface.co/datasets/uonlp/CulturaX.Comment: Ongoing Wor

    Oseltamivir Is Adequately Absorbed Following Nasogastric Administration to Adult Patients with Severe H5N1 Influenza

    Get PDF
    In the absence of a parenteral drug, oral oseltamivir is currently recommended by the WHO for treating H5N1 influenza. Whether oseltamivir absorption is adequate in severe influenza is unknown. We measured the steady state, plasma concentrations of nasogastrically administered oseltamivir 150 mg bid and its active metabolite, oseltamivir carboxylate (OC), in three, mechanically ventilated patients with severe H5N1 (male, 30 yrs; pregnant female, 22 yrs) and severe H3N2 (female, 76 yrs). Treatments were started 6, 7 and 8 days after illness onset, respectively. Both females were sampled while on continuous venovenous haemofiltration. Admission and follow up specimens (trachea, nose, throat, rectum, blood) were tested for RNA viral load by reverse transcriptase PCR. In vitro virus susceptibility to OC was measured by a neuraminidase inhibition assay. Admission creatinine clearances were 66 (male, H5N1), 82 (female, H5N1) and 6 (H3N2) ml/min. Corresponding AUC0–12 values (5932, 10,951 and 34,670 ng.h/ml) and trough OC concentrations (376, 575 and 2730 ng/ml) were higher than previously reported in healthy volunteers; the latter exceeded 545 to 3956 fold the H5N1 IC50 (0.69 ng/ml) isolated from the H5N1 infected female. Two patients with follow-up respiratory specimens cleared their viruses after 5 (H5N1 male) and 5 (H3N2 female) days of oseltamivir. Both female patients died of respiratory failure; the male survived. 150 mg bid of oseltamivir was well absorbed and converted extensively to OC. Virus was cleared in two patients but two patients died, suggesting viral efficacy but poor clinical efficacy

    Epidemiology, Clinical Manifestations, and Outcomes of Streptococcus suis Infection in Humans

    Get PDF
    Streptococcus suis, a bacterium that affects pigs, is a neglected pathogen that causes systemic disease in humans. We conducted a systematic review and meta-analysis to summarize global estimates of the epidemiology, clinical characteristics, and outcomes of this zoonosis. We searched main literature databases for all studies through December 2012 using the search term "streptococcus suis." The prevalence of S. suis infection is highest in Asia; the primary risk factors are occupational exposure and eating of contaminated food. The pooled proportions of case-patients with pig-related occupations and history of eating high-risk food were 38.1% and 37.3%, respectively. The main clinical syndrome was meningitis (pooled rate 68.0%), followed by sepsis, arthritis, endocarditis, and endophthalmitis. The pooled case-fatality rate was 12.8%. Sequelae included hearing loss (39.1%) and vestibular dysfunction (22.7%). Our analysis identified gaps in the literature, particularly in assessing risk factors and sequelae of this infection

    Risk Factors of Streptococcus suis Infection in Vietnam. A Case-Control Study

    Get PDF
    Background: Streptococcus suis infection, an emerging zoonosis, is an increasing public health problem across South East Asia and the most common cause of acute bacterial meningitis in adults in Vietnam. Little is known of the risk factors underlying the disease. Methods and Findings: A case-control study with appropriate hospital and matched community controls for each patient was conducted between May 2006 and June 2009. Potential risk factors were assessed using a standardized questionnaire and investigation of throat and rectal S. suis carriage in cases, controls and their pigs, using real-time PCR and culture of swab samples. We recruited 101 cases of S. suis meningitis, 303 hospital controls and 300 community controls. By multivariate analysis, risk factors identified for S. suis infection as compared to either control group included eating "high risk" dishes, including such dishes as undercooked pig blood and pig intestine (OR1 = 2.22; 95% CI = [1.15-4.28] and OR2 = 4.44; 95% CI = [2.15-9.15]), occupations related to pigs (OR1 = 3.84; 95% CI = [1.32-11.11] and OR2 = 5.52; 95% CI = [1.49-20.39]), and exposures to pigs or pork in the presence of skin injuries (OR1 = 7.48; 95% CI = [1.97-28.44] and OR2 = 15.96; 95% CI = [2.97-85.72]). S. suis specific DNA was detected in rectal and throat swabs of 6 patients and was cultured from 2 rectal samples, but was not detected in such samples of 1522 healthy individuals or patients without S. suis infection. Conclusions: This case control study, the largest prospective epidemiological assessment of this disease, has identified the most important risk factors associated with S. suis bacterial meningitis to be eating 'high risk' dishes popular in parts of Asia, occupational exposure to pigs and pig products, and preparation of pork in the presence of skin lesions. These risk factors can be addressed in public health campaigns aimed at preventing S. suis infectio

    The Vietnam Initiative on Zoonotic Infections (VIZIONS): A Strategic Approach to Studying Emerging Zoonotic Infectious Diseases

    Get PDF
    The effect of newly emerging or re-emerging infectious diseases of zoonotic origin in human populations can be potentially catastrophic, and large-scale investigations of such diseases are highly challenging. The monitoring of emergence events is subject to ascertainment bias, whether at the level of species discovery, emerging disease events, or disease outbreaks in human populations. Disease surveillance is generally performed post hoc, driven by a response to recent events and by the availability of detection and identification technologies. Additionally, the inventory of pathogens that exist in mammalian and other reservoirs is incomplete, and identifying those with the potential to cause disease in humans is rarely possible in advance. A major step in understanding the burden and diversity of zoonotic infections, the local behavioral and demographic risks of infection, and the risk of emergence of these pathogens in human populations is to establish surveillance networks in populations that maintain regular contact with diverse animal populations, and to simultaneously characterize pathogen diversity in human and animal populations. Vietnam has been an epicenter of disease emergence over the last decade, and practices at the human/animal interface may facilitate the likelihood of spillover of zoonotic pathogens into humans. To tackle the scientific issues surrounding the origins and emergence of zoonotic infections in Vietnam, we have established The Vietnam Initiative on Zoonotic Infections (VIZIONS). This countrywide project, in which several international institutions collaborate with Vietnamese organizations, is combining clinical data, epidemiology, high-throughput sequencing, and social sciences to address relevant one-health questions. Here, we describe the primary aims of the project, the infrastructure established to address our scientific questions, and the current status of the project. Our principal objective is to develop an integrated approach to the surveillance of pathogens circulating in both human and animal populations and assess how frequently they are exchanged. This infrastructure will facilitate systematic investigations of pathogen ecology and evolution, enhance understanding of viral cross-species transmission events, and identify relevant risk factors and drivers of zoonotic disease emergence

    Autonomous Driving System and Power Transmission on 1/10 RC Car

    Get PDF
    This project aims to implement an autonomous system that combines adaptive cruise control (ACC), trajectory generation, trajectory tracking controller and half-toroidal continuously variable transmission (CVT) on a radio control car. The results show that the car can track trajectories, drive smoothly at desired speed, and keep a safe distance from the front car and has a wider range of speed and smoother acceleration. With the success of this project, students interested in control theories or related areas will be able to take the project further

    Selecting Optimal Context Sentences for Event-Event Relation Extraction

    No full text
    Understanding events entails recognizing the structural and temporal orders between event mentions to build event structures/graphs for input documents. To achieve this goal, our work addresses the problems of subevent relation extraction (SRE) and temporal event relation extraction (TRE) that aim to predict subevent and temporal relations between two given event mentions/triggers in texts. Recent state-of-the-art methods for such problems have employed transformer-based language models (e.g., BERT) to induce effective contextual representations for input event mention pairs. However, a major limitation of existing transformer-based models for SRE and TRE is that they can only encode input texts of limited length (i.e., up to 512 sub-tokens in BERT), thus unable to effectively capture important context sentences that are farther away in the documents. In this work, we introduce a novel method to better model document-level context with important context sentences for event-event relation extraction. Our method seeks to identify the most important context sentences for a given entity mention pair in a document and pack them into shorter documents to be consume entirely by transformer-based language models for representation learning. The REINFORCE algorithm is employed to train models where novel reward functions are presented to capture model performance, and context-based and knowledge-based similarity between sentences for our problem. Extensive experiments demonstrate the effectiveness of the proposed method with state-of-the-art performance on benchmark datasets

    FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction

    Full text link
    This paper presents FAMIE, a comprehensive and efficient active learning (AL) toolkit for multilingual information extraction. FAMIE is designed to address a fundamental problem in existing AL frameworks where annotators need to wait for a long time between annotation batches due to the time-consuming nature of model training and data selection at each AL iteration. This hinders the engagement, productivity, and efficiency of annotators. Based on the idea of using a small proxy network for fast data selection, we introduce a novel knowledge distillation mechanism to synchronize the proxy network with the main large model (i.e., BERT-based) to ensure the appropriateness of the selected annotation examples for the main model. Our AL framework can support multiple languages. The experiments demonstrate the advantages of FAMIE in terms of competitive performance and time efficiency for sequence labeling with AL. We publicly release our code (\url{https://github.com/nlp-uoregon/famie}) and demo website (\url{http://nlp.uoregon.edu:9000/}). A demo video for FAMIE is provided at: \url{https://youtu.be/I2i8n_jAyrY}.Comment: Accepted to NAACL 2022 (System Demonstrations

    Ocena jakości wody surowej na potrzeby planu poprawy w stacji uzdatniania wody Thu Duc w Wietnamie

    No full text
    A conventional water treatment process is currently operated at Thu Duc Water Treatment Plant (TDWTP, Ho Chi Minh City, Vietnam) in which raw water is collected from Dong Nai River at Hoa An water intake and pumping station. The raw water quality is currently fluctuated due to the effects of run-off flows which has been increasing recently. This issue directly affects the operation and performance of existing treatment process at TDWTP since the current treatment are all based on traditional technologies and have been operating for a long time. This study is conducted to evaluate the quality of raw water collected at Hoa An intake station during the period of 2018–2020 with the aim to support the consideration of improvement and enhance the operation efficiency at TDWTP. The raw water quality is evaluated by investigating physico-chemical and biological parameters during the 36 months monitoring. This helps to produce a feasible and reliable results which may then can be used as a scientific database for the improvement plan at TDWTP. Results show that the changes of water quality during the investigated time is so complicated, and the concentration of most monitoring parameters is highly seasonal fluctuated. Specifically, the amounts of organic matters, microorganism, nitrogen compounds (NH4 +, NO2 - , NO3 - ) tend to increase strongly, which may be due to the urbanization and industrialization. The management of run-off flows on upstream of water intake and pumping station is also an important aspect which need to be considered to prevent the diffusion and spread of pollution. In addition, the effects of climate changes are the important reason which leads to the seasonal changes of flow and water quality. These issues cause a big challenge for TDWTP to maintain the treatment efficiency and overall performance. This study also proposes several management and technical solutions to address the changes of raw water quality in the future, which may be useful for TDWTP during their consideration to improve the treatment process.Konwencjonalny proces uzdatniania wody jest obecnie prowadzony w zakładzie uzdatniania wody Thu Duc (TDWTP, miasto Ho Chi Minh, Wietnam), w którym surowa woda jest pobierana z rzeki Dong Nai i z przepompowni Hoa An. Jakość wody surowej podlega obecnie wahaniom ze względu na skutki spływów, które ostatnio nasilają się. Kwestia ta ma bezpośredni wpływ na funkcjonowanie i wydajność istniejących procesów oczyszczania w TDWTP, gdyż wszystkie obecne działające oczyszczalnie oparte są na tradycyjnych technologiach i działają od dłuższego czasu. Niniejsze badanie ma na celu ocenę jakości wody surowej pobieranej ze stacji poboru Hoa An w latach 2018–2020 w celu wsparcia rozważań nad poprawą i zwiększeniem efektywności działania TDWTP. Jakość wody surowej oceniana jest poprzez badanie parametrów fizykochemicznych i biologicznych podczas 36-miesięcznego monitoringu. Pozwoliłó to uzyskanie wiarygodnych wyników, które następnie mogą być wykorzystane jako naukowa baza danych dla planu modernizacji TDWTP. Wyniki pokazują, że zmiany jakości wody w badanym okresie są bardzo złożone, a stężenia większości parametrów monitoringu podlegają dużym wahaniom sezonowym. Szczególnie silnie wzrastają ilości materii organicznej, mikroorganizmów, związków azotu (NH4 +, NO2 - , NO3 - ), co może być spowodowane urbanizacją i uprzemysłowieniem. Ważnym aspektem, który należy wziąć pod uwagę, aby zapobiec rozprzestrzenianiu się zanieczyszczeń, jest również zarządzanie przepływami odpływowymi przed ujęciem wody i przepompownią. Ponadto skutki zmian klimatu są ważną przyczyną sezonowych zmian przepływu i jakości wody. Kwestie te stanowią duże wyzwanie dla TDWTP, aby utrzymać skuteczność oczyszczania i ogólną wydajność. W niniejszym opracowaniu zaproponowano również kilka rozwiązań w zakresie zarządzania i rozwiązań technicznych mających na celu zajęcie się zmianami jakości wody surowej w przyszłości, które mogą być przydatne dla TDWTP podczas rozważań nad poprawą procesu oczyszczania
    corecore