Search CORE

271 research outputs found

Quality control and bias adjustment of crowdsourced wind speed observations

Author: Chen Jieyu
Saunders Kate
Whan Kirien
Publication venue: John Wiley and Sons
Publication date: 01/01/2021
Field of study

Wind observations collected at citizen weather stations (CWSs) could be an invaluable resource in climate and meteorology studies, yet these observations are underutilised because scientists do not have confidence in their quality. These wind speed observations have systematic biases, likely caused by improper instrumentation and station sitings. Such systematic biases introduce spatial inconsistencies that prevent comparison of these stations spatially and limit the possible usage of the data. In this paper, we address these issues by improving and developing new methods for identifying suspect observations and adjusting systematic biases. Our complete quality control and bias adjustment procedure consists of four steps: (a) performing within-station quality control tests to check the plausible range and the temporal consistency of observations, (b) adjusting the systematic bias using empirical quantile mapping, (c) implementing between-station quality control to compare observations from neighbouring stations to identify spatially inconsistent observations, and (d) providing estimates of the true wind when CWSs falsely report zero wind speeds, as a complement to the bias adjustment. We apply these methods to CWSs from the Weather Observation Website (WOW) in the Netherlands, comparing the crowdsourced data with official data, and statistically assessing the improvements in data quality after each step. The results demonstrate that the crowdsourced wind speed data are more comparable with official data after quality control checks and bias adjustment steps. Our quality assessment methods therefore give confidence in CWSs, converting their observations into a usable data product and an invaluable resource for applications in need of additional wind observations

KITopen

TU Delft Repository

Queensland University of Technology ePrints Archive

Safer-Instruct: Aligning Language Models with Automated Preference Data

Author: Chen Kai
Shi Taiwei
Zhao Jieyu
Publication venue
Publication date: 14/11/2023
Field of study

Reinforcement Learning from Human Feedback (RLHF) is a vital strategy for enhancing model safety in language models. However, annotating preference data for RLHF is a resource-intensive and creativity-demanding process, while automatic generation methods face limitations in data diversity and quality. In response, we present Safer-Instruct, a novel pipeline for semi-automatically constructing large-scale preference datasets. Our approach leverages reversed instruction tuning, instruction induction, and expert model evaluation to efficiently generate high-quality preference data without human annotators. We evaluate Safer-Instruct using LLaMA for instruction induction and GPT-4 as an expert model, generating approximately 10K preference samples. Finetuning an Alpaca model on this dataset demonstrates improved harmlessness while maintaining competitive performance on conversation and downstream tasks. Safer-Instruct addresses the challenges in preference data acquisition, advancing the development of safer and more responsible AI systems. Our code and data are available at https://github.com/uscnlp-lime/safer-instructComment: 11 page

arXiv.org e-Print Archive

Generative machine learning methods for multivariate ensemble post-processing

Author: Chen Jieyu
Janke Tim
Lerch Sebastian
Steinke Florian
Publication venue
Publication date: 26/09/2022
Field of study

Ensemble weather forecasts based on multiple runs of numerical weather prediction models typically show systematic errors and require post-processing to obtain reliable forecasts. Accurately modeling multivariate dependencies is crucial in many practical applications, and various approaches to multivariate post-processing have been proposed where ensemble predictions are first post-processed separately in each margin and multivariate dependencies are then restored via copulas. These two-step methods share common key limitations, in particular the difficulty to include additional predictors in modeling the dependencies. We propose a novel multivariate post-processing method based on generative machine learning to address these challenges. In this new class of nonparametric data-driven distributional regression models, samples from the multivariate forecast distribution are directly obtained as output of a generative neural network. The generative model is trained by optimizing a proper scoring rule which measures the discrepancy between the generated and observed data, conditional on exogenous input variables. Our method does not require parametric assumptions on univariate distributions or multivariate dependencies and allows for incorporating arbitrary predictors. In two case studies on multivariate temperature and wind speed forecasting at weather stations over Germany, our generative model shows significant improvements over state-of-the-art methods and particularly improves the representation of spatial dependencies

arXiv.org e-Print Archive

Semantic Parsing with Dual Learning

Author: Cao Ruisheng
Li Jieyu
Liu Chen
Yu Kai
Zhu Su
Publication venue
Publication date: 01/01/2019
Field of study

Semantic parsing converts natural language queries into structured logical forms. The paucity of annotated training samples is a fundamental challenge in this field. In this work, we develop a semantic parsing framework with the dual learning algorithm, which enables a semantic parser to make full use of data (labeled and even unlabeled) through a dual-learning game. This game between a primal model (semantic parsing) and a dual model (logical form to query) forces them to regularize each other, and can achieve feedback signals from some prior-knowledge. By utilizing the prior-knowledge of logical form structures, we propose a novel reward signal at the surface and semantic levels which tends to generate complete and reasonable logical forms. Experimental results show that our approach achieves new state-of-the-art performance on ATIS dataset and gets competitive performance on Overnight dataset.Comment: Accepted by ACL 2019 Long Pape

arXiv.org e-Print Archive

Crossref

Improved Ferroelectric and Leakage Properties of Ce Doped in BiFeO 3

Author: Alima Bai
Jieyu Chen
Shifeng Zhao
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

Ce doped BiFeO3 thin films with a perovskite structure were prepared using solution-gelation method. It shows that the ferroelectric properties have been enhanced after doping Ce. The enhanced ferroelectric properties are attributed to the structural transformation and the reduced leakage current after doping rare metal of Ce. It has been found that the phase structures of the films transfer from rhombohedral symmetry structure to the coexistence of the tetragonal and orthorhombic symmetry structure. And Fe2+ ions have been reduced, which leads to the decreased leakage for Ce doped BiFeO3 thin films. The present work can provide an available way to improve the ferroelectric and leakage properties for multiferroic BiFeO3 based thin films

Crossref

Directory of Open Access Journals

DMA

Author: Chen Wang
Fei Wang
Jieyu Liu
Mengdan Li
Shengnan Liu
Shuhua Xi
Siqi Cao
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

Crossref

ASTormer: An AST Structure-aware Transformer Decoder for Text-to-SQL

Author: Cao Ruisheng
Chen Lu
Li Jieyu
Ma Da
Xu Hongshen
Yu Kai
Zhang Hanchong
Publication venue
Publication date: 28/10/2023
Field of study

Text-to-SQL aims to generate an executable SQL program given the user utterance and the corresponding database schema. To ensure the well-formedness of output SQLs, one prominent approach adopts a grammar-based recurrent decoder to produce the equivalent SQL abstract syntax tree (AST). However, previous methods mainly utilize an RNN-series decoder, which 1) is time-consuming and inefficient and 2) introduces very few structure priors. In this work, we propose an AST structure-aware Transformer decoder (ASTormer) to replace traditional RNN cells. The structural knowledge, such as node types and positions in the tree, is seamlessly incorporated into the decoder via both absolute and relative position embeddings. Besides, the proposed framework is compatible with different traversing orders even considering adaptive node selection. Extensive experiments on five text-to-SQL benchmarks demonstrate the effectiveness and efficiency of our structured decoder compared to competitive baselines

arXiv.org e-Print Archive

Generative machine learning methods for multivariate ensemble post-processing

Author: Chen Jieyu
Janke Tim
Lerch Sebastian
Steinke Florian
Publication venue: Karlsruher Institut für Technologie
Publication date: 26/09/2022
Field of study

Ensemble weather forecasts based on multiple runs of numerical weather prediction models typically show systematic errors and require post-processing to obtain reliable forecasts. Ac- curately modeling multivariate dependencies is crucial in many practical applications, and various approaches to multivariate post-processing have been proposed where ensemble pre- dictions are first post-processed separately in each margin and multivariate dependencies are then restored via copulas. These two-step methods share common key limitations, in particular the difficulty to include additional predictors in modeling the dependencies. We propose a novel multivariate post-processing method based on generative machine learning to address these challenges. In this new class of nonparametric data-driven distributional regression models, samples from the multivariate forecast distribution are directly obtained as output of a generative neural network. The generative model is trained by optimizing a proper scoring rule which measures the discrepancy between the generated and observed data, conditional on exogenous input variables. Our method does not require parametric assumptions on univariate distributions or multivariate dependencies and allows for incor- porating arbitrary predictors. In two case studies on multivariate temperature and wind speed forecasting at weather stations over Germany, our generative model shows significant improvements over state-of-the-art methods and particularly improves the representation of spatial dependencies

arXiv.org e-Print Archive

KITopen