21 research outputs found

    ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

    Full text link
    Text evaluation has historically posed significant challenges, often demanding substantial labor and time cost. With the emergence of large language models (LLMs), researchers have explored LLMs' potential as alternatives for human evaluation. While these single-agent-based approaches show promise, experimental results suggest that further advancements are needed to bridge the gap between their current effectiveness and human-level evaluation quality. Recognizing that best practices of human evaluation processes often involve multiple human annotators collaborating in the evaluation, we resort to a multi-agent debate framework, moving beyond single-agent prompting strategies. The multi-agent-based approach enables a group of LLMs to synergize with an array of intelligent counterparts, harnessing their distinct capabilities and expertise to enhance efficiency and effectiveness in handling intricate tasks. In this paper, we construct a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of generated responses from different models on open-ended questions and traditional natural language generation (NLG) tasks. Our analysis shows that ChatEval transcends mere textual scoring, offering a human-mimicking evaluation process for reliable assessments. Our code is available at https://github.com/chanchimin/ChatEval

    Effects of solar wind density and velocity variations on the Martian ionosphere and plasma transport - a MHD model study

    Get PDF
    Solar wind dynamic pressure, consisting solar wind density and velocity , is an important external driver that controls Martian plasma environment. In this study, a 3D magnetohydrodynamic model is applied to investigate the separate influences of solar wind density and velocity on the Martian ionosphere. The spatial distributions of ions in the dayside and near nightside ionosphere under different and are analyzed, as well as the ion transport process. We find that for the same dynamic pressure condition, the ionosphere extends to higher altitudes under higher solar wind density, indicating that a solar wind velocity enhancement event is more efficient at compressing the Martian ionosphere. A higher will result in a stronger induced magnetic field, shielding the Martian ionosphere, preventing the penetration of solar wind particles. For the same dynamic pressure, increasing (decreasing ) leads to a higher horizontal ion velocity, facilitating day-to-night plasma transport. As a result, the ionosphere extends farther into the nightside. Also, the ion outflow flux is larger for high , which may lead to a higher escape rate. Moreover, the strong crustal fields in the southern hemisphere also cause significant effect to the ionosphere, hindering horizontal ion transport. An additional outflow channel is also provided by the crustal field on the southern dayside, causing different responses of flow pattern between local and global scale while the solar wind condition is varied

    Semiparametric Estimation and Inference in Causal Inference and Measurement Error Models

    No full text
    This dissertation research has focused on theoretical and practical developments of semiparametric modeling and statistical inference for high dimensional data and measurement error data. In causal inference framework, when evaluating the effectiveness of medical treatments or social intervention policies, the average treatment effect becomes fundamentally important. We focus on propensity score modelling in treatment effect problems and develop new robust tools to overcome the curse of dimensionality. Furthermore, estimating and testing the effect of covariates of interest while accommodating many other covariates is an important problem in many scientific practices, including but not limited to empirical economics, public health and medical research. However when the covariates of interest are measured with error, to evaluate the effect precisely, we must reduce the bias caused by measurement error and adjust for the confounding effects simultaneously. We design a general methodology for a general single index semiparametric measurement error model and for a class of Poisson models, and introduce a bias-correction approach to construct a class of locally efficient estimators. We derive the corresponding estimating procedures and examine the asymptotic properties. Extensive simulation studies have been conducted to verify the performance of our semiparametric approaches

    Effective Learning During COVID-19: Multilevel Covariates Matching and Propensity Score Matching

    No full text
    In large-scale observational data with a hierarchical structure, both clusters and interventions often have more than two levels. Popular methods in the binary treatment literature do not naturally extend to the hierarchical multilevel treatment case. For example, most K-12 and universities have moved to an unprecedented hybrid learning module during the COVID-19 pandemic where learning modes include hybrid and fully remote learning, while students were clustered within a class and school region. It is challenging to evaluate the effectiveness of the learning outcomes of the multilevel treatments in a hierarchically data structured. In this paper, we study a covariates matching method and develop a generalized propensity score matching method to reduce the bias of estimation in the intervention effect. We also propose simple algorithms to assess the covariates balance for each approach. We examine the finite sample performance of the methods via simulation studies and apply the proposed methods to analyze the effectiveness of learning modes during the COVID-19 pandemic

    An alternative robust estimator of average treatment effect in causal inference

    No full text
    Summary The problem of estimating the average treatment effects is important when evaluating the effectiveness of medical treatments or social intervention policies. Most of the existing methods for estimating the average treatment effect rely on some parametric assumptions about the propensity score model or the outcome regression model one way or the other. In reality, both models are prone to misspecification, which can have undue influence on the estimated average treatment effect. We propose an alternative robust approach to estimating the average treatment effect based on observational data in the challenging situation when neither a plausible parametric outcome model nor a reliable parametric propensity score model is available. Our estimator can be considered as a robust extension of the popular class of propensity score weighted estimators. This approach has the advantage of being robust, flexible, data adaptive, and it can handle many covariates simultaneously. Adopting a dimension reduction approach, we estimate the propensity score weights semiparametrically by using a non‐parametric link function to relate the treatment assignment indicator to a low‐dimensional structure of the covariates which are formed typically by several linear combinations of the covariates. We develop a class of consistent estimators for the average treatment effect and study their theoretical properties. We demonstrate the robust performance of the estimators on simulated data and a real data example of investigating the effect of maternal smoking on babies’ birth weight

    Longitudinal Impacts of Religious Profiles on Substance Abuse Among Emerging Adults: A Fusion of Unsupervised and Supervised Learning Approach

    No full text
    This study aims to assess the longitudinal patterns of multifaceted religious profiles and their relationships with illegal substance abuse among young people transitioning from late adolescence to early adulthood. A novel longitudinal approach integrating the cutting-edge unsupervised and supervised learning techniques is proposed to analyze the data from the National Longitudinal Survey of Youth 1997. The results show that emerging adults who are highly religious in either subjective (e.g., religious beliefs) or objective (e.g., religious attendance) domain are much less likely to abuse illegal substances than their religiously disengaged peers. Religiosity, regardless of subjective or objective, tends to be protective, but its effect is most prominent among young people most profoundly devoted to both religious beliefs and behaviors. Nevertheless, possessing strong commitment to religious beliefs without accompanying frequent religious behaviors may put emerging adults at greater risk for illicit substance abuse, compared to those who hold high level of religious beliefs but do not engage in corresponding religious behaviors frequently

    Primitive Photosynthetic Architectures Based on Self-Organization and Chemical Evolution of Amino Acids and Metal Ions

    No full text
    The emergence of light-energy-utilizing metabolism is likely to be a critical milestone in prebiotic chemistry and the origin of life. However, how the primitive pigment is spontaneously generated still remains unknown. Herein, a primitive pigment model based on adaptive self-organization of amino acids (Cystine, Cys) and metal ions (zinc ion, Zn2+) followed by chemical evolution under hydrothermal conditions is developed. The resulting hybrid microspheres are composed of radially aligned cystine/zinc (Cys/Zn) assembly decorated with carbonate-doped zinc sulfide (C-ZnS) nanocrystals. The part of C-ZnS can work as a light-harvesting antenna to capture ultraviolet and visible light, and use it in various photochemical reactions, including hydrogen (H-2) evolution, carbon dioxide (CO2) photoreduction, and reduction of nicotinamide adenine dinucleotide (NAD(+)) to nicotinamide adenine dinucleotide hydride (NADH). Additionally, guest molecules (e.g., glutamate dehydrogenase, GDH) can be encapsulated within the hierarchical Cys/Zn framework, which facilitates sustainable photoenzymatic synthesis of glutamate. This study helps deepen insight into the emergent functionality (conversion of light energy) and complexity (hierarchical architecture) from interaction and reaction of prebiotic molecules. The primitive pigment model is also promising to work as an artificial photosynthetic microreactor

    A Systematic Comparison on Prevailing Intrusion Detection Models

    No full text
    Modern vehicles have become connected via On-Board Units (OBUs) involving many complex embedded and networked devices with steadily increasing processing and communication resources. Those devices exchange information through intra-vehicle networks to implement various functionalities and perform actions. Vehicles’ connectivity has also been extended to external networks through vehicle-to-everything technologies, enabling communications with other vehicles, infrastructures, and smart devices. In parallel to the significant increase in quality of service, the connectivity of modern vehicles raises their vulnerabilities to cyber-attacks targeting both intra-vehicle and external networks. To secure communications in vehicular networks, there has been a consistent effort to develop intrusion detection systems based on machine learning techniques to detect and ultimately react to malicious cyber-attacks. In this article, we study several machine learning algorithms, deep learning models, and hyper-parameter optimization techniques to detect vulnerability attacks on vehicular networks. Experimental results on well-known data sets such as CICIDS2017, NSL-KDD, IoTID20, KDDCup99, and UNSW-NB15 indicate that learning-based algorithms can detect various types of intrusion detection attacks with significant performance
    corecore