16 research outputs found
Unveiling the Tapestry of Automated Essay Scoring: A Comprehensive Investigation of Accuracy, Fairness, and Generalizability
Automatic Essay Scoring (AES) is a well-established educational pursuit that
employs machine learning to evaluate student-authored essays. While much effort
has been made in this area, current research primarily focuses on either (i)
boosting the predictive accuracy of an AES model for a specific prompt (i.e.,
developing prompt-specific models), which often heavily relies on the use of
the labeled data from the same target prompt; or (ii) assessing the
applicability of AES models developed on non-target prompts to the intended
target prompt (i.e., developing the AES models in a cross-prompt setting).
Given the inherent bias in machine learning and its potential impact on
marginalized groups, it is imperative to investigate whether such bias exists
in current AES methods and, if identified, how it intervenes with an AES
model's accuracy and generalizability. Thus, our study aimed to uncover the
intricate relationship between an AES model's accuracy, fairness, and
generalizability, contributing practical insights for developing effective AES
models in real-world education. To this end, we meticulously selected nine
prominent AES methods and evaluated their performance using seven metrics on an
open-sourced dataset, which contains over 25,000 essays and various demographic
information about students such as gender, English language learner status, and
economic status. Through extensive evaluations, we demonstrated that: (1)
prompt-specific models tend to outperform their cross-prompt counterparts in
terms of predictive accuracy; (2) prompt-specific models frequently exhibit a
greater bias towards students of different economic statuses compared to
cross-prompt models; (3) in the pursuit of generalizability, traditional
machine learning models coupled with carefully engineered features hold greater
potential for achieving both high accuracy and fairness than complex neural
network models
Variants of Tagged Sentential Decision Diagrams
A recently proposed canonical form of Boolean functions, namely tagged
sentential decision diagrams (TSDDs), exploits both the standard and
zero-suppressed trimming rules. The standard ones minimize the size of
sentential decision diagrams (SDDs) while the zero-suppressed trimming rules
have the same objective as the standard ones but for zero-suppressed sentential
decision diagrams (ZSDDs). The original TSDDs, which we call zero-suppressed
TSDDs (ZTSDDs), firstly fully utilize the zero-suppressed trimming rules, and
then the standard ones. In this paper, we present a variant of TSDDs which we
call standard TSDDs (STSDDs) by reversing the order of trimming rules. We then
prove the canonicity of STSDDs and present the algorithms for binary operations
on TSDDs. In addition, we offer two kinds of implementations of STSDDs and
ZTSDDs and acquire three variations of the original TSDDs. Experimental
evaluations demonstrate that the four versions of TSDDs have the size advantage
over SDDs and ZSDDs
An Investigation of Darwiche and Pearl's Postulates for Iterated Belief Update
Belief revision and update, two significant types of belief change, both
focus on how an agent modify her beliefs in presence of new information. The
most striking difference between them is that the former studies the change of
beliefs in a static world while the latter concentrates on a
dynamically-changing world. The famous AGM and KM postulates were proposed to
capture rational belief revision and update, respectively. However, both of
them are too permissive to exclude some unreasonable changes in the iteration.
In response to this weakness, the DP postulates and its extensions for iterated
belief revision were presented. Furthermore, Rodrigues integrated these
postulates in belief update. Unfortunately, his approach does not meet the
basic requirement of iterated belief update. This paper is intended to solve
this problem of Rodrigues's approach. Firstly, we present a modification of the
original KM postulates based on belief states. Subsequently, we migrate several
well-known postulates for iterated belief revision to iterated belief update.
Moreover, we provide the exact semantic characterizations based on partial
preorders for each of the proposed postulates. Finally, we analyze the
compatibility between the above iterated postulates and the KM postulates for
belief update
TCon: A transparent congestion control deployment platform for optimizing WAN transfers
Nowadays, many web services (e.g., cloud storage) are deployed inside datacenters and may trigger transfers to clients through WAN. TCP congestion control is a vital component for improving the performance (e.g., latency) of these services. Considering complex networking environment, the default congestion control algorithms on servers may not always be the most efficient, and new advanced algorithms will be proposed. However, adjusting congestion control algorithm usually requires modification of TCP stacks of servers, which is difficult if not impossible, especially considering different operating systems and configurations on servers. In this paper, we propose TCon, a light-weight, flexible and scalable platform that allows administrators (or operators) to deploy any appropriate congestion control algorithms transparently without making any changes to TCP stacks of servers. We have implemented TCon in Open vSwitch (OVS) and conducted extensive test-bed experiments by transparently deploying BBR congestion control algorithm over TCon. Test-bed results show that the BBR over TCon works effectively and the performance stays close to its native implementation on servers, reducing latency by 12.76% on average
A fine-grained and transparent congestion control enforcement scheme
In practice, a single TCP congestion control is often used to handle all TCP connections on a Web server, e.g., Cubic for Linux by default. Considering complex and ever-changing networking environment, the default congestion control algorithm may not always be the most suitable one. Adjusting congestion control usually to meet different networking scenarios requires modification of servers' TCP stacks. This is difficult, if not impossible, due to various operating systems and different configurations on the servers. In this paper, we propose Mystique, a light-weight and flexible scheme that allows administrators (or operators) to deploy any congestion control schemes transparently without changing existing TCP stacks on servers. We have implemented Mystique in Open vSwitch (OVS) and conducted extensive test-bed experiments in public cloud environments. We have extensively evaluated Mystique and the results have demonstrated that it is able to effectively adapt to varying network conditions, and can always employ the most suitable congestion control for each TCP connection. Mystique can significantly reduce latency by up to 37.8% in comparison with other congestion controls
Mystique: a fine-grained and transparent congestion control enforcement scheme
TCP congestion control is a vital component for the latency of Web services. In practice, a single congestion control mechanism is often used to handle all TCP connections on a Web server, e.g., Cubic for Linux by default. Considering complex and ever-changing networking environment, the default congestion control may not always be the most suitable one. Adjusting congestion control to meet different networking scenarios usually requires modification of TCP stacks on a server. This is difficult, if not impossible, due to various operating system and application configurations on production servers. In this paper, we propose Mystique, a light-weight, flexible, and dynamic congestion control switching scheme that allows network or server administrators to deploy any congestion control schemes transparently without modifying existing TCP stacks on servers. We have implemented Mystique in Open vSwitch (OVS) and conducted extensive testbed experiments in both public and private cloud environments. Experiment results have demonstrated that Mystique is able to effectively adapt to varying network conditions, and can always employ the most suitable congestion control for each TCP connection. More specifically, Mystique can significantly reduce latency by 18.13% on average when compared with individual congestion controls
Diagnosing and Rectifying Fake OOD Invariance: A Restructured Causal Approach
Invariant representation learning (IRL) encourages the prediction from invariant causal features to labels deconfounded from the environments, advancing the technical roadmap of out-of-distribution (OOD) generalization. Despite spotlights around, recent theoretical result verified that some causal features recovered by IRLs merely pretend domain-invariantly in the training environments but fail in unseen domains. The fake invariance severely endangers OOD generalization since the trustful objective can not be diagnosed and existing causal remedies are invalid to rectify. In this paper, we review a IRL family (InvRat) under the Partially and Fully Informative Invariant Feature Structural Causal Models (PIIF SCM /FIIF SCM) respectively, to certify their weaknesses in representing fake invariant features, then, unify their causal diagrams to propose ReStructured SCM (RS-SCM). RS-SCM can ideally rebuild the spurious and the fake invariant features simultaneously. Given this, we further develop an approach based on conditional mutual information with respect to RS-SCM, then rigorously rectify the spurious and fake invariant effects. It can be easily implemented by a small feature selection subnet introduced in the IRL family, which is alternatively optimized to achieve our goal. Experiments verified the superiority of our approach to fight against the fake invariant issue across a variety of OOD generalization benchmarks
Generalized Linear Integer Numeric Planning
Classical planning aims to find a sequence of actions that guarantees goal achievement from an initial state. The representative framework of classical planning is based on propositional logic. Due to the weak expressiveness of propositional logic, many applications of interest cannot be formalized as a classical planning problem. Some extensions such as numeric planning and generalized planning (GP) are therefore proposed. Qualitative numeric planning (QNP) is a decidable class of numeric and generalized extensions and serves as a numeric abstraction of GP. However, QNP is still far from being perfect and needs further improvement. In this paper, we introduce another generalized version of numeric planning, namely generalized linear integer numeric planning(GLINP), which is a more suitable abstract framework of GP than QNP. In addition, we develop a general framework to synthesize solutions to GLINP problems. Finally, we evaluate our approach on a number of benchmarks, and experimental results justify the effectiveness and scalability of our proposed approach
Unveiling the Tapestry of Automated Essay Scoring: A Comprehensive Investigation of Accuracy, Fairness, and Generalizability
Automatic Essay Scoring (AES) is a well-established educational pursuit that employs machine learning to evaluate student-authored essays. While much effort has been made in this area, current research primarily focuses on either (i) boosting the predictive accuracy of an AES model for a specific prompt (i.e., developing prompt-specific models), which often heavily relies on the use of the labeled data from the same target prompt; or (ii) assessing the applicability of AES models developed on non-target prompts to the intended target prompt (i.e., developing the AES models in a cross-prompt setting). Given the inherent bias in machine learning and its potential impact on marginalized groups, it is imperative to investigate whether such bias exists in current AES methods and, if identified, how it intervenes with an AES model's accuracy and generalizability. Thus, our study aimed to uncover the intricate relationship between an AES model's accuracy, fairness, and generalizability, contributing practical insights for developing effective AES models in real-world education. To this end, we meticulously selected nine prominent AES methods and evaluated their performance using seven distinct metrics on an open-sourced dataset, which contains over 25,000 essays and various demographic information about students such as gender, English language learner status, and economic status. Through extensive evaluations, we demonstrated that: (1) prompt-specific models tend to outperform their cross-prompt counterparts in terms of predictive accuracy; (2) prompt-specific models frequently exhibit a greater bias towards students of different economic statuses compared to cross-prompt models; (3) in the pursuit of generalizability, traditional machine learning models (e.g., SVM) coupled with carefully engineered features hold greater potential for achieving both high accuracy and fairness than complex neural network models