368 research outputs found
Raman fingerprint of semi-metal WTe2 from bulk to monolayer
Tungsten ditelluride (WTe2), a layered transition-metal dichalcogenide (TMD),
has recently demonstrated an extremely large magnetoresistance effect, which is
unique among TMDs. This fascinating feature seems to be correlated with its
special electronic structure. Here, we report the observation of 6 Raman peaks
corresponding to the A_2^4, A_1^9, A_1^8, A_1^6, A_1^5 and A_1^2 phonons, from
the 33 Raman-active modes predicted for WTe2. This provides direct evidence to
distinguish the space group of WTe2 from that of other TMDs. Moreover, the
Raman evolution of WTe2 from bulk to monolayer is clearly revealed. It is
interesting to find that the A_2^4 mode, centered at ~109.8 cm-1, is forbidden
in a monolayer, which may be attributable to the transition of the point group
from C2v (bulk) to C2h (monolayer). Our work characterizes all observed Raman
peaks in the bulk and few-layer samples and provides a route to study the
physical properties of two-dimensional WTe2.Comment: 19 pages, 4 figures and 2 table
Postglacial sea-level change: novel insights from physical and statistical modelling
Developing accurate projections of future sea-level change is a key challenge for the entire science community under the current warming climate. Due to the fact that modern instrumental sea-level observations are only available since the 19-20th century, sea-level projections based on them can only capture short-term effects, leaving physical processes that dominate over longer timescales underestimated. Therefore, an essential step towards accurate and robust long-term sea-level projections is to investigate the physical processes that impact the spatio-temporal evolution of sea-level change over centennial to millennial timescales. Due to sometimes scarce and often noisy palaeo sea-level observations, mechanisms of sea-level change over geological timescales are still not well-understood, with many outstanding questions to be resolved. This thesis develops novel physical and statistical models to better understand the mechanisms behind postglacial sea-level change. Specifically, this thesis focuses on three outstanding problems that are not only important in postglacial sea-level change but also in understanding past ice sheet dynamics and palaeoclimate change.
Firstly, a statistical framework is developed to invert the sources of meltwater pulse 1A, the largest and most rapid global sea-level rise event of the last deglaciation, with sophisticated treatment of uncertainties associated with sea-level reconstructions and geophysical modelling. The results suggest there were contributions from North America, 12.0 m (5.6-15.4 m; 95% probability), Scandinavia, 4.6 m (3.2-6.4 m), and Antarctica, 1.3 m (0-5.9 m), giving a total global mean sea-level rise of 17.9 m (15.7-20.2 m) in 500 years.
Secondly, the missing ice problem (distinctive imbalance between observed global mean sea-level rise and the reconstructed amount of ice-sheet melt) is revisited by including an extra physical process (sediment isostatic adjustment, SIA) which has not been considered in this problem before. In particular, this thesis investigates the impact of SIA on local RSL variation across the Great Barrier Reef (GBR), the world's largest mixed carbonate-siliciclastic sediment system. Based on a Bayesian calibration method, SIA can contribute up to 1.1 m relative sea-level rise in the outer shelf of the southern central GBR from 28 ka to present. Because the SIA-induced RSL rise is unrelated to ice mass loss, failing to correct for this signal will lead to systematic overestimation of grounded ice volume. Therefore, incorporating the SIA process will reduce the global grounded ice volume estimate for the Last Glacial Maximum (LGM), which can help to mitigate the missing ice problem.
Lastly, robust global barystatic sea-level maps with minimum dependency on the detailed geometry of past ice sheet change are reconstructed. Estimating such maps requires physical simulation of relative sea-level corresponding to thousands of different ice histories, which is computationally prohibitive. To improve this situation, this thesis develops a statistical emulator which can mimic the behaviour of a physics-based model and is computationally much cheaper to evaluate. The results highlight the Seychelles as an exceptionally good place to map barystatic sea level throughout the last deglaciation because RSL at this location only slightly departs from global barystatic sea level, with minor dependency on the assumed ice history.
Together, these physical and statistical models present powerful tools to yield novel insights into postglacial sea-level change mechanisms and hence they have the potential to yield more robust, accurate and trust-worthy sea-level change projections
An Open Source Data Contamination Report for Large Language Models
Data contamination in model evaluation has become increasingly prevalent with
the growing popularity of large language models. It allows models to "cheat"
via memorisation instead of displaying true capabilities. Therefore,
contamination analysis has become an crucial part of reliable model evaluation
to validate results. However, existing contamination analysis is usually
conducted internally by large language model developers and often lacks
transparency and completeness. This paper presents an extensive data
contamination report for over 15 popular large language models across six
popular multiple-choice QA benchmarks. We also introduce an open-source
pipeline that enables the community to perform contamination analysis on
customised data and models. Our experiments reveal varying contamination levels
ranging from 1\% to 45\% across benchmarks, with the contamination degree
increasing rapidly over time. Performance analysis of large language models
indicates that data contamination does not necessarily lead to increased model
metrics: while significant accuracy boosts of up to 14\% and 7\% are observed
on contaminated C-Eval and Hellaswag benchmarks, only a minimal increase is
noted on contaminated MMLU. We also find larger models seem able to gain more
advantages than smaller models on contaminated test sets
LatestEval: Addressing Data Contamination in Language Model Evaluation through Dynamic and Time-Sensitive Test Construction
Data contamination in evaluation is getting increasingly prevalent with the
emergence of language models pre-trained on super large, automatically crawled
corpora. This problem leads to significant challenges in the accurate
assessment of model capabilities and generalisations. In this paper, we propose
LatestEval, an automatic method that leverages the most recent texts to create
uncontaminated reading comprehension evaluations. LatestEval avoids data
contamination by only using texts published within a recent time window,
ensuring no overlap with the training corpora of pre-trained language models.
We develop the LatestEval automated pipeline to 1) gather the latest texts; 2)
identify key information, and 3) construct questions targeting the information
while removing the existing answers from the context. This encourages models to
infer the answers themselves based on the remaining context, rather than just
copy-paste. Our experiments demonstrate that language models exhibit negligible
memorisation behaviours on LatestEval as opposed to previous benchmarks,
suggesting a significantly reduced risk of data contamination and leading to a
more robust evaluation. Data and code are publicly available at:
https://github.com/liyucheng09/LatestEval.Comment: AAAI 202
Evaluating Large Language Models for Generalization and Robustness via Data Compression
Existing methods for evaluating large language models face challenges such as
data contamination, sensitivity to prompts, and the high cost of benchmark
creation. To address this, we propose a lossless data compression based
evaluation approach that tests how models' predictive abilities generalize
after their training cutoff. Specifically, we collect comprehensive test data
spanning 83 months from 2017 to 2023 and split the data into training and
testing periods according to models' training data cutoff. We measure: 1) the
compression performance on the testing period as a measure of generalization on
unseen data; and 2) the performance gap between the training and testing period
as a measure of robustness. Our experiments test 14 representative large
language models with various sizes on sources including Wikipedia, news
articles, code, arXiv papers, and multi-modal data. We find that the
compression rate of many models reduces significantly after their cutoff date,
but models such as Mistral and Llama-2 demonstrate a good balance between
performance and robustness. Results also suggest that models struggle to
generalize on news and code data, but work especially well on arXiv papers. We
also find the context size and tokenization implementation have a big impact of
on the overall compression performance
Compressing Context to Enhance Inference Efficiency of Large Language Models
Large language models (LLMs) achieved remarkable performance across various
tasks. However, they face challenges in managing long documents and extended
conversations, due to significantly increased computational requirements, both
in memory and inference time, and potential context truncation when the input
exceeds the LLM's fixed context length. This paper proposes a method called
Selective Context that enhances the inference efficiency of LLMs by identifying
and pruning redundancy in the input context to make the input more compact. We
test our approach using common data sources requiring long context processing:
arXiv papers, news articles, and long conversations, on tasks of summarisation,
question answering, and response generation. Experimental results show that
Selective Context significantly reduces memory cost and decreases generation
latency while maintaining comparable performance compared to that achieved when
full context is used. Specifically, we achieve a 50\% reduction in context
cost, resulting in a 36\% reduction in inference memory usage and a 32\%
reduction in inference time, while observing only a minor drop of .023 in
BERTscore and .038 in faithfulness on four downstream applications, indicating
that our method strikes a good balance between efficiency and performance.Comment: EMNLP 2023. arXiv admin note: substantial text overlap with
arXiv:2304.12102; text overlap with arXiv:2303.11076 by other author
Metaphor Detection via Explicit Basic Meanings Modelling
One noticeable trend in metaphor detection is the embrace of linguistic
theories such as the metaphor identification procedure (MIP) for model
architecture design. While MIP clearly defines that the metaphoricity of a
lexical unit is determined based on the contrast between its \textit{contextual
meaning} and its \textit{basic meaning}, existing work does not strictly follow
this principle, typically using the \textit{aggregated meaning} to approximate
the basic meaning of target words. In this paper, we propose a novel metaphor
detection method, which models the basic meaning of the word based on literal
annotation from the training set, and then compares this with the contextual
meaning in a target sentence to identify metaphors. Empirical results show that
our method outperforms the state-of-the-art method significantly by 1.0\% in F1
score. Moreover, our performance even reaches the theoretical upper bound on
the VUA18 benchmark for targets with basic annotations, which demonstrates the
importance of modelling basic meanings for metaphor detection.Comment: ACL 202
On Robustness and Generalization of ML-Based Congestion Predictors to Valid and Imperceptible Perturbations
There is substantial interest in the use of machine learning (ML)-based
techniques throughout the electronic computer-aided design (CAD) flow,
particularly methods based on deep learning. However, while deep learning
methods have achieved state-of-the-art performance in several applications,
recent work has demonstrated that neural networks are generally vulnerable to
small, carefully chosen perturbations of their input (e.g. a single pixel
change in an image). In this work, we investigate robustness in the context of
ML-based EDA tools -- particularly for congestion prediction. As far as we are
aware, we are the first to explore this concept in the context of ML-based EDA.
We first describe a novel notion of imperceptibility designed specifically
for VLSI layout problems defined on netlists and cell placements. Our
definition of imperceptibility is characterized by a guarantee that a
perturbation to a layout will not alter its global routing. We then demonstrate
that state-of-the-art CNN and GNN-based congestion models exhibit brittleness
to imperceptible perturbations. Namely, we show that when a small number of
cells (e.g. 1%-5% of cells) have their positions shifted such that a measure of
global congestion is guaranteed to remain unaffected (e.g. 1% of the design
adversarially shifted by 0.001% of the layout space results in a predicted
decrease in congestion of up to 90%, while no change in congestion is implied
by the perturbation). In other words, the quality of a predictor can be made
arbitrarily poor (i.e. can be made to predict that a design is
"congestion-free") for an arbitrary input layout. Next, we describe a simple
technique to train predictors that improves robustness to these perturbations.
Our work indicates that CAD engineers should be cautious when integrating
neural network-based mechanisms in EDA flows to ensure robust and high-quality
results.Comment: 7 pages, 7 figure
- …