250 research outputs found
Online Robot Introspection via Wrench-based Action Grammars
Robotic failure is all too common in unstructured robot tasks. Despite
well-designed controllers, robots often fail due to unexpected events. How do
robots measure unexpected events? Many do not. Most robots are driven by the
sense-plan act paradigm, however more recently robots are undergoing a
sense-plan-act-verify paradigm. In this work, we present a principled
methodology to bootstrap online robot introspection for contact tasks. In
effect, we are trying to enable the robot to answer the question: what did I
do? Is my behavior as expected or not? To this end, we analyze noisy wrench
data and postulate that the latter inherently contains patterns that can be
effectively represented by a vocabulary. The vocabulary is generated by
segmenting and encoding the data. When the wrench information represents a
sequence of sub-tasks, we can think of the vocabulary forming a sentence (set
of words with grammar rules) for a given sub-task; allowing the latter to be
uniquely represented. The grammar, which can also include unexpected events,
was classified in offline and online scenarios as well as for simulated and
real robot experiments. Multiclass Support Vector Machines (SVMs) were used
offline, while online probabilistic SVMs were are used to give temporal
confidence to the introspection result. The contribution of our work is the
presentation of a generalizable online semantic scheme that enables a robot to
understand its high-level state whether nominal or abnormal. It is shown to
work in offline and online scenarios for a particularly challenging contact
task: snap assemblies. We perform the snap assembly in one-arm simulated and
real one-arm experiments and a simulated two-arm experiment. This verification
mechanism can be used by high-level planners or reasoning systems to enable
intelligent failure recovery or determine the next most optima manipulation
skill to be used.Comment: arXiv admin note: substantial text overlap with arXiv:1609.0494
A Novel Prognostic Predictor of Immune Micro-environment and Therapeutic Response in Kidney Renal Clear Cell Carcinoma based on Necroptosis-related Gene Signature
Background: Necroptosis, a cell death of caspase-independence, plays a pivotal role in cancer biological regulation. Although necroptosis is closely associated with oncogenesis, cancer metastasis, and immunity, there remains a lack of studies determining the role of necroptosis-related genes (NRGs) in the highly immunogenic cancer type, kidney renal clear cell carcinoma (KIRC). Methods: The information of clinicopathology and transcriptome was extracted from TCGA database. Following the division into the train and test cohorts, a three-NRGs (TLR3, FASLG, ZBP1) risk model was identified in train cohort by LASSO regression. The overall survival (OS) comparison was conducted between different risk groups through Kaplan-Meier analysis, which was further validated in test cohort. The Cox proportional hazards regression model was introduced to assess its impact of clinicopathological factors and risk score on survival. ESTIMATE and CIBERSORT algorithms were introduced to evaluate immune microenvironment, while enrichment analysis was conducted to explore the biological significance. Correlation analysis was applied for the correlation assessment between checkpoint gene expression and risk score, between gene expression and therapeutic response. Gene expressions from TCGA were verified by GEO datasets and immunohistochemistry (IHC) analysis. Results: This NRGs-related signature predicted poorer OS in high-risk group, which was also verified in test cohort. Risk score could also independently predict survival outcome of KIRC. Significant changes were also found in immune microenvironment and checkpoint gene expressions between different risk groups, with immune functional enrichment in high-risk group. Interestingly, therapeutic response was correlated with the expressions of NRGs. The expressions of NRGs from TCGA were consistent with those from GEO datasets and IHC analysis. Conclusion: The NRGs-related signature functions as a novel prognostic predictor of immune microenvironment and therapeutic response in KIRC
Research into Satisfaction with Industrial Heritage Renewal Based on the SEM-IPA Model: A Case Study of the Dongguan Jianyuzhou Park
The objective of this study was to establish a satisfaction evaluation model of industrial heritage renewal by evaluating the performance of old industrial buildings after renewal, then judging the adaptability and accuracy of the transformation strategy and putting forward targeted improvement suggestions. The data were collected through a questionnaire survey and analyzed by the statistical package for social science (SPSS) software using a structural equation model (SEM) and importance–performance analysis (IPA). The results show that (1) history, culture, and placeness have the largest average satisfaction and a positive and significant impact on overall satisfaction; (2) the building regeneration effect has a significant impact on overall satisfaction; and (3) social influence, external environmental perception, and economic benefits have no significant effect on overall satisfaction, indicating the need to strengthen public participation for further research into environmental creation, market-based operation, and land conservation. This case study reflects the distinctive features of internationalization and localization, and provides an indication of local practice in China for world urban renewal and architectural heritage theories
Fake Alignment: Are LLMs Really Aligned Well?
The growing awareness of safety concerns in large language models (LLMs) has
sparked considerable interest in the evaluation of safety within current
research endeavors. This study investigates an interesting issue pertaining to
the evaluation of LLMs, namely the substantial discrepancy in performance
between multiple-choice questions and open-ended questions. Inspired by
research on jailbreak attack patterns, we argue this is caused by mismatched
generalization. That is, the LLM does not have a comprehensive understanding of
the complex concept of safety. Instead, it only remembers what to answer for
open-ended safety questions, which makes it unable to solve other forms of
safety tests. We refer to this phenomenon as fake alignment and construct a
comparative benchmark to empirically verify its existence in LLMs. Such fake
alignment renders previous evaluation protocols unreliable. To address this, we
introduce the Fake alIgNment Evaluation (FINE) framework and two novel
metrics--Consistency Score (CS) and Consistent Safety Score (CSS), which
jointly assess two complementary forms of evaluation to quantify fake alignment
and obtain corrected performance estimates. Applying FINE to 14 widely-used
LLMs reveals several models with purported safety are poorly aligned in
practice. Our work highlights potential limitations in prevailing alignment
methodologies
Evaluating Hallucinations in Chinese Large Language Models
In this paper, we establish a benchmark named HalluQA (Chinese Hallucination
Question-Answering) to measure the hallucination phenomenon in Chinese large
language models. HalluQA contains 450 meticulously designed adversarial
questions, spanning multiple domains, and takes into account Chinese historical
culture, customs, and social phenomena. During the construction of HalluQA, we
consider two types of hallucinations: imitative falsehoods and factual errors,
and we construct adversarial samples based on GLM-130B and ChatGPT. For
evaluation, we design an automated evaluation method using GPT-4 to judge
whether a model output is hallucinated. We conduct extensive experiments on 24
large language models, including ERNIE-Bot, Baichuan2, ChatGLM, Qwen, SparkDesk
and etc. Out of the 24 models, 18 achieved non-hallucination rates lower than
50%. This indicates that HalluQA is highly challenging. We analyze the primary
types of hallucinations in different types of models and their causes.
Additionally, we discuss which types of hallucinations should be prioritized
for different types of models.Comment: Work in progres
- …