30 research outputs found
Concept Design and Analysis of a Novel Steamer-Filling Robot
Steamer-filling operation is a crucially important process in the liquor-making process, directly related to liquor yield and liquor quality. But so far, this process is still dominated by manual operation. In view of working environment and labor shortages in this industry, a novel exclusive steamer-filling robot is proposed in this paper. Firstly, the steamer-filling operation process is described, and the structure composition and function realization of the robot are particularly introduced. Secondly, the kinematics problems in terms of position analysis and workspace of the robot are analyzed in detail. Thirdly, experimental analyses are made to prove the validity and efficiency of the robot system. Finally, some conclusions and the future developing direction are prescribed
Estimating Early Fundraising Performance of Innovations via Graph-based Market Environment Model
Well begun is half done. In the crowdfunding market, the early fundraising
performance of the project is a concerned issue for both creators and
platforms. However, estimating the early fundraising performance before the
project published is very challenging and still under-explored. To that end, in
this paper, we present a focused study on this important problem in a market
modeling view. Specifically, we propose a Graph-based Market Environment model
(GME) for estimating the early fundraising performance of the target project by
exploiting the market environment. In addition, we discriminatively model the
market competition and market evolution by designing two graph-based neural
network architectures and incorporating them into the joint optimization stage.
Finally, we conduct extensive experiments on the real-world crowdfunding data
collected from Indiegogo.com. The experimental results clearly demonstrate the
effectiveness of our proposed model for modeling and estimating the early
fundraising performance of the target project
KMF: Knowledge-Aware Multi-Faceted Representation Learning for Zero-Shot Node Classification
Recently, Zero-Shot Node Classification (ZNC) has been an emerging and
crucial task in graph data analysis. This task aims to predict nodes from
unseen classes which are unobserved in the training process. Existing work
mainly utilizes Graph Neural Networks (GNNs) to associate features' prototypes
and labels' semantics thus enabling knowledge transfer from seen to unseen
classes. However, the multi-faceted semantic orientation in the
feature-semantic alignment has been neglected by previous work, i.e. the
content of a node usually covers diverse topics that are relevant to the
semantics of multiple labels. It's necessary to separate and judge the semantic
factors that tremendously affect the cognitive ability to improve the
generality of models. To this end, we propose a Knowledge-Aware Multi-Faceted
framework (KMF) that enhances the richness of label semantics via the extracted
KG (Knowledge Graph)-based topics. And then the content of each node is
reconstructed to a topic-level representation that offers multi-faceted and
fine-grained semantic relevancy to different labels. Due to the particularity
of the graph's instance (i.e., node) representation, a novel geometric
constraint is developed to alleviate the problem of prototype drift caused by
node information aggregation. Finally, we conduct extensive experiments on
several public graph datasets and design an application of zero-shot
cross-domain recommendation. The quantitative results demonstrate both the
effectiveness and generalization of KMF with the comparison of state-of-the-art
baselines
A Survey on Multimodal Large Language Models
Multimodal Large Language Model (MLLM) recently has been a new rising
research hotspot, which uses powerful Large Language Models (LLMs) as a brain
to perform multimodal tasks. The surprising emergent capabilities of MLLM, such
as writing stories based on images and OCR-free math reasoning, are rare in
traditional methods, suggesting a potential path to artificial general
intelligence. In this paper, we aim to trace and summarize the recent progress
of MLLM. First of all, we present the formulation of MLLM and delineate its
related concepts. Then, we discuss the key techniques and applications,
including Multimodal Instruction Tuning (M-IT), Multimodal In-Context Learning
(M-ICL), Multimodal Chain of Thought (M-CoT), and LLM-Aided Visual Reasoning
(LAVR). Finally, we discuss existing challenges and point out promising
research directions. In light of the fact that the era of MLLM has only just
begun, we will keep updating this survey and hope it can inspire more research.
An associated GitHub link collecting the latest papers is available at
https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models.Comment: Project
page:https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Model
Deep Technology Tracing for High-tech Companies
Technological change and innovation are vitally important, especially for
high-tech companies. However, factors influencing their future research and
development (R&D) trends are both complicated and various, leading it a quite
difficult task to make technology tracing for high-tech companies. To this end,
in this paper, we develop a novel data-driven solution, i.e., Deep Technology
Forecasting (DTF) framework, to automatically find the most possible technology
directions customized to each high-tech company. Specially, DTF consists of
three components: Potential Competitor Recognition (PCR), Collaborative
Technology Recognition (CTR), and Deep Technology Tracing (DTT) neural network.
For one thing, PCR and CTR aim to capture competitive relations among
enterprises and collaborative relations among technologies, respectively. For
another, DTT is designed for modeling dynamic interactions between companies
and technologies with the above relations involved. Finally, we evaluate our
DTF framework on real-world patent data, and the experimental results clearly
prove that DTF can precisely help to prospect future technology emphasis of
companies by exploiting hybrid factors.Comment: 6 pages, 7 figure
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Hallucination is a big shadow hanging over the rapidly evolving Multimodal
Large Language Models (MLLMs), referring to the phenomenon that the generated
text is inconsistent with the image content. In order to mitigate
hallucinations, existing studies mainly resort to an instruction-tuning manner
that requires retraining the models with specific data. In this paper, we pave
a different way, introducing a training-free method named Woodpecker. Like a
woodpecker heals trees, it picks out and corrects hallucinations from the
generated text. Concretely, Woodpecker consists of five stages: key concept
extraction, question formulation, visual knowledge validation, visual claim
generation, and hallucination correction. Implemented in a post-remedy manner,
Woodpecker can easily serve different MLLMs, while being interpretable by
accessing intermediate outputs of the five stages. We evaluate Woodpecker both
quantitatively and qualitatively and show the huge potential of this new
paradigm. On the POPE benchmark, our method obtains a 30.66%/24.33% improvement
in accuracy over the baseline MiniGPT-4/mPLUG-Owl. The source code is released
at https://github.com/BradyFU/Woodpecker.Comment: 16 pages, 7 figures. Code Website:
https://github.com/BradyFU/Woodpecke
Efficiently Measuring the Cognitive Ability of LLMs: An Adaptive Testing Perspective
Large language models (LLMs), like ChatGPT, have shown some human-like
cognitive abilities. For comparing these abilities of different models, several
benchmarks (i.e. sets of standard test questions) from different fields (e.g.,
Literature, Biology and Psychology) are often adopted and the test results
under traditional metrics such as accuracy, recall and F1, are reported.
However, such way for evaluating LLMs can be inefficient and inaccurate from
the cognitive science perspective. Inspired by Computerized Adaptive Testing
(CAT) used in psychometrics, we propose an adaptive testing framework for LLM
evaluation. Rather than using a standard test set and simply reporting
accuracy, this approach dynamically adjusts the characteristics of the test
questions, such as difficulty, based on the model's performance. This allows
for a more accurate estimation of the model's abilities, using fewer questions.
More importantly, it allows LLMs to be compared with humans easily, which is
essential for NLP models that aim for human-level ability. Our diagnostic
reports have found that ChatGPT often behaves like a ``careless student'',
prone to slip and occasionally guessing the questions. We conduct a
fine-grained diagnosis and rank the latest 6 instruction-tuned LLMs from three
aspects of Subject Knowledge, Mathematical Reasoning, and Programming, where
GPT4 can outperform other models significantly and reach the cognitive ability
of middle-level students. Different tests for different models using efficient
adaptive testing -- we believe this has the potential to become a new norm in
evaluating large language models