6 research outputs found
Large Language Models are Zero Shot Hypothesis Proposers
Significant scientific discoveries have driven the progress of human
civilisation. The explosion of scientific literature and data has created
information barriers across disciplines that have slowed the pace of scientific
discovery. Large Language Models (LLMs) hold a wealth of global and
interdisciplinary knowledge that promises to break down these information
barriers and foster a new wave of scientific discovery. However, the potential
of LLMs for scientific discovery has not been formally explored. In this paper,
we start from investigating whether LLMs can propose scientific hypotheses. To
this end, we construct a dataset consist of background knowledge and hypothesis
pairs from biomedical literature. The dataset is divided into training, seen,
and unseen test sets based on the publication date to control visibility. We
subsequently evaluate the hypothesis generation capabilities of various
top-tier instructed models in zero-shot, few-shot, and fine-tuning settings,
including both closed and open-source LLMs. Additionally, we introduce an
LLM-based multi-agent cooperative framework with different role designs and
external tools to enhance the capabilities related to generating hypotheses. We
also design four metrics through a comprehensive review to evaluate the
generated hypotheses for both ChatGPT-based and human evaluations. Through
experiments and analyses, we arrive at the following findings: 1) LLMs
surprisingly generate untrained yet validated hypotheses from testing
literature. 2) Increasing uncertainty facilitates candidate generation,
potentially enhancing zero-shot hypothesis generation capabilities. These
findings strongly support the potential of LLMs as catalysts for new scientific
discoveries and guide further exploration.Comment: Instruction Workshop @ NeurIPS 202
Robust semi-supervised classification based on data augmented online ELMs with deep features
Abstract
One important strategy in semi-supervised learning is to utilize the predicted pseudo labels of unlabeled data to relieve the overdependence on the ground truth of supervised learning algorithms. However, the performance of such kinds of semi-supervised methods heavily relies on the quality of pseudo labels. To address this issue, a robust semi-supervised classification method, named data augmented online extreme learning machines (ELMs) with deep features (DF-DAELM) is proposed. This method firstly extracts features and infers labels for unlabeled data through self-training. Then, with the learned features and inferred labels, two noise-robust shallow classifiers based on data augmentation (i.e., SLI-OELM and CR-OELM) are proposed to eliminate the adverse effects of noises on classifier training. Specifically, inspired by label smoothing, a data augmented method, SLI-OELM is designed based on stochastic linear interpolation to improve the robustness of classifiers based on ELMs. Furthermore, based on the smoothing assumption, the proposed CR-OELM utilizes an ℓ₂-norm consistency regularization term to implicitly weight noisy samples. Comprehensive experiments demonstrate that DF-DAELM achieves competitive or even better performance on CIFAR-10/100 and SVHN over the related state-of-the-art methods. Meanwhile, for the proposed classifiers, experimental results on the MNIST dataset with different noise levels and sample scales demonstrate their superior performance, especially when the sample scale is small (≤ 20 K) and the noise is strong (40% ~ 80% )
Synergistic organic dye degradation and hydrogen production using Bi2Te3/Te/C single-catalyst nanowires
Over-consumption of limited fossil fuels has caused serious environmental pollution and a global energy crisis, threatening human life and biodiversity. As an ideal, environmentally friendly renewable energy, hydrogen can satisfy human clean energy requirements. Therefore, whether hydrogen can be catalytically generated in the wastewater treatment process is a highly meaningful investigation. Herein, Bi2Te3/Te/C heterojunction nanowires with high specific surface area and rich pore structure were successfully synthesized. The efficient catalytic degradation process is accompanied by the generation of hydrogen. The catalytic degradation of methylene blue and methyl orange was achieved in less than 20 s and 150 s, respectively. Meanwhile, in scaled-up degradation/hydrogen production experiments, fast and efficient H2 production from NaBH4 can be realized in the presence of Bi2Te3/Te/C nanowires. The mechanism of efficient synergistic organic dye degradation and hydrogen production is due to the efficient carrier transfers and accumulation at the hetero-interface. In contrast to previous work, rapid degradation of organic dyes and hydrogen production by decomposition of NaBH4 were achieved without the help of high-cost catalysts such as precious metals. This work could provide an alternative pathway for the future degradation of organic matter in synergistic heterogeneous catalytic wastewater and recovery of by-products including hydrogen
The Sixth Visual Object Tracking VOT2018 Challenge Results
The Visual Object Tracking challenge VOT2018 is the sixth annual tracker benchmarking activity organized by the VOT initiative. Results of over eighty trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The evaluation included the standard VOT and other popular methodologies for short-term tracking analysis and a “real-time” experiment simulating a situation where a tracker processes images as if provided by a continuously running sensor. A long-term tracking subchallenge has been introduced to the set of standard VOT sub-challenges. The new subchallenge focuses on long-term tracking properties, namely coping with target disappearance and reappearance. A new dataset has been compiled and a performance evaluation methodology that focuses on long-term tracking capabilities has been adopted. The VOT toolkit has been updated to support both standard short-term and the new long-term tracking subchallenges. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website (http://votchallenge.net).Funding agencies: Slovenian research agencySlovenian Research Agency - Slovenia [P2-0214, P2-0094, J2-8175]; Czech Science FoundationGrant Agency of the Czech Republic [GACR P103/12/G084]; WASP; VR (EMC2); SSF (SymbiCloud); SNIC; AIT Strategic Research Programme 2017 Visua</p