Prompt-engineered Large Language Models (LLMs) have gained widespread adoption across various applications due to their ability to
perform complex tasks without requiring additional training. Despite their impressive performance, there is considerable scope for improvement,
particularly in addressing the limitations of individual models. One promising avenue is the use of ensemble learning strategies, which
combine the strengths of multiple models to enhance overall performance. In this study, we investigate the effectiveness of stacking ensemble
techniques for chat-based LLMs in text classification tasks, with a focus on phishing URL detection. Notably, we introduce and evaluate
three stacking methods: (1) prompt-based stacking, which uses multiple prompts to generate diverse responses from a single LLM; (2) modelbased
stacking, which combines responses from multiple LLMs using a unified prompt; (3) hybrid stacking, which integrates the first two
approaches by employing multiple prompts across different LLMs to generate responses. For each of these methods, we explore meta-learners
of varying complexities, ranging from Logistic Regression to BERT. Additionally, we investigate the impact of including the input text as
a feature for the meta-learner. Our results demonstrate that stacking ensembles consistently outperform individual models, achieving superior
performance with minimal training and computational overhead
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.