E-commerce applications such as faceted product search or product comparison
are based on structured product descriptions like attribute/value pairs. The
vendors on e-commerce platforms do not provide structured product descriptions
but describe offers using titles or descriptions. To process such offers, it is
necessary to extract attribute/value pairs from textual product attributes.
State-of-the-art attribute/value extraction techniques rely on pre-trained
language models (PLMs), such as BERT. Two major drawbacks of these models for
attribute/value extraction are that (i) the models require significant amounts
of task-specific training data and (ii) the fine-tuned models face challenges
in generalizing to attribute values not included in the training data. This
paper explores the potential of large language models (LLMs) as a training
data-efficient and robust alternative to PLM-based attribute/value extraction
methods. We consider hosted LLMs, such as GPT-3.5 and GPT-4, as well as
open-source LLMs based on Llama2. We evaluate the models in a zero-shot
scenario and in a scenario where task-specific training data is available. In
the zero-shot scenario, we compare various prompt designs for representing
information about the target attributes of the extraction. In the scenario with
training data, we investigate (i) the provision of example attribute values,
(ii) the selection of in-context demonstrations, and (iii) the fine-tuning of
GPT-3.5. Our experiments show that GPT-4 achieves an average F1-score of 85% on
the two evaluation datasets while the best PLM-based techniques perform on
average 5% worse using the same amount of training data. GPT-4 achieves a 10%
higher F1-score than the best open-source LLM. The fine-tuned GPT-3.5 model
reaches a similar performance as GPT-4 while being significantly more
cost-efficient