SEALM: Semantically Enriched Attributes with Language Models for Linkage Recommendation

Abstract

International Conference on Enterprise Information Systems (ICEIS 2025). Porto, Portugal, April 4-6, 2025Matching attributes from different repositories is an important step in the process of schema integration to consolidate heterogeneous data silos. In order to recommend linkages between relevant attributes, a contextually rich representation of each attribute is quite essential, particularly when more than two database schemas are to be integrated. This paper introduces the SEALM approach to generate a data catalog of semantically rich attribute descriptions using Generative Language Models based on a new technique that employs six variations of available metadata information. Instead of using raw attribute metadata, we generate SEALM descriptions, which are used to recommend linkages with an unsupervised matching pipeline that involves a novel multi-source Blocking algorithm. Experiments on multiple schemas yield a 5% to 20% recall improvement in recommending linkages with SEALM-based attribute descriptions generated by the tiniest Llama3.1:8B model compared to existing techniques. With SEALM, we only need to process the small fraction of attributes to be integrated rather than exhaustively inspecting all combinations of potential linkages.Leonard Traeger was partially supported by a Technology Catalyst Fund TCF24KAR11131049602 by UMBC and a grant project PLan CV (reference number 03FHP109) by the German Federal Ministry of Education and Research (BMBF) and Joint Science Conference (GWK)

Similar works

Full text

thumbnail-image

MD-SOAR Maryland Shared Open Access Repository

redirect
Last time updated on 06/06/2025

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.