Search CORE

3 research outputs found

An Algorithm for Matching Heterogeneous Financial Databases: a Case Study for COMPUSTAT/CRSP and I/B/E/S Databases

Author: Huerta Ramon
Rodriguez-Lujan Irene
Publication venue: 'Redfame Publishing'
Publication date: 01/01/2016
Field of study

Rigorous and proper linking of financial databases is a necessary step to test trading strategies incorporating multimodal sources of information. This paper proposes a machine learning solution to match companies in heterogeneous financial databases. Our method, named Financial Attribute Selection Distance (FASD), has two stages, each of them corresponding to one of the two interrelated tasks commonly involved in heterogeneous database matching problems: schema matching and entity matching. FASD's schema matching procedure is based on the Kullback-Leibler divergence of string and numeric attributes. FASD's entity matching solution relies on learning a company distance flexible enough to deal with the numeric and string attribute links found by the schema matching algorithm and incorporate different string matching approaches such as edit-based and token-based metrics. The parameters of the distance are optimized using the F-score as cost function. FASD is able to match the joint Compustat/CRSP and Institutional Brokers' Estimate System (I/B/E/S) databases with an F-score over 0.94 using only a hundred of manually labeled company links

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Redfame Publishing: E-Journals

Biblos-e Archivo

Entwicklung eines Matching- und Mappingverfahrens zur Verbesserung der XML-Schemaevolution

Author: Deffke Jan
Publication venue
Publication date: 01/08/2013
Field of study

Universität Rostock, Lehrstuhl Datenbank- und Informationssysteme: Dbis Repository