3 research outputs found

    An Algorithm for Matching Heterogeneous Financial Databases: a Case Study for COMPUSTAT/CRSP and I/B/E/S Databases

    Get PDF
    Rigorous and proper linking of financial databases is a necessary step to test trading strategies incorporating multimodal sources of information. This paper proposes a machine learning solution to match companies in heterogeneous financial databases. Our method, named Financial Attribute Selection Distance (FASD), has two stages, each of them corresponding to one of the two interrelated tasks commonly involved in heterogeneous database matching problems: schema matching and entity matching. FASD's schema matching procedure is based on the Kullback-Leibler divergence of string and numeric attributes. FASD's entity matching solution relies on learning a company distance flexible enough to deal with the numeric and string attribute links found by the schema matching algorithm and incorporate different string matching approaches such as edit-based and token-based metrics. The parameters of the distance are optimized using the F-score as cost function. FASD is able to match the joint Compustat/CRSP and Institutional Brokers' Estimate System (I/B/E/S) databases with an F-score over 0.94 using only a hundred of manually labeled company links
    corecore