3 research outputs found

    AUTOMATED KEYWORD GENERATION IN THE PUBLIC RADIO SECTOR USING WORD EMBEDDINGS

    Get PDF
    Public broadcasters find themselves in a difficult situation when it comes to digital offers. In more and more use cases, metadata is needed, e.g. to allow radio editors to search for content pieces, to set up content-based recommendation services, to allow users to browse by categories or tags, or to optimize content for search engines. They are in need of proper metadata to manage digital products and to offer new and timely services. Public broadcasters often have their own taxonomy of keywords at hand. The manual distilling of metadata in particular in form of keywords may however become a bottleneck in operation, whereas automatic keyword generation does not always provide the desired accuracy and also requires continuous human effort for training classifiers and controlling the accuracy. Building upon more recent techniques of word embedding we present a novel approach to assign keywords from a taxonomy to documents on the basis of distributed representation of words and documents that does not require annotation by human experts and evaluate it with a large dataset of a German nation-wide broadcaster. Preliminary results are promising that keywords can be automatically generated in an unsupervised way in the public radio sector

    Designing Radio in a Personalized World

    No full text
    Radio broadcasting is currently undergoing major changes. Radio broadcasting agencies experience a shift from the classic linear stream to non-linear personalized playouts on smart devices. Many broadcasting agencies are experimenting how to innovate their offerings, but designing personalized radio is not straightforward. We contribute a design science artefact that considers requirements from different stakeholders – listeners, broadcasting agencies and the public – and present design requirements and design principles with a corresponding architecture for personalized radio

    Improving Recall and Precision in Unsupervised Multi-Label Document Classification Tasks by Combining Word Embeddings with TF-IDF

    No full text
    Multi-label document classification is a common task and has become increasingly important for current business needs. However, generating keywords is not easily done as, next to methodological challenges, labeled training data for supervised classification does not always exist in the desired amount or quality. Therefore, methods that do not require labeled training data (e.g., unsupervised learning or statistical approaches) are valuable for practice. As none of these approaches alone provides optimal results in terms of recall and precision, we show that it is worth examining existing approaches for complementary strengths in order to combine them. We found such complementary strengths for an unsupervised word embedding method and the term frequency–inverse document frequency method (tfidf) and propose a combined approach. For evaluation, we test the combined approach on a data set from a public broadcaster in Germany and show that recall and precision can be significantly improved
    corecore