4 research outputs found

    Discovering Synonyms and Other Related Words

    Get PDF
    Discovering synonyms and other related words among the words in a document collection can be seen as a clustering problem, where we expect the words in a cluster to be closely related to one another. The intuition is that words occurring in similar contexts tend to convey similar meaning. We introduce a way to use translation dictionaries for several languages to evaluate the rate of synonymy found in the word clusters. We also apply the information radius to calculating similarities between words using a full dependency syntactic feature space, and introduce a method for similarity recalculation during clustering as a fast approximation of the high-dimensional feature space. Finally, we show that 69-79% of the words in the clusters we discover are useful for thesaurus construction.Peer reviewe

    Mylly - The Mill : A new platform for processing speech and text corpora easily and efficiently

    Get PDF
    Speech and language researchers need to manage and analyze increasing quantities of material. Various tools are available for various stages of the work, but they often require the researcher to use different interfaces and to convert the output from each tool into suitable input for the next one. The Language Bank of Finland (Kielipankki) is developing an on-line platform called Mylly for processing speech and language data in a graphical user interface that integrates different tools into a single workflow. Mylly provides tools and computational resources for processing material and for the inspecting the results. The tools plugged into Mylly include a parser, morphological analyzers, generic finite-state technology, and a speech recognizer. Users can upload data and download any intermediate results in the tool chain. Mylly runs on CSC’s Taito cluster and is an instance of the Chipster platform. Access rights to Mylly are given for academic use. The Language Bank of Finland is a collection of corpora, tools and other services maintained by FIN-CLARIN, a consortium of Finnish universities and research organizations coordinated by the University of Helsinki. The technological infrastructure for the Language Bank of Finland is provided by CSC – IT Center for Science.Peer reviewe
    corecore