Search CORE

3,325 research outputs found

Experiment on Methods for Clustering and Categorization of Polish Text

Author: Dabrowska-Boruch Agnieszka
Fraczek Rafał
Jamro Ernest
Pietroń Marcin
Russek Paweł
Wiatr Kazimierz
Wielgosz Maciej
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 09/05/2017
Field of study

The main goal of this work was to experimentally verify the methods for a challenging task of categorization and clustering Polish text. Supervised and unsupervised learning was employed respectively for the categorization and clustering. A profound examination of the employed methods was done for the custom-built corpus of Polish texts. The corpus was assembled by the authors from Internet resources. The corpus data was acquired from the news portal and, therefore, it was sorted by type by journalists according to their specialization. The presented algorithms employ Vector Space Model (VSM) and TF-IDF (Term Frequency-Inverse Document Frequency) weighing scheme. Series of experiments were conducted that revealed certain properties of algorithms and their accuracy. The accuracy of algorithms was elaborated regarding their ability to match human arrangement of the documents by the topic. For both the categorization and clustering, the authors used F-measure to assess the quality of allocation

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology

Author: Balahur A.
Bautin M.
Dryer M. S.
Esuli A.
Güngördü Z.
Krizhevsky A.
Lang P.
Lee J. H.
McCarthy E. D.
Mesquita B.
Mihalcea R.
Mikolov T.
Plutchik R.
Schmid H.
Vessel E. A.
You Q.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/08/2015
Field of study

Every culture and language is unique. Our work expressly focuses on the uniqueness of culture and language in relation to human affect, specifically sentiment and emotion semantics, and how they manifest in social multimedia. We develop sets of sentiment- and emotion-polarized visual concepts by adapting semantic structures called adjective-noun pairs, originally introduced by Borth et al. (2013), but in a multilingual context. We propose a new language-dependent method for automatic discovery of these adjective-noun constructs. We show how this pipeline can be applied on a social multimedia platform for the creation of a large-scale multilingual visual sentiment concept ontology (MVSO). Unlike the flat structure in Borth et al. (2013), our unified ontology is organized hierarchically by multilingual clusters of visually detectable nouns and subclusters of emotionally biased versions of these nouns. In addition, we present an image-based prediction task to show how generalizable language-specific models are in a multilingual context. A new, publicly available dataset of >15.6K sentiment-biased visual concepts across 12 languages with language-specific detector banks, >7.36M images and their metadata is also released.Comment: 11 pages, to appear at ACM MM'1

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Spectral Clustering Wikipedia Keyword-Based Search Results

Author: Julian Szymański
Tomasz Dziubich
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2017
Field of study

Frontiers - Publisher Connector

Comparison of Latent Semantic Analysis and Probabilistic Latent Semantic Analysis for Documents Clustering

Author: Kitowski Jacek
Kuta Marcin
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 04/02/2015
Field of study

In this paper we compare usefulness of statistical techniques of dimensionality reduction for improving clustering of documents in Polish. We start with partitional and agglomerative algorithms applied to Vector Space Model. Then we investigate two transformations: Latent Semantic Analysis and Probabilistic Latent Semantic Analysis. The obtained results showed advantage of Latent Semantic Analysis technique over probabilistic model. We also analyse time and memory consumption aspects of these transformations and present runtime details for IBM BladeCenter HS21 machine

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Benchmarking High Performance Architectures With Natural Language Processing Algorithms

Author: Jacek Kitowski
Marcin Kuta
Publication venue: AGH University of Science and Technology Press
Publication date: 01/01/2011
Field of study

Natural Language Processing algorithms are resource demanding, especially when tuning toinflective language like Polish is needed. The paper presents time and memory requirementsof part of speech tagging and clustering algorithms applied to two corpora of the Polishlanguage. The algorithms are benchmarked on three high performance platforms of differentarchitectures. Additionally sequential versions and OpenMP implementations of clusteringalgorithms were compared

AGH (Akademia Górniczo-Hutnicza) University of Science and Technology: Journals

Computer Science Journal (AGH University of Science and Technology, Krakow)

Directory of Open Access Journals

Automatic domain-specific learning: towards a methodology for ontology enrichment

Author: Mestre-Mestre Eva M.
Ureña Gómez-Moreno Pedro
Publication venue: Serv. Publicaciones de la Universidad de Las Palmas de Gran Canaria
Publication date: 01/01/2017
Field of study

[EN] At the current rate of technological development, in a world where enormous amount of data are constantly created and in which the Internet is used as the primary means for information exchange, there exists a need for tools that help processing, analyzing and using that information. However, while the growth of information poses many opportunities for social and scientific advance, it has also highlighted the difficulties of extracting meaningful patterns from massive data. Ontologies have been claimed to play a major role in the processing of large-scale data, as they serve as universal models of knowledge representation, and are being studied as possible solutions to this. This paper presents a method for the automatic expansion of ontologies based on corpus and terminological data exploitation. The proposed ¿ontology enrichment method¿ (OEM) consists of a sequence of tasks aimed at classifying an input keyword automatically under its corresponding node within a target ontology. Results prove that the method can be successfully applied for the automatic classification of specialized units into a reference ontology.Financial support for this research has been provided by the DGI, Spanish Ministry of Education and Science, grant FFI2011-29798-C0201.Ureña Gómez-Moreno, P.; Mestre-Mestre, EM. (2017). Automatic domain-specific learning: towards a methodology for ontology enrichment. LFE. Revista de Lenguas para Fines Específicos. 23(2):63-85. http://hdl.handle.net/10251/148357S638523

RiuNet

Portal digital de revistas científicas de la ULPGC (Universidad de Las Palmas de Gran Canaria)

DIALNET

Natural language processing and cognitive science : proceedings 2018

Author: Lubaszewski Wiesław
Sedes Florence
Sharp Bernadette
Publication venue: Jagiellonian Library
Publication date: 01/01/2018
Field of study

Jagiellonian Univeristy Repository