Search CORE

34,955 research outputs found

TensorLayer: A Versatile Library for Efficient Deep Learning Development

Author: Dong Hao
Guo Yike
Liu Fangde
Mai Luo
Oehmichen Axel
Supratak Akara
Yu Simiao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/08/2017
Field of study

Deep learning has enabled major advances in the fields of computer vision, natural language processing, and multimedia among many others. Developing a deep learning system is arduous and complex, as it involves constructing neural network architectures, managing training/trained models, tuning optimization process, preprocessing and organizing data, etc. TensorLayer is a versatile Python library that aims at helping researchers and engineers efficiently develop deep learning systems. It offers rich abstractions for neural networks, model and data management, and parallel workflow mechanism. While boosting efficiency, TensorLayer maintains both performance and scalability. TensorLayer was released in September 2016 on GitHub, and has helped people from academia and industry develop real-world applications of deep learning.Comment: ACM Multimedia 201

arXiv.org e-Print Archive

A SEMANTIC INFORMATION RETRIEVAL APPROACH TO SOLVING PAPER-REVIEWER ASSIGNMENT PROBLEM USING A NEURAL NETWORK LANGUAGE MODEL

Author: Covenant University Theses
OGUNLEYE OLAWOLE MOSES
Publication venue
Publication date: 01/06/2017
Field of study

The task of assigning papers to reviewers is crucial to the realisation of an effective peer review of academic conferences and journal articles. For an excellent performance of the reviewers, it is important that the papers assigned to them are related to their knowledge domain. The manual process of ensuring paper submissions assigned to reviewers is related to the reviewers’ knowledge domain can be very cumbersome and inefficient. Besides, low quality and unfair reviews can result from an inefficient assignment of papers to the invited reviewers. From extant literature, automated reviewer assignment systems have been built to address this challenge as an information retrieval problem. In a bid to leverage on the recent advancement of artificial neural networks in solving natural language problems in the society, a neural network language model, Word2Vec was used to derive suitability scores based on the semantic relatedness between a submitted paper meant for review and a reviewer’s representation. The Integer Linear Programming model used suitability scores to optimise the assignments to ensure the workload on the reviewers is balanced. To test our model and compare with other Information retrieval models used in solving paper-reviewer assignment problem, a system was implemented in Python programming language. Python libraries such as Natural Language Toolkit was used to perform natural language processing on the experimental datasets and the Gensim Python library were used to implement the models. ORTools, a python library for operation research, was used to develop the Integer Linear Programming exact optimisation used for the assignments. Django framework was used to make the project portable for the web. MySQL was the database management system used in managing the database. Celery was used to make the large tasks run on the background. From our experiment, we explored whether Word2Vec could be used in paper-reviewer assignments and our results indicate that Word2Vec had approximately the same performance with Latent Semantic Indexing. Evaluation was carried out to test the efficiency of the assignment system on an on-ground truth basis. The results show that Word2Vec provided more accuracy in semantic assignments than Latent Semantic Indexing

Covenant University Repository

Introducing a Gold Standard Corpus from Young Multilinguals for the Evaluation of Automatic UD-PoS Taggers for Italian

Author: Frey Jennifer-Carmen
Schmalz Verena
Stemle Egon
Publication venue
Publication date: 01/01/2022
Field of study

Part-of-speech (PoS) tagging constitutes a common task in Natural Language Processing (NLP), given its widespread applicability. However, with the advance of new information technologies and language variation, the contents and methods for PoS-tagging have changed. The majority of Italian existing data for this task originate from standard texts, where language use is far from multifaceted informal real-life situations. Automatic PoS-tagging models trained with such data do not perform reliably on non-standard language, like social media content or language learners’ texts. Our aim is to provide additional training and evaluation data from language learners tagged in Universal Dependencies (UD), as well as testing current automatic PoStagging systems and evaluating their performance on such data. We use a multilingual corpus of young language learners, LEONIDE, to create a tagged gold standard for evaluating UD PoStagging performance on the Italian nonstandard language. With the 3.7 version of Stanza, a Python NLP package, we apply available automatic PoS-taggers, namely ISDT, ParTUT, POSTWITA, TWITTIRÒ and VIT, trained with both standard and non-standard data, on our dataset. Our results show that the above taggers, trained on non-standard data or multilingual Treebanks, can achieve up to 95% of accuracy on multilingual learner data, if combined.Part-of-speech (PoS) tagging constitutes a common task in Natural Language Processing (NLP), given its widespread applicability. However, with the advance of new information technologies and language variation, the contents and methods for PoS-tagging have changed. The majority of Italian existing data for this task originate from standard texts, where language use is far from multifaceted informal real-life situations. Automatic PoS-tagging models trained with such data do not perform reliably on non-standard language, like social media content or language learners’ texts. Our aim is to provide additional training and evaluation data from language learners tagged in Universal Dependencies (UD), as well as testing current automatic PoStagging systems and evaluating their performance on such data. We use a multilingual corpus of young language learners, LEONIDE, to create a tagged gold standard for evaluating UD PoStagging performance on the Italian nonstandard language. With the 3.7 version of Stanza, a Python NLP package, we apply available automatic PoS-taggers, namely ISDT, ParTUT, POSTWITA, TWITTIRÒ and VIT, trained with both standard and non-standard data, on our dataset. Our results show that the above taggers, trained on non-standard data or multilingual Treebanks, can achieve up to 95% of accuracy on multilingual learner data, if combined

Univerzitní repozitář Masarykovy univerzity