Search CORE

1,522 research outputs found

A Machine Learning Approach For Opinion Holder Extraction In Arabic Language

Author: AbdelRahman Samir
Elarnaoty Mohamed
Fahmy Aly
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 06/04/2012
Field of study

Opinion mining aims at extracting useful subjective information from reliable amounts of text. Opinion mining holder recognition is a task that has not been considered yet in Arabic Language. This task essentially requires deep understanding of clauses structures. Unfortunately, the lack of a robust, publicly available, Arabic parser further complicates the research. This paper presents a leading research for the opinion holder extraction in Arabic news independent from any lexical parsers. We investigate constructing a comprehensive feature set to compensate the lack of parsing structural outcomes. The proposed feature set is tuned from English previous works coupled with our proposed semantic field and named entities features. Our feature analysis is based on Conditional Random Fields (CRF) and semi-supervised pattern recognition techniques. Different research models are evaluated via cross-validation experiments achieving 54.03 F-measure. We publicly release our own research outcome corpus and lexicon for opinion mining community to encourage further research

arXiv.org e-Print Archive

Crossref

BULETIN PUSAT SUKAN UNIVERSITI EDISI 5/2020

Author: UNIT PENGURUSAN STRATEGIK DAN KORPORAT
Publication venue: UNIVERSITI TUN HUSSEIN ONN MALAYSIA
Publication date: 01/10/2020
Field of study

UTHM Institutional Repository

Strategies Used in Translation of Scientific Texts to Cope with Lexical Gaps (Case of Biomass Gasification and Pyrolysis Book)

Author: Pezeshki Mahshid
Tabrizi Hossein Heidari
Publication venue: 'Academy Publication'
Publication date: 31/05/2015
Field of study

Lexical gap in translation is deeply debated during the history of translation studies. Many theories have been put forward to explain the possible strategies of filling lexical gaps. This descriptive study aims to investigate some of these strategies used in the translation of a special technical book, Biomass Gasification & Pyrolysis (Practical Design and Theory), this book was a suitable option, since it has modern topics, some of which have not been discussed or even existed in the target language (Persian). In this study, seventy new terms which have not been employed in Persian (in that field) were selected and examined, the qualitative and quantitative analysis of the words indicated that loan word, loan translation, loanblend were the most prominent strategies to cope with new lexicons; in addition, it also showed that loan translation had the highest rate of usage (68.5%) among other techniques and in scientific contexts it is widely preferred

Undergraduates’ interest towards learning genetics concepts through integrated stemproblem based learning approach

Author: Abdul Rahim Shamimah Parveen
Abdul Talib Corrienna
Amiruddin Mohd Hasril
Ismail Mohd Erfy
Samsudin Mohd Ali
Publication venue
Publication date: 01/01/2020
Field of study

Scientific and innovative society can be produced by giving priorities in Science, Technology, Engineering, and Mathematics (STEM) as emphasized by Malaysian Higher Education Blueprint (2015-2025). STEM need to be implemented at higher education because universities need to produce competent graduates to support economy growth and sustainable development. Learning STEM through Problem Based Learning might allow the undergraduates to become more enthusiastic when problem-based instruction is incorporated with STEM by implementing teamwork and problem-solving techniques to engage the first-year undergraduates fully with the learning. This study was conducted to investigate whether Integrated STEM Problem Based Learning module could enhance and retain the interest towards genetics concepts among first-year undergraduates. Topics in genetics was considered difficult not only to teach but also to learn. In this research, to overcome the genetic concepts learning difficulties, genetic related topics were chosen to introduce STEM through problem-based learning approach, which might help first-year undergraduates to acquire deep genetic content knowledge. This is very vital for the first-year undergraduates, as the knowledge gained in their first semester will be applied in the upcoming courses in their entire undergraduates’ programs of study. A Pre-Experimental research design with one group-posttest design was applied. A total of 50 participants who are first-year undergraduates from Faculty of Biology from one of the public universities in Malaysia were involved. The Genetics Interest Questionnaire used to study if the STEM Problem Based Learning module could enhance and retain the interest towards genetics concepts. The research has proven that Integrated STEM through problem-based learning approach could enhance and retains the interest in learning genetics concepts among first-year undergraduates

UTHM Institutional Repository

Universiti Teknologi Malaysia Institutional Repository

A review of sentiment analysis research in Arabic language

Author: Cambria Erik
HajHmida Moez Ben
Oueslati Oumaima
Ounelli Habib
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Sentiment analysis is a task of natural language processing which has recently attracted increasing attention. However, sentiment analysis research has mainly been carried out for the English language. Although Arabic is ramping up as one of the most used languages on the Internet, only a few studies have focused on Arabic sentiment analysis so far. In this paper, we carry out an in-depth qualitative study of the most important research works in this context by presenting limits and strengths of existing approaches. In particular, we survey both approaches that leverage machine translation or transfer learning to adapt English resources to Arabic and approaches that stem directly from the Arabic language

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

A Survey on Semantic Processing Techniques

Author: Cambria Erik
Chen Guanyi
He Kai
Mao Rui
Ni Jinjie
Yang Zonglin
Zhang Xulang
Publication venue
Publication date: 22/10/2023
Field of study

Semantic processing is a fundamental research domain in computational linguistics. In the era of powerful pre-trained language models and large language models, the advancement of research in this domain appears to be decelerating. However, the study of semantics is multi-dimensional in linguistics. The research depth and breadth of computational semantic processing can be largely improved with new technologies. In this survey, we analyzed five semantic processing tasks, e.g., word sense disambiguation, anaphora resolution, named entity recognition, concept extraction, and subjectivity detection. We study relevant theoretical research in these fields, advanced methods, and downstream applications. We connect the surveyed tasks with downstream applications because this may inspire future scholars to fuse these low-level semantic processing tasks with high-level natural language processing tasks. The review of theoretical research may also inspire new tasks and technologies in the semantic processing domain. Finally, we compare the different semantic processing techniques and summarize their technical trends, application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN 1566-2535. The equal contribution mark is missed in the published version due to the publication policies. Please contact Prof. Erik Cambria for detail

arXiv.org e-Print Archive

One Model to Rule them all: Multitask and Multilingual Modelling for Lexical Analysis

Author: Bjerva Johannes
Publication venue
Publication date: 01/01/2017
Field of study

When learning a new skill, you take advantage of your preexisting skills and knowledge. For instance, if you are a skilled violinist, you will likely have an easier time learning to play cello. Similarly, when learning a new language you take advantage of the languages you already speak. For instance, if your native language is Norwegian and you decide to learn Dutch, the lexical overlap between these two languages will likely benefit your rate of language acquisition. This thesis deals with the intersection of learning multiple tasks and learning multiple languages in the context of Natural Language Processing (NLP), which can be defined as the study of computational processing of human language. Although these two types of learning may seem different on the surface, we will see that they share many similarities. The traditional approach in NLP is to consider a single task for a single language at a time. However, recent advances allow for broadening this approach, by considering data for multiple tasks and languages simultaneously. This is an important approach to explore further as the key to improving the reliability of NLP, especially for low-resource languages, is to take advantage of all relevant data whenever possible. In doing so, the hope is that in the long term, low-resource languages can benefit from the advances made in NLP which are currently to a large extent reserved for high-resource languages. This, in turn, may then have positive consequences for, e.g., language preservation, as speakers of minority languages will have a lower degree of pressure to using high-resource languages. In the short term, answering the specific research questions posed should be of use to NLP researchers working towards the same goal.Comment: PhD thesis, University of Groninge

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Models to represent linguistic linked data

Author: A. GÓMEZ-PÉREZ
Borin
Crystal
E. MONTIEL-PONSODA
Ehrmann
Farrar
Fellbaum
Fellbaum
Hanks
Hayes
Hellmann
Ide
J. BOSQUE-GIL
J. GRACIA
Klimek
Mel’cuk
Mel’cuk
Menke
Ogden
Peirce
Pustejovsky
Schuurman
Trippel
Vila-Suero
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2018
Field of study

As the interest of the Semantic Web and computational linguistics communities in linguistic linked data (LLD) keeps increasing and the number of contributions that dwell on LLD rapidly grows, scholars (and linguists in particular) interested in the development of LLD resources sometimes find it difficult to determine which mechanism is suitable for their needs and which challenges have already been addressed. This review seeks to present the state of the art on the models, ontologies and their extensions to represent language resources as LLD by focusing on the nature of the linguistic content they aim to encode. Four basic groups of models are distinguished in this work: models to represent the main elements of lexical resources (group 1), vocabularies developed as extensions to models in group 1 and ontologies that provide more granularity on specific levels of linguistic analysis (group 2), catalogues of linguistic data categories (group 3) and other models such as corpora models or service-oriented ones (group 4). Contributions encompassed in these four groups are described, highlighting their reuse by the community and the modelling challenges that are still to be faced

Crossref

Repositorio Universidad de Zaragoza

Archivo Digital UPM

Design of a Controlled Language for Critical Infrastructures Protection

Author: CANTARELLA SIMONA
FERIGATO Carlo
OWUSU EVANS BOATENG
Publication venue: European Language Resources Association
Publication date: 28/03/2012
Field of study

We describe a project for the construction of controlled language for critical infrastructures protection (CIP). This project originates from the need to coordinate and categorize the communications on CIP at the European level. These communications can be physically represented by official documents, reports on incidents, informal communications and plain e-mail. We explore the application of traditional library science tools for the construction of controlled languages in order to achieve our goal. Our starting point is an analogous work done during the sixties in the field of nuclear science known as the Euratom Thesaurus.JRC.G.6-Security technology assessmen

JRC Publications Repository