Search CORE

461 research outputs found

Discovering missing Wikipedia inter-language links by means of cross-lingual word sense disambiguation

Author: De Cock Martine
Hoste Veronique
Lefever Els
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2012
Field of study

Wikipedia is a very popular online multilingual encyclopedia that contains millions of articles covering most written languages. Wikipedia pages contain monolingual hypertext links to other pages, as well as inter-language links to the corresponding pages in other languages. These inter-language links, however, are not always complete. We present a prototype for a cross-lingual link discovery tool that discovers missing Wikipedia inter-language links to corresponding pages in other languages for ambiguous nouns. Although the framework of our approach is language-independent, we built a prototype for our application using Dutch as an input language and Spanish, Italian, English, French and German as target languages. The input for our system is a set of Dutch pages for a given ambiguous noun, and the output of the system is a set of links to the corresponding pages in our five target languages. Our link discovery application contains two submodules. In a first step all pages are retrieved that contain a translation (in our five target languages) of the ambiguous word in the page title (Greedy crawler module), whereas in a second step all corresponding pages are linked between the focus language (being Dutch in our case) and the five target languages (Cross-lingual web page linker module). We consider this second step as a disambiguation task and apply a cross-lingual Word Sense Disambiguation framework to determine whether two pages refer to the same content or not

Ghent University Academic Bibliography

Using parallel corpora for word sense disambiguation

Author: De Cock Martine
Hoste Veronique
Lefever Els
Publication venue
Publication date: 01/01/2011
Field of study

Ghent University Academic Bibliography

TA-COS 2018 : 2nd Workshop on Text Analytics for Cybersecurity and Online Safety : Proceedings

Author: De Pauw Guy
Desmet Bart
Lefever Els
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2018
Field of study

Ghent University Academic Bibliography

Evaluation of automatic hypernym extraction from technical corpora in English and Dutch

Author: Hoste Veronique
Lefever Els
Van de Kauter Marjan
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2014
Field of study

In this research, we evaluate different approaches for the automatic extraction of hypernym relations from English and Dutch technical text. The detected hypernym relations should enable us to semantically structure automatically obtained term lists from domain- and user-specific data. We investigated three different hypernymy extraction approaches for Dutch and English: a lexico-syntactic pattern-based approach, a distributional model and a morpho-syntactic method. To test the performance of the different approaches on domain-specific data, we collected and manually annotated English and Dutch data from two technical domains, viz. the dredging and financial domain. The experimental results show that especially the morpho-syntactic approach obtains good results for automatic hypernym extraction from technical and domain-specific texts

Ghent University Academic Bibliography

Normalization of Dutch user-generated content

Author: De Clercq Orphée
Desmet Bart
Hoste Veronique
Lefever Els
Schulz Sarah
Publication venue: INCOMA
Publication date: 01/01/2013
Field of study

Abstract This paper describes a phrase-based machine translation approach to normalize Dutch user-generated content (UGC). We compiled a corpus of three different social media genres (text messages, message board posts and tweets) to have a sample of this recent domain. We describe the various characteristics of this noisy text material and explain how it has been manually normalized using newly developed guidelines. For the automatic normalization task we focus on text messages, and find that a cascaded SMT system where a token-based module is followed by a translation at the character level gives the best word error rate reduction. After these initial experiments, we investigate the system's robustness on the complete domain of UGC by testing it on the other two social media genres, and find that the cascaded approach performs best on these genres as well. To our knowledge, we deliver the first proof-of-concept system for Dutch UGC normalization, which can serve as a baseline for future work

CiteSeerX

Ghent University Academic Bibliography

Archivsystem Ask23

Monitoring the reduction in shrinkage cracking of mortars containing superabsorbent polymers

Author: Aggelis Dimitrios G.
De Belie Nele
De Boe Emanuel
Lefever Gerlinde
Snoeck Didier
Van Hemelrijck Danny
Publication venue: 'Rilem Publications SARL'
Publication date: 01/01/2017
Field of study

Ultra-high performance concrete (UHPC) is characterized by a low water-to-cement ratio, leading to improved durability and mechanical properties. However, the risk for autogenous shrinkage and cracking due to restrained shrinkage increases, which may affect the durability of UHPC as cracks form pathways for ingress of aggressive liquids and gases. These negative features can be prevented by the use of superabsorbent polymers (SAPs) in the mixture. SAPs reduce autogenous shrinkage by means of internal curing: they will absorb water during the hydration process and release it again to the cementitious matrix when water shortage arises. In this way, hydration can continue and shrinkage is diminished

Ghent University Academic Bibliography

Noise or music? Investigating the usefulness of normalisation for robust sentiment analysis on social media data

Author: De Clercq Orphée
Desmet Bart
Hoste Veronique
Lefever Els
Van de Kauter Marjan
Van Hee Cynthia
Publication venue
Publication date: 01/01/2017
Field of study

In the past decade, sentiment analysis research has thrived, especially on social media. While this data genre is suitable to extract opinions and sentiment, it is known to be noisy. Complex normalisation methods have been developed to transform noisy text into its standard form, but their effect on tasks like sentiment analysis remains underinvestigated. Sentiment analysis approaches mostly include spell checking or rule-based normalisation as preprocess- ing and rarely investigate its impact on the task performance. We present an optimised sentiment classifier and investigate to what extent its performance can be enhanced by integrating SMT-based normalisation as preprocessing. Experiments on a test set comprising a variety of user-generated content genres revealed that normalisation improves sentiment classification performance on tweets and blog posts, showing the model’s ability to generalise to other data genres

Ghent University Academic Bibliography

The development of a novel SNP genotyping assay to differentiate cacao clones

Author: Coppieters Frauke
De Wever Jocelyn
Dewettinck Koen
Everaert Helena
Lefever Steve
Messens Kathy
Rottiers Hayley
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

In this study, a double-mismatch allele-specific (DMAS) qPCR SNP genotyping method has been designed, tested and validated specifically for cacao, using 65 well annotated international cacao reference accessions retrieved from the Center for Forestry Research and Technology Transfer (CEFORTT) and the International Cocoa Quarantine Centre (ICQC). In total, 42 DMAS-qPCR SNP genotyping assays have been validated, with a 98.05% overall efficiency in calling the correct genotype. In addition, the test allowed for the identification of 15.38% off-types and two duplicates, highlighting the problem of mislabeling in cacao collections and the need for conclusive genotyping assays. The developed method showed on average a high genetic diversity (He = 0.416) and information index (I = 0.601), making it applicable to assess intra-population variation. Furthermore, only the 13 most informative markers were needed to achieve maximum differentiation. This simple, effective method provides robust and accurate genotypic data which allows for more efficient resource management (e.g. tackling mislabeling, conserving valuable genetic material, parentage analysis, genetic diversity studies), thus contributing to an increased knowledge on the genetic background of cacao worldwide. Notably, the described method can easily be integrated in other laboratories for a wide range of objectives and organisms

Ghent University Academic Bibliography

methGraph: A genome visualization tool for PCR-based methylation assays

Author: Arányi Tamás
De Paepe Anne
Hoebeeck Jasmien
Lefever Steve
Pattyn Filip
Speleman Franki
Tusnády Gábor
Vandesompele Jo
Publication venue
Publication date: 01/01/2010
Field of study

Ghent University Academic Bibliography

Target enrichment using parallel nanoliter quantitative PCR amplification

Author: De Wilde Bram
Derveaux Stefaan
Dong Wes
Dunne Jude
Hellemans Jan
Husain Syed
Lefever Steve
Vandesompele Jo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: Next generation targeted resequencing is replacing Sanger sequencing at high pace in routine genetic diagnosis. The need for well validated, high quality enrichment platforms to complement the bench-top next generation sequencing devices is high. Results: We used the WaferGen Smartchip platform to perform highly parallelized PCR based target enrichment for a set of known cancer genes in a well characterized set of cancer cell lines from the NCI60 panel. Optimization of PCR assay design and cycling conditions resulted in a high enrichment efficiency. We provide proof of a high mutation rediscovery rate and have included technical replicates to enable SNP calling validation demonstrating the high reproducibility of our enrichment platform. Conclusions: Here we present our custom developed quantitative PCR based target enrichment platform. Using highly parallel nanoliter singleplex PCR reactions makes this a flexible and efficient platform. The high mutation validation rate shows this platform’s promise as a targeted resequencing method for multi-gene routine sequencing diagnostics

Springer - Publisher Connector

Ghent University Academic Bibliography

PubMed Central