Search CORE

12 research outputs found

Splitting Arabic Texts into Elementary Discourse Units

Author: Abdul-Mageed M.
Abu-Jbara A.
Afantenos S.
Afantenos S. D.
Al-Saif A.
Al-Saif A.
Belguith H. L.
Boujelben I.
Charoensuk J.
Da Cunha I.
Darwish K.
Diab M.
Diab M.
Eskander R.
Farah Benamara Zitoune
Fisher S.
Green S.
Gridach M.
Habash N.
Iskandar Keskes
Kamp H.
Keskes I.
Khalifa I.
Lamia Hadrich Belguith
Lüngen H.
Maamouri M.
Maamouri M.
Mourad A.
Nivre J.
Polanyi L.
Prasad A.
Sadat F.
Sawalha M.
Subba R.
Sumita K.
Tofiloski M.
Trigui O.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2014
Field of study

International audienceIn this article, we propose the first work that investigates the feasibility of Arabic discourse segmentation into elementary discourse units within the segmented discourse representation theory framework. We first describe our annotation scheme that defines a set of principles to guide the segmentation process. Two corpora have been annotated according to this scheme: elementary school textbooks and newspaper documents extracted from the syntactically annotated Arabic Treebank. Then, we propose a multiclass supervised learning approach that predicts nested units. Our approach uses a combination of punctuation, morphological, lexical, and shallow syntactic features. We investigate how each feature contributes to the learning process. We show that an extensive morphological analysis is crucial to achieve good results in both corpora. In addition, we show that adding chunks does not boost the performance of our system

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Deep learning for detection and segmentation of artefact and disease instances in gastrointestinal endoscopy

Author: Ali S.
Ali S.
Bailey A.
Bailey A.
Balasubramanian V.
Balasubramanian V.
Bano S.
Bano S.
Boutry N.
Boutry N.
Braden B.
Braden B.
Cannizzaro R.
Cannizzaro R.
Chavan A.
Chavan A.
Chen H.
Chen H.
Choi Y.
Choi Y.
Daul C.
Daul C.
Dmitrieva M.
Dmitrieva M.
East J.
East J.
Gao X.
Gao X.
Ghatwary N.
Ghatwary N.
Gridach M.
Gridach M.
Guo Y.
Guo Y.
Hekalo A.
Hekalo A.
Hu H.
Hu H.
Huynh L.
Huynh L.
Krenzer A.
Krenzer A.
Lamarque D.
Lamarque D.
Liao Y.
Liao Y.
Matuszewski B.
Matuszewski B.
Nguyen N.
Nguyen N.
Polat G.
Polat G.
Raj A.
Raj A.
Realdon S.
Realdon S.
Rezvy S.
Rezvy S.
Rittscher J.
Rittscher J.
Stoyanov D.
Stoyanov D.
Subramanian A.
Subramanian A.
Temizel A.
Temizel A.
Tran D.
Tran D.
Tran-Nguyen T.
Tran-Nguyen T.
Voiculescu I.
Voiculescu I.
Yoganand V.
Yoganand V.
Publication venue: Elsevier
Publication date: 01/01/2021
Field of study

The Endoscopy Computer Vision Challenge (EndoCV) is a crowd-sourcing initiative to address eminent problems in developing reliable computer aided detection and diagnosis endoscopy systems and suggest a pathway for clinical translation of technologies. Whilst endoscopy is a widely used diagnostic and treatment tool for hollow-organs, there are several core challenges often faced by endoscopists, mainly: 1) presence of multi-class artefacts that hinder their visual interpretation, and 2) difficulty in identifying subtle precancerous precursors and cancer abnormalities. Artefacts often affect the robustness of deep learning methods applied to the gastrointestinal tract organs as they can be confused with tissue of interest. EndoCV2020 challenges are designed to address research questions in these remits. In this paper, we present a summary of methods developed by the top 17 teams and provide an objective comparison of state-of-the-art methods and methods designed by the participants for two sub-challenges: i) artefact detection and segmentation (EAD2020), and ii) disease detection and segmentation (EDD2020). Multi-center, multi-organ, multi-class, and multi-modal clinical endoscopy datasets were compiled for both EAD2020 and EDD2020 sub-challenges. The out-of-sample generalization ability of detection algorithms was also evaluated. Whilst most teams focused on accuracy improvements, only a few methods hold credibility for clinical usability. The best performing teams provided solutions to tackle class imbalance, and variabilities in size, origin, modality and occurrences by exploring data augmentation, data fusion, and optimal class thresholding techniques

Middlesex University Research Repository

OXENDONET: A dilated convolutional neural networks for endoscopic artefact segmentation

Author: Gridach M
Voiculescu I
Publication venue: CEUR Workshop Proceedings
Publication date: 01/01/2020
Field of study

Medical image segmentation plays a key role in many generic applications such as population analysis and, more accessibly, can be made into a crucial tool in diagnosis and treatment planning. Its output can vary from extracting practical clinical information such as pathologies (detection of cancer), to measuring anatomical structures (kidney volume, cartilage thickness, bone angles). Many prior approaches to this problem are based on one of two main architectures: a fully convolutional network or a U-Net-based architecture. These methods rely on multiple pooling and striding layers to increase the receptive field size of neurons. Since we are tackling a segmentation task, the way pooling layers are used reduce the feature map size and lead to the loss of important spatial information. In this paper, we propose a novel neural network, which we call OxEndoNet. Our network uses the pyramid dilated module (PDM) consisting of multiple dilated convolutions stacked in parallel. The PDM module eliminates the need of striding layers and has a very large receptive field which maintains spatial resolution. We combine several pyramid dilated modules to form our final OxEndoNet network. The proposed network is able to capture small and complex variations in the challenging problem of Endoscopy Artefact Detection and Segmentation where objects vary largely in scale and size

Oxford University Research Archive

Dopnet: Densely Oriented Pooling Network for medical image segmentation

Author: Gridach M
Voiculescu I
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Since manual annotation of medical images is time consuming for clinical experts, reliable automatic segmentation would be the ideal way to handle large medical datasets. Deep learning-based models have been the dominant approach, achieving remarkable performance on various medical segmentation tasks. There can be a significant variation in the size of the feature being segmented out of a medical image relative to the other features in the image, which can be challenging. In this paper, we propose a Densely Oriented Pooling Network (DOPNet) to capture variation in feature size in medical images and preserve spatial interconnection. DOPNet is based on two interdependent ideas: the dense connectivity and the pooling oriented layer. When tested on three publicly available medical image segmentation datasets, the proposed model achieves leading performance

Oxford University Research Archive

Self-knowledge distillation for first trimester ultrasound saliency prediction

Author: Drukker L
Gridach M
Noble JA
Papageorghiou AT
Savochkina E
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2022
Field of study

Self-knowledge distillation (SKD) is a recent and promising machine learning approach where a shallow student network is trained to distill its own knowledge. By contrast, in traditional knowledge distillation a student model distills its knowledge from a large teacher network model, which involves vast computational complexity and a large storage size. Consequently, SKD is a useful approach to model medical imaging problems with scarce data. We propose an original SKD framework to predict where a sonographer should look next using a multi-modal ultrasound and gaze dataset. We design a novel Wide Feature Distillation module, which is applied to intermediate feature maps in the form of transformations. The module applies a more refined feature map filtering which is important when predicting gaze for the fetal anatomy variable in size. Our architecture design includes ReSL loss that enables a student network to learn useful information whilst discarding the rest. The proposed network is validated on a large multi-modal ultrasound dataset, which is acquired during routine first trimester fetal ultrasound scanning. Experimental results show the novel SKD approach outperforms alternative state-of-the-art architectures on all saliency metrics

Oxford University Research Archive

The AnIta-Lemmatiser: a tool for accurate lemmatisation of Italian texts

Author: A. Hardie
A.K. Ingason
E. Airio
F. Carota
F. Eynde Van
H. Hammarström
J. Plisson
M. Creutz
M. Gridach
R. Delmonte
T. Mauro De
Publication venue: Springer Verlag
Publication date: 01/01/2013
Field of study

This paper presents the AnIta-Lemmatiser, an automatic tool to lem- matise Italian texts. It is based on a powerful morphological analyser enriched with a large lexicon and some heuristic techniques to select the most appropriate lemma among those that can be morphologically associated to an ambiguous wordform. The heuristics are essentially based on the frequency-of-use tags provided by the De Mauro/Paravia electronic dictionary. The AnIta-Lemmatiser ranked at the second place in the Lemmatisation Task of the EVALITA 2011 evaluation campaign. Beyond the official lemmatiser used for EVALITA, some further improvements are presented

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Arabic named entity recognition: A bidirectional GRU-CRF approach

Author: D Nadeau
F Huang
FA Gers
G Hinton
K Shaalan
K Shaalan
M Gridach
MA Zahran
R Collobert
R Pascanu
S Abdallah
S Ferrández
S Hochreiter
TH Cao
Y Bengio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

The previous Named Entity Recognition (NER) models for Modern Standard Arabic (MSA) rely heavily on the use of features and gazetteers, which is time consuming. In this paper, we introduce a novel neural network architecture based on bidirectional Gated Recurrent Unit (GRU) combined with Conditional Random Fields (CRF). Our neural network uses minimal features: pretrained word representations learned from unannotated corpora and also character-level embeddings of words. This novel architecture allowed us to eliminate the need for most of handcrafted engineering features. We evaluate our system on a publicly available dataset where we were able to achieve comparable results to previous best-performing systems.SCOPUS: cp.kinfo:eu-repo/semantics/publishe

Crossref

DI-fusion

A semi-supervised approach for extracting TCM clinical terms based on feature words

Author: A Graves
ASAA Mohammed
B Liu
F Olsson
G Nestler
J Lei
J Wang
JI Kazama
JS Justeson
KT Frantzi
M Gridach
M Habibi
M Habibi
M Riedl
N Peng
Q Liu
R Collobert
S Yadav
T Mikolov
T Yu
Y Chen
Y Kim
Y Wu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Recent advances in Swedish and Spanish medical entity recognition in clinical texts using deep neural approaches

Author: A Casillas
A Pérez
Alicia Pérez
Arantza Casillas
E Grave
H Dalianis
H Dalianis
I Martinez Soriano
JPC Chiu
L Yao
M Gridach
M Oronoz
M Oronoz
Maite Oronoz
Maryam Habibi
O Uzuner
P Bojanowski
PB Jensen
R Collobert
R Roller
R Weegar
R Östling
Rebecka Weegar
S Almgren
S Hochreiter
SVS Pakhomov
T Mikolov
T Mikolov
V Yadav
X Dong
Y Wu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Background Text mining and natural language processing of clinical text, such as notes from electronic health records, requires specific consideration of the specialized characteristics of these texts. Deep learning methods could potentially mitigate domain specific challenges such as limited access to in-domain tools and data sets. Methods A bi-directional Long Short-Term Memory network is applied to clinical notes in Spanish and Swedish for the task of medical named entity recognition. Several types of embeddings, both generated from in-domain and out-of-domain text corpora, and a number of generation and combination strategies for embeddings have been evaluated in order to investigate different input representations and the influence of domain on the final results. Results For Spanish, a micro averaged F1-score of 75.25 was obtained and for Swedish, the corresponding score was 76.04. The best results for both languages were achieved using embeddings generated from in-domain corpora extracted from electronic health records, but embeddings generated from related domains were also found to be beneficial. Conclusions A recurrent neural network with in-domain embeddings improved the medical named entity recognition compared to shallow learning methods, showing this combination to be suitable for entity recognition in clinical text for both languages.The publication cost of this article was funded by Stockholm University Librar

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Archivo Digital para la Docencia y la Investigación

Recent advances in Swedish and Spanish medical entity recognition in clinical texts using deep neural approaches

Author: A Casillas
A Pérez
Alicia Pérez
Arantza Casillas
E Grave
H Dalianis
H Dalianis
I Martinez Soriano
JPC Chiu
L Yao
M Gridach
M Oronoz
M Oronoz
Maite Oronoz
Maryam Habibi
O Uzuner
P Bojanowski
PB Jensen
R Collobert
R Roller
R Weegar
R Östling
Rebecka Weegar
S Almgren
S Hochreiter
SVS Pakhomov
T Mikolov
T Mikolov
V Yadav
X Dong
Y Wu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref