94 research outputs found
Joint Entity Extraction and Assertion Detection for Clinical Text
Negative medical findings are prevalent in clinical reports, yet
discriminating them from positive findings remains a challenging task for
information extraction. Most of the existing systems treat this task as a
pipeline of two separate tasks, i.e., named entity recognition (NER) and
rule-based negation detection. We consider this as a multi-task problem and
present a novel end-to-end neural model to jointly extract entities and
negations. We extend a standard hierarchical encoder-decoder NER model and
first adopt a shared encoder followed by separate decoders for the two tasks.
This architecture performs considerably better than the previous rule-based and
machine learning-based systems. To overcome the problem of increased parameter
size especially for low-resource settings, we propose the Conditional Softmax
Shared Decoder architecture which achieves state-of-art results for NER and
negation detection on the 2010 i2b2/VA challenge dataset and a proprietary
de-identified clinical dataset.Comment: Accepted at the 57th Annual Meeting of the Association for
Computational Linguistics (ACL 2019
Assessment of Lead, Zinc and Cadmium Contamination in the Fruit of Palestinian Date Palm Cultivars Growing at Jericho Governorate
Phoenix dactylifera L. fruits was studied to assess whether the fruits were safe for human consumption and evaluating the date fruit as a bio-monitor of heavy metals pollution in Palestine. Hence, current research explored the toxic heavy metals (Pb, Cd, and Zn) levels in thirty-five date varieties collected from three locations (NARC, DH and ADS) of Jericho by applying anatomic absorption spectrometry. Mean values of heavy metals were calculated and expressed. the concentrations of heavy metals in date fruits flesh part were relatively higher as compared with the concentration of fruit washing residue. Heavy metals in the date palm fruits collected from NARC station (in the city center) reveals that the values are higher than ADS and DH stations (far away from the city center) due to higher human activity and higher vehicular traffic. Results of this study, reveals that most of studied heavy metals are within safe limit with respect to maximum allowable levels (MAL) in some date cultivars. Keywords: Phoenix dactylifera L., Heavy metals, Lead, Zinc, Cadmium, Jericho. DOI: 10.7176/JBAH/10-2-02 Publication date: January 31st 202
Cytotoxic Activity of Cyclamen Persicum Ethanolic Extract on MCF-7, PC-3 and LNCaP Cancer Cell Lines
It is important to develop new approaches to increase the efficacy of cancer treatments. Nowadays, the uses of natural products to treat cancer are very common. In addition, working with plants that are endemic to Palestine and determining the biological activities of these plant extracts, is extremely important due to the potential for new drug development. Cyclamen persicum is used in traditional medicinal to treat anti-rheumatic, diarrhea, abdominal pains, edema, abscesses, eczema, cancer and other ailments. In this study the cytotoxic effect of C. persicum tubers and leaves ethanolic extracts were studied against MCF-7, PC-3 and LNCaP cancer cell lines, using mitochondrial dehydrogenase enzyme method. Results showed the remarkable cytotoxic activity of C. persicum extracts, against breast and prostate adenocarcinoma. For tubers extract the IC50 value was found to be 0.05 mg/ml for the three cell lines. Although the leaves extract the IC50 value was found to be 0.25 mg/ml for PC-3 and MCF-7 cell lines, while LNCaP cell inhibition were less than 30% at all tested leaves extract concentrations. MCF-7 cells exhibited the highest sensitivity to the C. persicum extracts, compared to PC-3 and LNCaP cell lines evaluated. In contrast, LNCaP cells generally exhibited the lowest sensitivity to extracts. These results displayed that C. persicum is a good source for natural products with antitumor compounds that can be further exploited for the development of a potential therapeutic anticancer agent. Key words: Cyclamen persicum, Cytotoxicity, MTT assay, LNCaP, MCF-7, PC-3. DOI: 10.7176/JNSR/10-2-05 Publication date: January 31st 202
التنبؤ بأداء الطلاب بناء على ملف الطالب الأكاديمي
Data mining is an important field; it has been widely used in different domains. Oneof the fields that make use of data mining is Educational Data Mining. In this study, we apply machine learning models on data obtained from Palestine Technical University-Kadoorie (PTUK) in Tulkarm for students in the department of computer engineering and applied computing. Students in both fields study the same major courses; C++ and Java. Therefore, we focused on these courses to predict student’s performance. The goal of our study is predicting students’ performance measured by (GPA) in the major. There are many techniques that are used in the educational data mining field. We applied three models on the obtained data which have been commonly used in the educational data mining field; the decision tree with information gain measure, the decision tree with Gini index measure, and the naive Bayes model. We used these models inour work because they are efficient and they have a high speed in data classification, and prediction. The results suggest that the decision tree with information gain measure outperforms other models with 0.66 accuracy. We had a deeper look on key features that we train our models; precisely, their branch of study at school, field of study in the university, and whether or not the students have a scholarship. These features have an influence on the pre-diction. For example, the accuracy of the decision tree with information gain measure increases to 0.71 when applied on the subset of students who studied in the scientific branch at high school. This study is important for both the students and the higher management of PTUK. The university will be able to do some predictions on the performance of the students. In the carried experiments, the prediction of the model was in line with the actual expectation
Offensive Hebrew Corpus and Detection using BERT
Offensive language detection has been well studied in many languages, but it
is lagging behind in low-resource languages, such as Hebrew. In this paper, we
present a new offensive language corpus in Hebrew. A total of 15,881 tweets
were retrieved from Twitter. Each was labeled with one or more of five classes
(abusive, hate, violence, pornographic, or none offensive) by Arabic-Hebrew
bilingual speakers. The annotation process was challenging as each annotator is
expected to be familiar with the Israeli culture, politics, and practices to
understand the context of each tweet. We fine-tuned two Hebrew BERT models,
HeBERT and AlephBERT, using our proposed dataset and another published dataset.
We observed that our data boosts HeBERT performance by 2% when combined with
D_OLaH. Fine-tuning AlephBERT on our data and testing on D_OLaH yields 69%
accuracy, while fine-tuning on D_OLaH and testing on our data yields 57%
accuracy, which may be an indication to the generalizability our data offers.
Our dataset and fine-tuned models are available on GitHub and Huggingface.Comment: 8 pages, 1 figure, The 20th ACS/IEEE International Conference on
Computer Systems and Applications (AICCSA
SALMA: Arabic Sense-Annotated Corpus and WSD Benchmarks
SALMA, the first Arabic sense-annotated corpus, consists of ~34K tokens,
which are all sense-annotated. The corpus is annotated using two different
sense inventories simultaneously (Modern and Ghani). SALMA novelty lies in how
tokens and senses are associated. Instead of linking a token to only one
intended sense, SALMA links a token to multiple senses and provides a score to
each sense. A smart web-based annotation tool was developed to support scoring
multiple senses against a given word. In addition to sense annotations, we also
annotated the corpus using six types of named entities. The quality of our
annotations was assessed using various metrics (Kappa, Linear Weighted Kappa,
Quadratic Weighted Kappa, Mean Average Error, and Root Mean Square Error),
which show very high inter-annotator agreement. To establish a Word Sense
Disambiguation baseline using our SALMA corpus, we developed an end-to-end Word
Sense Disambiguation system using Target Sense Verification. We used this
system to evaluate three Target Sense Verification models available in the
literature. Our best model achieved an accuracy with 84.2% using Modern and
78.7% using Ghani. The full corpus and the annotation tool are open-source and
publicly available at https://sina.birzeit.edu/salma/
ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic
This paper presents the ArBanking77, a large Arabic dataset for intent
detection in the banking domain. Our dataset was arabized and localized from
the original English Banking77 dataset, which consists of 13,083 queries to
ArBanking77 dataset with 31,404 queries in both Modern Standard Arabic (MSA)
and Palestinian dialect, with each query classified into one of the 77 classes
(intents). Furthermore, we present a neural model, based on AraBERT, fine-tuned
on ArBanking77, which achieved an F1-score of 0.9209 and 0.8995 on MSA and
Palestinian dialect, respectively. We performed extensive experimentation in
which we simulated low-resource settings, where the model is trained on a
subset of the data and augmented with noisy queries to simulate colloquial
terms, mistakes and misspellings found in real NLP systems, especially live
chat queries. The data and the models are publicly available at
https://sina.birzeit.edu/arbanking77
Lexical Diversity in Kinship Across Languages and Dialects
Languages are known to describe the world in diverse ways. Across lexicons,
diversity is pervasive, appearing through phenomena such as lexical gaps and
untranslatability. However, in computational resources, such as multilingual
lexical databases, diversity is hardly ever represented. In this paper, we
introduce a method to enrich computational lexicons with content relating to
linguistic diversity. The method is verified through two large-scale case
studies on kinship terminology, a domain known to be diverse across languages
and cultures: one case study deals with seven Arabic dialects, while the other
one with three Indonesian languages. Our results, made available as browseable
and downloadable computational resources, extend prior linguistics research on
kinship terminology, and provide insight into the extent of diversity even
within linguistically and culturally close communities
- …