94 research outputs found

    Joint Entity Extraction and Assertion Detection for Clinical Text

    Full text link
    Negative medical findings are prevalent in clinical reports, yet discriminating them from positive findings remains a challenging task for information extraction. Most of the existing systems treat this task as a pipeline of two separate tasks, i.e., named entity recognition (NER) and rule-based negation detection. We consider this as a multi-task problem and present a novel end-to-end neural model to jointly extract entities and negations. We extend a standard hierarchical encoder-decoder NER model and first adopt a shared encoder followed by separate decoders for the two tasks. This architecture performs considerably better than the previous rule-based and machine learning-based systems. To overcome the problem of increased parameter size especially for low-resource settings, we propose the Conditional Softmax Shared Decoder architecture which achieves state-of-art results for NER and negation detection on the 2010 i2b2/VA challenge dataset and a proprietary de-identified clinical dataset.Comment: Accepted at the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019

    Assessment of Lead, Zinc and Cadmium Contamination in the Fruit of Palestinian Date Palm Cultivars Growing at Jericho Governorate

    Get PDF
    Phoenix dactylifera L. fruits was studied to assess whether the fruits were safe for human consumption and evaluating the date fruit as a bio-monitor of heavy metals pollution in Palestine. Hence, current research explored the toxic heavy metals (Pb, Cd, and Zn) levels in thirty-five date varieties collected from three locations (NARC, DH and ADS) of Jericho by applying anatomic absorption spectrometry. Mean values of heavy metals were calculated and expressed. the concentrations of heavy metals in date fruits flesh part were relatively higher as compared with the concentration of fruit washing residue. Heavy metals in the date palm fruits collected from NARC station (in the city center) reveals that the values are higher than ADS and DH stations (far away from the city center) due to higher human activity and higher vehicular traffic. Results of this study, reveals that most of studied heavy metals are within safe limit with respect to maximum allowable levels (MAL) in some date cultivars. Keywords: Phoenix dactylifera L., Heavy metals, Lead, Zinc, Cadmium, Jericho. DOI: 10.7176/JBAH/10-2-02 Publication date: January 31st 202

    Relational data clustering algorithms with biomedical applications

    Get PDF

    Cytotoxic Activity of Cyclamen Persicum Ethanolic Extract on MCF-7, PC-3 and LNCaP Cancer Cell Lines

    Get PDF
    It is important to develop new approaches to increase the efficacy of cancer treatments. Nowadays, the uses of natural products to treat cancer are very common. In addition, working with plants that are endemic to Palestine and determining the biological activities of these plant extracts, is extremely important due to the potential for new drug development. Cyclamen persicum is used in traditional medicinal to treat anti-rheumatic, diarrhea, abdominal pains, edema, abscesses, eczema, cancer and other ailments. In this study the cytotoxic effect of C. persicum tubers and leaves ethanolic extracts were studied against MCF-7, PC-3 and LNCaP cancer cell lines, using mitochondrial dehydrogenase enzyme method. Results showed the remarkable cytotoxic activity of C. persicum extracts, against breast and prostate adenocarcinoma. For tubers extract the IC50 value was found to be 0.05 mg/ml for the three cell lines. Although the leaves extract the IC50 value was found to be 0.25 mg/ml for PC-3 and MCF-7 cell lines, while LNCaP cell inhibition were less than 30% at all tested leaves extract concentrations. MCF-7 cells exhibited the highest sensitivity to the C. persicum extracts, compared to PC-3 and LNCaP cell lines evaluated. In contrast, LNCaP cells generally exhibited the lowest sensitivity to extracts. These results displayed that C. persicum is a good source for natural products with antitumor compounds that can be further exploited for the development of a potential therapeutic anticancer agent. Key words: Cyclamen persicum, Cytotoxicity, MTT assay, LNCaP, MCF-7, PC-3. DOI: 10.7176/JNSR/10-2-05 Publication date: January 31st 202

    التنبؤ بأداء الطلاب بناء على ملف الطالب الأكاديمي

    Get PDF
    Data mining is an important field; it has been widely used in different domains. Oneof the fields that make use of data mining is Educational Data Mining. In this study, we apply machine learning models on data obtained from Palestine Technical University-Kadoorie (PTUK) in Tulkarm for students in the department of computer engineering and applied computing. Students in both fields study the same major courses; C++ and Java. Therefore, we focused on these courses to predict student’s performance. The goal of our study is predicting students’ performance measured by (GPA) in the major. There are many techniques that are used in the educational data mining field. We applied three models on the obtained data which have been commonly used in the educational data mining field; the decision tree with information gain measure, the decision tree with Gini index measure, and the naive Bayes model. We used these models inour work because they are efficient and they have a high speed in data classification, and prediction. The results suggest that the decision tree with information gain measure outperforms other models with 0.66 accuracy. We had a deeper look on key features that we train our models; precisely, their branch of study at school, field of study in the university, and whether or not the students have a scholarship. These features have an influence on the pre-diction. For example, the accuracy of the decision tree with information gain measure increases to 0.71 when applied on the subset of students who studied in the scientific branch at high school. This study is important for both the students and the higher management of PTUK. The university will be able to do some predictions on the performance of the students. In the carried experiments, the prediction of the model was in line with the actual expectation

    Offensive Hebrew Corpus and Detection using BERT

    Full text link
    Offensive language detection has been well studied in many languages, but it is lagging behind in low-resource languages, such as Hebrew. In this paper, we present a new offensive language corpus in Hebrew. A total of 15,881 tweets were retrieved from Twitter. Each was labeled with one or more of five classes (abusive, hate, violence, pornographic, or none offensive) by Arabic-Hebrew bilingual speakers. The annotation process was challenging as each annotator is expected to be familiar with the Israeli culture, politics, and practices to understand the context of each tweet. We fine-tuned two Hebrew BERT models, HeBERT and AlephBERT, using our proposed dataset and another published dataset. We observed that our data boosts HeBERT performance by 2% when combined with D_OLaH. Fine-tuning AlephBERT on our data and testing on D_OLaH yields 69% accuracy, while fine-tuning on D_OLaH and testing on our data yields 57% accuracy, which may be an indication to the generalizability our data offers. Our dataset and fine-tuned models are available on GitHub and Huggingface.Comment: 8 pages, 1 figure, The 20th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA

    SALMA: Arabic Sense-Annotated Corpus and WSD Benchmarks

    Full text link
    SALMA, the first Arabic sense-annotated corpus, consists of ~34K tokens, which are all sense-annotated. The corpus is annotated using two different sense inventories simultaneously (Modern and Ghani). SALMA novelty lies in how tokens and senses are associated. Instead of linking a token to only one intended sense, SALMA links a token to multiple senses and provides a score to each sense. A smart web-based annotation tool was developed to support scoring multiple senses against a given word. In addition to sense annotations, we also annotated the corpus using six types of named entities. The quality of our annotations was assessed using various metrics (Kappa, Linear Weighted Kappa, Quadratic Weighted Kappa, Mean Average Error, and Root Mean Square Error), which show very high inter-annotator agreement. To establish a Word Sense Disambiguation baseline using our SALMA corpus, we developed an end-to-end Word Sense Disambiguation system using Target Sense Verification. We used this system to evaluate three Target Sense Verification models available in the literature. Our best model achieved an accuracy with 84.2% using Modern and 78.7% using Ghani. The full corpus and the annotation tool are open-source and publicly available at https://sina.birzeit.edu/salma/

    ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic

    Full text link
    This paper presents the ArBanking77, a large Arabic dataset for intent detection in the banking domain. Our dataset was arabized and localized from the original English Banking77 dataset, which consists of 13,083 queries to ArBanking77 dataset with 31,404 queries in both Modern Standard Arabic (MSA) and Palestinian dialect, with each query classified into one of the 77 classes (intents). Furthermore, we present a neural model, based on AraBERT, fine-tuned on ArBanking77, which achieved an F1-score of 0.9209 and 0.8995 on MSA and Palestinian dialect, respectively. We performed extensive experimentation in which we simulated low-resource settings, where the model is trained on a subset of the data and augmented with noisy queries to simulate colloquial terms, mistakes and misspellings found in real NLP systems, especially live chat queries. The data and the models are publicly available at https://sina.birzeit.edu/arbanking77

    Lexical Diversity in Kinship Across Languages and Dialects

    Full text link
    Languages are known to describe the world in diverse ways. Across lexicons, diversity is pervasive, appearing through phenomena such as lexical gaps and untranslatability. However, in computational resources, such as multilingual lexical databases, diversity is hardly ever represented. In this paper, we introduce a method to enrich computational lexicons with content relating to linguistic diversity. The method is verified through two large-scale case studies on kinship terminology, a domain known to be diverse across languages and cultures: one case study deals with seven Arabic dialects, while the other one with three Indonesian languages. Our results, made available as browseable and downloadable computational resources, extend prior linguistics research on kinship terminology, and provide insight into the extent of diversity even within linguistically and culturally close communities
    corecore