6,330 research outputs found
Court Judgment Decision Support System Based on Medical Text Mining
Medical damage is a common problem faced by hospitals around the world and is widely watched by countries and the World Health Organization. As the number of medical damage dispute lawsuit cases rapidly grows, many countries in the world face the problem how to improve the efficiency of the judicial system under the premise of guaranteeing the quality of the trial. Therefore, in addition to reforming the system, the decision support system will effectively improve judicial decisions. This paper takes medical damage judgment documents in China as example, and proposes a court judgment decision support system (CJ-DSS) based on medical text mining and the automatic classification technology. The system can predict the trail results of the new lawsuit documents according to the previous cases verdict - rejected and non-rejected. Combined with the cases, the study in this paper found that combined feature extraction method does improve the performance of three kinds of classifiers - Support Value Machine (SVM), Artificial Neural Network (ANN) and K-Nearest Neighbor (KNN), the degree of improved performance is different from using DF-CHI combined feature extraction method. In addition, integrated learning algorithm also improves the classification performance of the overall system
Cross-Lingual Adaptation using Structural Correspondence Learning
Cross-lingual adaptation, a special case of domain adaptation, refers to the
transfer of classification knowledge between two languages. In this article we
describe an extension of Structural Correspondence Learning (SCL), a recently
proposed algorithm for domain adaptation, for cross-lingual adaptation. The
proposed method uses unlabeled documents from both languages, along with a word
translation oracle, to induce cross-lingual feature correspondences. From these
correspondences a cross-lingual representation is created that enables the
transfer of classification knowledge from the source to the target language.
The main advantages of this approach over other approaches are its resource
efficiency and task specificity.
We conduct experiments in the area of cross-language topic and sentiment
classification involving English as source language and German, French, and
Japanese as target languages. The results show a significant improvement of the
proposed method over a machine translation baseline, reducing the relative
error due to cross-lingual adaptation by an average of 30% (topic
classification) and 59% (sentiment classification). We further report on
empirical analyses that reveal insights into the use of unlabeled data, the
sensitivity with respect to important hyperparameters, and the nature of the
induced cross-lingual correspondences
Recommended from our members
The role of HG in the analysis of temporal iteration and interaural correlation
Information retrieval and text mining technologies for chemistry
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European
Community’s Horizon 2020 Program (project reference:
654021 - OpenMinted). M.K. additionally acknowledges the
Encomienda MINETAD-CNIO as part of the Plan for the
Advancement of Language Technology. O.R. and J.O. thank
the Foundation for Applied Medical Research (FIMA),
University of Navarra (Pamplona, Spain). This work was
partially funded by Consellería
de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic
funding of UID/BIO/04469/2013 unit and COMPETE 2020
(POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi
for useful feedback and discussions during the preparation of
the manuscript.info:eu-repo/semantics/publishedVersio
Performance Analysis of Machine Learning Approaches in Automatic Classification of Arabic Language
Text classification (TC) is a crucial subject. The number of digital files available on the internet is enormous. The goal of TC is to categorize texts into a series of predetermined groups. The number of studies conducted on the English database is significantly higher than the number of studies conducted on the Arabic database. Therefore, this research analyzes the performance of automatic TC of the Arabic language using Machine Learning (ML) approaches. Further, Single-label Arabic News Articles Datasets (SANAD) are introduced, which contain three different datasets, namely Akhbarona, Khaleej, and Arabiya. Initially, the collected texts are pre-processed in which tokenization and stemming occur. In this research, three kinds of stemming are employed, namely light stemming, Khoja stemming, and no- stemming, to evaluate the effect of the pre-processing technique on Arabic TC performance. Moreover, feature extraction and feature weighting are performed; in feature weighting, the term weighting process is completed by the term frequency- inverse document frequency (tf-idf) method. In addition, this research selects C4.5, Support Vector Machine (SVM), and Naïve Bayes (NB) as a classification algorithm. The results indicated that the SVM and NB methods had attained higher accuracy than the C4.5 method. NB achieved the maximum accuracy with a performance of 99.9%
- …