2 research outputs found

    The RareDis corpus: A corpus annotated with rare diseases, their signs and symptoms

    Get PDF
    Rare diseases affect a small number of people compared to the general population. However, more than 6,000 different rare diseases exist and, in total, they affect more than 300 million people worldwide. Rare diseases share as part of their main problem, the delay in diagnosis and the sparse information available for researchers, clinicians, and patients. Finding a diagnostic can be a very long and frustrating experience for patients and their families. The average diagnostic delay is between 6–8 years. Many of these diseases result in different manifestations among patients, which hampers even more their detection and the correct treatment choice. Therefore, there is an urgent need to increase the scientific and medical knowledge about rare diseases. Natural Language Processing (NLP) can help to extract relevant information about rare diseases to facilitate their diagnosis and treatments, but most NLP techniques require manually annotated corpora. Therefore, our goal is to create a gold standard corpus annotated with rare diseases and their clinical manifestations. It could be used to train and test NLP approaches and the information extracted through NLP could enrich the knowledge of rare diseases, and thereby, help to reduce the diagnostic delay and improve the treatment of rare diseases. The paper describes the selection of 1,041 texts to be included in the corpus, the annotation process and the annotation guidelines. The entities (disease, rare disease, symptom, sign and anaphor) and the relationships (produces, is a, is acron, is synon, increases risk of, anaphora) were annotated. The RareDis corpus contains more than 5,000 rare diseases and almost 6,000 clinical manifestations are annotated. Moreover, the Inter Annotator Agreement evaluation shows a relatively high agreement (F1-measure equal to 83.5% under exact match criteria for the entities and equal to 81.3% for the relations). Based on these results, this corpus is of high quality, supposing a significant step for the field since there is a scarcity of available corpus annotated with rare diseases. This could open the door to further NLP applications, which would facilitate the diagnosis and treatment of these rare diseases and, therefore, would improve dramatically the quality of life of these patients.This work was supported by the Madrid Government (Comunidad de Madrid) under the Multiannual Agreement with UC3M in the line of "Fostering Young Doctors Research" (NLP4RARE-CM-UC3M) and in the context of the V PRICIT (Regional Programme of Research and Technological Innovation; the Multiannual Agreement with UC3M in the line of "Excellence of University Professors (EPUC3M17)"; and a grant from Spanish Ministry of Economy and Competitiveness (SAF2017-86810-R)

    SISTEM DETEKSI KERUSAKAN PADA SISTEM OPERASI MENGGUNAKAN METODE TF-IDF DAN COSINE SIMILARITY

    Get PDF
    System damage to the operating system, errors in the operating system, with damage to software and hardware. The detection system is expected to be more flexible than an ordinary expert system, because in an ordinary expert system the consultation is guided while in the detection system using the text similarity method, the user can express the consultation using free expressions on the user consultation menu by using the user consultation text. The system uses the Term Frequency-Inverse Document Frequency method. Once the operating system malfunction query is filled in to the system, the query preprocessing is carried out and the text document is in the database, dedicating the weight of the relationship of a word to the document. After doing the word weighting process, then do the document crunching against the query using the Cosine Similarity method. A collection of text that has been classified in the database which is used as the basis of knowledge and the text consulted as a query, obtained the operating system damage detection system with two categories, namely software and hardware damage. The system is able to create consulted crashes by checking the similarity of the query text and knowledge base. The results of the evaluation using a matrix that shows an accuracy value of 70 percent, the next research in error detection using text similarity is expected to increase the reliability of the system with even greater assessments
    corecore