Search CORE

390 research outputs found

Recommended from our members

TREES RECOMMENDATION IN AGROFORESTRY ECOSYSTEM USING NLP

Author: Sawant Omkar
Publication venue: CSUSB ScholarWorks
Publication date: 01/05/2023
Field of study

Agroforestry farming is one of the challenging sectors to grow the crops or farming or variety of trees from the ancient days due to erosion and desertification. This Culminating Experience Project explored how recommendation system can be developed and used in agroforestry. The research questions are Q1. What methods can be used to improve the accuracy and reliability of soil-based agroforestry tree species recommendation systems? Q2. How can agroforestry tree species recommendation systems be tailored to the needs of different stakeholders, such as smallholder farmers or agribusinesses? Q3. What will be the top three, tree recommendations using natural language processing based on varying soil content? Data was collected from two datasets the Agroforestry Database and the European Commission\u27s extension of the periodic Land Use/Land Cover Area Frame Survey. The findings are: 1) Various Natural language processing techniques such as cosine similarity, count vectorization, and TF-IDF can significantly enhance the system\u27s ability to analyze and process large amounts of Data collection, validation, and monitoring to improve the accuracy and reliability of soil-based agroforestry tree species recommendation systems. 2) Cosine similarity achieve to recommend tree species based on soil test report data collected by the European Commission\u27s extension of the periodic Land Use/Land Cover Area Frame Survey and tailored based on various soil properties helps the smallholders, stake holders, farmers to best decisions to increase their growth. 3) Natural language processing techniques such as cosine similarity, count vectorization, and TF-IDF can be employed to analyze soil data and identify the tree species that are most appropriate for different soil types.The conclusions are: 1) The system\u27s ability to analyze and process large volumes of data accurately, and the recommendations provided by the system can become more effective and reliable. 2) The system\u27s recommendations can become more relevant, practical, and acceptable, leading to higher adoption rates and better outcomes.3) Develop The proposed agroforestry tree species recommendation system provides top three trees recommendations using cosine similarity, TFIDF and Count vectorization techniques. Furthermore, areas for future research that emerged from this study include the need to improve the sustainability and productivity of agroforestry practices, enhance ecosystem services, and promote economic, social benefits and identify additional strategies for improving the accuracy and reliability by getting additional feedback about the trees recommendation from the stakeholders and farmers directly in design and development

CSUSB ScholarWorks

Attention is Not Always What You Need: Towards Efficient Classification of Domain-Specific Text

Author: Madhavji Nazim
Steinbacher John
Wahba Yasmen
Publication venue
Publication date: 30/03/2023
Field of study

For large-scale IT corpora with hundreds of classes organized in a hierarchy, the task of accurate classification of classes at the higher level in the hierarchies is crucial to avoid errors propagating to the lower levels. In the business world, an efficient and explainable ML model is preferred over an expensive black-box model, especially if the performance increase is marginal. A current trend in the Natural Language Processing (NLP) community is towards employing huge pre-trained language models (PLMs) or what is known as self-attention models (e.g., BERT) for almost any kind of NLP task (e.g., question-answering, sentiment analysis, text classification). Despite the widespread use of PLMs and the impressive performance in a broad range of NLP tasks, there is a lack of a clear and well-justified need to as why these models are being employed for domain-specific text classification (TC) tasks, given the monosemic nature of specialized words (i.e., jargon) found in domain-specific text which renders the purpose of contextualized embeddings (e.g., PLMs) futile. In this paper, we compare the accuracies of some state-of-the-art (SOTA) models reported in the literature against a Linear SVM classifier and TFIDF vectorization model on three TC datasets. Results show a comparable performance for the LinearSVM. The findings of this study show that for domain-specific TC tasks, a linear model can provide a comparable, cheap, reproducible, and interpretable alternative to attention-based models

arXiv.org e-Print Archive

Learning to write medical reports from EEG data

Author: Ana Maria Amaro de Sousa
Publication venue
Publication date: 22/07/2022
Field of study

Repositório Aberto da Universidade do Porto

Toward More Predictive Models by Leveraging Multimodal Data

Author: Srinivasan Sudarshan
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 15/05/2020
Field of study

Data is often composed of structured and unstructured data. Both forms of data have information that can be exploited by machine learning models to increase their prediction performance on a task. However, integrating the features from both these data forms is a hard, complicated task. This is all the more true for models which operate on time-constraints. Time-constrained models are machine learning models that work on input where time causality has to be maintained such as predicting something in the future based on past data. Most previous work does not have a dedicated pipeline that is generalizable to different tasks and domains, especially under time-constraints. In this work, we present a systematic, domain-agnostic pipeline for integrating features from structured and unstructured data while maintaining time causality for building models. We focus on the healthcare and consumer market domain and perform experiments, preprocess data, and build models to demonstrate the generalizability of the pipeline. More specifically, we focus on the task of identifying patients who are at risk of an imminent ICU admission. We use our pipeline to solve this task and show how augmenting unstructured data with structured data improves model performance. We found that by combining structured and unstructured data we can get a performance improvement of up to 8.5

University of Tennessee, Knoxville: Trace

Linking social media, medical literature, and clinical notes using deep learning.

Author: Asghari Mohsen
Publication venue: ThinkIR: The University of Louisville\u27s Institutional Repository
Publication date: 01/08/2021
Field of study

Researchers analyze data, information, and knowledge through many sources, formats, and methods. The dominant data format includes text and images. In the healthcare industry, professionals generate a large quantity of unstructured data. The complexity of this data and the lack of computational power causes delays in analysis. However, with emerging deep learning algorithms and access to computational powers such as graphics processing unit (GPU) and tensor processing units (TPUs), processing text and images is becoming more accessible. Deep learning algorithms achieve remarkable results in natural language processing (NLP) and computer vision. In this study, we focus on NLP in the healthcare industry and collect data not only from electronic medical records (EMRs) but also medical literature and social media. We propose a framework for linking social media, medical literature, and EMRs clinical notes using deep learning algorithms. Connecting data sources requires defining a link between them, and our key is finding concepts in the medical text. The National Library of Medicine (NLM) introduces a Unified Medical Language System (UMLS) and we use this system as the foundation of our own system. We recognize social media’s dynamic nature and apply supervised and semi-supervised methodologies to generate concepts. Named entity recognition (NER) allows efficient extraction of information, or entities, from medical literature, and we extend the model to process the EMRs’ clinical notes via transfer learning. The results include an integrated, end-to-end, web-based system solution that unifies social media, literature, and clinical notes, and improves access to medical knowledge for the public and experts

University of Louisville

Yleiskäyttöinen tekstinluokittelija suomenkielisille potilaskertomusteksteille

Author: Pursiainen Eetu
Publication venue
Publication date: 23/10/2017
Field of study

Medical texts are an underused source of data in clinical analytics. Extracting the relevant information from unstructured texts is difficult and while there are some tools available, they are often targeted for English texts. The situation is worse for smaller languages, such as Finnish. In this work, we reviewed literature in text mining and natural language processing fields in the scope of analyzing medical texts. Using the results of our literature review, we created an algorithm for information extraction from patient record texts. During this thesis work we created a decent text mining tool that works through text classification. This algorithm can be used detect medical conditions solely from medical texts. The usage of the algorithm is limited through the availability of large training data.Potilaskertomustekstejä käytetään kliinisessä analytiikassa huomattavan vähäisessä määrin. Olennaisen tiedon poimiminen tekstin joukosta on vaikeaa, ja vaikka siihen on työkaluja saatavilla, ovat ne useimmiten tehty englanninkielisille teksteille. Pienempien kielten, kuten suomen kohdalla tilanne on heikompi. Tässä työssä tehtiin kirjallisuuskatsaus tekstinlouhintaan ja luonnollisen kielen käsittelyyn liittyvään kirjallisuuteen, keskittyen erityisesti menetelmiin jotka soveltuvat lääketieteellisten tekstien analysointiin. Kirjallisuuskatsauksen pohjalta loimme algoritmin, joka soveltuu yleisesti lääketieteellisten tekstien luokitteluun. Tämän diplomityön osana luotiin tekstinlouhintatyökalu suomenkielisille lääketieteellisille teksteille. Kehitettyä algoritmia voidaan käyttää erilaisten tilojen tunnistamiseen potilaskertomusteksteistä. Algoritmin käyttöä kuitenkin rajoittaa tarve suurehkolle määrälle opetusdataa

Aaltodoc Publication Archive

Enrichment of ontologies using machine learning and summarization

Author: Liu Hao
Publication venue: Digital Commons @ NJIT
Publication date: 31/08/2020
Field of study

Biomedical ontologies are structured knowledge systems in biomedicine. They play a major role in enabling precise communications in support of healthcare applications, e.g., Electronic Healthcare Records (EHR) systems. Biomedical ontologies are used in many different contexts to facilitate information and knowledge management. The most widely used clinical ontology is the SNOMED CT. Placing a new concept into its proper position in an ontology is a fundamental task in its lifecycle of curation and enrichment. A large biomedical ontology, which typically consists of many tens of thousands of concepts and relationships, can be viewed as a complex network with concepts as nodes and relationships as links. This large-size node-link diagram can easily become overwhelming for humans to understand or work with. Adding concepts is a challenging and time-consuming task that requires domain knowledge and ontology skills. IS-A links (aka subclass links) are the most important relationships of an ontology, enabling the inheritance of other relationships. The position of a concept, represented by its IS-A links to other concepts, determines how accurately it is modeled. Therefore, considering as many parent candidate concepts as possible leads to better modeling of this concept. Traditionally, curators rely on classifiers to place concepts into ontologies. However, this assumes the accurate relationship modeling of the new concept as well as the existing concepts. Since many concepts in existing ontologies, are underspecified in terms of their relationships, the placement by classifiers may be wrong. In cases where the curator does not manually check the automatic placement by classifier programs, concepts may end up in wrong positions in the IS-A hierarchy. A user searching for a concept, without knowing its precise name, would not find it in its expected location. Automated or semi-automated techniques that can place a concept or narrow down the places where to insert it, are highly desirable. Hence, this dissertation is addressing the problem of concept placement by automatically identifying IS-A links and potential parent concepts correctly and effectively for new concepts, with the assistance of two powerful techniques, Machine Learning (ML) and Abstraction Networks (AbNs). Modern neural networks have revolutionized Machine Learning in vision and Natural Language Processing (NLP). They also show great promise for ontology-related tasks, including ontology enrichment, i.e., insertion of new concepts. This dissertation presents research using ML and AbNs to achieve knowledge enrichment of ontologies. Abstraction networks (AbNs), are compact summary networks that preserve a significant amount of the semantics and structure of the underlying ontologies. An Abstraction Network is automatically derived from the ontology itself. It consists of nodes, where each node represents a set of concepts that are similar in their structure and semantics. Various kinds of AbNs have been previously developed by the Structural Analysis of Biomedical Ontologies Center (SABOC) to support the summarization, visualization, and quality assurance (QA) of biomedical ontologies. Two basic kinds of AbNs are the Area Taxonomy and the Partial-area Taxonomy, which have been developed for various biomedical ontologies (e.g., SNOMED CT of SNOMED International and NCIt of the National Cancer Institute). This dissertation presents four enrichment studies of SNOMED CT, utilizing both ML and AbN-based techniques

Digital Commons @ New Jersey Institute of Technology (NJIT)