5 research outputs found
Developing Assamese Information Retrieval System Considering NLP Techniques: an attempt for a low resourced language
This paper engulfs the activities involved in developing a Monolingual Information Retrieval (IR) system for an Indo-Aryan language- Assamese. In a multilingual country like India, where 23 official languages exist, the task of digitizing local language contents is growing tremendously. To meet the need of each individual’s relevant information, monolingual Information Retrieval in own language is very essential. The work aims to develop a search engine that retrieves relevant information for the fired query in one's respective language. Various Linguists, Researchers collaborated with the work, provided valuable information and developed various important resources. Many informative resources, language resources, tools technologies were research, analyze, develop and applied in implementing the overall pipeline. The search engine is frame worked on open search platforms- Solr and Nutch with NLP applications embedded in it. Computational Linguistics or Natural Language Processing (NLP) enhances the performance of the IR system. Each phase of the system is being elaborately described in this paper and explained step-wise. This work is a remarkable contribution to Assamese language technology and an important application of NLP
Smartphone assisting convolutional neural networks for soil texture classification in dry and wet humid conditions in West Guwahati, Assam
Soil texture using a hydrometer or pipette method requires expertise, although these are accurate. A soil expert may help the farmer to detect the soil texture by analyzing the visual texture of the soil, which is not always accurate. This paper presents the smartphone image-based sand and clay soil classification in wet and dry humid conditions using Self Convolution Neural Network (SCNN) and finetuned MobileNet.A soil dataset of 576 soil images was prepared using a low-cost smartphone under natural light conditions. Different augmentation techniques such as shift, range, rotation, and zoom were applied to the soil dataset to increase the number of images in the soil dataset. The best performance of the MobileNet was reported at epoch 15 with a testing and training loss of 0.0091 and 0.0194, respectively. Though the SCNN model performed best at epoch 10 with a testing accuracy of 99.85%, the MobileNet reported less computation time (167.8s) than the SCNN (273.2s). The precision and recall of the models were 99.62 (MobileNet) and 99.84 (SCNN). The accuracy of the SCNN reported itself as the best model, whereas the computing time of the MobileNet reported itself as the best model in different humid conditions. The model can be used to replicate the traditional soil texture analysis method and the farmers can use it for better productivity
Unsupervised Extractive News Articles Summarization leveraging Statistical, Topic-Modelling and Graph-based Approaches
Due to the presence of large amounts of data and its exponential level generation, the manual approach of summarization takes more time, is biased, and needs linguistic professional experts. To avoid these substantial issues or to generate a succinct summary report, automatic text summarization is very much important. Three different approaches namely the statistical approach such as Term Frequency Inverse Document Frequency(TF-IDF), the topic modeling approach such as Latent Semantic Analysis (LSA), and graph-based approaches such as TextRank were applied to generate a concise summary for the benchmark the British Broadcasting Corporation (BBC) news articles summarization dataset. The domain-specific implementations of each approach in the five domains of the dataset and domain-agnostic prospects were explored in the paper while drawing various insights. The generated summaries were evaluated using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) framework, leveraging precision, recall, and f-measure metrics. The approaches were not only able to achieve a commendable ROUGE score but also outperform the previous works on the datase
Unsupervised Extractive News Articles Summarization leveraging Statistical, Topic-Modelling and Graph-based Approaches
952-962Due to the presence of large amounts of data and its exponential level generation, the manual approach of summarization takes more time, is biased, and needs linguistic professional experts. To avoid these substantial issues or to generate a succinct summary report, automatic text summarization is very much important. Three different approaches namely the statistical approach such as Term Frequency Inverse Document Frequency(TF-IDF), the topic modeling approach such as Latent Semantic Analysis (LSA), and graph-based approaches such as TextRank were applied to generate a concise summary for the benchmark the British Broadcasting Corporation (BBC) news articles summarization dataset. The domain-specific implementations of each approach in the five domains of the dataset and domain-agnostic prospects were explored in the paper while drawing various insights. The generated summaries were evaluated using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) framework, leveraging precision, recall, and f-measure metrics. The approaches were not only able to achieve a commendable ROUGE score but also outperform the previous works on the dataset