131 research outputs found

    HSRA: Hindi stopword removal algorithm

    Get PDF
    In the last few years, electronic documents have been the main source of data in many research areas like Web Mining, Information Retrieval, Artificial Intelligence, Natural Language Processing etc. Text Processing plays a vital role for processing structured or unstructured data from the web. Preprocessing is the main step in any text processing systems. One significant preprocessing technique is the elimination of functional words, also known as stopwords, which affects the performance of text processing tasks. An efficient stopword removal technique is required in all text processing tasks. In this paper, we are proposing a stopword removal algorithm for Hindi Language which is using the concept of a Deterministic Finite Automata (DFA). A large number of available works on stopword removal techniques are based on dictionary containing stopword lists. Then pattern matching technique is applied and the matched patterns, which is a stopword, is removed from the document. It is a time consuming task as searching process takes a long time. This makes the method inefficient and very expensive. In comparison of that, our algorithm has been tested on 200 documents and achieved 99% accuracy and also time efficient

    HSAS: Hindi Subjectivity Analysis System

    Get PDF
    With the development of Web 2.0, we are abundant with the documents expressing user's opinions, attitudes and sentiments in the textual form. This user generated textual content is an important source of information to make sound decisions by the organizations and the government. The textual information can be categorized into two types: facts and opinions. Subjectivity analysis is the automatic extraction of subjective information from the opinions posted by users and divides the content into subjective and objective sentences. Most of the works in subjectivity analysis exists for English language data but with the introduction of unicode standards UTF-8, Hindi language content on the web is growing very rapidly. In this paper, Hindi Subjectivity Analysis System (HSAS) is proposed. It explores two different methods of generating subjectivity lexicon using the available resources in English language and their comparative evaluation in performing the task of subjectivity analysis at the sentence level. The first method uses English language OpinionFinder subjectivity lexicon. The second method uses a small seed word list of Hindi language and expands it to generate subjectivity lexicon. Different evaluation strategies are used to validate the lexicon. We achieved 71.4% agreement with human annotators and ~80% accuracy in classification on a parallel data set in English and Hindi. Extensive simulations conducted on the test dataset confirm the validity of the suggested method

    HMDSAD: Hindi multi-domain sentiment aware dictionary

    Get PDF
    Sentiment Analysis is a fast growing sub area of Natural Language Processing which extracts user's opinion and classify it according to its polarity into positive, negative or neutral classes. This task of classification is required for many purposes like opinion mining, opinion summarization, contextual advertising and market analysis but it is domain dependent. The words used to convey sentiments in one domain is different from the words used to express sentiments in other domain and it is a costly task to annotate the corpora in every possible domain of interest before training the classifier for the classification. We are making an attempt to solve this problem by creating a sentiment aware dictionary using multiple domain data. The source domain data is labeled into positive and negative classes at the document level and the target domain data is unlabeled. The dictionary is created using both source and target domain data. The words used to express positive or negative sentiments in labeled data has relatedness weights assigned to it which signifies its co-occurrence frequency with the words expressing the similar sentiments in target domain. This work is carried out in Hindi, the official language of India. The web pages in Hindi language is booming very quickly after the introduction of UTF-8 encoding style. The dictionary can be used to classify the unlabeled data in the target domain by training a classifier

    Cloud-based smart water quality monitoring system using IoT sensors and machine learning

    Get PDF
    Low water quality is a major concern in urban as well as rural areas. Consumption of contaminated water leads to several health hazards. Early water quality detection can prevent most of such health-related issues. Parameters such as conductivity, pH, nitrate, biochemical oxygen demand, fecal coliform are significant parameters in deciding the quality of water. These parameters which are collected from groundwater samples at different places are highly correlated to each other. Therefore, machine learning algorithms are used for classification. The data collected from sensors are further analyzed using a cloud-based environment Ubidots to support distributed computing. The cloud environment is connected to display units and mobile devices. To predict the quality of water it is necessary to check the values associated with the quality attributes and for that reason, a decision tree classification model is used. The dataset is broken into subsets that have decision nodes and leaf nodes to decide classifications. The IoT based sensors are deployed in the water tank to measure the quality parameters which are further sent to the cloud. The proposed framework predicts the water quality and assesses the performance of the decision tree classifier. Decision Tree is used to infer decision rules based on various parameters read through sensors

    Product Recommendation System using Scalable Alternating Least Square Algorithm and Collaborative Filtering using Apache Spark in E-Commerce

    Get PDF
    Recommender System is tremendously used in numerous spaces, such as e-commerce and entertainment to enhance businesses by increasing the chance of sales. Earlier researches have focused more on traditional Machine Learning (ML) and Artificial Intelligence (AI)-based approaches. Developing a scalable recommender system has been challenging concerning high availability and fault tolerance. The traditional collaborative filtering approach used with the recommender system also faces challenges due to the absence of explicit product ratings by the customer and the cold start problem. We have proposed a scalable Alternating least square (ALS) and collaborative filtering-based approach for the recommender system. The experimental results of the proposed hybrid approach show improved performance as compared with the traditional approach

    Development of Facial Expression Classifier using Neural Networks

    Get PDF
    A person's emotional and mental well being, together with the age, sex, race, can be easily depicted by one's face. A crucial role is played by facial expressions in day-today social interactions. An individual's emotional level as well as behavioral manners can be interpreted by these expressions. Facial expression classifier is a evolving, demanding and curious problem in computer vision. It has its potential applications in the field of robotics, behavioral science, human computer interaction, video games etc.. It assists in building more intelligent systems which have better ability to interpret human emotions. In this paper, a facial expression classifier is proposed based on Convolution Neural Networks (CNN). CNNs are biologically-inspired variants of multi-layer preceptor (MLP) networks. They use an architecture which is particularly well suitable to classify images. Detection of facial expression can be enhanced by

    Sentiment Analysis for E-Commerce Products Using Natural Language Processing

    Get PDF
    Sentiment analysis is one of the ways to evaluate the attitude of consumers towards products and services. E-commerce businesses have grown to a larger level in recent years. Customers' opinions and preferences are collected to analyze them further to boost online businesses. Collecting real-time structured and unstructured data and performing sentiment analysis on them are challenging and need to be addressed. We have used PySpark, and resilient distributed dataset (RDD) based sentiment analysis using Spark NLP to address scalability and availability issues in sentiment analysis on the e-commerce platform. We have also used FLASK-based Restful APIs and Scrapy for web scrapping to collect useful data from an e-commerce site. Our findings indicate that the proposed method of Natural Language Processing (NLP) for e-commerce products in real-time has enhanced efficiency in terms of scalability, availability, and faster data collectio

    Sentiment analysis in a resource scarce language: Hindi

    Get PDF
    A common human behavior is to take other’s opinion before taking any decision. With the tremendous availability of documents which express opinions on different issues, the challenge arises to analyze it and produce useful knowledge from it. Many works in the area of Sentiment Analysis is available for English language. From last few years, opinion-rich resources are booming in other languages and hence there is a need to perform Sentiment Analysis in those languages. In this paper, a Sentiment Analysis in Hindi Languag

    Reputation systems: Evaluating reputation among all good sellers

    Get PDF
    A reputation system assists people selecting whom to trust, encourages trustworthy action, and discourages participation of unskilled or dishonest. The “all good reputation” problem is common in current reputation systems, especially in e-commerce domain, making it difficult for buyers to choose credible sellers. Observing high growth of online data in Hindi language, in this paper, we propose a reputation system in this language. The functions of this system include (1) review mining for different criteria of online transactions, (2) calculation of reputation rating using Bayesian method, (3) calculation of reputation weight using typed dependency relation representation and Latent Dirichlet Allocation topic modeling technique for each criteria from user reviews, and (4) ranking sellers based on computed reputation score. Extensive simulations conducted on eBay dataset and TripAdvisor dataset show its

    Generating multilingual subjectivity resources using english language

    Get PDF
    The text data can be of two types: facts and opinions. With the introduction of UTF-8 standards and development of Web 2.0, we are in abundance of opinionated text data available in many languages on the web. Subjectivity analysis aims at dividing those opinionated data into subjective and objective sentences and automatic extraction of subjective information from it. Many subjectivity resources as well as subjectivity analysis works are available in English language. In this paper, we examine different methods of generating subjectivity resources in Hindi language and other Indian languages using resources and tools available in English language. Two methods are proposed using wordlevel subjectivity annotations. These methods use English language OpinionFinder subjectivity lexicon and a small seed word list of Hindi language which can be expanded to generate subjectivity lexicon, respectively. Four methods are proposed using sentencelevel subjectivity annotations. These methods use subjectivity annotated corpora and tools available in English language. Different evaluation strategies are used to validate the generated lexicon and corpora in Hindi language. The simulations conducted confirm that these methods are effective in rapidly creating subjectivity resources in Hindi language and other Indian languages
    • …
    corecore