4,504 research outputs found

    ENHANCED NEIGHBORHOOD NORMALIZED POINTWISE MUTUAL INFORMATION ALGORITHM FOR CONSTRAINT AWARE DATA CLUSTERING

    Get PDF
    Clustering of similar data items is an important technique in mining useful patterns. To enhance the performance of Clustering, training or learning is an important task. A constraint learning semi-supervised methodology is proposed which incorporates SVM and Normalized Pointwise Mutual Information Computation Strategy to increase the relevance as well as the performance efficiency of clustering. The SVM Classifier is of Hard Margin Type to roughly classify the initial set. A recursive re-clustering approach is proposed for achieving higher degree of relevance in the final clustered set by incorporating ENNPI algorithm. An overall enriched F-Measure value of 94.09% is achieved as compared to existing algorithms

    Enhanced neighborhood normalized pointwise mutual information algorithm for constraint aware data clustering

    Get PDF
    Clustering of similar data items is an important technique in mining useful patterns. To enhance the performance of Clustering, training or learning is an important task. A constraint learning semi-supervised methodology is proposed which incorporates SVM and Normalized Point wise Mutual Information Computation Strategy to increase the relevance as well as the performance efficiency of clustering. The SVM Classifier is of Hard Margin Type to roughly classify the initial set. A recursive re-clustering approach is proposed for achieving higher degree of relevance in the final clustered set by incorporating ENNPI algorithm. An overall enriched F-Measure value of 94.09% is achieved as compared to existing algorithms

    Business Intelligence from Web Usage Mining

    Full text link
    The rapid e-commerce growth has made both business community and customers face a new situation. Due to intense competition on one hand and the customer's option to choose from several alternatives business community has realized the necessity of intelligent marketing strategies and relationship management. Web usage mining attempts to discover useful knowledge from the secondary data obtained from the interactions of the users with the Web. Web usage mining has become very critical for effective Web site management, creating adaptive Web sites, business and support services, personalization, network traffic flow analysis and so on. In this paper, we present the important concepts of Web usage mining and its various practical applications. We further present a novel approach 'intelligent-miner' (i-Miner) to optimize the concurrent architecture of a fuzzy clustering algorithm (to discover web data clusters) and a fuzzy inference system to analyze the Web site visitor trends. A hybrid evolutionary fuzzy clustering algorithm is proposed in this paper to optimally segregate similar user interests. The clustered data is then used to analyze the trends using a Takagi-Sugeno fuzzy inference system learned using a combination of evolutionary algorithm and neural network learning. Proposed approach is compared with self-organizing maps (to discover patterns) and several function approximation techniques like neural networks, linear genetic programming and Takagi-Sugeno fuzzy inference system (to analyze the clusters). The results are graphically illustrated and the practical significance is discussed in detail. Empirical results clearly show that the proposed Web usage-mining framework is efficient

    Leveraging Mobile App Classification and User Context Information for Improving Recommendation Systems

    Get PDF
    Mobile apps play a significant role in current online environments where there is an overwhelming supply of information. Although mobile apps are part of our daily routine, searching and finding mobile apps is becoming a nontrivial task due to the current volume, velocity and variety of information. Therefore, app recommender systems provide users’ desired apps based on their preferences. However, current recommender systems and their underlying techniques are limited in effectively leveraging app classification schemes and context information. In this thesis, I attempt to address this gap by proposing a text analytics framework for mobile app recommendation by leveraging an app classification scheme that incorporates the needs of users as well as the complexity of the user-item-context information in mobile app usage pattern. In this recommendation framework, I adopt and empirically test an app classification scheme based on textual information about mobile apps using data from Google Play store. In addition, I demonstrate how context information such as user social media status can be matched with app classification categories using tree-based and rule-based prediction algorithms. Methodology wise, my research attempts to show the feasibility of textual data analysis in profiling apps based on app descriptions and other structured attributes, as well as explore mechanisms for matching user preferences and context information with app usage categories. Practically, the proposed text analytics framework can allow app developers reach a wider usage base through better understanding of user motivation and context information

    Unsupervised Machine Learning Approach for Tigrigna Word Sense Disambiguation

    Get PDF
    All human languages have words that can mean different things in different contexts. Word sense disambiguation (WSD) is an open problem of natural language processing, which governs the process of identifying which sense of a word (i.e. meaning) is used in a sentence, when the word has multiple meanings (polysemy). We use unsupervised machine learning techniques to address the problem of automatically deciding the correct sense of an ambiguous word Tigrigna texts based on its surrounding context. And we report experiments on four selected Tigrigna ambiguous words due to lack of sufficient training data; these are መደብ read as “medeb” has three different meaning (Program, Traditional bed and Grouping), ሓለፈ read as “halefe”; has four dissimilar meanings (Pass, Promote, Boss and Pass away), ሃደመ read as “hademe”; has two different meaning (Running and Building house) and, ከበረ read as “kebere”; has two different meaning (Respecting and Expensive).Finally we tested five clustering algorithms (simple k means, hierarchical agglomerative: Single, Average and complete link and Expectation Maximization algorithms) in the existing implementation of Weka 3.8.1 package. “Use training set” evaluation mode was selected to learn the selected algorithms in the preprocessed dataset. We have evaluated the algorithms for the four ambiguous words and achieved the best accuracy within the range of 67 to 83.3 for EM which is encouraging result. Keywords: Attribute- Relation File Format, Cross Validation, Consonant Vowel, Machine Readable Dictionary, Natural Language Processing, System for Ethiopic Representation in ASCII, Word Sense Disambiguatio

    Exploiting the conceptual space in hybrid recommender systems: a semantic-based approach

    Full text link
    Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, octubre de 200

    Text-based Sentiment Analysis and Music Emotion Recognition

    Get PDF
    Nowadays, with the expansion of social media, large amounts of user-generated texts like tweets, blog posts or product reviews are shared online. Sentiment polarity analysis of such texts has become highly attractive and is utilized in recommender systems, market predictions, business intelligence and more. We also witness deep learning techniques becoming top performers on those types of tasks. There are however several problems that need to be solved for efficient use of deep neural networks on text mining and text polarity analysis. First of all, deep neural networks are data hungry. They need to be fed with datasets that are big in size, cleaned and preprocessed as well as properly labeled. Second, the modern natural language processing concept of word embeddings as a dense and distributed text feature representation solves sparsity and dimensionality problems of the traditional bag-of-words model. Still, there are various uncertainties regarding the use of word vectors: should they be generated from the same dataset that is used to train the model or it is better to source them from big and popular collections that work as generic text feature representations? Third, it is not easy for practitioners to find a simple and highly effective deep learning setup for various document lengths and types. Recurrent neural networks are weak with longer texts and optimal convolution-pooling combinations are not easily conceived. It is thus convenient to have generic neural network architectures that are effective and can adapt to various texts, encapsulating much of design complexity. This thesis addresses the above problems to provide methodological and practical insights for utilizing neural networks on sentiment analysis of texts and achieving state of the art results. Regarding the first problem, the effectiveness of various crowdsourcing alternatives is explored and two medium-sized and emotion-labeled song datasets are created utilizing social tags. One of the research interests of Telecom Italia was the exploration of relations between music emotional stimulation and driving style. Consequently, a context-aware music recommender system that aims to enhance driving comfort and safety was also designed. To address the second problem, a series of experiments with large text collections of various contents and domains were conducted. Word embeddings of different parameters were exercised and results revealed that their quality is influenced (mostly but not only) by the size of texts they were created from. When working with small text datasets, it is thus important to source word features from popular and generic word embedding collections. Regarding the third problem, a series of experiments involving convolutional and max-pooling neural layers were conducted. Various patterns relating text properties and network parameters with optimal classification accuracy were observed. Combining convolutions of words, bigrams, and trigrams with regional max-pooling layers in a couple of stacks produced the best results. The derived architecture achieves competitive performance on sentiment polarity analysis of movie, business and product reviews. Given that labeled data are becoming the bottleneck of the current deep learning systems, a future research direction could be the exploration of various data programming possibilities for constructing even bigger labeled datasets. Investigation of feature-level or decision-level ensemble techniques in the context of deep neural networks could also be fruitful. Different feature types do usually represent complementary characteristics of data. Combining word embedding and traditional text features or utilizing recurrent networks on document splits and then aggregating the predictions could further increase prediction accuracy of such models

    A smart itsy bitsy spider for the Web

    Get PDF
    Artificial Intelligence Lab, Department of MIS, University of ArizonaAs part of the ongoing Illinois Digital Library Initiative project, this research proposes an intelligent agent approach to Web searching. In this experiment, we developed two Web personal spiders based on best first search and genetic algorithm techniques, respectively. These personal spiders can dynamically take a userâ s selected starting homepages and search for the most closely related homepages in the Web, based on the links and keyword indexing. A graphical, dynamic, Java-based interface was developed and is available for Web access. A system architecture for implementing such an agent-based spider is presented, followed by detailed discussions of benchmark testing and user evaluation results. In benchmark testing, although the genetic algorithm spider did not outperform the best first search spider, we found both results to be comparable and complementary. In user evaluation, the genetic algorithm spider obtained significantly higher recall value than that of the best first search spider. However, their precision values were not statistically different. The mutation process introduced in genetic algorithm allows users to find other potential relevant homepages that cannot be explored via a conventional local search process. In addition, we found the Java-based interface to be a necessary component for design of a truly interactive and dynamic Web agent
    corecore