94 research outputs found

    A Partitioning Based Algorithm to Fuzzy Tricluster

    Get PDF
    Fuzzy clustering allows an object to exist in multiple clusters and represents the affiliation of objects to clusters by memberships. It is extended to fuzzy coclustering by assigning both objects and features membership functions. In this paper we propose a new fuzzy triclustering (FTC) algorithm for automatic categorization of three-dimensional data collections. FTC specifies membership function for each dimension and is able to generate fuzzy clusters simultaneously on three dimensions. Thus FTC divides a three-dimensional cube into many little blocks which should be triclusters with strong coherent bonding among its members. The experimental studies on MovieLens demonstrate the strength of FTC in terms of accuracy compared to some recent popular fuzzy clustering and coclustering approaches

    Clustering in Aggregated User Profiles Across Multiple Social Networks

    Get PDF
    A social network is indeed an abstraction of related groups interacting amongst themselves to develop relationships. However, toanalyze any relationships and psychology behind it, clustering plays a vital role. Clustering enhances the predictability and discoveryof like mindedness amongst users. This article’s goal exploits the technique of Ensemble K-means clusters to extract the entities and their corresponding interestsas per the skills and location by aggregating user profiles across the multiple online social networks. The proposed ensemble clustering utilizes known K-means algorithm to improve results for the aggregated user profiles across multiple social networks. The approach produces an ensemble similarity measure and provides 70% better results than taking a fixed value of K or guessing a value of K while not altering the clustering method. This paper states that good ensembles clusters can be spawned to envisage the discoverability of a user for a particular interest

    Flexible document organization: comparing fuzzy and possibilistic approaches

    Get PDF
    System flexibility means the ability of a system to manage imprecise and/or uncertain information. A lot of commercially available Information Retrieval Systems (IRS) address this issue at the level of query formulation. Another way to make the flexibility of an IRS possible is by means of the flexible organization of documents. Such organization can be carried out using clustering algorithms by which documents can be automatically organized in multiple clusters simultaneously. Fuzzy and possibilistic clustering algorithms are examples of methods by which documents can belong to more than one cluster simultaneously with different membership degrees. The interpretation of these membership degrees can be used to quantify the compatibility of a document with a particular topic. The topics are represented by clusters and the clusters are identified by one or more descriptors extracted by a proposed method. We aim to investigate if the performance of each clustering algorithm can affect the extraction of meaningful overlapping cluster descriptors. Experiments were carried using well-known collections of documents and the predictive power of the descriptors extracted from both fuzzy and possibilistic document clustering was evaluated. The results prove that descriptors extracted after both fuzzy and possibilistic clustering are effective and can improve the flexible organization of documents.CAPES (Coordination for the Improvement of Higher Level Personnel) (PDSE grant 5983-11-8)FAPESP (Sao Paulo Research Foundation) (grant 2011/19850-9

    Detection of Masses in Digital Mammograms using K-means and Support Vector Machine

    Get PDF
    Breast cancer is a serious public health problem in several countries. Computer Aided Detection/Diagnosis systems (CAD/CADx) have been used with relative success aiding health care professionals. The goal of such systems is contribute on the specialist task aiding in the detection of different types of cancer at an early stage. This work presents a methodology for masses detection on digitized mammograms using the K-means algorithm for image segmentation and co-occurrence matrix to describe the texture of segmented structures. Classification of these structures is accomplished through Support Vector Machines, which separate them in two groups, using shape and texture descriptors: masses and non-masses. The methodology obtained 85% of accuracy

    Representation Learning for Words and Entities

    Get PDF
    This thesis presents new methods for unsupervised learning of distributed representations of words and entities from text and knowledge bases. The first algorithm presented in the thesis is a multi-view algorithm for learning representations of words called Multiview Latent Semantic Analysis (MVLSA). By incorporating up to 46 different types of co-occurrence statistics for the same vocabulary of english words, I show that MVLSA outperforms other state-of-the-art word embedding models. Next, I focus on learning entity representations for search and recommendation and present the second method of this thesis, Neural Variational Set Expansion (NVSE). NVSE is also an unsupervised learning method, but it is based on the Variational Autoencoder framework. Evaluations with human annotators show that NVSE can facilitate better search and recommendation of information gathered from noisy, automatic annotation of unstructured natural language corpora. Finally, I move from unstructured data and focus on structured knowledge graphs. I present novel approaches for learning embeddings of vertices and edges in a knowledge graph that obey logical constraints.Comment: phd thesis, Machine Learning, Natural Language Processing, Representation Learning, Knowledge Graphs, Entities, Word Embeddings, Entity Embedding

    Linking Folksonomies and Ontologies for Supporting Knowledge Sharing: a State of the Art

    Get PDF
    Deliverable of ISICIL ANR-funded projectSocial tagging systems have recently become very popular as a means to classify large sets of resources shared among on-line communities over the social Web. However, the folksonomies resulting from the use of these systems revealed limitations: tags are ambiguous and their spelling may vary, and folksonomies are difficult to exploit in order to retrieve or exchange information. This report compares the recent attempts to overcome these limitations and to support the use of folksonomies with formal languages and ontologies from the Semantic Web

    Representation Learning for Words and Entities

    Get PDF
    This thesis presents new methods for unsupervised learning of distributed representations of words and entities from text and knowledge bases. The first algorithm presented in the thesis is a multi-view algorithm for learning representations of words called Multiview LSA (MVLSA). Through experiments on close to 50 different views, I show that MVLSA outperforms other state-of-the-art word embedding models. After that, I focus on learning entity representations for search and recommendation and present the second algorithm of this thesis called Neural Variational Set Expansion (NVSE). NVSE is also an unsupervised learning method, but it is based on the Variational Autoencoder framework. Evaluations with human annotators show that NVSE can facilitate better search and recommendation of information gathered from noisy, automatic annotation of unstructured natural language corpora. Finally, I move from unstructured data and focus on structured knowledge graphs. Moreover, I present novel approaches for learning embeddings of vertices and edges in a knowledge graph that obey logical constraints

    Naive Bayesian Automatic Classification of Railway Service Complaint Text Based on Eigenvalue Extraction

    Get PDF
    Railways have developed rapidly in China for several decades. The hardware of railways has already reached the world\u27s leading level, but the level of service of these railways still has room for improvement. The railway management department receives a large number of passenger complaints every year and records them in text, which needs to be classified and analyzed. The text of railway complaints includes characteristics spanning wide business coverage, various events, serious colloquialisms, interference and useless information. When using the direct classification via traditional text categorization, the classification accuracy is low. The key to the automatic classification of such text lies in an eigenvalue extraction. The more accurate the eigenvalue extraction, the higher the accuracy of text classification. In this paper, the TF-IDF algorithm, TextRank algorithm and Word2vec algorithm are selected to extract text eigenvalues, and a railway complaint text classification method is constructed with a naive Bayesian classifier. The three types of eigenvalue extraction algorithms are compared. The TF-IDF algorithm, based on eigenvalue extraction, achieves the highest automatic text classification accuracy

    Pattern Recognition

    Get PDF
    Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition
    corecore