2,003 research outputs found

    Content-Based Image Retrieval Using Self-Organizing Maps

    Full text link

    Dagstuhl News January - December 2011

    Get PDF
    "Dagstuhl News" is a publication edited especially for the members of the Foundation "Informatikzentrum Schloss Dagstuhl" to thank them for their support. The News give a summary of the scientific work being done in Dagstuhl. Each Dagstuhl Seminar is presented by a small abstract describing the contents and scientific highlights of the seminar as well as the perspectives or challenges of the research topic

    Validação de heterogeneidade estrutural em dados de Crio-ME por comitês de agrupadores

    Get PDF
    Orientadores: Fernando José Von Zuben, Rodrigo Villares PortugalDissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: Análise de Partículas Isoladas é uma técnica que permite o estudo da estrutura tridimensional de proteínas e outros complexos macromoleculares de interesse biológico. Seus dados primários consistem em imagens de microscopia eletrônica de transmissão de múltiplas cópias da molécula em orientações aleatórias. Tais imagens são bastante ruidosas devido à baixa dose de elétrons utilizada. Reconstruções 3D podem ser obtidas combinando-se muitas imagens de partículas em orientações similares e estimando seus ângulos relativos. Entretanto, estados conformacionais heterogêneos frequentemente coexistem na amostra, porque os complexos moleculares podem ser flexíveis e também interagir com outras partículas. Heterogeneidade representa um desafio na reconstrução de modelos 3D confiáveis e degrada a resolução dos mesmos. Entre os algoritmos mais populares usados para classificação estrutural estão o agrupamento por k-médias, agrupamento hierárquico, mapas autoorganizáveis e estimadores de máxima verossimilhança. Tais abordagens estão geralmente entrelaçadas à reconstrução dos modelos 3D. No entanto, trabalhos recentes indicam ser possível inferir informações a respeito da estrutura das moléculas diretamente do conjunto de projeções 2D. Dentre estas descobertas, está a relação entre a variabilidade estrutural e manifolds em um espaço de atributos multidimensional. Esta dissertação investiga se um comitê de algoritmos de não-supervisionados é capaz de separar tais "manifolds conformacionais". Métodos de "consenso" tendem a fornecer classificação mais precisa e podem alcançar performance satisfatória em uma ampla gama de conjuntos de dados, se comparados a algoritmos individuais. Nós investigamos o comportamento de seis algoritmos de agrupamento, tanto individualmente quanto combinados em comitês, para a tarefa de classificação de heterogeneidade conformacional. A abordagem proposta foi testada em conjuntos sintéticos e reais contendo misturas de imagens de projeção da proteína Mm-cpn nos estados "aberto" e "fechado". Demonstra-se que comitês de agrupadores podem fornecer informações úteis na validação de particionamentos estruturais independetemente de algoritmos de reconstrução 3DAbstract: Single Particle Analysis is a technique that allows the study of the three-dimensional structure of proteins and other macromolecular assemblies of biological interest. Its primary data consists of transmission electron microscopy images from multiple copies of the molecule in random orientations. Such images are very noisy due to the low electron dose employed. Reconstruction of the macromolecule can be obtained by averaging many images of particles in similar orientations and estimating their relative angles. However, heterogeneous conformational states often co-exist in the sample, because the molecular complexes can be flexible and may also interact with other particles. Heterogeneity poses a challenge to the reconstruction of reliable 3D models and degrades their resolution. Among the most popular algorithms used for structural classification are k-means clustering, hierarchical clustering, self-organizing maps and maximum-likelihood estimators. Such approaches are usually interlaced with the reconstructions of the 3D models. Nevertheless, recent works indicate that it is possible to infer information about the structure of the molecules directly from the dataset of 2D projections. Among these findings is the relationship between structural variability and manifolds in a multidimensional feature space. This dissertation investigates whether an ensemble of unsupervised classification algorithms is able to separate these "conformational manifolds". Ensemble or "consensus" methods tend to provide more accurate classification and may achieve satisfactory performance across a wide range of datasets, when compared with individual algorithms. We investigate the behavior of six clustering algorithms both individually and combined in ensembles for the task of structural heterogeneity classification. The approach was tested on synthetic and real datasets containing a mixture of images from the Mm-cpn chaperonin in the "open" and "closed" states. It is shown that cluster ensembles can provide useful information in validating the structural partitionings independently of 3D reconstruction methodsMestradoEngenharia de ComputaçãoMestre em Engenharia Elétric

    Implementation of Fuzzy C-Means for Clustering the Majelis Ulama Indonesia (MUI) Fatwa Documents

    Get PDF
    Since the Indonesian Ulema Council (MUI) was established in 1975 until now, this institution has produced 201 edicts covering various fields. Text mining is one of the techniques used to collect data hidden from data that form text. One method of extracting text is Clustering. The present study implements the Fuzzy C-Means Clustering method in MUI fatwa documents to classify existing fatwas based on the similarity of the issues discussed. Silhouette Coefficient is used to analyze the resulting clusters, with the best value of 0.0982 with 10 clusters grouping. Classify fatwas based on the similarity of the issues discussed can make it easier and faster in the search for an Islamic law in Indonesia

    A DEEP LEARNING APPROACH FOR SENTIMENT ANALYSIS

    Get PDF
    La Sentiment Analysis si riferisce alla analisi qualitativa volta ad identificare e classificare opinioni contenute in frasi e testi, allo scopo di stabilire lo \u201cstato d\u2019animo\u201d dell\u2019autore rispetto ad un particolare argomento o prodotto, e di determinare se tale stato \ue8 di fatto positivo, negativo oppure neutrale. Le opinioni espresse in un testo, come ad esempio giudizi, sentimenti ed emozioni, sono di recente diventate oggetto di studio e di ricerca sia in ambito accademico che industriale. Sfortunatamente la comprensione del linguaggio, applicata a commenti di utenti, \ue8 un attivit\ue0 estremamente complessa per una macchina, specialmente se ci si riferisce ai contesti dei moderni social network. Le modalit\ue0 in cui le persone si esprimono in linguaggio naturale, sono molteplici, e l\u2019utilizzo \u201cinformale\u201d della lingua adottato tipicamente nei social netowrks, genera frasi spesso dense di errori, modi di dire (slang), costrutti sintattici \u201dpersonalizzati\u201d, o anche frasi arricchite da caratteri speciali (come l\u2019hashtag in Twitter), il che complica notevolmente l\u2019analisi. Recentemente, le tecniche di Deep Learning, stanno emergendo nel panorama del machine learning, come un modello computazionale che pu\uf2 essere adoperato con efficacia per scoprire relazioni semantiche complesse, all\u2019interno di un testo, anche senza la necessit\ue0 di dover individuare a priori caratteristiche (features) di tali relazioni. Questi approcci hanno migliorato l\u2019attuale stato dell\u2019arte in diversi settori della Sentiment Analysis, come ad esempio la classificazione di frasi o di documenti, l\u2019apprendimento basato su lexicon, fino ad arrivare alla analisi di fenomeni complessi come il cyber bullismo. I contributi di questa tesi sono di due tipi. Il primo contributo fornito, relativo ad aspetti generali di Sentiment Analysis, riguarda la proposta di un modello di rete neurale semi supervisionata, basato sulle reti di tipo Deep Belief, in grado di affrontare l\u2019incertezza dei dati insita nelle frasi testuali, con particolare riferimento alla lingua italiana. Il modello proposto \ue8 stato testato rispetto a diversi datasets presi dalla letteratura di riferimento, composti da testi relativi a critiche cinematografiche, adottando una rappresentazione dell\u2019informazione basata su vettori (Word2Vec) ed introducendo anche metodi derivati dal campo del Natural Language Processing (NLP). Il secondo contributo fornito in questa tesi, partendo dall\u2019assunto che il cyber bullismo pu\uf2 essere considerato come un caso particolare di Sentiment Analysis, propone un approccio non supervisionato alla rilevazione automatica di tracce di cyber bullismo all\u2019interno di social networks, basato sia su di una rete neurale di tipo GHSOM (Growing Hierarchical Self Organizing Map), sia su di un modello di caratteristiche (features) predefinito. Il modello non supervisionato proposto dimostra di raggiungere comunque risultati interessanti rispetto ai tipici modelli supervisionati, applicati solitamente in questo ambito.Sentiment Analysis refers to the process of computationally identifying and categorizing opinions expressed in a piece of text, in order to determine whether the writer\u2019s attitude towards a particular topic or product is positive, negative, or even neutral. The views expressed and its related concepts, such as feelings, judgments, and emotions have become recently a subject of study and research in both academic and industrial areas. Unfortunately language comprehension of user comments, especially in social networks, is inherently complex to computers. The ways in which humans express themselves with natural language are nearly unlimited and informal texts is riddled with typos, misspellings, badly set up syntactic constructions and also specific symbols (e.g. hashtags in Twitter) which exponentially complicate this task. Recently, deep learning approaches are emerging as powerful computational models that discover intricate semantic representations of texts automatically from data without hand-made feature engineering. These approaches have improved the state-of-the-art in many Sentiment Analysis tasks including sentiment classification of sentences or documents, sentiment lexicon learning and also in more complex problems as cyber bullying detection. The contributions of this work are twofold. First, related to the general Sentiment Analysis problem, we propose a semi-supervised neural network model, based on Deep Belief Networks, able to deal with data uncertainty for text sentences in Italian language. We test this model against some datasets from literature related to movie reviews, adopting a vectorized representation of text (Word2Vec) and exploiting methods from Natural Language Processing (NLP) pre-processing. Second, assuming that the cyber bullying phenomenon can be treated as a particular Sentiment Analysis problem, we propose an unsupervised approach to automatic cyber bullying detection in social networks, based both on Growing Hierarchical Self Organizing Map (GHSOM) and on a new specific features model, showing that our solution can achieve interesting results, respect to classical supervised approaches

    Strategies for image visualisation and browsing

    Get PDF
    PhDThe exploration of large information spaces has remained a challenging task even though the proliferation of database management systems and the state-of-the art retrieval algorithms is becoming pervasive. Signi cant research attention in the multimedia domain is focused on nding automatic algorithms for organising digital image collections into meaningful structures and providing high-semantic image indices. On the other hand, utilisation of graphical and interactive methods from information visualisation domain, provide promising direction for creating e cient user-oriented systems for image management. Methods such as exploratory browsing and query, as well as intuitive visual overviews of image collection, can assist the users in nding patterns and developing the understanding of structures and content in complex image data-sets. The focus of the thesis is combining the features of automatic data processing algorithms with information visualisation. The rst part of this thesis focuses on the layout method for displaying the collection of images indexed by low-level visual descriptors. The proposed solution generates graphical overview of the data-set as a combination of similarity based visualisation and random layout approach. Second part of the thesis deals with problem of visualisation and exploration for hierarchical organisation of images. Due to the absence of the semantic information, images are considered the only source of high-level information. The content preview and display of hierarchical structure are combined in order to support image retrieval. In addition to this, novel exploration and navigation methods are proposed to enable the user to nd the way through database structure and retrieve the content. On the other hand, semantic information is available in cases where automatic or semi-automatic image classi ers are employed. The automatic annotation of image items provides what is referred to as higher-level information. This type of information is a cornerstone of multi-concept visualisation framework which is developed as a third part of this thesis. This solution enables dynamic generation of user-queries by combining semantic concepts, supported by content overview and information ltering. Comparative analysis and user tests, performed for the evaluation of the proposed solutions, focus on the ways information visualisation a ects the image content exploration and retrieval; how e cient and comfortable are the users when using di erent interaction methods and the ways users seek for information through di erent types of database organisation

    Navigating the space of your music

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2008.Includes bibliographical references (p. 121-124).Navigating increasingly large personal music libraries is commonplace. Yet most music browsers do not enable their users to explore their collections in a guided and manipulable fashion, often requiring them to have a specific target in mind. MusicBox is a new music browser that provides this interactive control by mapping a music collection into a two-dimensional space, applying principal components analysis (PCA) to a combination of contextual and content-based features of each of the musical tracks. The resulting map shows similar songs close together and dissimilar songs farther apart. MusicBox is fully interactive and highly flexible: users can add and remove features from the included feature list, with PCA recomputed on the fly to remap the data. MusicBox is also extensible; we invite other music researchers to contribute features to its PCA engine. A small user study has shown that MusicBox helps users to find music in their libraries, to discover new music, and to challenge their assumptions about relationships between types of music.by Anita Shen Lillie.S.M

    Earth Observation Open Science and Innovation

    Get PDF
    geospatial analytics; social observatory; big earth data; open data; citizen science; open innovation; earth system science; crowdsourced geospatial data; citizen science; science in society; data scienc
    corecore