18 research outputs found

    Generative Non-Markov Models for Information Extraction

    Get PDF
    Learning from unlabeled data is a long-standing challenge in machine learning. A principled solution involves modeling the full joint distribution over inputs and the latent structure of interest, and imputing the missing data via marginalization. Unfortunately, such marginalization is expensive for most non-trivial problems, which places practical limits on the expressiveness of generative models. As a result, joint models often encode strict assumptions about the underlying process such as fixed-order Markovian assumptions and employ simple count-based features of the inputs. In contrast, conditional models, which do not directly model the observed data, are free to incorporate rich overlapping features of the input in order to predict the latent structure of interest. It would be desirable to develop expressive generative models that retain tractable inference. This is the topic of this thesis. In particular, we explore joint models which relax fixed-order Markov assumptions, and investigate the use of recurrent neural networks for automatic feature induction in the generative process. We focus on two structured prediction problems: (1) imputing labeled segmentions of input character sequences, and (2) imputing directed spanning trees relating strings in text corpora. These problems arise in many applications of practical interest, but we are primarily concerned with named-entity recognition and cross-document coreference resolution in this work. For named-entity recognition, we propose a generative model in which the observed characters originate from a latent non-Markov process over words, and where the characters are themselves produced via a non-Markov process: a recurrent neural network (RNN). We propose a sampler for the proposed model in which sequential Monte Carlo is used as a transition kernel for a Gibbs sampler. The kernel is amenable to a fast parallel implementation, and results in fast mixing in practice. For cross-document coreference resolution, we move beyond sequence modeling to consider string-to-string transduction. We stipulate a generative process for a corpus of documents in which entity names arise from copying---and optionally transforming---previous names of the same entity. Our proposed model is sensitive to both the context in which the names occur as well as their spelling. The string-to-string transformations correspond to systematic linguistic processes such as abbreviation, typos, and nicknaming, and by analogy to biology, we think of them as mutations along the edges of a phylogeny. We propose a novel block Gibbs sampler for this problem that alternates between sampling an ordering of the mentions and a spanning tree relating all mentions in the corpus

    Avaliação da viabilidade de modelos filogenéticos na classificação de aplicações maliciosas

    Get PDF
    Orientador: André Ricardo Abed GrégioTese (Doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa : Curitiba, 03/02/2023Inclui referências: p. 150-170Área de concentração: Ciência da ComputaçãoResumo: Milhares de códigos maliciosos são criados, modificados com apoio de ferramentas de automação e liberados diariamente na rede mundial de computadores. Entre essas ameaças, malware são programas projetados especificamente para interromper, danificar ou obter acesso não autorizado a um sistema ou dispositivo. Para facilitar a identificação e a categorização de comportamentos comuns, estruturas e outras características de malware, possibilitando o desenvolvimento de soluções de defesa, existem estratégias de análise que classificam malware em grupos conhecidos como famílias. Uma dessas estratégias é a Filogenia, técnica baseada na Biologia, que investiga o relacionamento histórico e evolutivo de uma espécie ou outro grupo de elementos. Além disso, a utilização de técnicas de agrupamento em conjuntos semelhantes facilita tarefas de engenharia reversa para análise de variantes desconhecidas. Uma variante se refere a uma nova versão de um código malicioso que é criada a partir de modificações de malware existentes. O presente trabalho investiga a viabilidade do uso de filogenias e de métodos de agrupamento na classificação de variantes de malware para plataforma Android. Inicialmente foram analisados 82 trabalhos correlatos para verificação de configurações de experimentos do estado da arte. Após esse estudo, foram realizados quatro experimentos para avaliar uso de métricas de similaridade e de algoritmos de agrupamento na classificação de variantes e na análise de similaridade entre famílias. Propôs-se então um Fluxo de Atividades para Agrupamento de malware com o objetivo de auxiliar na definição de parâmetros para técnicas de agrupamentos, incluindo métricas de similaridade, tipo de algoritmo de agrupamento a ser utilizado e seleção de características. Como prova de conceito, foi proposto o framework Androidgyny para análise de amostras, extração de características e classificação de variantes com base em medóides (elementos representativos médios de cada grupo) e características exclusivas de famílias conhecidas. Para validar o Androidgyny foram feitos dois experimentos: um comparativo com a ferramenta correlata Gefdroid e outro, com exemplares das 25 famílias mais populosas do dataset Androzoo.Abstract: Thousands of malicious codes are created, modified with the support of tools of automation and released daily on the world wide web. Among these threats, malware are programs specifically designed to interrupt, damage, or gain access unauthorized access to a system or device. To facilitate identification and categorization of common behaviors, structures and other characteristics of malware, enabling the development of defense solutions, there are analysis strategies that classify malware into groups known as families. One of these strategies is Phylogeny, a technique based on the Biology, which investigates the historical and evolutionary relationship of a species or other group of elements. In addition, the use of clustering techniques on similar sets facilitates reverse engineering tasks for analysis of unknown variants. a variant refers to a new version of malicious code that is created from modifications of existing malware. The present work investigates the feasibility of using phylogenies and methods of grouping in the classification of malware variants for the Android platform. Initially 82 related works were analyzed to verify experiment configurations of the state of the art. After this study, four experiments were carried out to evaluate the use of similarity measures and clustering algorithms in the classification of variants and in the similarity analysis between families. In addition to these experiments, a Flow of Activities for Malware grouping with five distinct phases. This flow has purpose of helping to define parameters for clustering techniques, including measures of similarity, type of clustering algorithm to be used and feature selection. After defining the flow of activities, the Androidgyny framework was proposed, a prototype for sample analysis, feature extraction and classification of variants based on medoids and unique features of known families. To validate Androidgyny were Two experiments were carried out: a comparison with the related tool Gefdroid and another with copies of the 25 most populous families in the Androzoo dataset

    Graduate Academic Catalog 2020-2021

    Get PDF

    Iowa State University, Courses and Programs Catalog 2014–2015

    Get PDF
    The Iowa State University Catalog is a one-year publication which lists all academic policies, and procedures. The catalog also includes the following: information for fees; curriculum requirements; first-year courses of study for over 100 undergraduate majors; course descriptions for nearly 5000 undergraduate and graduate courses; and a listing of faculty members at Iowa State University.https://lib.dr.iastate.edu/catalog/1025/thumbnail.jp

    University catalog, 2016-2017

    Get PDF
    The catalog is a comprehensive reference for your academic studies. It includes a list of all degree programs offered at MU, including bachelors, masters, specialists, doctorates, minors, certificates, and emphasis areas. It details the university wide requirements, the curricular requirements for each program, and in some cases provides a sample plan of study. The catalog includes a complete listing and description of approved courses. It also provides information on academic policies, contact information for supporting offices, and a complete listing of faculty members. -- Page 3

    University of Nebraska at Omaha 2020-2021 Course Catalog

    Get PDF
    Located in one of America’s best cities to live, work and learn, the University of Nebraska at Omaha (UNO) is Nebraska’s premier metropolitan university. With more than 15,000 students enrolled in 200-plus programs of study, UNO is recognized nationally for its online education, graduate education, military friendliness, and community engagement efforts. Founded in 1908, UNO has served learners of all backgrounds for more than 100 years and is dedicated to another century of excellence both in the classroom and in the communit
    corecore