26 research outputs found

    АВТОМАТИЧЕСКОЕ ОПРЕДЕЛЕНИЕ ЯЗЫКА ТЕКСТОВОГО ДОКУМЕНТА ДЛЯ ОСНОВНЫХ ЕВРОПЕЙСКИХ ЯЗЫКОВ

    Get PDF
    Проводится анализ основных методов решения задачи автоматического определения языка текстового документа и предлагается алгоритм, основанный на комбинировании алфавитного метода, метода грамматических слов и алфавитно-триграммного метода, сочетающий в себе возможности минимального статистического и лингвистического анализа языковых данных и обеспечивающий эффективное решение указанной задачи

    Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021

    Get PDF
    The eighth edition of the Italian Conference on Computational Linguistics (CLiC-it 2021) was held at Università degli Studi di Milano-Bicocca from 26th to 28th January 2022. After the edition of 2020, which was held in fully virtual mode due to the health emergency related to Covid-19, CLiC-it 2021 represented the first moment for the Italian research community of Computational Linguistics to meet in person after more than one year of full/partial lockdown

    Graph-based approaches to word sense induction

    Get PDF
    This thesis is a study of Word Sense Induction (WSI), the Natural Language Processing (NLP) task of automatically discovering word meanings from text. WSI is an open problem in NLP whose solution would be of considerable benefit to many other NLP tasks. It has, however, has been studied by relatively few NLP researchers and often in set ways. Scope therefore exists to apply novel methods to the problem, methods that may improve upon those previously applied. This thesis applies a graph-theoretic approach to WSI. In this approach, word senses are identifed by finding particular types of subgraphs in word co-occurrence graphs. A number of original methods for constructing, analysing, and partitioning graphs are introduced, with these methods then incorporated into graphbased WSI systems. These systems are then shown, in a variety of evaluation scenarios, to return results that are comparable to those of the current best performing WSI systems. The main contributions of the thesis are a novel parameter-free soft clustering algorithm that runs in time linear in the number of edges in the input graph, and novel generalisations of the clustering coeficient (a measure of vertex cohesion in graphs) to the weighted case. Further contributions of the thesis include: a review of graph-based WSI systems that have been proposed in the literature; analysis of the methodologies applied in these systems; analysis of the metrics used to evaluate WSI systems, and empirical evidence to verify the usefulness of each novel method introduced in the thesis for inducing word senses

    Contours in Visualization

    Get PDF
    This thesis studies the visualization of set collections either via or defines as the relations among contours. In the first part, dynamic Euler diagrams are used to communicate and improve semimanually the result of clustering methods which allow clusters to overlap arbitrarily. The contours of the Euler diagram are rendered as implicit surfaces called blobs in computer graphics. The interaction metaphor is the moving of items into or out of these blobs. The utility of the method is demonstrated on data arising from the analysis of gene expressions. The method works well for small datasets of up to one hundred items and few clusters. In the second part, these limitations are mitigated employing a GPU-based rendering of Euler diagrams and mixing textures and colors to resolve overlapping regions better. The GPU-based approach subdivides the screen into triangles on which it performs a contour interpolation, i.e. a fragment shader determines for each pixel which zones of an Euler diagram it belongs to. The rendering speed is thus increased to allow multiple hundred items. The method is applied to an example comparing different document clustering results. The contour tree compactly describes scalar field topology. From the viewpoint of graph drawing, it is a tree with attributes at vertices and optionally on edges. Standard tree drawing algorithms emphasize structural properties of the tree and neglect the attributes. Adapting popular graph drawing approaches to the problem of contour tree drawing it is found that they are unable to convey this information. Five aesthetic criteria for drawing contour trees are proposed and a novel algorithm for drawing contour trees in the plane that satisfies four of these criteria is presented. The implementation is fast and effective for contour tree sizes usually used in interactive systems and also produces readable pictures for larger trees. Dynamical models that explain the formation of spatial structures of RNA molecules have reached a complexity that requires novel visualization methods to analyze these model\''s validity. The fourth part of the thesis focuses on the visualization of so-called folding landscapes of a growing RNA molecule. Folding landscapes describe the energy of a molecule as a function of its spatial configuration; they are huge and high dimensional. Their most salient features are described by their so-called barrier tree -- a contour tree for discrete observation spaces. The changing folding landscapes of a growing RNA chain are visualized as an animation of the corresponding barrier tree sequence. The animation is created as an adaption of the foresight layout with tolerance algorithm for dynamic graph layout. The adaptation requires changes to the concept of supergraph and it layout. The thesis finishes with some thoughts on how these approaches can be combined and how the task the application should support can help inform the choice of visualization modality

    Doctor of Philosophy

    Get PDF
    dissertationRapidly evolving technologies such as chip arrays and next-generation sequencing are uncovering human genetic variants at an unprecedented pace. Unfortunately, this ever growing collection of gene sequence variation has limited clinical utility without clear association to disease outcomes. As electronic medical records begin to incorporate genetic information, gene variant classification and accurate interpretation of gene test results plays a critical role in customizing patient therapy. To verify the functional impact of a given gene variant, laboratories rely on confirming evidence such as previous literature reports, patient history and disease segregation in a family. By definition variants of uncertain significance (VUS) lack this supporting evidence and in such cases, computational tools are often used to evaluate the predicted functional impact of a gene mutation. This study evaluates leveraging high quality genotype-phenotype disease variant data from 20 genes and 3986 variants, to develop gene-specific predictors utilizing a combination of changes in primary amino acid sequence, amino acid properties as descriptors of mutation severity and Naïve Bayes classification. A Primary Sequence Amino Acid Properties (PSAAP) prediction algorithm was then combined with well established predictors in a weighted Consensus sum in context of gene-specific reference intervals for known phenotypes. PSAAP and Consensus were also used to evaluate known variants of uncertain significance in the RET proto-oncogene as a model gene. The PSAAP algorithm was successfully extended to many genes and diseases. Gene-specific algorithms typically outperform generalized prediction tools. Characteristic mutation properties of a given gene and disease may be lost when diluted into genomewide data sets. A reliable computational phenotype classification framework with quantitative metrics and disease specific reference ranges allows objective evaluation of novel or uncertain gene variants and augments decision making when confirming clinical information is limited

    Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    Jahresbericht der Research Academy Leipzig 2007

    Get PDF
    Jahresbericht der Research Academy Leipzig 2007:Inhalt - Die Research Academy Leipzig - Rede zum einjährigen Jubiläum der Gründung der Research Academy Leipzig - Die Vorteile von Promotionsschulen Eine Betreuerperspektive - Fächerübergreifende Qualifikationsmaßnahmen: Die Veranstaltungen der Research Academy Leipzig 2007 - Präsentation in der Öffentlichkeit - Kleinkindbetreuung für Kinder der Doktorandinnen und Doktoranden - Das Graduiertenzentrum Mathematik/Informatik und Naturwissenschaften - Graduiertenschule Leipzig School of Natural Sciences – Building with Molecules and Nano-objects BuildMoNa - Deutsch-Französisches Doktorandenkollegium Statistical Physics of Complex Systems - International Max Planck Research School Mathematics in the Sciences - International Research Training Group Diffusion in Porous Materials - Graduiertenkolleg Analysis, Geometrie und ihre Verbindung zu den Naturwissenschaften - Graduiertenkolleg Wissensrepräsentation - Graduiertenkolleg Mechanistische und Anwendungsaspekte nichtkonventioneller Oxidationsreaktionen - Internationales Promotionsprogramm Forschung in Grenzgebieten der Chemie - Das Graduiertenzentrum Lebenswissenschaften - Graduiertenkolleg Interdisziplinäre Ansätze in den Neurowissenschaften InterNeuro - Graduiertenkolleg Funktion von Aufmerksamkeit bei kognitiven Prozessen - Internationales Promotionsprogramm Von der Signalverarbeitung zum Verhalten IPP Signal - International Max Planck Research School The Leipzig School of Human Origins - MD-PhD-Programm der Universität Leipzig - Graduiertenkolleg Universalität und Diversität: Sprachliche Strukturen und Prozesse - Das Graduiertenzentrum Geistes- und Sozialwissenschaften - Internationales Promotionsprogramm Transnationalisierung und Regionalisierung vom 18. Jahrhundert bis zur Gegenwart - Graduiertenkolleg Bruchzonen der Globalisierung - Deutsch als Fremdsprache Transcultural German Studies - Kultureller Austausch Altertumswissenschaftliche, historische und ethnologische Perspektiven - Praktiken gesellschaftlicher Raumproduktionen in Europa Geographische, historische und soziologische Perspektiven - Bildnachweise - Impressu

    The Organization Of Halakhic Knowledge In Early Modern Europe: The Transformation Of A Scholarly Culture

    Get PDF
    ABSTRACT THE ORGANIZATION OF HALAKHIC KNOWLEDGE IN EARLY MODERN EUROPE: THE TRANSFORMATION OF A SCHOLARLY CULTURE Tamara Morsel-Eisenberg David B. Ruderman Far from being abstract and immaterial, knowledge is impacted in myriad ways by non-intellectual factors, such as technology, organization, culture, and erudite practices. The scholarship of halakha, Jewish religious law, is a millennia-long tradition that was shaped by historical changes in its particular contexts. In sixteenth-century Europe specifically, historical circumstances — the advent of print, the dislocation of the Jewish communities of Ashkenaz (the German lands) reconstructed in Eastern Europe, and the shift to systematic organizational paradigms introduced by newly dominant works — led to a complete reordering of halakha. Drawing upon methods from the history of knowledge, social and cultural history, book history, media studies, and studies of knowledge-organization, this dissertation shows that the changes taking place in Europe between the 1470s and the 1570s influenced a profound transformation of the halakhic system. These changes in technology, organization, and community, fundamentally transformed Jewish law, which became more ordered and therefore more easily accessible, transmissible and applicable than its predecessor. To argue this, the dissertation’s first two units examine the shift from personal manuscript collections to printed books, from heterogeneous compilations to hyper-structured codifications, and from a panoply of localized customs to unified, universalized, Jewish law. The third unit studies the evolution of one form of halakhic writing – the responsum, epistolary exchanges about legal problems – to examine how the abovementioned changes shaped halakhic texts and their structure. An analysis of responsa as they evolve from letters, to documents in the rabbinic archive, to published works, displays the scholarly practices and forms of logic specific to each one of these media against the backdrop of the larger shifts in the history of knowledge. As a whole, this study shows that, in the sixteenth century, halakhic culture transformed from a flexible, heterogeneous, and personal universe to an increasingly stable, homogenous, and generalized legal system that henceforth shaped Jewish legal study and adjudication
    corecore