6 research outputs found

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    Semantic analysis with inference: high spots of the football match = СемантичеСкий анализ С логичеСким выводом: оСтрые моменты футбольного матча

    Full text link
    The paper describes a new version of the semantic analyzer SemETAP. Our approach is based on the assumption that the depth of understanding is growing with the number of inferences we can draw from the text. The salient features of SemETAP include: 1) intensive use of both linguistic and background knowledge. The former is incorporated in the Combinatorial Dictionary and the Grammar, and the latter is stored in the Ontology and Repository of Individuals. 2) Words and concepts of the ontology may be supplied with explicit decompositions for inference purposes. 3) Two levels of semantic structure are distinguished. Basic semantic structure (BSemS) interprets the text in terms of ontological elements. Enhanced semantic structure (EnSemS) extends BSemS by means of a series of inferences. 4) A new logical formalism Etalog is developed in which all inference rules are written. Semantic analysis with inference allows us to extract implicit information. The analyzer is tested on the task of interpreting high spots of the football match

    Knowledge-based approach to Winograd Schema Challenge

    No full text
    We propose a method to resolve anaphoric pronouns in the framework of Winograd Schema Challenge (WSC) by means of SemETAP –a knowledge-based semantic analyzer. WSC is a modern version of the famous Turing test. Its objective is to check a machine’s ability to exhibit intelligent behavior indistinguishable from that of a human. In contrast to other approaches to WSC, which are based on machine learning, our method uses explicit knowledge. An important advantage of this approach is that it gives an opportunity to provide an explanation of the result understandable for humans. SemETAP interprets the text using both linguistic and extralinguistic (background) knowledge. The former is stored in the grammar and the dictionary of the ETAP-4 system, and the latter is provided by the SemETAP ontology, inference rules and the repository of individuals. We show how this knowledge is used for resolving WSC. At the moment, the performanceof the algorithm is not high –54%. This is due to the incompleteness of the background knowledge supplied to the system. It is shown, however, that if the background knowledge is complete and accurate enough, the WSC test is resolved well and it is easily understandable why the system arrived at a particular conclusion.---Аннотация---Предлагается метод разрешения анафоры в рамках теста WinogradSchemaChallenge(WSC) с помощью семантического анализатора SemETAP, основанного на знаниях. Тест WSCпредставляет собой современный вариант теста Тьюринга и предназначен для проверки того, в какой степени компьютер владеет фоновыми знаниями и некоторыми мыслительными операциями, свойственными человеку. В отличие от других подходов к WSC, использующих машинное обучение, наш метод основан на эксплицитных знаниях. Важное преимущество такого подхода состоит в том, что он позволяет дать обоснование полученного результата, понятное человеку. Для интерпретации текста SemETAPиспользует как лингвистические, так и внелингвистические (фоновые) знания. Лингвистические знания собраны в словарях и грамматике системы ETAP-4, а фоновые знания –в онтологии, массиве правил вывода и в базе индивидов. Мы показываем, какиезнания и как используются для WSC-теста. Проведенная оценка алгоритма показала невысокий результат –54%. Этообъясняетсянедостаточнополнымифоновымизнаниями, вложеннымивсистему. Тем не менее, показано, что,если фоновые знания системы достаточно детальны, WSC-тест дает хороший результат, обоснование которого легко понимается человеком

    Knowledge-based approach to Winograd Schema Challenge

    Full text link
    We propose a method to resolve anaphoric pronouns in the framework of Winograd Schema Challenge (WSC) by means of SemETAP –a knowledge-based semantic analyzer. WSC is a modern version of the famous Turing test. Its objective is to check a machine’s ability to exhibit intelligent behavior indistinguishable from that of a human. In contrast to other approaches to WSC, which are based on machine learning, our method uses explicit knowledge. An important advantage of this approach is that it gives an opportunity to provide an explanation of the result understandable for humans. SemETAP interprets the text using both linguistic and extralinguistic (background) knowledge. The former is stored in the grammar and the dictionary of the ETAP-4 system, and the latter is provided by the SemETAP ontology, inference rules and the repository of individuals. We show how this knowledge is used for resolving WSC. At the moment, the performanceof the algorithm is not high –54%. This is due to the incompleteness of the background knowledge supplied to the system. It is shown, however, that if the background knowledge is complete and accurate enough, the WSC test is resolved well and it is easily understandable why the system arrived at a particular conclusion.---Аннотация---Предлагается метод разрешения анафоры в рамках теста WinogradSchemaChallenge(WSC) с помощью семантического анализатора SemETAP, основанного на знаниях. Тест WSCпредставляет собой современный вариант теста Тьюринга и предназначен для проверки того, в какой степени компьютер владеет фоновыми знаниями и некоторыми мыслительными операциями, свойственными человеку. В отличие от других подходов к WSC, использующих машинное обучение, наш метод основан на эксплицитных знаниях. Важное преимущество такого подхода состоит в том, что он позволяет дать обоснование полученного результата, понятное человеку. Для интерпретации текста SemETAPиспользует как лингвистические, так и внелингвистические (фоновые) знания. Лингвистические знания собраны в словарях и грамматике системы ETAP-4, а фоновые знания –в онтологии, массиве правил вывода и в базе индивидов. Мы показываем, какиезнания и как используются для WSC-теста. Проведенная оценка алгоритма показала невысокий результат –54%. Этообъясняетсянедостаточнополнымифоновымизнаниями, вложеннымивсистему. Тем не менее, показано, что,если фоновые знания системы достаточно детальны, WSC-тест дает хороший результат, обоснование которого легко понимается человеком

    Evolution of genes and genomes on the Drosophila phylogeny

    No full text
    Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species

    Evolution of genes and genomes on the Drosophila phylogeny

    Get PDF
    Affiliations des auteurs : cf page 216 de l'articleInternational audienceComparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species
    corecore