37 research outputs found

    A New Database (GCD) on Genome Composition for Eukaryote and Prokaryote Genome Sequences and Their Initial Analyses

    Get PDF
    Eukaryote genomes contain many noncoding regions, and they are quite complex. To understand these complexities, we constructed a database, Genome Composition Database, for the whole genome composition statistics for 101 eukaryote genome data, as well as more than 1,000 prokaryote genomes. Frequencies of all possible one to ten oligonucleotides were counted for each genome, and these observed values were compared with expected values computed under observed oligonucleotide frequencies of length 1–4. Deviations from expected values were much larger for eukaryotes than prokaryotes, except for fungal genomes. Mammalian genomes showed the largest deviation among animals. The results of comparison are available online at http://esper.lab.nig.ac.jp/genome-composition-database/

    MISHIMA - a new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Large nucleotide sequence datasets are becoming increasingly common objects of comparison. Complete bacterial genomes are reported almost everyday. This creates challenges for developing new multiple sequence alignment methods. Conventional multiple alignment methods are based on pairwise alignment and/or progressive alignment techniques. These approaches have performance problems when the number of sequences is large and when dealing with genome scale sequences.</p> <p>Results</p> <p>We present a new method of multiple sequence alignment, called MISHIMA (Method for Inferring Sequence History In terms of Multiple Alignment), that does not depend on pairwise sequence comparison. A new algorithm is used to quickly find rare oligonucleotide sequences shared by all sequences. Divide and conquer approach is then applied to break the sequences into fragments that can be aligned independently by an external alignment program. These partial alignments are assembled together to form a complete alignment of the original sequences.</p> <p>Conclusions</p> <p>MISHIMA provides improved performance compared to the commonly used multiple alignment methods. As an example, six complete genome sequences of bacteria species <it>Helicobacter pylori </it>(about 1.7 Mb each) were successfully aligned in about 6 hours using a single PC.</p

    Контейнерная поточно-транспортная технология подготовки селекционного зерна

    Get PDF
    Abstract. Low-level mechanization is one of the main reasons for the high costs in selection and primary seed production. Crop breeders use transport and loading facilities for seed material transporting in an unsystematic manner. (Research purpose) Development of technology of transport support in selection and seed production, including all transport and loading processes of the delivery of grain seeds from selection combines to storage facilities using containers for seed collection, transportation, drying, and storage. (Materials and methods) The authors have described a container flowtransport technology of selection grain harvesting at the stage of primary reproduction and developed a machine complex technology and a database of harvesting and transport machines for seed collection, transportation, drying, and storage. (Results and discussion) The authors have determined the type of transport and loading means for the container method of seed harvesting, transportation and storage recommended for use in selection and seed production. There are four distinctive novelty positions of the presented type: the ability to transport containers in 2 rows; increased loading height from 2 m to 3 m; maximum operating radius reaches 3.8 m (vs. 2.7 m); increased cargo capacity – by 460 kg. (Conclusions) The authors suggest using the developed methodology to improve the technological process of harvesting, transportation and postharvest processing of seed grain, organize this process, as well as select machine parameters and technical equipment on the farms of the Central region of Russia. It has been suggested that test prototypes of containers and a loader with a container tilter should be designed and manufactured for use in primary crop processing.Реферат. Низкий уровень механизации – одна из главных причин высоких затрат в селекции и первичном семеноводстве. Селекционеры бессистемно пользуются транспортными и погрузочными средствами для перевозки семенного материала. (Цель исследования) Разработать технологию транспортного обеспечения в селекции и семеноводстве, включающую в себя все транспортно-погрузочные процессы доставки семян зерновых культур от селекционных комбайнов до хранилищ с использованием контейнера для сбора, транспортировки, сушки и хранения семян. (Материалы и методы) Описали контейнерную поточно-транспортную технологию заготовки селекционного зерна на этапе первичного размножения. Разработали машинный комплекс технологии и базу данных уборочных и транспортных машин для сбора, транспортировки, сушки и хранения семян. (Результаты и обсуждение) Определили типаж транспортных и погрузочных средств при контейнерном способе уборки, транспортирования и хранения семян, рекомендуемых для применения в селекции и семеноводстве. Отличие типажа по новизне представили четырьмя позициями: возможность перевозить контейнеры в 2 ряда; увеличение высоты погрузки с 2 до 3 м; набольший вылет стрелы достигает 3,8 м (против 2,7 м); грузоподъемность выше на 460 кг. (Выводы) Рекомендовали использовать разработанную методику для совершенствования технологического процесса уборки, транспортировки и послеуборочной обработки семенного зерна, организации этого процесса, а также выбора параметров средств и технической оснащенности в хозяйствах Центрального региона России. Предложили разработать и изготовить опытные образцы контейнеров и погрузчика с кантователем контейнеров для первичной переработки урожая

    A circulating subset of iNKT cells mediates antitumor and antiviral immunity

    Get PDF
    新規の循環型iNKT細胞を発見 --抗腫瘍・抗ウイルス感染効果の高い免疫細胞療法の開発への貢献に期待--. 京都大学プレスリリース. 2022-10-24.Invariant natural killer T (iNKT) cells are a group of innate-like T lymphocytes that recognize lipid antigens. They are supposed to be tissue resident and important for systemic and local immune regulation. To investigate the heterogeneity of iNKT cells, we recharacterized iNKT cells in the thymus and peripheral tissues. iNKT cells in the thymus were divided into three subpopulations by the expression of the natural killer cell receptor CD244 and the chemokine receptor CXCR6 and designated as C0 (CD244⁻CXCR6⁻), C1 (CD244⁻CXCR6⁺), or C2 (CD244⁺CXCR6⁺) iNKT cells. The development and maturation of C2 iNKT cells from C0 iNKT cells strictly depended on IL-15 produced by thymic epithelial cells. C2 iNKT cells expressed high levels of IFN-γ and granzymes and exhibited more NK cell–like features, whereas C1 iNKT cells showed more T cell–like characteristics. C2 iNKT cells were influenced by the microbiome and aging and suppressed the expression of the autoimmune regulator AIRE in the thymus. In peripheral tissues, C2 iNKT cells were circulating that were distinct from conventional tissue-resident C1 iNKT cells. Functionally, C2 iNKT cells protected mice from the tumor metastasis of melanoma cells by enhancing antitumor immunity and promoted antiviral immune responses against influenza virus infection. Furthermore, we identified human CD244⁺CXCR6⁺ iNKT cells with high cytotoxic properties as a counterpart of mouse C2 iNKT cells. Thus, this study reveals a circulating subset of iNKT cells with NK cell–like properties distinct from conventional tissue-resident iNKT cells

    Development of new methods for evolutionary data analysis

    No full text
    My PhD study belongs to the field of computational biology and is focusing on development of new methods for molecular biology data analysis. My PhD paper includes three chapters, that are focusing on computational methods for different stages of biological study.In the first chapter, titled: MISHIMA: a new method of multiple sequence alignment, I explore a possibility of applying advanced computational techniques to the problem of multiple molecular sequence alignment. Sequence alignment is one of the central tasks in molecular biology DNA or protein sequences must be aligned before any comparison can be done between them. Although alignment of two sequences already reveals valuable information about sequence relationship, some studies require multiple sequences aligned together. Such studies include phylogenetic analysis, identification of conserved genome elements and protein secondary structure prediction.  Common methods of multiple sequence alignment are usually based on pairwise sequence comparison all pairs of sequences are compared separately and then multiple alignment is constructed through the progressive alignment procedure. This method works well for aligning relatively short sequences, but takes too long time to align genomic sequences, and also when the number of sequences is large. These days the continuously increasing amount of available genomic sequences of various organisms requires some more efficient techniques for aligning such huge data.  The new method of multiple sequence alignment, that I was developing during the last year MISHIMA (a Method for Identifying Sequence History In terms of Multiple Alignment) is an attempt to reduce the computational requirement of alignment procedure of multiple genomic sequences. This is achieved through the heuristic approach to the quick extraction of potential homology information from the sequences. After that sequences are aligned using the Divide and Conquer approach: regions of homology shared by multiple sequences are used as a points of splitting sequences into parts, which are aligned independently from each other by conventional alignment method. The partial alignments are then assembled together to construct the final multiple alignment.  The homology extraction step is the key part of this method. It is based on the observation that the chance of every sequence motif (short sequence fragment) to represent a homology signal is related with the frequency of this motif occurrence in the sequence dataset. Sequence motifs that are rare, or oppositely very abundant in the sequence dataset, are unlikely to happen in the region of homology. On the other hand, the motifs that are occurring exactly once in each of the input sequences have a good chance to belong to the conserved element, thus revealing the probable homology shared by multiple sequences.  The heuristic method of homology extraction used in MISHIMA depends on counting the number of occurrences of every sequence motif of up to K nucleotides long in the sequence dataset. The number of all sequence motifs of length K is very large (it is proportional to K4), so the important problem was to organize the information about motif frequencies. In MISHIMA method I use dictionary structure for storing the motif frequency data in efficient way, allowing information about motifs of up to 12 nucleotides long to be stored using about 0.5 GB of computer RAM.  MISHIMA alignment method was tested with several datasets, and compared with alternative methods. One of the datasets consisted of 10 complete mitochondrion genomic sequences of mammalian species. MISHIMA method could successfully construct the alignment for this dataset, taking about two minutes. ClustalW (most widely used multiple alignment software today) takes several hours to produce the alignment of the same data. Among the other test datasets was a set of 4 complete genomes of different strains of Streptococcus pyogenes, each about 2 MB long. MISHIMA method could align the dataset taking about 6 hours on Pentium 4 notebook machine with 1 GB of RAM. This test shows that this method can bring the possibility of large scale genomic multiple alignment experiment to the users of ordinary desktop or portable computers.   Second chapter of my work SMAP: Alignment with Reference Sequence is describing a technique for assisting a sequencing experiment. In a common whole genome shotgun-sequencing project a target species chromosome is divided into fragments, such as BAC (bacterial artificial chromosome), with length of several to one about hundred KB. These fragments are then sequenced, resulting in a number of sequence reads, usually less than 1 KB in length. These reads are assembled together to form contigs -a basic unit of resulting sequence. The location of each contig in the genome is not known at this stage.  The analysis of the set of contigs may be easier in case when a genome of a closely related species is already determined. In the process of sequencing genome of species A, genome of a closely related species B can be used as a reference, to supervise and assist the sequencing process. If A and B are close to each other most of the newly sequenced contigs will be found to be homologous to some part of the reference. This homology suggests their probable location in the A genome, that can be used to estimate the progress of sequencing process. Also this information can be used to assist the sequencing process, especially at the late stage of finishing the sequence. Comparison with reference sequence give the estimation of size and location of gaps -still unknown regions of target genome. Also reference sequence can help to assemble the contigs. In some cases the information about contig homology in reference sequence is enough to correctly assemble the continuous sequence of newly sequenced genome.  To implement this idea I developed SMAP -a software package for assisting a sequencing process with the help of the genomic DNA sequence of a closely related species. Its name came from the original idea -Sequence MAPping. BLAST local homology search tool is used for detecting homology between the original sequence fragments or contigs and the reference. SMAP then analyzes the result of BLAST search and performs the mapping and assembling of the set of contigs. SMAP was already applied in the process of chimpanzee clromosome 22 sequencing, when human chromosome 21 sequences were used as a reference.   Third part of my study Netview: Constructing and visually exploring phylogenetic networks is describing a new method for phylogenetic analysis. Phylogenetic relationship of a group of gene sequences is commonly represented as a tree. However a non-tree phylogenetic structure may be more appropriate in some cases. Such cases may result from recombination or horizontal gene transfer events. Also a non-tree structure may appear because of ambiguity in the sequence data. In this study I proposed a method to explore such non-tree structures, based on contradictions between the aligned sequence data and a phylogenetic tree topology constructed by using the neighbor-joining method.  The Netview method of network construction is based on a comparison of a multiple sequence alignment data and a phylogenetic tree, based on that alignment. Every alignment position can be characterized by a certain relation with the tree -it can either support tree topology or contradict to it. Alignment sites that support tree topology don t require further analysis, but alignment positions that contradict with the tree represent the data that may need some additional explanation. Such sites show a conflict between the sequence data and the tree, so a more complex topology, such as network, may be needed to explain the data. Netview method counts different patterns of conflicting data and constructs a network by introducing an additional dimension to the tree.  I developed a program Netview implementing this method. Netview implements a graphical interface that lets user select a particular pattern of incompatibility between the aligned sequence data and the phylogenetic tree. The network is then re-constructed for selected pattern. The sequence data, which is shown for each case, also plays important role in interpreting the observed network structure. Also Netview has a convenient 3-dimensional network viewing tool, that is useful for navigating and exploring a phylogenetic structure. It is convenient to be able to change the size and projection angle to examine the network carefully

    Human Contamination in Public Genome Assemblies

    No full text
    <div><p>Contamination in genome assembly can lead to wrong or confusing results when using such genome as reference in sequence comparison. Although bacterial contamination is well known, the problem of human-originated contamination received little attention. In this study we surveyed 45,735 available genome assemblies for evidence of human contamination. We used lineage specificity to distinguish between contamination and conservation. We found that 154 genome assemblies contain fragments that with high confidence originate as contamination from human DNA. Majority of contaminating human sequences were present in the reference human genome assembly for over a decade. We recommend that existing contaminated genomes should be revised to remove contaminated sequence, and that new assemblies should be thoroughly checked for presence of human DNA before submitting them to public databases.</p></div

    Prokaryote genomes containing at least 2 kbp of likely human originated (LHO) sequence.

    No full text
    <p>Prokaryote genomes containing at least 2 kbp of likely human originated (LHO) sequence.</p

    Phylogenetic trees comparing close homologs of human-originated regions.

    No full text
    <p>Phylogenetic trees comparing close homologs of human-originated regions.</p
    corecore