85 research outputs found

    DeepJoin: Joinable Table Discovery with Pre-trained Language Models

    Full text link
    Due to the usefulness in data enrichment for data analysis tasks, joinable table discovery has become an important operation in data lake management. Existing approaches target equi-joins, the most common way of combining tables for creating a unified view, or semantic joins, which tolerate misspellings and different formats to deliver more join results. They are either exact solutions whose running time is linear in the sizes of query column and target table repository or approximate solutions lacking precision. In this paper, we propose Deepjoin, a deep learning model for accurate and efficient joinable table discovery. Our solution is an embedding-based retrieval, which employs a pre-trained language model (PLM) and is designed as one framework serving both equi- and semantic joins. We propose a set of contextualization options to transform column contents to a text sequence. The PLM reads the sequence and is fine-tuned to embed columns to vectors such that columns are expected to be joinable if they are close to each other in the vector space. Since the output of the PLM is fixed in length, the subsequent search procedure becomes independent of the column size. With a state-of-the-art approximate nearest neighbor search algorithm, the search time is logarithmic in the repository size. To train the model, we devise the techniques for preparing training data as well as data augmentation. The experiments on real datasets demonstrate that by training on a small subset of a corpus, Deepjoin generalizes to large datasets and its precision consistently outperforms other approximate solutions'. Deepjoin is even more accurate than an exact solution to semantic joins when evaluated with labels from experts. Moreover, when equipped with a GPU, Deepjoin is up to two orders of magnitude faster than existing solutions

    Transcriptomic changes with increasing algal symbiont reveal the detailed process underlying establishment of coral-algal symbiosis

    Get PDF
    To clarify the establishment process of coral-algal symbiotic relationships, coral transcriptome changes during increasing algal symbiont densities were examined in juvenile corals following inoculation with the algae Symbiodinium goreaui (clade C) and S. trenchii (clade D), and comparison of their transcriptomes with aposymbiotic corals by RNA-sequencing. Since Symbiodinium clades C and D showed very different rates of density increase, comparisons were made of early onsets of both symbionts, revealing that the host behaved differently for each. RNA-sequencing showed that the number of differentially-expressed genes in corals colonized by clade D increased ca. two-fold from 10 to 20 days, whereas corals with clade C showed unremarkable changes consistent with a slow rate of density increase. The data revealed dynamic metabolic changes in symbiotic corals. In addition, the endocytosis pathway was also upregulated, while lysosomal digestive enzymes and the immune system tended to be downregulated as the density of clade D algae increased. The present dataset provides an enormous number of candidate symbiosis-related molecules that exhibit the detailed process by which coral-algal endosymbiosis is established

    CpG Island Methylator Phenotype in Primary Gastric Carcinoma

    Get PDF
    Gastric cancers (GC) with methylation of multiple CpG islands have a CpG island methylator phenotype (CIMP) and they can have different biological features. The aim of this study was to investigate the DNA methylation status of GCs and its association with their clinicopathological features. We evaluated the methylation status of four genes (MINT1, MINT2, MINT25 and MINT31) in 105 primary GCs using bisulfite-pyrosequencing analysis. We classified tumors as CIMP-high (CIMP-H), CIMP-low (CIMP-L) or CIMP-negative (CIMP-N) based on the methylation of MINT1, MINT2, MINT25, and MINT31. Overall, the prevalence of CIMP-H, CIMP-L and CIMP-N was 22% (23/105), 52% (55/105) and 26% (27/105), respectively. We observed a significant difference in tumor stage (stages I-II vs. stages III-IV) between CIMP-H and CIMP-N tumors (P = 0.0435). No significant differences were observed in clinicopathological characteristics (gender, age, location and tumor differentiation) among the CIMP phenotypes. The prognoses of patients with a CIMP-H tumor is likely to be better than those with CIMP-L or CIMP-N tumors, but these differences are not statistically significant (P = 0.074 and P = 0.200). Our results suggest that CIMP may define a subgroup of GCs with distinct biological features

    Origins and Evolution of MicroRNA Genes in Drosophila Species

    Get PDF
    MicroRNAs (miRs) regulate gene expression at the posttranscriptional level. To obtain some insights into the origins and evolutionary patterns of miR genes, we have identified miR genes in the genomes of 12 Drosophila species by bioinformatics approaches and examined their evolutionary changes. The results showed that the extant and ancestral Drosophila species had more than 100 miR genes and frequent gains and losses of miR genes have occurred during evolution. Although many miR genes appear to have originated from random hairpin structures in intronic or intergenic regions, duplication of miR genes has also contributed to the generation of new miR genes. Estimating the rate of nucleotide substitution of miR genes, we have found that newly arisen miR genes have a substitution rate similar to that of synonymous nucleotide sites in protein-coding genes and evolve almost neutrally. This suggests that most new miR genes have not acquired any important function and would become inactive. By contrast, old miR genes show a substitution rate much lower than the synonymous rate. Moreover, paired and unpaired nucleotide sites of miR genes tend to remain unchanged during evolution. Therefore, once miR genes acquired their functions, they appear to have evolved very slowly, maintaining essentially the same structures for a long time

    Origins and Evolution of MicroRNA Genes in Plant Species

    Get PDF
    MicroRNAs (miRNAs) are among the most important regulatory elements of gene expression in animals and plants. However, their origin and evolutionary dynamics have not been studied systematically. In this paper, we identified putative miRNA genes in 11 plant species using the bioinformatic technique and examined their evolutionary changes. Our homology search indicated that no miRNA gene is currently shared between green algae and land plants. The number of miRNA genes has increased substantially in the land plant lineage, but after the divergence of eudicots and monocots, the number has changed in a lineage-specific manner. We found that miRNA genes have originated mainly by duplication of preexisting miRNA genes or protein-coding genes. Transposable elements also seem to have contributed to the generation of species-specific miRNA genes. The relative importance of these mechanisms in plants is quite different from that in Drosophila species, where the formation of hairpin structures in the genomes seems to be a major source of miRNA genes. This difference in the origin of miRNA genes between plants and Drosophila may be explained by the difference in the binding to target mRNAs between plants and animals. We also found that young miRNA genes are less conserved than old genes in plants as well as in Drosophila species. Yet, nearly half of the gene families in the ancestor of flowering plants have been lost in at least one species examined. This indicates that the repertoires of miRNA genes have changed more dynamically than previously thought during plant evolution

    Evolutionary Changes of the Target Sites of Two MicroRNAs Encoded in the Hox Gene Cluster of Drosophila and Other Insect Species

    Get PDF
    MicroRNAs (miRs) are noncoding RNAs that regulate gene expression at the post-transcriptional level. In animals, the target sites of a miR are generally located in the 3′ untranslated regions (UTRs) of messenger RNAs. However, how the target sites change during evolution is largely unknown. MiR-iab-4 and miR-iab-4as are known to regulate the expression of two Hox genes, Abd-A and Ubx, in Drosophila melanogaster. We have therefore studied the evolutionary changes of these two miR genes and their target sites of the Hox genes in Drosophila, other insect species, and Daphnia. Our homology search identified a single copy of each miR gene located in the same genomic position of the Hox gene cluster in all species examined. The seed nucleotide sequence was also the same for all species. Searching for the target sites in all Hox genes, we found several target sites of miR-iab-4 and miR-iab-4as in Antp in addition to Abd-A and Ubx in most insect species examined. Our phylogenetic analysis of target sites in Abd-A, Ubx, and Antp showed that the old target sites, which existed before the divergence of the 12 Drosophila species, have been well maintained in most species under purifying selection. By contrast, new target sites, which were generated during Drosophila evolution, were often lost in some species and mostly located in unalignable regions of the 3′ UTRs. These results indicate that these regions can be a potential source of generating new target sites, which results in multiple target genes for each miR in animals

    Experimental Approach Reveals the Role of alx1 in the Evolution of the Echinoderm Larval Skeleton

    Get PDF
    AbstractOver the course of evolution, the acquisition of novel structures has ultimately led to wide variation in morphology among extant multicellular organisms. Thus, the origins of genetic systems for new morphological structures are a subject of great interest in evolutionary biology. The larval skeleton is a novel structure acquired in some echinoderm lineages via the activation of the adult skeletogenic machinery. Previously, VEGF signaling was suggested to have played an important role in the acquisition of the larval skeleton. In the present study, we compared expression patterns of Alx genes among echinoderm classes to further explore the factors involved in the acquisition of a larval skeleton. We found that the alx1 gene, originally described as crucial for sea urchin skeletogenesis, may have also played an essential role in the evolution of the larval skeleton. Unlike those echinoderms that have a larval skeleton, we found that alx1 of starfish was barely expressed in early larvae that have no skeleton. When alx1 overexpression was induced via injection of alx1 mRNA into starfish eggs, the expression patterns of certain genes, including those possibly involved in skeletogenesis, were altered. This suggested that a portion of the skeletogenic program was induced solely by alx1. However, we observed no obvious external phenotype or skeleton. We concluded that alx1 was necessary but not sufficient for the acquisition of the larval skeleton, which, in fact, requires several genetic events. Based on these results, we discuss how the larval expression of alx1 contributed to the acquisition of the larval skeleton in the putative ancestral lineage of echinoderms

    The draft genomes of soft-shell turtle and green sea turtle yield insights into the development and evolution of the turtle-specific body plan

    Get PDF
    The unique anatomical features of turtles have raised unanswered questions about the origin of their unique body plan. We generated and analyzed draft genomes of the soft-shell turtle (Pelodiscus sinensis) and the green sea turtle (Chelonia mydas); our results indicated the close relationship of the turtles to the bird-crocodilian lineage, from which they split ~267.9–248.3 million years ago (Upper Permian to Triassic). We also found extensive expansion of olfactory receptor genes in these turtles. Embryonic gene expression analysis identified an hourglass-like divergence of turtle and chicken embryogenesis, with maximal conservation around the vertebrate phylotypic period, rather than at later stages that show the amniote-common pattern. Wnt5a expression was found in the growth zone of the dorsal shell, supporting the possible co-option of limb-associated Wnt signaling in the acquisition of this turtle-specific novelty. Our results suggest that turtle evolution was accompanied by an unexpectedly conservative vertebrate phylotypic period, followed by turtle-specific repatterning of development to yield the novel structure of the shell

    Genome of the pitcher plant <i>Cephalotus </i>reveals genetic changes associated with carnivory

    Get PDF
    Carnivorous plants exploit animals as a nutritional source and have inspired long-standing questions about the origin and evolution of carnivory-related traits. To investigate the molecular bases of carnivory, we sequenced the genome of the heterophyllous pitcher plant Cephalotus follicularis, in which we succeeded in regulating the developmental switch between carnivorous and non-carnivorous leaves. Transcriptome comparison of the two leaf types and gene repertoire analysis identified genetic changes associated with prey attraction, capture, digestion and nutrient absorption. Analysis of digestive fluid proteins from C. follicularis and three other carnivorous plants with independent carnivorous origins revealed repeated co-options of stress-responsive protein lineages coupled with convergent amino acid substitutions to acquire digestive physiology. These results imply constraints on the available routes to evolve plant carnivory

    Stenting and interventional radiology for obstructive jaundice in patients with unresectable biliary tract carcinomas

    Get PDF
    Together with biliary drainage, which is an appropriate procedure for unresectable biliary cancer, biliary stent placement is used to improve symptoms associated with jaundice. Owing to investigations comparing percutaneous transhepatic biliary drainage (PTBD), surgical drainage, and endoscopic drainage, many types of stents are now available that can be placed endoscopically. The stents used are classified roughly as plastic stents and metal stents. Compared with plastic stents, metal stents are of large diameter, and have long-term patency (although they are expensive). For this reason, the use of metal stents is preferred for patients who are expected to survive for more than 6 months, whereas for patients who are likely to survive for less than 6 months, the use of plastic stents is not considered to be improper. Obstruction in a metal stent is caused by a tumor that grows within the stent through the mesh interstices. To overcome such problems, a covered metal stent was developed, and these stents are now used in patients with malignant distal biliary obstruction. However, this type of stent has been reported to have several shortcomings, such as being associated with the development of acute cholecystitis and stent migration. In spite of these shortcomings, evidence is expected to demonstrate its superiority over other types of stent
    corecore