250 research outputs found
AiM: Taking Answers in Mind to Correct Chinese Cloze Tests in Educational Applications
To automatically correct handwritten assignments, the traditional approach is
to use an OCR model to recognize characters and compare them to answers. The
OCR model easily gets confused on recognizing handwritten Chinese characters,
and the textual information of the answers is missing during the model
inference. However, teachers always have these answers in mind to review and
correct assignments. In this paper, we focus on the Chinese cloze tests
correction and propose a multimodal approach (named AiM). The encoded
representations of answers interact with the visual information of students'
handwriting. Instead of predicting 'right' or 'wrong', we perform the sequence
labeling on the answer text to infer which answer character differs from the
handwritten content in a fine-grained way. We take samples of OCR datasets as
the positive samples for this task, and develop a negative sample augmentation
method to scale up the training data. Experimental results show that AiM
outperforms OCR-based methods by a large margin. Extensive studies demonstrate
the effectiveness of our multimodal approach.Comment: Accepted to COLING 202
Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology
BACKGROUND: Structural variants (SVs) are less common than single nucleotide polymorphisms and indels in the population, but collectively account for a significant fraction of genetic polymorphism and diseases. Base pair differences arising from SVs are on a much higher order (>100 fold) than point mutations; however, none of the current detection methods are comprehensive, and currently available methodologies are incapable of providing sufficient resolution and unambiguous information across complex regions in the human genome. To address these challenges, we applied a high-throughput, cost-effective genome mapping technology to comprehensively discover genome-wide SVs and characterize complex regions of the YH genome using long single molecules (>150Â kb) in a global fashion. RESULTS: Utilizing nanochannel-based genome mapping technology, we obtained 708 insertions/deletions and 17 inversions larger than 1Â kb. Excluding the 59 SVs (54 insertions/deletions, 5 inversions) that overlap with N-base gaps in the reference assembly hg19, 666 non-gap SVs remained, and 396 of them (60%) were verified by paired-end data from whole-genome sequencing-based re-sequencing or de novo assembly sequence from fosmid data. Of the remaining 270 SVs, 260 are insertions and 213 overlap known SVs in the Database of Genomic Variants. Overall, 609 out of 666 (90%) variants were supported by experimental orthogonal methods or historical evidence in public databases. At the same time, genome mapping also provides valuable information for complex regions with haplotypes in a straightforward fashion. In addition, with long single-molecule labeling patterns, exogenous viral sequences were mapped on a whole-genome scale, and sample heterogeneity was analyzed at a new level. CONCLUSION: Our study highlights genome mapping technology as a comprehensive and cost-effective method for detecting structural variation and studying complex regions in the human genome, as well as deciphering viral integration into the host genome. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/2047-217X-3-34) contains supplementary material, which is available to authorized users
Case report of a Li-Fraumeni syndrome-like phenotype with a de novo mutation in <i>CHEK2</i>
BACKGROUND: Cases of multiple tumors are rarely reported in China. In our study, a 57-year-old female patient had concurrent squamous cell carcinoma, mucoepidermoid carcinoma, brain cancer, bone cancer, and thyroid cancer, which has rarely been reported to date. METHODS: To determine the relationship among these multiple cancers, available DNA samples from the thyroid, lung, and skin tumors and from normal thyroid tissue were sequenced using whole exome sequencing. RESULTS: The notable discrepancies of somatic mutations among the 3 tumor tissues indicated that they arose independently, rather than metastasizing from 1 tumor. A novel deleterious germline mutation (chr22:29091846, G->A, p.H371Y) was identified in CHEK2, a LiâFraumeni syndrome causal gene. Examining the status of this novel mutation in the patient's healthy siblings revealed its de novo origin. CONCLUSION: Our study reports the first case of LiâFraumeni syndrome-like in Chinese patients and demonstrates the important contribution of de novo mutations in this type of rare disease
Diverse associations between pancreatic intra-, inter-lobular fat and the development of type 2 diabetes in overweight or obese patients
Pancreatic fat is associated with obesity and type 2 diabetes mellitus (T2DM); however, the relationship between different types of pancreatic fat and diabetes status remains unclear. Therefore, we aimed to determine the potential of different types of pancreatic fat accumulation as a risk factor for T2DM in overweight or obese patients. In total, 104 overweight or obese patients were recruited from January 2020 to December 2022. The patients were divided into three groups: normal glucose tolerance (NGT), impaired fasting glucose or glucose tolerance (IFG/IGT), and T2DM. mDixon magnetic resonance imaging (MRI) was used to detect pancreatic fat in all three groups of patients. The pancreatic head fat (PHF), body fat (PBF), and tail fat (PTF) in the IFG/IGT group were 21, 20, and 31% more than those in the NGT group, respectively. PHF, PBF, and PTF were positively associated with glucose metabolic dysfunction markers in the NGT group, and inter-lobular fat volume (IFV) was positively associated with these markers in the IFG/IGT group. The areas under the receiver operating characteristic curves for PHF, PBF, and PTF (used to evaluate their diagnostic potential for glucose metabolic dysfunction) were 0.73, 0.73, and 0.78, respectively, while those for total pancreatic volume (TPV), pancreatic parenchymal volume, IFV, and IFV/TPV were 0.67, 0.67, 0.66, and 0.66, respectively. These results indicate that intra-lobular pancreatic fat, including PHF, PTF, and PBF, may be a potential independent risk factor for the development of T2DM. Additionally, IFV exacerbates glucose metabolic dysfunction. Intra-lobular pancreatic fat indices were better than IFV for the diagnosis of glucose metabolic dysfunction
The Genome of the Netherlands:design, and project goals
Within the Netherlands a national network of biobanks has been established (Biobanking and Biomolecular Research Infrastructure-Netherlands (BBMRI-NL)) as a national node of the European BBMRI. One of the aims of BBMRI-NL is to enrich biobanks with different types of molecular and phenotype data. Here, we describe the Genome of the Netherlands (GoNL), one of the projects within BBMRI-NL. GoNL is a whole-genome-sequencing project in a representative sample consisting of 250 trio-families from all provinces in the Netherlands, which aims to characterize DNA sequence variation in the Dutch population. The parent-offspring trios include adult individuals ranging in age from 19 to 87 years (mean = 53 years; SD = 16 years) from birth cohorts 1910-1994. Sequencing was done on blood-derived DNA from uncultured cells and accomplished coverage was 14-15x. The family-based design represents a unique resource to assess the frequency of regional variants, accurately reconstruct haplotypes by family-based phasing, characterize short indels and complex structural variants, and establish the rate of de novo mutational events. GoNL will also serve as a reference panel for imputation in the available genome-wide association studies in Dutch and other cohorts to refine association signals and uncover population-specific variants. GoNL will create a catalog of human genetic variation in this sample that is uniquely characterized with respect to micro-geographic location and a wide range of phenotypes. The resource will be made available to the research and medical community to guide the interpretation of sequencing projects. The present paper summarizes the global characteristics of the project.</p
The Genome of the Netherlands: Design, and project goals
Within the Netherlands a national network of biobanks has been established (Biobanking and Biomolecular Research Infrastructure-Netherlands (BBMRI-NL)) as a national node of the European BBMRI. One of the aims of BBMRI-NL is to enrich biobanks with different types of molecular and phenotype data. Here, we describe the Genome of the Netherlands (GoNL), one of the projects within BBMRI-NL. GoNL is a whole-genome-sequencing project in a representative sample consisting of 250 trio-families from all provinces in the Netherlands, which aims to characterize DNA sequence variation in the Dutch population. The parent-offspring trios include adult individuals ranging in age from 19 to 87 years (mean=53 years; SD=16 years) from birth cohorts 1910-1994. Sequencing was done on blood-derived DNA from uncultured cells and accomplished coverage was 14-15x. The family-based design represents a unique resource to assess the frequency of regional variants, accurately reconstruct haplotypes by family-based phasing, characterize short indels and complex structural variants, and establish the rate of de novo mutational events. GoNL will also serve as a reference panel for imputation in the available genome-wide association studies in Dutch and other cohorts to refine association signals and uncover population-specific variants. GoNL will create a catalog of human genetic variation in this sample that is uniquely characterized with respect to micro-geographic location and a wide range of phenotypes. The resource will be made available to the research and medical community to guide the interpretation of sequencing projects. The present paper summarizes the global characteristics of the project
Novel loci and pathways significantly associated with longevity
Only two genome-wide significant loci associated with longevity have been identified so far, probably because of insufficient sample sizes of centenarians, whose genomes may harbor genetic variants associated with health and longevity. Here we report a genome-wide association study (GWAS) of Han Chinese with a sample size 2.7 times the largest previously published GWAS on centenarians. We identified 11 independent loci associated with longevity replicated in Southern-Northern regions of China, including two novel loci (rs2069837-IL6; rs2440012-ANKRD20A9P) with genome-wide significance and the rest with suggestive significance (Pâ<â3.65âĂâ10(â5)). Eight independent SNPs overlapped across Han Chinese, European and U.S. populations, and APOE and 5q33.3 were replicated as longevity loci. Integrated analysis indicates four pathways (starch, sucrose and xenobiotic metabolism; immune response and inflammation; MAPK; calcium signaling) highly associated with longevity (Pââ¤â0.006) in Han Chinese. The association with longevity of three of these four pathways (MAPK; immunity; calcium signaling) is supported by findings in other human cohorts. Our novel finding on the association of starch, sucrose and xenobiotic metabolism pathway with longevity is consistent with the previous results from Drosophilia. This study suggests protective mechanisms including immunity and nutrient metabolism and their interactions with environmental stress play key roles in human longevity
A high-quality human reference panel reveals the complexity and distribution of genomic structural variants
Structural variation (SV) represents a major source of differences between individual human genomes and has been linked to disease phenotypes. However, the majority of studies provide neither a global view of the full spectrum of these variants nor integrate them into reference panels of genetic variation. Here, we analyse whole genome sequencing data of 769 individuals from 250 Dutch families, and provide a haplotype-resolved map of 1.9 million genome variants across 9 different variant classes, including novel forms of complex indels, and retrotransposition-mediated insertions of mobile elements and processed RNAs. A large proportion are previously under reported variants sized between 21 and 100 bp. We detect 4 megabases of novel sequence, encoding 11 new transcripts. Finally, we show 191 known, trait-associated SNPs to be in strong linkage disequilibrium with SVs and demonstrate that our panel facilitates accurate imputation of SVs in unrelated individuals
- âŚ