80 research outputs found

    Interaction-aware Factorization Machines for Recommender Systems

    Full text link
    Factorization Machine (FM) is a widely used supervised learning approach by effectively modeling of feature interactions. Despite the successful application of FM and its many deep learning variants, treating every feature interaction fairly may degrade the performance. For example, the interactions of a useless feature may introduce noises; the importance of a feature may also differ when interacting with different features. In this work, we propose a novel model named \emph{Interaction-aware Factorization Machine} (IFM) by introducing Interaction-Aware Mechanism (IAM), which comprises the \emph{feature aspect} and the \emph{field aspect}, to learn flexible interactions on two levels. The feature aspect learns feature interaction importance via an attention network while the field aspect learns the feature interaction effect as a parametric similarity of the feature interaction vector and the corresponding field interaction prototype. IFM introduces more structured control and learns feature interaction importance in a stratified manner, which allows for more leverage in tweaking the interactions on both feature-wise and field-wise levels. Besides, we give a more generalized architecture and propose Interaction-aware Neural Network (INN) and DeepIFM to capture higher-order interactions. To further improve both the performance and efficiency of IFM, a sampling scheme is developed to select interactions based on the field aspect importance. The experimental results from two well-known datasets show the superiority of the proposed models over the state-of-the-art methods

    PI: An open-source software package for validation of the SEQUEST result and visualization of mass spectrum

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Tandem mass spectrometry (MS/MS) has emerged as the leading method for high- throughput protein identification in proteomics. Recent technological breakthroughs have dramatically increased the efficiency of MS/MS data generation. Meanwhile, sophisticated algorithms have been developed for identifying proteins from peptide MS/MS data by searching available protein sequence databases for the peptide that is most likely to have produced the observed spectrum. The popular SEQUEST algorithm relies on the cross-correlation between the experimental mass spectrum and the theoretical spectrum of a peptide. It utilizes a simplified fragmentation model that assigns a fixed and identical intensity for all major ions and fixed and lower intensity for their neutral losses. In this way, the common issues involved in predicting theoretical spectra are circumvented. In practice, however, an experimental spectrum is usually not similar to its SEQUEST -predicted theoretical one, and as a result, incorrect identifications are often generated.</p> <p>Results</p> <p>Better understanding of peptide fragmentation is required to produce more accurate and sensitive peptide sequencing algorithms. Here, we designed the software PI of novel and exquisite algorithms that make a good use of intensity property of a spectrum.</p> <p>Conclusions</p> <p>We designed the software PI with the novel and effective algorithms which made a good use of intensity property of the spectrum. Experiments have shown that PI was able to validate and improve the results of SEQUEST to a more satisfactory degree.</p

    ProbPS: A new model for peak selection based on quantifying the dependence of the existence of derivative peaks on primary ion intensity

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The analysis of mass spectra suggests that the existence of derivative peaks is strongly dependent on the intensity of the primary peaks. Peak selection from tandem mass spectrum is used to filter out noise and contaminant peaks. It is widely accepted that a valid primary peak tends to have high intensity and is accompanied by derivative peaks, including isotopic peaks, neutral loss peaks, and complementary peaks. Existing models for peak selection ignore the dependence between the existence of the derivative peaks and the intensity of the primary peaks. Simple models for peak selection assume that these two attributes are independent; however, this assumption is contrary to real data and prone to error.</p> <p>Results</p> <p>In this paper, we present a statistical model to quantitatively measure the dependence of the derivative peak's existence on the primary peak's intensity. Here, we propose a statistical model, named ProbPS, to capture the dependence in a quantitative manner and describe a statistical model for peak selection. Our results show that the quantitative understanding can successfully guide the peak selection process. By comparing ProbPS with AuDeNS we demonstrate the advantages of our method in both filtering out noise peaks and in improving <it>de novo </it>identification. In addition, we present a tag identification approach based on our peak selection method. Our results, using a test data set, suggest that our tag identification method (876 correct tags in 1000 spectra) outperforms PepNovoTag (790 correct tags in 1000 spectra).</p> <p>Conclusions</p> <p>We have shown that ProbPS improves the accuracy of peak selection which further enhances the performance of de novo sequencing and tag identification. Thus, our model saves valuable computation time and improving the accuracy of the results.</p

    Non-invasive preoperative prediction of Edmondson-Steiner grade of hepatocellular carcinoma based on contrast-enhanced ultrasound using ensemble learning

    Get PDF
    PurposeThis study aimed to explore the clinical value of non-invasive preoperative Edmondson-Steiner grade of hepatocellular carcinoma (HCC) using contrast-enhanced ultrasound (CEUS).Methods212 cases of HCCs were retrospectively included, including 83 cases of high-grade HCCs and 129 cases of low-grade HCCs. Three representative CEUS images were selected from the arterial phase, portal vein phase, and delayed phase and stored in a 3-dimensional array. ITK-SNAP was used to segment the tumor lesions manually. The Radiomics method was conducted to extract high-dimensional features on these contrast-enhanced ultrasound images. Then the independent sample T-test and the Least Absolute Shrinkage and Selection Operator (LASSO) were employed to reduce the feature dimensions. The optimized features were modeled by a classifier based on ensemble learning, and the Edmondson Steiner grading was predicted in an independent testing set using this model.ResultsA total of 1338 features were extracted from the 3D images. After the dimension reduction, 10 features were finally selected to establish the model. In the independent testing set, the integrated model performed best, with an AUC of 0.931.ConclusionThis study proposed an Edmondson-Steiner grading method for HCC with CEUS. The method has good classification performance on independent testing sets, which can provide quantitative analysis support for clinical decision-making

    Upregulation of Barrel GABAergic Neurons Is Associated with Cross-Modal Plasticity in Olfactory Deficit

    Get PDF
    Background: Loss of a sensory function is often followed by the hypersensitivity of other modalities in mammals, which secures them well-awareness to environmental changes. Cellular and molecular mechanisms underlying cross-modal sensory plasticity remain to be documented. Methodology/Principal Findings: Multidisciplinary approaches, such as electrophysiology, behavioral task and immunohistochemistry, were used to examine the involvement of specific types of neurons in cross-modal plasticity. We have established a mouse model that olfactory deficit leads to a whisking upregulation, and studied how GABAergic neurons are involved in this cross-modal plasticity. In the meantime of inducing whisker tactile hypersensitivity, the olfactory injury recruits more GABAergic neurons and their fine processes in the barrel cortex, as well as upregulates their capacity of encoding action potentials. The hyperpolarization driven by inhibitory inputs strengthens the encoding ability of their target cells. Conclusion/Significance: The upregulation of GABAergic neurons and the functional enhancement of neuronal networks may play an important role in cross-modal sensory plasticity. This finding provides the clues for developing therapeuti

    The Genome of Ganderma lucidum Provide Insights into Triterpense Biosynthesis and Wood Degradation

    Get PDF
    BACKGROUND: Ganoderma lucidum (Reishi or Ling Zhi) is one of the most famous Traditional Chinese Medicines and has been widely used in the treatment of various human diseases in Asia countries. It is also a fungus with strong wood degradation ability with potential in bioenergy production. However, genes, pathways and mechanisms of these functions are still unknown. METHODOLOGY/PRINCIPAL FINDINGS: The genome of G. lucidum was sequenced and assembled into a 39.9 megabases (Mb) draft genome, which encoded 12,080 protein-coding genes and ∼83% of them were similar to public sequences. We performed comprehensive annotation for G. lucidum genes and made comparisons with genes in other fungi genomes. Genes in the biosynthesis of the main G. lucidum active ingredients, ganoderic acids (GAs), were characterized. Among the GAs synthases, we identified a fusion gene, the N and C terminal of which are homologous to two different enzymes. Moreover, the fusion gene was only found in basidiomycetes. As a white rot fungus with wood degradation ability, abundant carbohydrate-active enzymes and ligninolytic enzymes were identified in the G. lucidum genome and were compared with other fungi. CONCLUSIONS/SIGNIFICANCE: The genome sequence and well annotation of G. lucidum will provide new insights in function analyses including its medicinal mechanism. The characterization of genes in the triterpene biosynthesis and wood degradation will facilitate bio-engineering research in the production of its active ingredients and bioenergy

    The Genomes of Oryza sativa: A History of Duplications

    Get PDF
    We report improved whole-genome shotgun sequences for the genomes of indica and japonica rice, both with multimegabase contiguity, or almost 1,000-fold improvement over the drafts of 2002. Tested against a nonredundant collection of 19,079 full-length cDNAs, 97.7% of the genes are aligned, without fragmentation, to the mapped super-scaffolds of one or the other genome. We introduce a gene identification procedure for plants that does not rely on similarity to known genes to remove erroneous predictions resulting from transposable elements. Using the available EST data to adjust for residual errors in the predictions, the estimated gene count is at least 38,000–40,000. Only 2%–3% of the genes are unique to any one subspecies, comparable to the amount of sequence that might still be missing. Despite this lack of variation in gene content, there is enormous variation in the intergenic regions. At least a quarter of the two sequences could not be aligned, and where they could be aligned, single nucleotide polymorphism (SNP) rates varied from as little as 3.0 SNP/kb in the coding regions to 27.6 SNP/kb in the transposable elements. A more inclusive new approach for analyzing duplication history is introduced here. It reveals an ancient whole-genome duplication, a recent segmental duplication on Chromosomes 11 and 12, and massive ongoing individual gene duplications. We find 18 distinct pairs of duplicated segments that cover 65.7% of the genome; 17 of these pairs date back to a common time before the divergence of the grasses. More important, ongoing individual gene duplications provide a never-ending source of raw material for gene genesis and are major contributors to the differences between members of the grass family
    corecore