99 research outputs found
An Exploratory Study of Social Media Analysis for Rare Diseases using Machine Learning Algorithms: A case study of Trigeminal Neuralgia
Rare diseases, affecting approximately 30 million Americans, are often poorly understood by clinicians due to lack of familiarity with the disease and proper research. Patients with rare diseases are often unfavorably treated, especially those with extremely painful chronic orofacial rare disorders. In the absence of structured knowledge, such patients often choose social media to seek help from peers within patient-oriented social media communities thereby generating tremendous amounts of unstructured data daily. We investigate whether we can organize this unstructured data using machine learning to help members of rare communities find relevant information more efficiently in real-time. We chose Trigeminal Neuralgia (TN), an extremely painful rare disorder, as our case study and collected 20,000 social media TN posts. We categorized TN posts into Twitter (very short), and Facebook (short, medium, long) datasets based on message length and performed three clustering experiments. Results revealed GSDMM outperformed both K-means and Spherical K-means in clustering Facebook especially for short messages in terms of speed. For long messages, MDS reduction outperformed the PCA when both were used with K-means and Spherical K-means. Our study demonstrated the need for further topic modeling to utilize among high level clusters based on semantic analysis of posts within each cluster
Accelerating large-scale protein structure alignments with graphics processing units
<p>Abstract</p> <p>Background</p> <p>Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons.</p> <p>Findings</p> <p>We present <it>ppsAlign</it>, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, <it>ppsAlign </it>could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated <it>ppsAlign </it>on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH.</p> <p>Conclusions</p> <p><it>ppsAlign </it>is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU.</p
Novel global effector mining from the transcriptome of early life stages of the soybean cyst nematode Heterodera glycines
Soybean cyst nematode (SCN) Heterodera glycines is an obligate parasite that relies on the secretion of effector proteins to manipulate host cellular processes that favor the formation of a feeding site within host roots to ensure its survival. The sequence complexity and co-evolutionary forces acting upon these effectors remain unknown. Here we generated a de novo transcriptome assembly representing the early life stages of SCN in both a compatible and an incompatible host interaction to facilitate global effector mining efforts in the absence of an available annotated SCN genome. We then employed a dual effector prediction strategy coupling a newly developed nematode effector prediction tool, N-Preffector, with a traditional secreted protein prediction pipeline to uncover a suite of novel effector candidates. Our analysis distinguished between effectors that co-evolve with the host genotype and those conserved by the pathogen to maintain a core function in parasitism and demonstrated that alternative splicing is one mechanism used to diversify the effector pool. In addition, we confirmed the presence of viral and microbial inhabitants with molecular sequence information. This transcriptome represents the most comprehensive whole-nematode sequence currently available for SCN and can be used as a tool for annotation of expected genome assemblies
Finding shortest and nearly shortest path nodes in large substantially incomplete networks
Dynamic processes on networks, be it information transfer in the Internet,
contagious spreading in a social network, or neural signaling, take place along
shortest or nearly shortest paths. Unfortunately, our maps of most large
networks are substantially incomplete due to either the highly dynamic nature
of networks, or high cost of network measurements, or both, rendering
traditional path finding methods inefficient. We find that shortest paths in
large real networks, such as the network of protein-protein interactions (PPI)
and the Internet at the autonomous system (AS) level, are not random but are
organized according to latent-geometric rules. If nodes of these networks are
mapped to points in latent hyperbolic spaces, shortest paths in them align
along geodesic curves connecting endpoint nodes. We find that this alignment is
sufficiently strong to allow for the identification of shortest path nodes even
in the case of substantially incomplete networks. We demonstrate the utility of
latent-geometric path-finding in problems of cellular pathway reconstruction
and communication security
Phytonematode Peptide Effectors Exploit a Host Post‐Translational Trafficking Mechanism to the ER using a Novel Translocation Signal
Summary Cyst nematodes induce a multicellular feeding site within roots called a syncytium. It remains unknown how root cells are primed for incorporation into the developing syncytium. Furthermore, it is an enigma how CLAVATA3/ESR (CLE) peptide effectors secreted into the cytoplasm of the initial feeding cell could have an effect on plant cells so distant from where the nematode is feeding as the syncytium expands. Here we describe a novel translocation signal within nematode CLE effectors that is recognized by plant cell secretory machinery to redirect these peptides from the cytoplasm to the apoplast of plant cells. We show that the translocation signal is functionally conserved across CLE effectors identified in nematode species spanning three genera and multiple plant species, operative across plant cell types, and can traffic other unrelated small peptides from the cytoplasm to the apoplast of host cells via a previously unknown post‐translational mechanism of ER translocation. Our results uncover an unprecedented mechanism of effector trafficking by any plant pathogen to date and illustrates how phytonematodes can deliver effector proteins into host cells and then hijack plant cellular processes for their export back out of the cell to function as external signaling molecules to distant cells
Protein interaction network of alternatively spliced isoforms from brain links genetic risk factors for autism
Increased risk for autism spectrum disorders (ASD) is attributed to hundreds of genetic loci. The convergence of ASD variants have been investigated using various approaches, including protein interactions extracted from the published literature. However, these datasets are frequently incomplete, carry biases and are limited to interactions of a single splicing isoform, which may not be expressed in the disease-relevant tissue. Here we introduce a new interactome mapping approach by experimentally identifying interactions between brain-expressed alternatively spliced variants of ASD risk factors. The Autism Spliceform Interaction Network reveals that almost half of the detected interactions and about 30% of the newly identified interacting partners represent contribution from splicing variants, emphasizing the importance of isoform networks. Isoform interactions greatly contribute to establishing direct physical connections between proteins from the de novo autism CNVs. Our findings demonstrate the critical role of spliceform networks for translating genetic knowledge into a better understanding of human diseases
Structural Modeling of Protein Interactions by Analogy: Application to PSD-95
We describe comparative patch analysis for modeling the structures of multidomain proteins and protein complexes, and apply it to the PSD-95 protein. Comparative patch analysis is a hybrid of comparative modeling based on a template complex and protein docking, with a greater applicability than comparative modeling and a higher accuracy than docking. It relies on structurally defined interactions of each of the complex components, or their homologs, with any other protein, irrespective of its fold. For each component, its known binding modes with other proteins of any fold are collected and expanded by the known binding modes of its homologs. These modes are then used to restrain conventional molecular docking, resulting in a set of binary domain complexes that are subsequently ranked by geometric complementarity and a statistical potential. The method is evaluated by predicting 20 binary complexes of known structure. It is able to correctly identify the binding mode in 70% of the benchmark complexes compared with 30% for protein docking. We applied comparative patch analysis to model the complex of the third PSD-95, DLG, and ZO-1 (PDZ) domain and the SH3-GK domains in the PSD-95 protein, whose structure is unknown. In the first predicted configuration of the domains, PDZ interacts with SH3, leaving both the GMP-binding site of guanylate kinase (GK) and the C-terminus binding cleft of PDZ accessible, while in the second configuration PDZ interacts with GK, burying both binding sites. We suggest that the two alternate configurations correspond to the different functional forms of PSD-95 and provide a possible structural description for the experimentally observed cooperative folding transitions in PSD-95 and its homologs. More generally, we expect that comparative patch analysis will provide useful spatial restraints for the structural characterization of an increasing number of binary and higher-order protein complexes
Structural Similarity and Classification of Protein Interaction Interfaces
Interactions between proteins play a key role in many cellular processes.
Studying protein-protein interactions that share similar interaction interfaces
may shed light on their evolution and could be helpful in elucidating the
mechanisms behind stability and dynamics of the protein complexes. When two
complexes share structurally similar subunits, the similarity of the interaction
interfaces can be found through a structural superposition of the subunits.
However, an accurate detection of similarity between the protein complexes
containing subunits of unrelated structure remains an open problem
- …