107 research outputs found

    A parallel Euler approach for large-scale biological sequence assembly

    Full text link
    Biological sequence assembly is an essential step for sequencing the genomes of organisms. Sequence assembly is very computing intensive especially for the large-scale sequence assembly. Parallel computing is an effective way to reduce the computing time and support the assembly for large amount of biological fragments. Euler sequence assembly algorithm is an innovative algorithm proposed recently. The advantage of this algorithm is that its computing complexity is polynomial and it provides a better solution to the notorious &ldquo;repeat&rdquo; problem. This paper introduces the parallelization of the Euler sequence assembly algorithm. All the Genome fragments generated by whole genome shotgun (WGS) will be assembled as a whole rather than dividing them into groups which may incurs errors due to the inaccurate group partition. The implemented system can be run on supercomputers, network of workstations or even network of PC computers. The experimental results have demonstrated the performance of our system.<br /

    A self-learning system for identifying harmful network information

    Full text link
    Network information identification is a &ldquo;hot&rdquo; topic currently. This paper designs a self-learning system using neural network algorithm for identifying the harmful network messages of both Chinese and English languages. The system segments the message into words and creates key word vector which characterizes the harmful network information. The BP algorithm is taken advantage of to train the neural network. The result of training and studying of the neural network can be applied onto many network applications based on message identification. The result of experiments demonstrates that our system has a high degree of accuracy.<br /

    Eulerian superpath approach to correct sequencing error in shotgun assembly

    Full text link
    This work describes an error correction method based on the Euler Superpath problem. Sequence data is mapped to an Euler Superpath dynamically by Merging Transformation. With restriction and guiding rules, data consistency is maintained and error paths are separated from correct data: Error edges are mapped to the correct ones and after substitution (of error edges with right paths), corresponding errors in the sequencing data are eliminated.<br /

    Computational insights into substrate binding and catalytic mechanism of the glutaminase domain of glucosamine-6-phosphate synthase (GlmS)

    Get PDF
    Glucosamine-6-phosphate synthase (GlmS) is a key enzyme in the biosynthesis of hexosamine across a variety of species including Escherichia coli, fungi, and humans. In particular, its glutaminase domain catalyzes the conversion of glutamine to glutamic acid with the release of ammonia. A catalytically important cysteinyl (Cys1) has been suggested to act as the mechanistic nucleophile after being activated by the N-terminal amine of the glutaminase domain (i.e., its own α-amine). Using molecular dynamics (MD) and quantum mechanics/molecular mechanics (QM/MM) computational methods, we have investigated the active site of the glutaminase domain, the protonation state of its N-terminal amine, substrate binding, and catalytic mechanism. In addition, the potential for an active site histidyl (His71) to alternatively act as the required base was examined. The N-terminal amine is concluded to have a reduced pKa due to being buried within the enzyme and the nearby presence of a protonated arginyl residue. Previous suggestions that this was due in part to hydrogen bonding with the hydroxyl of Thr606 is not supported; such an interaction is not consistent, and accounts for only 4% of the total duration of the MD simulation. The most feasible enzymatic pathway is found to involve a neutral N-terminal Cys1 α-amine acting as a base and directly deprotonating (i.e., without the involvement of a water, the Cys1SH thiol). The tetrahedral oxyanion intermediate formed during the mechanism is stabilized by a water and two enzyme residues: Asn98 and Gly99. Furthermore, the overall rate-limiting step of the mechanism is the nucleophilic attack of a water on the thioester cross-linked intermediate with a barrier of 74.4 kJ mol−1. An alternate mechanism in which His71 acts as the nucleophile-activating base, and which requires the Cys1 α-amine to be protonated, is calculated to be enzymatically feasible but to have a much higher overall rate-limiting barrier of 93.7 kJ mol−1

    A parallel DNA fragment assembly algorithm based on eulerian superpath approach

    Full text link
    Fragments assembly is among the core problems in the research of Genome. Although many assembly tools based on the &quot;overlap-layout-consensus&quot; paradigm are widely used such as in the Human Genome Project currently, they still can not resolve the &quot;repeats problem&quot; in the DNA sequencing. For the purpose of resolving such problem, Pevzner et al. put forward a new Euler Superpath assembly algorithm. But it needs a big and complex de Bruijin graph which consumes large amounts of memories i.e. becomes the bottleneck of the performance. We present a parallel DNA fragment assembly algorithm based on the Eularian Superpath theory and solve the bottleneck in the current assembly program. The experimental results demonstrate that our approach has a good scalability, and can be used in DNA assembly of middle and large size of eukaryote genome

    CALD : surviving various application-layer DDoS attacks that mimic flash crowd

    Full text link
    Distributed denial of service (DDoS) attack is a continuous critical threat to the Internet. Derived from the low layers, new application-layer-based DDoS attacks utilizing legitimate HTTP requests to overwhelm victim resources are more undetectable. The case may be more serious when suchattacks mimic or occur during the flash crowd event of a popular Website. In this paper, we present the design and implementation of CALD, an architectural extension to protect Web servers against various DDoS attacks that masquerade as flash crowds. CALD provides real-time detection using mess tests but is different from other systems that use resembling methods. First, CALD uses a front-end sensor to monitor thetraffic that may contain various DDoS attacks or flash crowds. Intense pulse in the traffic means possible existence of anomalies because this is the basic property of DDoS attacks and flash crowds. Once abnormal traffic is identified, the sensor sends ATTENTION signal to activate the attack detection module. Second, CALD dynamically records the average frequency of each source IP and check the total mess extent. Theoretically, the mess extent of DDoS attacks is larger than the one of flash crowds. Thus, with some parameters from the attack detection module, the filter is capable of letting the legitimate requests through but the attack traffic stopped. Third, CALD may divide the security modules away from the Web servers. As a result, it keeps maximum performance on the kernel web services, regardless of the harassment from DDoS. In the experiments, the records from www.sina.com and www.taobao.com have proved the value of CALD

    Discriminating DDoS flows from flash crowds using information distance

    Full text link
    Discriminating DDoS flooding attacks from flash crowds poses a tough challenge for the network security community. Because of the vulnerability of the original design of the Internet, attackers can easily mimic the patterns of legitimate network traffic to fly under the radar. The existing fingerprint or feature based algorithms are incapable to detect new attack strategies. In this paper, we aim to differentiate DDoS attack flows from flash crowds. We are motivated by the following fact: the attack flows are generated by the same prebuilt program (attack tools), however, flash crowds come from randomly distributed users all over the Internet. Therefore, the flow similarity among DDoS attack flows is much stronger than that among flash crowds. We employ abstract distance metrics, the Jeffrey distance, the Sibson distance, and the Hellinger distance to measure the similarity among flows to achieve our goal. We compared the three metrics and found that the Sibson distance is the most suitable one for our purpose. We apply our algorithm to the real datasets and the results indicate that the proposed algorithm can differentiate them with an accuracy around 65%.<br /

    Solvated interaction energy: from small-molecule to antibody drug design

    Get PDF
    Scoring functions are ubiquitous in structure-based drug design as an aid to predicting binding modes and estimating binding affinities. Ideally, a scoring function should be broadly applicable, obviating the need to recalibrate and refit its parameters for every new target and class of ligands. Traditionally, drugs have been small molecules, but in recent years biologics, particularly antibodies, have become an increasingly important if not dominant class of therapeutics. This makes the goal of having a transferable scoring function, i.e., one that spans the range of small-molecule to protein ligands, even more challenging. One such broadly applicable scoring function is the Solvated Interaction Energy (SIE), which has been developed and applied in our lab for the last 15 years, leading to several important applications. This physics-based method arose from efforts to understand the physics governing binding events, with particular care given to the role played by solvation. SIE has been used by us and many independent labs worldwide for virtual screening and discovery of novel small-molecule binders or optimization of known drugs. Moreover, without any retraining, it is found to be transferrable to predictions of antibody-antigen relative binding affinities and as accurate as functions trained on protein-protein binding affinities. SIE has been incorporated in conjunction with other scoring functions into ADAPT (Assisted Design of Antibody and Protein Therapeutics), our platform for affinity modulation of antibodies. Application of ADAPT resulted in the optimization of several antibodies with 10-to-100-fold improvements in binding affinity. Further applications included broadening the specificity of a single-domain antibody to be cross-reactive with virus variants of both SARS-CoV-1 and SARS-CoV-2, and the design of safer antibodies by engineering of a pH switch to make them more selective towards acidic tumors while sparing normal tissues at physiological pH
    • …