2,067 research outputs found
Optimizing Performance of Hadoop with Parameter Tuning
Optimizing Hadoop with the parameter tuning is an effective way to greatly improve the performance, but it usually costs too much time to identify the optimal parameters configuration because there are many parameters. Users are always blindly adjust too many parameters and are sometimes confused about which one could be changed at a higher-priority. To make optimization easier, classifying the parameter based on different applications will be helpful. In this paper, we will introduce a method that can classify these parameters in order that users can optimize performance more quickly and effectively for different applications
A Deep Cascade Model for Multi-Document Reading Comprehension
A fundamental trade-off between effectiveness and efficiency needs to be
balanced when designing an online question answering system. Effectiveness
comes from sophisticated functions such as extractive machine reading
comprehension (MRC), while efficiency is obtained from improvements in
preliminary retrieval components such as candidate document selection and
paragraph ranking. Given the complexity of the real-world multi-document MRC
scenario, it is difficult to jointly optimize both in an end-to-end system. To
address this problem, we develop a novel deep cascade learning model, which
progressively evolves from the document-level and paragraph-level ranking of
candidate texts to more precise answer extraction with machine reading
comprehension. Specifically, irrelevant documents and paragraphs are first
filtered out with simple functions for efficiency consideration. Then we
jointly train three modules on the remaining texts for better tracking the
answer: the document extraction, the paragraph extraction and the answer
extraction. Experiment results show that the proposed method outperforms the
previous state-of-the-art methods on two large-scale multi-document benchmark
datasets, i.e., TriviaQA and DuReader. In addition, our online system can
stably serve typical scenarios with millions of daily requests in less than
50ms.Comment: Accepted at AAAI 201
PhageTailFinder:A tool for phage tail module detection and annotation
Decades of overconsumption of antimicrobials in the treatment and prevention of bacterial infections have resulted in the increasing emergence of drug-resistant bacteria, which poses a significant challenge to public health, driving the urgent need to find alternatives to conventional antibiotics. Bacteriophages are viruses infecting specific bacterial hosts, often destroying the infected bacterial hosts. Phages attach to and enter their potential hosts using their tail proteins, with the composition of the tail determining the range of potentially infected bacteria. To aid the exploitation of bacteriophages for therapeutic purposes, we developed the PhageTailFinder algorithm to predict tail-related proteins and identify the putative tail module in previously uncharacterized phages. The PhageTailFinder relies on a two-state hidden Markov model (HMM) to predict the probability of a given protein being tail-related. The process takes into account the natural modularity of phage tail-related proteins, rather than simply considering amino acid properties or secondary structures for each protein in isolation. The PhageTailFinder exhibited robust predictive power for phage tail proteins in novel phages due to this sequence-independent operation. The performance of the prediction model was evaluated in 13 extensively studied phages and a sample of 992 complete phages from the NCBI database. The algorithm achieved a high true-positive prediction rate (>80%) in over half (571) of the studied phages, and the ROC value was 0.877 using general models and 0.968 using corresponding morphologic models. It is notable that the median ROC value of 992 complete phages is more than 0.75 even for novel phages, indicating the high accuracy and specificity of the PhageTailFinder. When applied to a dataset containing 189,680 viral genomes derived from 11,810 bulk metagenomic human stool samples, the ROC value was 0.895. In addition, tail protein clusters could be identified for further studies by density-based spatial clustering of applications with the noise algorithm (DBSCAN). The developed PhageTailFinder tool can be accessed either as a web server (http://www.microbiome-bigdata.com/PHISDetector/index/tools/PhageTailFinder) or as a stand-alone program on a standard desktop computer (https://github.com/HIT-ImmunologyLab/PhageTailFinder).</p
PhageTailFinder:A tool for phage tail module detection and annotation
Decades of overconsumption of antimicrobials in the treatment and prevention of bacterial infections have resulted in the increasing emergence of drug-resistant bacteria, which poses a significant challenge to public health, driving the urgent need to find alternatives to conventional antibiotics. Bacteriophages are viruses infecting specific bacterial hosts, often destroying the infected bacterial hosts. Phages attach to and enter their potential hosts using their tail proteins, with the composition of the tail determining the range of potentially infected bacteria. To aid the exploitation of bacteriophages for therapeutic purposes, we developed the PhageTailFinder algorithm to predict tail-related proteins and identify the putative tail module in previously uncharacterized phages. The PhageTailFinder relies on a two-state hidden Markov model (HMM) to predict the probability of a given protein being tail-related. The process takes into account the natural modularity of phage tail-related proteins, rather than simply considering amino acid properties or secondary structures for each protein in isolation. The PhageTailFinder exhibited robust predictive power for phage tail proteins in novel phages due to this sequence-independent operation. The performance of the prediction model was evaluated in 13 extensively studied phages and a sample of 992 complete phages from the NCBI database. The algorithm achieved a high true-positive prediction rate (>80%) in over half (571) of the studied phages, and the ROC value was 0.877 using general models and 0.968 using corresponding morphologic models. It is notable that the median ROC value of 992 complete phages is more than 0.75 even for novel phages, indicating the high accuracy and specificity of the PhageTailFinder. When applied to a dataset containing 189,680 viral genomes derived from 11,810 bulk metagenomic human stool samples, the ROC value was 0.895. In addition, tail protein clusters could be identified for further studies by density-based spatial clustering of applications with the noise algorithm (DBSCAN). The developed PhageTailFinder tool can be accessed either as a web server (http://www.microbiome-bigdata.com/PHISDetector/index/tools/PhageTailFinder) or as a stand-alone program on a standard desktop computer (https://github.com/HIT-ImmunologyLab/PhageTailFinder).</p
- …