Search CORE

2,067 research outputs found

Optimizing Performance of Hadoop with Parameter Tuning

Author: Cheng Chen
Guang-Rui Li
Si-Yu Liu
Xiang Chen
Yi Liang
Publication venue: 'EDP Sciences'
Publication date: 01/01/2017
Field of study

Optimizing Hadoop with the parameter tuning is an effective way to greatly improve the performance, but it usually costs too much time to identify the optimal parameters configuration because there are many parameters. Users are always blindly adjust too many parameters and are sometimes confused about which one could be changed at a higher-priority. To make optimization easier, classifying the parameter based on different applications will be helpful. In this paper, we will introduce a method that can classify these parameters in order that users can optimize performance more quickly and effectively for different applications

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

A Deep Cascade Model for Multi-Document Reading Comprehension

Author: Bi Bin
Chen Haiqing
Si Luo
Wang Rui
Wang Wei
Wu Chen
Xia Jiangnan
Yan Ming
Zhang Ji
Zhao Zhongzhou
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 27/11/2018
Field of study

A fundamental trade-off between effectiveness and efficiency needs to be balanced when designing an online question answering system. Effectiveness comes from sophisticated functions such as extractive machine reading comprehension (MRC), while efficiency is obtained from improvements in preliminary retrieval components such as candidate document selection and paragraph ranking. Given the complexity of the real-world multi-document MRC scenario, it is difficult to jointly optimize both in an end-to-end system. To address this problem, we develop a novel deep cascade learning model, which progressively evolves from the document-level and paragraph-level ranking of candidate texts to more precise answer extraction with machine reading comprehension. Specifically, irrelevant documents and paragraphs are first filtered out with simple functions for efficiency consideration. Then we jointly train three modules on the remaining texts for better tracking the answer: the document extraction, the paragraph extraction and the answer extraction. Experiment results show that the proposed method outperforms the previous state-of-the-art methods on two large-scale multi-document benchmark datasets, i.e., TriviaQA and DuReader. In addition, our online system can stably serve typical scenarios with millions of daily requests in less than 50ms.Comment: Accepted at AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

PhageTailFinder:A tool for phage tail module detection and annotation

Author: Chen Chuangeng
Gan Rui
Ren Chunyan
Si Yu
Wu Jiqiu
Yang Han
Yu Ling
Zhang Fan
Zhou Fengxia
Publication venue
Publication date: 23/01/2023
Field of study

Decades of overconsumption of antimicrobials in the treatment and prevention of bacterial infections have resulted in the increasing emergence of drug-resistant bacteria, which poses a significant challenge to public health, driving the urgent need to find alternatives to conventional antibiotics. Bacteriophages are viruses infecting specific bacterial hosts, often destroying the infected bacterial hosts. Phages attach to and enter their potential hosts using their tail proteins, with the composition of the tail determining the range of potentially infected bacteria. To aid the exploitation of bacteriophages for therapeutic purposes, we developed the PhageTailFinder algorithm to predict tail-related proteins and identify the putative tail module in previously uncharacterized phages. The PhageTailFinder relies on a two-state hidden Markov model (HMM) to predict the probability of a given protein being tail-related. The process takes into account the natural modularity of phage tail-related proteins, rather than simply considering amino acid properties or secondary structures for each protein in isolation. The PhageTailFinder exhibited robust predictive power for phage tail proteins in novel phages due to this sequence-independent operation. The performance of the prediction model was evaluated in 13 extensively studied phages and a sample of 992 complete phages from the NCBI database. The algorithm achieved a high true-positive prediction rate (>80%) in over half (571) of the studied phages, and the ROC value was 0.877 using general models and 0.968 using corresponding morphologic models. It is notable that the median ROC value of 992 complete phages is more than 0.75 even for novel phages, indicating the high accuracy and specificity of the PhageTailFinder. When applied to a dataset containing 189,680 viral genomes derived from 11,810 bulk metagenomic human stool samples, the ROC value was 0.895. In addition, tail protein clusters could be identified for further studies by density-based spatial clustering of applications with the noise algorithm (DBSCAN). The developed PhageTailFinder tool can be accessed either as a web server (http://www.microbiome-bigdata.com/PHISDetector/index/tools/PhageTailFinder) or as a stand-alone program on a standard desktop computer (https://github.com/HIT-ImmunologyLab/PhageTailFinder).</p

ARTS repository - University of Groningen

PhageTailFinder:A tool for phage tail module detection and annotation

Author: Chen Chuangeng
Gan Rui
Ren Chunyan
Si Yu
Wu Jiqiu
Yang Han
Yu Ling
Zhang Fan
Zhou Fengxia
Publication venue
Publication date: 23/01/2023
Field of study

ARTS repository - University of Groningen