5 research outputs found
The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens
Background The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Results Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. Conclusion We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.Peer reviewe
The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens
BackgroundThe Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function.ResultsHere, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory.ConclusionWe conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.</p
Recommended from our members
Efficient Storage Design in Log-Structured Merge (LSM) Tree Databases
In this cloud era, data is being generated rapidly from billions of network users, mobile devices, social networks, sensors, and many other devices and applications. Compared to traditional relational databases which were optimized for read-heavy workloads, many modern NoSQL database systems choose log-structured merge (LSM) architectures to support high write throughput, including AsterixDB, Bigtable, Cassandra, Dynamo, HBase, LevelDB, and RocksDB. My research interests focus on the architectural design and optimization of the storage engines of such LSM systems. Specifically, my thesis targets three aspects: merge policies, spatial data, and partitioning.First, a merge policy, also known as compaction strategy, is a critical component of an LSM system. It defines how data is organized on disk and highly affects the system's read and write performance as well as space utilization. Five state-of-the-art merge policies from existing LSM systems, including Bigtable, Constant, Exploring, Tiered, and Leveled, with two recently proposed policies, Binomial and MinLatency, are selected for comparison and evaluation of write, read and transient space amplification. We build and experimentally compare all these policies on the same platform. The experimental results show these new policies outperform the other strategies, as they offer a better trade-off between write and read amplification. Second, most of the existing LSM systems are optimized only for single dimensional data, that is, they lack support for spatial indexes for spatial queries. To support spatial indexes, an LSM system must either index spatial data by mapping the spatial keys into single dimensional keys or provide native support for a secondary LSM R-tree index. Using an OpenStreetMap dataset and a synthetic dataset, we experimentally compare LSM R-tree indexes with four different merge policies: Concurrent, Binomial, Tiered, and Leveled (with three partitioning algorithms). We discuss our observations and recommendations with respect to the merge policy, comparator, and partitioning in Leveled policy.Third, the incremental merge style of the Leveled policy makes it possible to break a big merge into multiple small sub-merges via partitioning. For certain workloads, such as sequential insertions, Leveled policy supports trivial-moves, where a whole partition is moved to the next level without any processing. Such features are missing from stack-based merge policies, such as Tiered, which often have many time-consuming large merges, and have no effective support for trivial moves to minimize disk I/O. We propose a novel global-range partitioning algorithm for stack-based merge policies to 1) improve the parallelism of merges to improve the overall write throughput; 2) increase opportunities for trivial-moves; and 3) enable a hybrid of stack-based and leveled merge policies
Crystal Structures and Binary Molten Solid–Liquid Equilibria of <i>tert</i>-Butylmethylphenol Isomers
Single crystals of 2-tert-butyl-5-methylphenol
anhydrate and 2-tert-butyl-5-methylphenol quarterhydrate
were prepared and presented for the first time in this work. The structures
were characterized by single-crystal X-ray diffraction and DSC analysis.
The solid–liquid equilibrium (SLE) for 2-tert-butyl-4-methylphenol with 2-tert-butyl-5-methylphenol
anhydrate or 2-tert-butyl-5-methylphenol quarterhydrate
was studied by the cooling–heating recycling method using a
synthetic visual technique at atmospheric pressure (101.6 ± 1.2
kPa). The experimental SLE data for the two binary systems were reported,
and both systems showed simple eutectic behavior. The SLE data were
further correlated by Wilson and NRTL (nonrandom two-liquid) models,
and the optimally fitted parameters of the two systems were presented.
Computational studies on geometric optimization and energy calculation
were performed using density functional theory, and the lower energy
configuration of 2-tert-butyl-5-methylphenol quarterhydrate
could explain the spontaneous incorporation of water in the anhydrous
form. These novel data provide valuable information in designing and
optimizing the melt crystallization process of tert-butylmethylphenol isomers