Human immunodeficiency virus type 1 (HIV-1) sequences are accumulating in the literature at a rapid pace. For this ever-expanding resource to be maximally useful, it is critical that researchers strive to maintain a high level of quality assurance, both in experimental design and conduct and in analyses. Here we present detailed analyses of problematic sets of HIV-1 sequences in the database that include sequence anomalies suggestive of mislabeling or sample contamination problems. These data are examined in the context of currently available HIV-1 sequence information to provide an example of how to identify potentially flawed data. Indicators of potential problems with sequences are (i) sequences that are nearly identical that are supposed to be derived from unlinked individuals and that are markedly distinct from other sequences from the putative source or (ii) sequences that are nearly identical to those of laboratory strains. We provide an outline of methods that researchers can use to perform preliminary laboratory and computational analyses that could help identify problematic data and thus help ensure the integrity of sequence databases. The amount of human immunodeficiency virus type 1 (HIV-1) nucleotide sequence data being submitted to the Gen-Bank database is a fair indicator of the amount of sequence data being generated from HIV-1 over time. On the basis o
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.