makesamplingmatrix

Abstract

This script accesses a directory, and traverses all FASTA files in it, recording the names of all taxa present in each file. Then it creates a tab-delimited file containing a matrix where the rows represent the taxa and the columns the FASTA files. The intended use is for a directory containing a set of FASTA files each corresponding to a single locus, and containing homologous sequences of that locus for different taxa. The script will record a 1 in the resulting matrix if a taxon is present in a locus file, or a 0 if not. Key point: the script does not intelligently differentiate FASTA files from other types, and it will attempt to parse any file in the directory. For this reason, you should remove all other files before you run the script. It will create (or overwrite!) a file in the passed directory called 'sampling_matrix.txt' that may be opened in any conventional spreadsheet or text-editor app. This file should be in the proper format for use in the Decisivator application. This script requires BioPython to be installed

    Similar works