<p>Sequence based phylogenetic analyses of 47 gene families identified in an analysis of conserved synteny around somatostatin receptor gene-bearing chromosome regions. For each gene family amino acid sequences were predicted from the Ensembl genome browser (http://www.ensembl.org) and used to create sequence alignments and phylogenetic trees. Gene families were defined based on Ensembl protein family predictions. Database identifiers, location data, genome assembly information and annotation notes for all identified protein families and sequences are included in 'Supplemental Table 2.xlsx' and 'Supplemental Table 3.xlsx' (Excel spreadsheets). </p>
<p>File information: </p>
<p>Gene families are identified by unique abbreviations based on approved HUGO Gene Nomenclature Committe (HGNC) gene symbols, or known aliases from the NCBI Entrez Gene database. For each gene family an alignment file '...align.fasta', a neighbor joining tree '...NJ_rooted.phb' and a phylogenetic maximum likelihood tree '...PhyML_rooted.phb' are included. </p>
<p>Alignments are included in FASTA format with the extension '.fasta'. This file format can be opened by most sequence analysis applications as well as text editors. Alignments were created using the ClustalWS sequence alignment program with standard settings (Gonnet weight matrix, gap opening penalty 10.0 and gap extension penalty 0.20) through the JABAWS 2 tool in Jalview 2.7 (http://www.jalview.org/).</p>
<p>Phylogenetic tree files are included in Phylip/Newick format with the extension '.phb'. This file format can be opened by freely available phylogenetic tree viewers such as FigTree (http://tree.bio.ed.ac.uk/software/figtree/) and TreeView (http://darwin.zoology.gla.ac.uk/~rpage/treeviewx/). The phylogenetic analyses were carried out based on the included alignments using bootstrap-supported neighbor joining (NJ) as well as phylogenetic maximum likelihood (PhyML) methods. Phylogenetic trees are rooted with identified <em>Drosophila melanogaster </em>(fruit fly) sequences, or with identified <em>Ciona intestinalis</em> or <em>Ciona savignyi</em> (tunicates), <em>Branchiostoma floridae </em>(Florida lancelet, amphioxus), or <em>Caenorhabditis elegans</em> (nematode) sequences if no fruit fly sequence could be found. </p>
The NJ trees are supported by non-parametric bootstrap analyses with 1000 replicates, applied through ClustalX 2.0 (http://www.clustal.org/clustal2/) with standard settings. The PhyML trees are supported by non-parametric bootstrap analyses with 100 replicates made using the PhyML 3.0 algorithm (http://www.atgc-montpellier.fr/phyml/) with the following settings: amino acid frequencies (equilibrium frequencies), proportion of invariable sites (with optimised p-invar) and gamma-shape parameters were estimated from the datasets; the number of substitution rate categories was set to 8; BIONJ was chosen to create the starting tree and the nearest neighbor interchange (NNI) tree improvement method was used to estimate the best topology; both tree topology and branch length optimization were chosen. The LG model of amino acid substitution, which is standard for PhyML 3.0, was chosen.
Species abbreviations are applied as follows:
<em>Homo sapiens</em> (Hsa, human), <em>Mus musculus</em> (Mmu, mouse), <em>Canis familiaris</em> (Cfa, dog), <em>Monodelphis domestica</em> (Mdo, grey short-tailed opossum), <em>Macropus eugenii</em> (Meu, tammar wallaby), <em>Ornitorhynchus anatinus</em> (Oan, platypus), <em>Gallus gallus</em> (Gga, chicken), <em>Taeniopygia guttata</em> (Tgu, zebra finch), <em>Meleagris gallopavo</em> (Mga, turkey), <em>Anolis carolinensis</em> (Aca, Carolina anole lizard), <em>Silurana (Xenopus) tropicalis</em> (Xtr, Western clawed frog), <em>Danio rerio</em> (Dre, zebrafish), <em>Oryzias latipes</em> (Ola, medaka), <em>Gasterosteus aculeatus</em> (Gac, three-spined stickleback), <em>Tetraodon nigroviridis</em> (Tni, green spotted pufferfish), <em>Takifugu rubripes</em> (Tru, fugu), <em>Ciona intestinalis</em> (Cin, tunicate), <em>Ciona savignyi</em> (Csa, tunicate), <em>Branchiostoma floridae</em> (Bfl, amphioxus), <em>Caenorhabditis elegans</em> (Cel, nematode) and <em>Drosophila melanogaster</em> (Dme, fruit fly).
The following gene families are included in this file set:
ABHD12: Abhydrolase domain containing 12
CFL: Cofilin and destrin (actin depolymerizing factor)
FLRT: Fibronectin leucine rich transmembrane protein
FOXA: Forkhead box A
ISM: Isthmin homolog
JAG: Jagged
NIN: Ninein (GSK3B interacting protein)
NKX2: NK2 homeobox 1 and 4
PAX: Paired box 1 and 9
PYG: Glycogen phosphorylase; brain, liver and muscle variants
RALGAPA: Ral GTPase activating protein, alpha subunit
RIN: Ras and Rab interactor
SEC23: Sec23 homologs A and B
SLC24A: Solute carrier family 24 members 3 and 4
SNX: Sorting nexin 5, 6 and 32
SPTLC: Serine palmitoyltransferase, long chain base subunit 2 and 3
VSX: Visual system homeobox
ADAP: ArfGAP with dual PH domains
ATP2A: ATPase, Ca++ transporting, cardiac muscle, fast twitch
C1QTNF: C1q and tumor necrosis factor related protein
CABP: Calcium binding protein 1, 3, 4 and 5
CACNA1: Calcium channel, voltage dependent, T type alpha subunit
CREBBP: CREB binding protein
CYTH: Cytohesin
FAM20: Family with sequence similarity 20
FNG: Fringe homolog
FSCN: Fascin homolog 1 and 2, actin-bundling protein
GLPR: Glucagon, glucagon-like and gastric inhibitory polypeptide receptors
GGA: Golgi-associated, gamma adapting ear containing, ARF-binding protein
GRIN2: Glutamate receptor, ionotropic, N-methyl D-aspartate 2
KCNJ: Potassium inwardly-rectifying channel, subfamily J member 2, 4, 12 and 14
KCTD: Potassium channel tetramerisation domain containing 2, 5 and 17
METRN: Meteorin, glial cell differentiation regulator
NDE: nudE nuclear distribution gene E homolog
RAB11FIP: RAB11 family interacting protein 3 and 4 (class II)
RADIL: Ras association and DIL domains/Ras interacting protein
RHBDF: Rhomboid 5 homolog
RHOT: Ras homolog gene family, member T1 and T2
RPH3A: Rabphilin 3A homolog/double C2-like domains, alpha
SDK: Sidekick cell adhesion molecule
SOX: Sex-determining region Y-box 8, 9 and 10
TEX2: Testis expressed 2
TNRC6: Trinucleotide repeat containing 6
TOM1: Target of myb1
TTYH: Tweety homolog
USP: Ubiquitin specific peptidase 31 and 43
WFIKKN: WAP, follistatin/kazal, immunoglobulin, kunitz and netrin domain contanin