8 research outputs found

    IntOGen - Pipeline

    No full text
    Requirements: IntOGen depends on **Python 3.4** or above and some python libraries. If you don't have Python 3.4 already installed, the easiest way to install all this software stack is using the well known [Anaconda Python distribution](http://continuum.io/downloads#34). /nAlso **Perl 5.10** (with DBI module installed) or above has to be available at PATH to be able to run VEP scripts./n By default MutsigCV is disabled. If you want to enable it you have to first download and install [Matlab Runtime](http://es.mathworks.com/products/compiler/mcr/) and MutsigCV](https://www.broadinstitute.org/cancer/cga/mutsig) and then edit the IntOGen configuration file that by default it's at /.intogen/system.conf (parameters: mutsig_enabled, mutsig_path and matlab_mcr) /nInstallation: To install or update to the last stable version of IntOGen you need to run: /n pipinstallintogenpandas=0.17/nAfterthisyouwillhavethe‘intogen‘scriptavailableatyourpathandifthisisthefirsttimethatyouinstallIntOGenyouneedtorunthesetuptodownloadallthedatadependencies.Thissetupwilldownload3.6Gbofdatathatafteruncompressitwillneed9Gboffreespace./n pip install intogen pandas=0.17/nAfter this you will have the `intogen` script available at your path and if this is the first time that you install IntOGen you need to run the setup to download all the data dependencies. This setup will download 3.6Gb of data that after uncompress it will need 9Gb of free space. /n intogen --setup/n**TIP**: By default the IntOGen configuration files are in `/.intogen` if you want to change this folder you need to define/nthe system environment variable **INTOGEN_HOME** using the `export` command. Also, all the datasets are downloaded by/ndefault at `/.bgdata` if you want to change this folder you need to define the system environment variable **BGDATA_LOCAL**./nRun an example:/nDownload and extract some samples VCF files:/n wgethttps://bitbucket.org/intogen/intogen−pipeline/downloads/intogen−samples.tar.gz/n wget https://bitbucket.org/intogen/intogen-pipeline/downloads/intogen-samples.tar.gz/n tar xvzf intogen-samples.tar.gz /nRun IntOGen using the default tasks configuration./n intogen−isample1.vcf−isample2.vcf−isample3.vcf−isample4.vcf/nBrowsetheresultsatthe‘output‘folder./n/nCustomconfiguration:/nAt‘/.intogen/task.conf‘youcancheckthedefaulttaskconfigurationvalues.Ifyouwanttorunthepipeline/nusingdifferentparametersyoucanchangethedefaultvaluesorcreatea‘.smconfig‘fileforeachproject./nThe‘.smconfig‘filesareacopyof‘/.intogen/task.conf‘butadding‘id‘and‘files‘parameters.The‘id‘isthename/noftheprojectandthe‘files‘isalistseparatedbycommaofallthefiles(MAF,VCFortabformat)thatcontain/nsamplesforthatproject./nYoucancreatea‘.smconfig‘filelikethis:/n intogen -i sample1.vcf -i sample2.vcf -i sample3.vcf -i sample4.vcf /nBrowse the results at the `output` folder./n /nCustom configuration:/nAt `/.intogen/task.conf` you can check the default task configuration values. If you want to run the pipeline /nusing different parameters you can change the default values or create a `.smconfig` file for each project. /nThe `.smconfig` files are a copy of `/.intogen/task.conf` but adding `id` and `files` parameters. The `id` is the name /nof the project and the `files` is a list separated by comma of all the files (MAF, VCF or tab format) that contain /nsamples for that project. /nYou can create a `.smconfig` file like this:/n echo -e "id = allsamples/nfiles = sample1.vcf,sample2.vcf,sample3.vcf,sample4.vcf/n" > allsamples.smconfig/n cat/.intogen/task.conf>>allsamples.smconfig/nTorunitagain,youneedtodeleteormovethepreviousoutputandrunusingthe‘.smconfig‘fileasinput./n cat /.intogen/task.conf >> allsamples.smconfig/nTo run it again, you need to delete or move the previous output and run using the `.smconfig` file as input./n rm -rf output/n $ intogen -i allsamples.smconfig /nIf you want to run multiple projects at once you can create multiple `.smconfig` files in one folder and then give that/nfolder as input.Analyses somatic mutations in thousands of tumor genomes to identify cancer driver genes
    corecore