This page gives a
complete command list for running ACE+ via Docker. It shows a
complete session that covers the training of ACE+ models, building a
personalized genome, and running the ACE+ models on that personalized
genome. This series of commands has been confirmed to run without error in Docker version 17.03.1-ce, build c6d412e.
The output format for ACE+, and utilities for converting and parsing ACE+ output, are described on the original ACE web page. ACE+ is described on a separate page.As a prerequisite, you must have Docker installed, and have access to a UNIX terminal window and the ability to download remote files via wget. Once you have the Docker daemon running, the following commands should work on any system. # This command
downloads the 3GB ACE+ image for Docker:
docker pull genezilla/aceplus # This command makes a convenient environment variable, $DOCKER, which will simplify the following commands. Please change "/home/bmajoros" to your working directory: export DOCKER="docker run -w /root -v /home/bmajoros:/root -it genezilla/aceplus" # This command downloads the human genome in 2bit format: wget # This command extracts chromosome lengths from the 2bit genome file: $DOCKER twoBitInfo hg19.2bit hg19.lengths # This command downloads a set of annotations for chromosome 22: wget # This command downloads a VCF file of genetic variants for chr22: wget # These commands re-zip the VCF file using bgzip and runs tabix to index the VCF file for efficient extract of variants within genes: gunzip chr22.vcf.gz $DOCKER bgzip chr22.vcf $DOCKER tabix chr22.vcf.gz # These commands download a set of training genes: wget gunzip iso0-100.fasta.gz wget gunzip iso0-100.gff.gz # This command downloads a substitution matrix for alignment: wget # This command downloads a list of individuals for which we wish to construct personal genomes: wget # This command downloads a sample configuration file: wget # These commands create a personalized genome for individual HG00096: mkdir genomes $DOCKER aceplus.config chr22.gff genomes # This command extracts positive training features: $DOCKER iso0-100.gff iso0-100.fasta GT,GC,AT AG,AC ATG TGA,TAA,TAG # This command extracts negative training features: $DOCKER # These commands create model files for start and stop codons: $DOCKER WMM start-codons.fasta non-start-codons.fasta start-codons-12bp 1 ATG 6 3 12 0.95 0 30 0 0 $DOCKER WMM start-codons.fasta non-start-codons.fasta start-codons-3bp 1 ATG 0 3 3 0.95 0 30 0 0 $DOCKER WMM stop-codons.fasta non-stop-codons.fasta stop-codons-3bp 1 ATG 0 3 3 0.95 0 30 0 0 # These commands train donor and acceptor splice-site models: $DOCKER donors.fasta non-donors.fasta 6 12 0.5 GT,GC,AT GT > donors.model $DOCKER acceptors.fasta non-acceptors.fasta 20 2 0.5 AG,AC AG > acceptors.model # This command extracts training features for the exon definition model: $DOCKER internal-exons.fasta introns.fasta > hex-counts.txt # These commands subset the training data to make training faster -- this is only for illustration purposes: in practice you should use at least 10,000 exons and 10,000 introns, and then the logistic regression will take around 12 hours to run. head -n 1000 hex-counts.txt > tmp.1 tail -n 1000 hex-counts.txt > tmp.2 mv hex-counts.txt hex-counts.txt.bak cat tmp.1 tmp.2 > hex-counts.txt # This command trains the exon definition hexamer weights: $DOCKER logistic-regression.R hex-counts.txt 0.5 betas.txt # These commands install the hexamer weights into exon and intron model files: $DOCKER betas.txt EXON exon.model $DOCKER betas.txt INTRON intron.model # This command downloads an intergenic model that can be used for any organism (it does not affect ACE+ predictions, but is still needed): wget mv intergenic.model intergenic0-43.binmod # This command runs ACE+, producing its output into aceplus.essex: $DOCKER aceplus.config genomes/ref-1.fasta genomes/HG00096-1.fasta genomes/local.gff 0 aceplus.essex