MPIBLAST |
---|
Description: Parallel implementation of NCBI BLAST |
SHARCNET Package information: see MPIBLAST software page in web portal |
Full list of SHARCNET supported software |
h1. Example 1 - DROSOPH
Copy sample problem files into a directory under work. Use 1.5.0 or 1.6.0 depending on which cluster you are on:
mkdir /work/$USER/testmpiblast1; cd /work/$USER/testmpiblast1 cp /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.in drosoph.in gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.aa.gz > drosoph.aa gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.nt.gz > drosoph.nt
Create hidden configuration file which defines shared storage location between nodes and a local storage directory on each compute node as follows
cd /work/$USER/testmpiblast1 echo "[mpiBLAST]" > .ncbirc echo "Shared=/scratch/$USER/testmpiblast1" >> .ncbirc echo "Local=/tmp" >> .ncbirc
Create the shared directory under scratch where the partitioned database will be stored. Note that files under scratch will eventually expire and be deleted automatically by the system.
mkdir /scratch/$USER/testmpiblast1
From _/work/$USER/testmpiblast1_ execute the following command to partition the database. After it completes verify the partition files were created in the shared scratch directory. For this example choosing N=32 doubles the execution time compared to N=16. The choice of N should be therefore carefully chosen based on scaling tests.
Version 1.5.0 clusters run: mpiformatdb.sh "-N 16 -i drosoph.nt -o T -p F" Version 1.6.0 clusters run: mpiformatdb -N 16 -i drosoph.nt -o T -p F
Submit a short test job to the queue with a 15m time limit. If all goes well output results will be written to _drosoph.out_ and the total execution wall time will be approximately 3 seconds.
sqsub -t -r 15m -n 16 -q mpi -o ofile%J mpiblast -d drosoph.nt -i drosoph.in -p blastn -o drosoph.out --removedb
Sample output is included in _/opt/sharcnet/mpiblast/current/examples/ROSOPH.out_ to compare your _drosoph.out_ output file with it.
h1. Example 2 - BIOBREW
Copy sample problem files into a directory under work.
mkdir /work/$USER/testmpiblast2; cd /work/$USER/testmpiblast2 cp /opt/sharcnet/mpiblast/1.6.0/examples/il2ra.in il2ra.in gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/Hs.seq.uniq.gz > Hs.seq.uniq
Create hidden configuration file which defines shared storage location between nodes and a local storage directory on each compute node as follows:
cd /work/$USER/testmpiblast2 echo "[mpiBLAST]" > .ncbirc echo "Shared=/work/$USER/mpiformatdbs/testmpiblast2" >> .ncbirc echo "Local=/tmp" >> .ncbirc
Create the shared directory under work where formated databases will be stored. In this example the database is saved under work for long term retention and sharing.
mkdir /work/$USER/mpiformatdbs; mkdir /work/$USER/mpiformatdbs/testmpiblast2
From _/work/$USER/testmpiblast2_ execute the following command to partition the database. After it completes verify the database files were created in the shared work directory. Note that doubling N to 32 in this examples improves the performance by only 10% and therefore is not practical.
Version 1.5.0 clusters run: mpiformatdb.sh "-N 16 -i Hs.seq.uniq -o T -p F" Version 1.6.0 clusters run: mpiformatdb -N 16 -i Hs.seq.uniq -o T -p F
Submit a short test job to the queue with a 15m time limit. If all goes well output results will be written to biobrew.out and the total execution wall time will be approximately 30 seconds.
sqsub -t -r 55m -n 16 -q mpi -o ofile%J mpiblast -p blastn -d Hs.seq.uniq -i il2ra.in -o biobrew.out
Sample output is included in _/opt/sharcnet/mpiblast/1.6.0/examples/BIOBREW.out_ to compare your _biobrew.out_ output file with.
h1. MPIBLAST Command Line Arguments
mpiblast.sh --help -p [blast program name] -d [database] -i [query file]
mpiformatdb.sh --help formatdb 2.2.15 arguments: -t Title for database file [String] Optional -i Input file(s) for formatting [File In] Optional -l Logfile name: [File Out] Optional default = formatdb.log -p Type of file T - protein F - nucleotide [T/F] Optional default = T -o Parse options T - True: Parse SeqId and create indexes. F - False: Do not parse SeqId. Do not create indexes. [T/F] Optional default = F -a Input file is database in ASN.1 format (otherwise FASTA is expected) T - True, F - False. [T/F] Optional default = F -b ASN.1 database in binary mode T - binary, F - text mode. [T/F] Optional default = F -e Input is a Seq-entry [T/F] Optional default = F -n Base name for BLAST files [String] Optional -v Database volume size in millions of letters [Integer] Optional default = 0 range from 0 to <NULL> -s Create indexes limited only to accessions - sparse [T/F] Optional default = F -V Verbose: check for non-unique string ids in the database [T/F] Optional default = F -L Create an alias file with this name use the gifile arg (below) if set to calculate db size use the BLAST db specified with -i (above) [File Out] Optional -F Gifile (file containing list of gi's) [File In] Optional -B Binary Gifile produced from the Gifile specified above [File Out] Optional -N Number of database volumes [Integer] Optional default = 0 range from 1 to 250