(Created page with "{{Software |package_name=MPIBLAST |package_description=Parallel implementation of NCBI BLAST |package_idnumber=55 }} h1. Example 1 - DROSOPH Copy sample problem files into a dir...") |
|||
Line 2: | Line 2: | ||
|package_name=MPIBLAST | |package_name=MPIBLAST | ||
|package_description=Parallel implementation of NCBI BLAST | |package_description=Parallel implementation of NCBI BLAST | ||
− | |package_idnumber=55 | + | |package_idnumber=55}} |
− | }} | + | <u>GETTING STARTED</u> |
− | + | ||
− | + | Mpiblast is not loaded by default on the clusters therefore load the module: | |
− | + | <pre> | |
− | + | module load mpiblast/1.6.0 | |
− | + | </pre> | |
− | + | ||
− | + | <u>EXAMPLE1 - DROSOPH</u> | |
− | + | Copy sample problem files (fasta database and input) into a directory under work: | |
− | + | ||
− | + | ||
− | + | ||
− | + | <pre> | |
+ | mkdir /work/$USER/samples/mpiblast/test1; rm -f /work/$USER/samples/mpiblast/test1/*; cd /work/$USER/samples/mpiblast/test1 | ||
+ | cp /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.in drosoph.in | ||
+ | gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.nt.gz > drosoph.nt | ||
+ | </pre> | ||
− | + | Create hidden configuration file to define a Shared storage location between nodes and a Local storage directory available on each compute node as follows: | |
− | + | <pre> | |
− | + | [username@orc-login1:/work/roberpj/samples/mpiblast/test1] vi .ncbirc | |
− | + | ||
− | + | ||
− | + | [NCBI] | |
+ | Data=/opt/sharcnet/mpiblast/1.6.0/data | ||
+ | [BLAST] | ||
+ | BLASTDB=/scratch/$USER/mpiblasttest1 | ||
+ | BLASTMAT=/work/$USER/samples/mpiblast/test1 | ||
+ | [mpiBLAST] | ||
+ | Shared=/scratch/$USER/mpiblasttest1 | ||
+ | Local=/tmp | ||
+ | </pre> | ||
− | + | Partition the database into 8 fragments to a cluster local scratch storage location: | |
− | + | <pre> | |
+ | mkdir /scratch/$USER/mpiblasttest1; rm -f /scratch/$USER/mpiblasttest1 | ||
+ | cd /work/$USER/samples/mpiblast/test1; mpiformatdb -N 8 -i drosoph.nt -o T -p F -n /scratch/$USER/testmpiblast1 | ||
+ | </pre> | ||
− | + | Submit a short job with a 15m time limit. If all goes well output results will be written to <i>drosoph.out</i> and the execution time will appear in ofile%J where %J is the job number: | |
− | + | <pre> | |
+ | cd /work/$USER/samples/mpiblast/test1 | ||
+ | sqsub -r 15m -n 8 -q mpi -o ofile%J mpiblast -d drosoph.nt -i drosoph.in -p blastn -o drosoph.out --use-parallel-write --use-virtual-frags | ||
+ | </pre> | ||
− | + | Sample output results computed previously with BLASTN 2.2.15 [Oct-15-2006] are included in <i>/opt/sharcnet/mpiblast/1.6.0/examples/ROSOPH.out</i> to compare with. | |
− | + | ||
− | + | ||
− | + | <u>EXAMPLE2 - BIOBREW</u> | |
− | + | This example is provided to show some extra options and switchs that maybe useful for debugging and dealing with larger databases. As before copy sample problem files into a directory under work: | |
− | + | ||
− | + | ||
− | + | ||
− | + | <pre> | |
+ | mkdir /work/$USER/samples/mpiblast/test2; rm -f /work/$USER/samples/mpiblast/test2/*; cd /work/$USER/samples/mpiblast/test2 | ||
+ | cp /opt/sharcnet/mpiblast/1.6.0/examples/il2ra.in il2ra.in | ||
+ | gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/Hs.seq.uniq.gz > Hs.seq.uniq | ||
+ | </pre> | ||
− | + | Create hidden configuration file using the vi editor to define a Shared storage location between nodes and a Local storage directory available on each compute node as | |
+ | follows, where the Data directory is not yet populated or used in this example and hence can be ommitted. If its desired that the Local and Shared directories are the | ||
+ | same (so there is no file copying) then replace <b>--copy-via=mpi</b> with <b>--copy-via=none</b> as shown the two below sqsub commands. | ||
− | + | <pre> | |
+ | [username@orc-login1:/work/roberpj/samples/mpiblast/test1] vi .ncbirc | ||
− | + | [NCBI] | |
− | + | Data=/opt/sharcnet/mpiblast/1.6.0/data | |
+ | [BLAST] | ||
+ | BLASTDB=/work/$USER/mpiblasttest2 | ||
+ | BLASTMAT=/work/$USER/samples/mpiblast/test2 | ||
+ | [mpiBLAST] | ||
+ | Shared=/work/$USER/mpiblasttest2 | ||
+ | Local=/tmp | ||
+ | </pre> | ||
− | + | Partition the database into 16 fragments directly in the work directly: | |
− | + | <pre> | |
+ | mkdir /scratch/$USER/mpiblasttest2; rm -f /scratch/$USER/mpiblasttest2 | ||
+ | cd /work/$USER/samples/mpiblast/test1; mpiformatdb -N 16 -i Hs.seq.uniq -o T -p F | ||
+ | </pre> | ||
− | + | Submit a couple of short jobs 15m time limit. If all goes well output results will be written to <i>biobrewA.out</i> and <i>biobrewB.out</i> | |
+ | and the execution time will appear in corresponding ofile%J's where %J is the job number. | ||
− | + | A) In this submission fragment files are first copied from work to local /tmp before being used (appropriate if work is slow). | |
+ | Usage of the profile option is also shown in this example: | ||
+ | <pre> | ||
+ | cd /work/$USER/samples/mpiblast/test2; rm -f oTime* | ||
+ | sqsub -r 15m -n 18 -q mpi -o ofile%J mpiblast --use-parallel-write --copy-via=mpi -d Hs.seq.uniq -i il2ra.in -p blastn -o biobrew.out --time-profile=oTime | ||
+ | </pre> | ||
− | + | A) In this submission fragment files are used directly from work (file copying set to none). | |
− | + | Usage of the debug option is also shown in this example. | |
− | + | ||
− | + | <pre> | |
− | + | cd /work/$USER/samples/mpiblast/test2; rm -f oLog* | |
+ | sqsub -r 15m -n 18 -q mpi -o ofile%J mpiblast --use-parallel-write --copy-via=none -d Hs.seq.uniq -i il2ra.in -p blastn -o biobrew.out --debug=oLog | ||
+ | </pre> | ||
+ | |||
+ | Finally compare <i>/opt/sharcnet/mpiblast/1.6.0/examples/BIOBREW.out</i> computed previously with BLASTN 2.2.15 [Oct-15-2006] with your newly generated <i>biobrew.out</i> output file to verify the results and submit a ticket if there are any problems! | ||
+ | |||
+ | <u>MPIBLAST BINARIES (command line arguments)</u> | ||
+ | |||
+ | <pre> | ||
+ | [roberpj@orc-login1:/opt/sharcnet/mpiblast/1.6.0/bin] ./mpiblast -help | ||
+ | mpiBLAST requires the following options: -d [database] -i [query file] -p [blast program name] | ||
+ | </pre> | ||
+ | |||
+ | <pre> | ||
+ | [roberpj@orc-login1:/opt/sharcnet/mpiblast/1.6.0/bin] ./mpiformatdb --help | ||
+ | Executing: formatdb - | ||
+ | |||
+ | formatdb 2.2.20 arguments: | ||
− | |||
− | |||
-t Title for database file [String] Optional | -t Title for database file [String] Optional | ||
-i Input file(s) for formatting [File In] Optional | -i Input file(s) for formatting [File In] Optional | ||
Line 87: | Line 131: | ||
T - True: Parse SeqId and create indexes. | T - True: Parse SeqId and create indexes. | ||
F - False: Do not parse SeqId. Do not create indexes. | F - False: Do not parse SeqId. Do not create indexes. | ||
− | + | [T/F] Optional | |
default = F | default = F | ||
-a Input file is database in ASN.1 format (otherwise FASTA is expected) | -a Input file is database in ASN.1 format (otherwise FASTA is expected) | ||
T - True, | T - True, | ||
F - False. | F - False. | ||
− | + | [T/F] Optional | |
default = F | default = F | ||
-b ASN.1 database in binary mode | -b ASN.1 database in binary mode | ||
T - binary, | T - binary, | ||
F - text mode. | F - text mode. | ||
− | + | [T/F] Optional | |
default = F | default = F | ||
-e Input is a Seq-entry [T/F] Optional | -e Input is a Seq-entry [T/F] Optional | ||
Line 103: | Line 147: | ||
-n Base name for BLAST files [String] Optional | -n Base name for BLAST files [String] Optional | ||
-v Database volume size in millions of letters [Integer] Optional | -v Database volume size in millions of letters [Integer] Optional | ||
− | default = | + | default = 4000 |
− | + | ||
-s Create indexes limited only to accessions - sparse [T/F] Optional | -s Create indexes limited only to accessions - sparse [T/F] Optional | ||
default = F | default = F | ||
Line 114: | Line 157: | ||
-F Gifile (file containing list of gi's) [File In] Optional | -F Gifile (file containing list of gi's) [File In] Optional | ||
-B Binary Gifile produced from the Gifile specified above [File Out] Optional | -B Binary Gifile produced from the Gifile specified above [File Out] Optional | ||
+ | -T Taxid file to set the taxonomy ids in ASN.1 deflines [File In] Optional | ||
-N Number of database volumes [Integer] Optional | -N Number of database volumes [Integer] Optional | ||
default = 0 | default = 0 | ||
− | + | </pre> |
Revision as of 16:50, 6 June 2012
MPIBLAST |
---|
Description: Parallel implementation of NCBI BLAST |
SHARCNET Package information: see MPIBLAST software page in web portal |
Full list of SHARCNET supported software |
GETTING STARTED
Mpiblast is not loaded by default on the clusters therefore load the module:
module load mpiblast/1.6.0
EXAMPLE1 - DROSOPH
Copy sample problem files (fasta database and input) into a directory under work:
mkdir /work/$USER/samples/mpiblast/test1; rm -f /work/$USER/samples/mpiblast/test1/*; cd /work/$USER/samples/mpiblast/test1 cp /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.in drosoph.in gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.nt.gz > drosoph.nt
Create hidden configuration file to define a Shared storage location between nodes and a Local storage directory available on each compute node as follows:
[username@orc-login1:/work/roberpj/samples/mpiblast/test1] vi .ncbirc [NCBI] Data=/opt/sharcnet/mpiblast/1.6.0/data [BLAST] BLASTDB=/scratch/$USER/mpiblasttest1 BLASTMAT=/work/$USER/samples/mpiblast/test1 [mpiBLAST] Shared=/scratch/$USER/mpiblasttest1 Local=/tmp
Partition the database into 8 fragments to a cluster local scratch storage location:
mkdir /scratch/$USER/mpiblasttest1; rm -f /scratch/$USER/mpiblasttest1 cd /work/$USER/samples/mpiblast/test1; mpiformatdb -N 8 -i drosoph.nt -o T -p F -n /scratch/$USER/testmpiblast1
Submit a short job with a 15m time limit. If all goes well output results will be written to drosoph.out and the execution time will appear in ofile%J where %J is the job number:
cd /work/$USER/samples/mpiblast/test1 sqsub -r 15m -n 8 -q mpi -o ofile%J mpiblast -d drosoph.nt -i drosoph.in -p blastn -o drosoph.out --use-parallel-write --use-virtual-frags
Sample output results computed previously with BLASTN 2.2.15 [Oct-15-2006] are included in /opt/sharcnet/mpiblast/1.6.0/examples/ROSOPH.out to compare with.
EXAMPLE2 - BIOBREW
This example is provided to show some extra options and switchs that maybe useful for debugging and dealing with larger databases. As before copy sample problem files into a directory under work:
mkdir /work/$USER/samples/mpiblast/test2; rm -f /work/$USER/samples/mpiblast/test2/*; cd /work/$USER/samples/mpiblast/test2 cp /opt/sharcnet/mpiblast/1.6.0/examples/il2ra.in il2ra.in gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/Hs.seq.uniq.gz > Hs.seq.uniq
Create hidden configuration file using the vi editor to define a Shared storage location between nodes and a Local storage directory available on each compute node as follows, where the Data directory is not yet populated or used in this example and hence can be ommitted. If its desired that the Local and Shared directories are the same (so there is no file copying) then replace --copy-via=mpi with --copy-via=none as shown the two below sqsub commands.
[username@orc-login1:/work/roberpj/samples/mpiblast/test1] vi .ncbirc [NCBI] Data=/opt/sharcnet/mpiblast/1.6.0/data [BLAST] BLASTDB=/work/$USER/mpiblasttest2 BLASTMAT=/work/$USER/samples/mpiblast/test2 [mpiBLAST] Shared=/work/$USER/mpiblasttest2 Local=/tmp
Partition the database into 16 fragments directly in the work directly:
mkdir /scratch/$USER/mpiblasttest2; rm -f /scratch/$USER/mpiblasttest2 cd /work/$USER/samples/mpiblast/test1; mpiformatdb -N 16 -i Hs.seq.uniq -o T -p F
Submit a couple of short jobs 15m time limit. If all goes well output results will be written to biobrewA.out and biobrewB.out and the execution time will appear in corresponding ofile%J's where %J is the job number.
A) In this submission fragment files are first copied from work to local /tmp before being used (appropriate if work is slow). Usage of the profile option is also shown in this example:
cd /work/$USER/samples/mpiblast/test2; rm -f oTime* sqsub -r 15m -n 18 -q mpi -o ofile%J mpiblast --use-parallel-write --copy-via=mpi -d Hs.seq.uniq -i il2ra.in -p blastn -o biobrew.out --time-profile=oTime
A) In this submission fragment files are used directly from work (file copying set to none). Usage of the debug option is also shown in this example.
cd /work/$USER/samples/mpiblast/test2; rm -f oLog* sqsub -r 15m -n 18 -q mpi -o ofile%J mpiblast --use-parallel-write --copy-via=none -d Hs.seq.uniq -i il2ra.in -p blastn -o biobrew.out --debug=oLog
Finally compare /opt/sharcnet/mpiblast/1.6.0/examples/BIOBREW.out computed previously with BLASTN 2.2.15 [Oct-15-2006] with your newly generated biobrew.out output file to verify the results and submit a ticket if there are any problems!
MPIBLAST BINARIES (command line arguments)
[roberpj@orc-login1:/opt/sharcnet/mpiblast/1.6.0/bin] ./mpiblast -help mpiBLAST requires the following options: -d [database] -i [query file] -p [blast program name]
[roberpj@orc-login1:/opt/sharcnet/mpiblast/1.6.0/bin] ./mpiformatdb --help Executing: formatdb - formatdb 2.2.20 arguments: -t Title for database file [String] Optional -i Input file(s) for formatting [File In] Optional -l Logfile name: [File Out] Optional default = formatdb.log -p Type of file T - protein F - nucleotide [T/F] Optional default = T -o Parse options T - True: Parse SeqId and create indexes. F - False: Do not parse SeqId. Do not create indexes. [T/F] Optional default = F -a Input file is database in ASN.1 format (otherwise FASTA is expected) T - True, F - False. [T/F] Optional default = F -b ASN.1 database in binary mode T - binary, F - text mode. [T/F] Optional default = F -e Input is a Seq-entry [T/F] Optional default = F -n Base name for BLAST files [String] Optional -v Database volume size in millions of letters [Integer] Optional default = 4000 -s Create indexes limited only to accessions - sparse [T/F] Optional default = F -V Verbose: check for non-unique string ids in the database [T/F] Optional default = F -L Create an alias file with this name use the gifile arg (below) if set to calculate db size use the BLAST db specified with -i (above) [File Out] Optional -F Gifile (file containing list of gi's) [File In] Optional -B Binary Gifile produced from the Gifile specified above [File Out] Optional -T Taxid file to set the taxonomy ids in ASN.1 deflines [File In] Optional -N Number of database volumes [Integer] Optional default = 0