Difference between revisions of "MPIBLAST"

Revision as of 16:50, 6 June 2012

MPIBLAST
Description: Parallel implementation of NCBI BLAST
SHARCNET Package information: see MPIBLAST software page in web portal
Full list of SHARCNET supported software

GETTING STARTED

Mpiblast is not loaded by default on the clusters therefore load the module:

module load mpiblast/1.6.0

EXAMPLE1 - DROSOPH

Copy sample problem files (fasta database and input) into a directory under work:

mkdir /work/$USER/samples/mpiblast/test1; rm -f /work/$USER/samples/mpiblast/test1/*; cd /work/$USER/samples/mpiblast/test1
cp /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.in drosoph.in
gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.nt.gz > drosoph.nt

Create hidden configuration file to define a Shared storage location between nodes and a Local storage directory available on each compute node as follows:

[username@orc-login1:/work/roberpj/samples/mpiblast/test1] vi .ncbirc

[NCBI]
Data=/opt/sharcnet/mpiblast/1.6.0/data
[BLAST]
BLASTDB=/scratch/$USER/mpiblasttest1
BLASTMAT=/work/$USER/samples/mpiblast/test1
[mpiBLAST]
Shared=/scratch/$USER/mpiblasttest1
Local=/tmp

Partition the database into 8 fragments to a cluster local scratch storage location:

mkdir /scratch/$USER/mpiblasttest1; rm -f /scratch/$USER/mpiblasttest1
cd /work/$USER/samples/mpiblast/test1; mpiformatdb -N 8 -i drosoph.nt -o T -p F -n /scratch/$USER/testmpiblast1

Submit a short job with a 15m time limit. If all goes well output results will be written to drosoph.out and the execution time will appear in ofile%J where %J is the job number:

cd /work/$USER/samples/mpiblast/test1
sqsub -r 15m -n 8 -q mpi -o ofile%J mpiblast -d drosoph.nt -i drosoph.in -p blastn -o drosoph.out --use-parallel-write --use-virtual-frags

Sample output results computed previously with BLASTN 2.2.15 [Oct-15-2006] are included in /opt/sharcnet/mpiblast/1.6.0/examples/ROSOPH.out to compare with.

EXAMPLE2 - BIOBREW

This example is provided to show some extra options and switchs that maybe useful for debugging and dealing with larger databases. As before copy sample problem files into a directory under work:

mkdir /work/$USER/samples/mpiblast/test2; rm -f /work/$USER/samples/mpiblast/test2/*; cd /work/$USER/samples/mpiblast/test2
cp /opt/sharcnet/mpiblast/1.6.0/examples/il2ra.in il2ra.in
gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/Hs.seq.uniq.gz > Hs.seq.uniq

Create hidden configuration file using the vi editor to define a Shared storage location between nodes and a Local storage directory available on each compute node as follows, where the Data directory is not yet populated or used in this example and hence can be ommitted. If its desired that the Local and Shared directories are the same (so there is no file copying) then replace --copy-via=mpi with --copy-via=none as shown the two below sqsub commands.

[username@orc-login1:/work/roberpj/samples/mpiblast/test1] vi .ncbirc

[NCBI]
Data=/opt/sharcnet/mpiblast/1.6.0/data
[BLAST]
BLASTDB=/work/$USER/mpiblasttest2
BLASTMAT=/work/$USER/samples/mpiblast/test2
[mpiBLAST]
Shared=/work/$USER/mpiblasttest2
Local=/tmp

Partition the database into 16 fragments directly in the work directly:

mkdir /scratch/$USER/mpiblasttest2; rm -f /scratch/$USER/mpiblasttest2
cd /work/$USER/samples/mpiblast/test1; mpiformatdb -N 16 -i Hs.seq.uniq -o T -p F

Submit a couple of short jobs 15m time limit. If all goes well output results will be written to biobrewA.out and biobrewB.out and the execution time will appear in corresponding ofile%J's where %J is the job number.

A) In this submission fragment files are first copied from work to local /tmp before being used (appropriate if work is slow). Usage of the profile option is also shown in this example:

cd /work/$USER/samples/mpiblast/test2; rm -f oTime*
sqsub -r 15m -n 18 -q mpi -o ofile%J mpiblast  --use-parallel-write --copy-via=mpi -d Hs.seq.uniq -i il2ra.in -p blastn -o biobrew.out --time-profile=oTime

A) In this submission fragment files are used directly from work (file copying set to none). Usage of the debug option is also shown in this example.

cd /work/$USER/samples/mpiblast/test2; rm -f oLog*
sqsub -r 15m -n 18 -q mpi -o ofile%J mpiblast --use-parallel-write --copy-via=none -d Hs.seq.uniq -i il2ra.in -p blastn -o biobrew.out --debug=oLog

Finally compare /opt/sharcnet/mpiblast/1.6.0/examples/BIOBREW.out computed previously with BLASTN 2.2.15 [Oct-15-2006] with your newly generated biobrew.out output file to verify the results and submit a ticket if there are any problems!

MPIBLAST BINARIES (command line arguments)

[roberpj@orc-login1:/opt/sharcnet/mpiblast/1.6.0/bin] ./mpiblast -help
mpiBLAST requires the following options: -d [database] -i [query file] -p [blast program name]

[roberpj@orc-login1:/opt/sharcnet/mpiblast/1.6.0/bin] ./mpiformatdb --help
Executing: formatdb - 

formatdb 2.2.20   arguments:

  -t  Title for database file [String]  Optional
  -i  Input file(s) for formatting [File In]  Optional
  -l  Logfile name: [File Out]  Optional
    default = formatdb.log
  -p  Type of file
         T - protein   
         F - nucleotide [T/F]  Optional
    default = T
  -o  Parse options
         T - True: Parse SeqId and create indexes.
         F - False: Do not parse SeqId. Do not create indexes.
 [T/F]  Optional
    default = F
  -a  Input file is database in ASN.1 format (otherwise FASTA is expected)
         T - True, 
         F - False.
 [T/F]  Optional
    default = F
  -b  ASN.1 database in binary mode
         T - binary, 
         F - text mode.
 [T/F]  Optional
    default = F
  -e  Input is a Seq-entry [T/F]  Optional
    default = F
  -n  Base name for BLAST files [String]  Optional
  -v  Database volume size in millions of letters [Integer]  Optional
    default = 4000
  -s  Create indexes limited only to accessions - sparse [T/F]  Optional
    default = F
  -V  Verbose: check for non-unique string ids in the database [T/F]  Optional
    default = F
  -L  Create an alias file with this name
        use the gifile arg (below) if set to calculate db size
        use the BLAST db specified with -i (above) [File Out]  Optional
  -F  Gifile (file containing list of gi's) [File In]  Optional
  -B  Binary Gifile produced from the Gifile specified above [File Out]  Optional
  -T  Taxid file to set the taxonomy ids in ASN.1 deflines [File In]  Optional
  -N  Number of database volumes [Integer]  Optional
    default = 0

@@ Line 2: / Line 2: @@
 |package_name=MPIBLAST
 |package_description=Parallel implementation of NCBI BLAST
-|package_idnumber=55
+|package_idnumber=55}}
-}}
+<u>GETTING STARTED</u>
-h1. Example 1 - DROSOPH
-Copy sample problem files into a directory under work.  Use 1.5.0 or 1.6.0 depending on which cluster you are on:
+Mpiblast is not loaded by default on the clusters therefore load the module:
- mkdir /work/$USER/testmpiblast1; cd /work/$USER/testmpiblast1
+<pre>
- cp /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.in drosoph.in
+module load mpiblast/1.6.0
- gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.aa.gz > drosoph.aa
+</pre>
- gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.nt.gz > drosoph.nt
-Create hidden configuration file which defines shared storage location between nodes and a local storage directory on each compute node as follows
+<u>EXAMPLE1 - DROSOPH</u>
- cd /work/$USER/testmpiblast1
+Copy sample problem files (fasta database and input) into a directory under work:
- echo "[mpiBLAST]" > .ncbirc
- echo "Shared=/scratch/$USER/testmpiblast1" >> .ncbirc
- echo "Local=/tmp" >> .ncbirc
-Create the shared directory under scratch where the partitioned database will be stored.  Note that files under scratch will eventually expire and be deleted automatically by the system.
+<pre>
+mkdir /work/$USER/samples/mpiblast/test1; rm -f /work/$USER/samples/mpiblast/test1/*; cd /work/$USER/samples/mpiblast/test1
+cp /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.in drosoph.in
+gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.nt.gz > drosoph.nt
+</pre>
- mkdir /scratch/$USER/testmpiblast1
+Create hidden configuration file to define a Shared storage location between nodes and a Local storage directory available on each compute node as follows:
-From _/work/$USER/testmpiblast1_ execute the following command to partition the database. After it completes verify the partition files were created in the shared scratch directory.  For this example choosing N=32 doubles the execution time compared to N=16.  The choice of N should be therefore carefully chosen based on scaling tests.
+<pre>
+[username@orc-login1:/work/roberpj/samples/mpiblast/test1] vi .ncbirc
- Version 1.5.0 clusters run:  mpiformatdb.sh "-N 16 -i drosoph.nt -o T -p F"
- Version 1.6.0 clusters run:  mpiformatdb -N 16 -i drosoph.nt -o T -p F
-Submit a short test job to the queue with a 15m time limit.  If all goes well output results will be written to _drosoph.out_ and the total execution wall time will be approximately 3 seconds.
+[NCBI]
+Data=/opt/sharcnet/mpiblast/1.6.0/data
+[BLAST]
+BLASTDB=/scratch/$USER/mpiblasttest1
+BLASTMAT=/work/$USER/samples/mpiblast/test1
+[mpiBLAST]
+Shared=/scratch/$USER/mpiblasttest1
+Local=/tmp
+</pre>
- sqsub -t -r 15m -n 16 -q mpi -o ofile%J mpiblast -d drosoph.nt -i drosoph.in -p blastn -o drosoph.out --removedb
+Partition the database into 8 fragments to a cluster local scratch storage location:
-Sample output is included in _/opt/sharcnet/mpiblast/current/examples/ROSOPH.out_ to compare your _drosoph.out_ output file with it.
+<pre>
+mkdir /scratch/$USER/mpiblasttest1; rm -f /scratch/$USER/mpiblasttest1
+cd /work/$USER/samples/mpiblast/test1; mpiformatdb -N 8 -i drosoph.nt -o T -p F -n /scratch/$USER/testmpiblast1
+</pre>
-h1. Example 2 - BIOBREW
+Submit a short job with a 15m time limit.  If all goes well output results will be written to <i>drosoph.out</i> and the execution time will appear in ofile%J where %J is the job number:
-Copy sample problem files into a directory under work.
+<pre>
+cd /work/$USER/samples/mpiblast/test1
+sqsub -r 15m -n 8 -q mpi -o ofile%J mpiblast -d drosoph.nt -i drosoph.in -p blastn -o drosoph.out --use-parallel-write --use-virtual-frags
+</pre>
- mkdir /work/$USER/testmpiblast2; cd /work/$USER/testmpiblast2
+Sample output results computed previously with BLASTN 2.2.15 [Oct-15-2006] are included in <i>/opt/sharcnet/mpiblast/1.6.0/examples/ROSOPH.out</i> to compare with.
- cp /opt/sharcnet/mpiblast/1.6.0/examples/il2ra.in il2ra.in
- gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/Hs.seq.uniq.gz > Hs.seq.uniq
-Create hidden configuration file which defines shared storage location between nodes and a local storage directory on each compute node as follows:
+<u>EXAMPLE2 - BIOBREW</u>
- cd /work/$USER/testmpiblast2
+This example is provided to show some extra options and switchs that maybe useful for debugging and dealing with larger databases.  As before copy sample problem files into a directory under work:
- echo "[mpiBLAST]" > .ncbirc
-  echo "Shared=/work/$USER/mpiformatdbs/testmpiblast2" >> .ncbirc
- echo "Local=/tmp" >> .ncbirc
-Create the shared directory under work where formated databases will be stored.  In this example the database is saved under work for long term retention and sharing.
+<pre>
+mkdir /work/$USER/samples/mpiblast/test2; rm -f /work/$USER/samples/mpiblast/test2/*; cd /work/$USER/samples/mpiblast/test2
+cp /opt/sharcnet/mpiblast/1.6.0/examples/il2ra.in il2ra.in
+gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/Hs.seq.uniq.gz > Hs.seq.uniq
+</pre>
-  mkdir /work/$USER/mpiformatdbs; mkdir /work/$USER/mpiformatdbs/testmpiblast2
+Create hidden configuration file using the vi editor to define a Shared storage location between nodes and a Local storage directory available on each compute node as
+follows, where the Data directory is not yet populated or used in this example and hence can be ommitted.  If its desired that the Local and Shared directories are the
+same (so there is no file copying) then replace <b>--copy-via=mpi</b> with <b>--copy-via=none</b> as shown the two below sqsub commands.
-From _/work/$USER/testmpiblast2_ execute the following command to partition the database. After it completes verify the database files were created in the shared work directory.  Note that doubling N to 32 in this examples improves the performance by only 10% and therefore is not practical.
+<pre>
+[username@orc-login1:/work/roberpj/samples/mpiblast/test1] vi .ncbirc
- Version 1.5.0 clusters run:  mpiformatdb.sh "-N 16 -i Hs.seq.uniq -o T -p F"
+[NCBI]
- Version 1.6.0 clusters run:  mpiformatdb -N 16 -i Hs.seq.uniq -o T -p F
+Data=/opt/sharcnet/mpiblast/1.6.0/data
+[BLAST]
+BLASTDB=/work/$USER/mpiblasttest2
+BLASTMAT=/work/$USER/samples/mpiblast/test2
+[mpiBLAST]
+Shared=/work/$USER/mpiblasttest2
+Local=/tmp
+</pre>
-Submit a short test job to the queue with a 15m time limit.  If all goes well output results will be written to <i>biobrew.out</i> and the total execution wall time will be approximately 30 seconds.
+Partition the database into 16 fragments directly in the work directly:
- sqsub -t -r 55m -n 16 -q mpi -o ofile%J mpiblast -p blastn -d Hs.seq.uniq -i il2ra.in -o biobrew.out
+<pre>
+mkdir /scratch/$USER/mpiblasttest2; rm -f /scratch/$USER/mpiblasttest2
+cd /work/$USER/samples/mpiblast/test1; mpiformatdb -N 16 -i Hs.seq.uniq -o T -p F
+</pre>
-Sample output is included in _/opt/sharcnet/mpiblast/1.6.0/examples/BIOBREW.out_ to compare your _biobrew.out_ output file with.
+Submit a couple of short jobs 15m time limit.  If all goes well output results will be written to <i>biobrewA.out</i> and <i>biobrewB.out</i>
+and the execution time will appear in corresponding ofile%J's where %J is the job number.
-h1. MPIBLAST Command Line Arguments
+A) In this submission fragment files are first copied from work to local /tmp before being used (appropriate if work is slow).
+Usage of the profile option is also shown in this example:
+<pre>
+cd /work/$USER/samples/mpiblast/test2; rm -f oTime*
+sqsub -r 15m -n 18 -q mpi -o ofile%J mpiblast  --use-parallel-write --copy-via=mpi -d Hs.seq.uniq -i il2ra.in -p blastn -o biobrew.out --time-profile=oTime
+</pre>
- mpiblast.sh --help
+A) In this submission fragment files are used directly from work (file copying set to none).
+Usage of the debug option is also shown in this example.
- -p [blast program name]
- -d [database]
+<pre>
- -i [query file]
+cd /work/$USER/samples/mpiblast/test2; rm -f oLog*
+sqsub -r 15m -n 18 -q mpi -o ofile%J mpiblast --use-parallel-write --copy-via=none -d Hs.seq.uniq -i il2ra.in -p blastn -o biobrew.out --debug=oLog
+</pre>
+Finally compare <i>/opt/sharcnet/mpiblast/1.6.0/examples/BIOBREW.out</i> computed previously with BLASTN 2.2.15 [Oct-15-2006] with your newly generated <i>biobrew.out</i> output file to verify the results and submit a ticket if there are any problems!
+<u>MPIBLAST BINARIES (command line arguments)</u>
+<pre>
+[roberpj@orc-login1:/opt/sharcnet/mpiblast/1.6.0/bin] ./mpiblast -help
+mpiBLAST requires the following options: -d [database] -i [query file] -p [blast program name]
+</pre>
+<pre>
+[roberpj@orc-login1:/opt/sharcnet/mpiblast/1.6.0/bin] ./mpiformatdb --help
+Executing: formatdb -
+formatdb 2.2.20   arguments:
- mpiformatdb.sh --help
- formatdb 2.2.15   arguments:
    -t  Title for database file [String]  Optional
    -i  Input file(s) for formatting [File In]  Optional
@@ Line 87: / Line 131: @@
           T - True: Parse SeqId and create indexes.
           F - False: Do not parse SeqId. Do not create indexes.
-  [T/F]  Optional
+ [T/F]  Optional
      default = F
    -a  Input file is database in ASN.1 format (otherwise FASTA is expected)
           T - True,
           F - False.
-  [T/F]  Optional
+ [T/F]  Optional
      default = F
    -b  ASN.1 database in binary mode
           T - binary,
           F - text mode.
-  [T/F]  Optional
+ [T/F]  Optional
      default = F
    -e  Input is a Seq-entry [T/F]  Optional
@@ Line 103: / Line 147: @@
    -n  Base name for BLAST files [String]  Optional
    -v  Database volume size in millions of letters [Integer]  Optional
-     default = 0
+     default = 4000
-    range from 0 to <NULL>
    -s  Create indexes limited only to accessions - sparse [T/F]  Optional
      default = F
@@ Line 114: / Line 157: @@
    -F  Gifile (file containing list of gi's) [File In]  Optional
    -B  Binary Gifile produced from the Gifile specified above [File Out]  Optional
+  -T  Taxid file to set the taxonomy ids in ASN.1 deflines [File In]  Optional
    -N  Number of database volumes [Integer]  Optional
      default = 0
-    range from 1 to 250
+</pre>

Difference between revisions of "MPIBLAST"

Views

Revision as of 16:50, 6 June 2012

Navigation menu

Menu

Search

Tools

Personal tools