Line 16: | Line 16: | ||
<pre> | <pre> | ||
− | mkdir -p /work/$USER/samples/mpiblast/test1; rm | + | mkdir -p /work/$USER/samples/mpiblast/test1; rm /work/$USER/samples/mpiblast/test1/*; cd /work/$USER/samples/mpiblast/test1 |
cp /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.in drosoph.in | cp /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.in drosoph.in | ||
gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.nt.gz > drosoph.nt | gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.nt.gz > drosoph.nt | ||
Line 37: | Line 37: | ||
<pre> | <pre> | ||
− | mkdir /scratch/$USER/mpiblasttest1; rm -f /scratch/$USER/mpiblasttest1; cd /work/$USER/samples/mpiblast/test1 | + | mkdir /scratch/$USER/mpiblasttest1; rm -f /scratch/$USER/mpiblasttest1/*; cd /work/$USER/samples/mpiblast/test1 |
mpiformatdb -N 8 -i drosoph.nt -o T -p F -n /scratch/$USER/testmpiblast1 | mpiformatdb -N 8 -i drosoph.nt -o T -p F -n /scratch/$USER/testmpiblast1 | ||
</pre> | </pre> | ||
Line 55: | Line 55: | ||
<pre> | <pre> | ||
− | mkdir /work/$USER/samples/mpiblast/test2; rm | + | mkdir /work/$USER/samples/mpiblast/test2; rm /work/$USER/samples/mpiblast/test2/*; cd /work/$USER/samples/mpiblast/test2 |
cp /opt/sharcnet/mpiblast/1.6.0/examples/il2ra.in il2ra.in | cp /opt/sharcnet/mpiblast/1.6.0/examples/il2ra.in il2ra.in | ||
gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/Hs.seq.uniq.gz > Hs.seq.uniq | gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/Hs.seq.uniq.gz > Hs.seq.uniq | ||
Line 78: | Line 78: | ||
<pre> | <pre> | ||
− | mkdir -p /scratch/$USER/mpiblasttest2; rm -f /scratch/$USER/mpiblasttest2; cd /work/$USER/samples/mpiblast/test1 | + | mkdir -p /scratch/$USER/mpiblasttest2; rm -f /scratch/$USER/mpiblasttest2/*; cd /work/$USER/samples/mpiblast/test1 |
mpiformatdb -N 16 -i Hs.seq.uniq -o T -p F | mpiformatdb -N 16 -i Hs.seq.uniq -o T -p F | ||
</pre> | </pre> |
Revision as of 17:58, 6 June 2012
MPIBLAST |
---|
Description: Parallel implementation of NCBI BLAST |
SHARCNET Package information: see MPIBLAST software page in web portal |
Full list of SHARCNET supported software |
GETTING STARTED
Mpiblast is not loaded by default on the clusters therefore load the module before submitting any jobs:
module load mpiblast/1.6.0
EXAMPLE1 - DROSOPH
Copy sample problem files (fasta database and input) into a directory under work:
mkdir -p /work/$USER/samples/mpiblast/test1; rm /work/$USER/samples/mpiblast/test1/*; cd /work/$USER/samples/mpiblast/test1 cp /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.in drosoph.in gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.nt.gz > drosoph.nt
Create hidden configuration file to define a Shared storage location between nodes and a Local storage directory available on each compute node where $USER should be replaced with your username as shown here:
[username@orc-login1:/work/roberpj/samples/mpiblast/test1] vi .ncbirc [BLAST] BLASTDB=/scratch/$USER/mpiblasttest1 BLASTMAT=/work/$USER/samples/mpiblast/test1 [mpiBLAST] Shared=/scratch/$USER/mpiblasttest1 Local=/tmp
Partition the database into 8 fragments to a cluster local scratch storage location:
mkdir /scratch/$USER/mpiblasttest1; rm -f /scratch/$USER/mpiblasttest1/*; cd /work/$USER/samples/mpiblast/test1 mpiformatdb -N 8 -i drosoph.nt -o T -p F -n /scratch/$USER/testmpiblast1
Submit a short job with a 15m time limit on 8 plus 2 cores. If all goes well output results will be written to drosoph.out and the execution time will appear in ofile%J where %J is the job number:
cd /work/$USER/samples/mpiblast/test1 sqsub -r 15m -n 10 -q mpi -o ofile%J mpiblast -d drosoph.nt -i drosoph.in -p blastn -o drosoph.out --use-parallel-write --use-virtual-frags
Sample output results computed previously with BLASTN 2.2.15 [Oct-15-2006] are included in /opt/sharcnet/mpiblast/1.6.0/examples/ROSOPH.out to compare your newly generated drosoph.out file with.
EXAMPLE2 - BIOBREW
This example is provided to show some extra options and switchs that maybe useful for debugging and dealing with larger databases. As before copy sample problem files into a directory under work:
mkdir /work/$USER/samples/mpiblast/test2; rm /work/$USER/samples/mpiblast/test2/*; cd /work/$USER/samples/mpiblast/test2 cp /opt/sharcnet/mpiblast/1.6.0/examples/il2ra.in il2ra.in gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/Hs.seq.uniq.gz > Hs.seq.uniq
Create hidden configuration file using the vi editor to define a Shared storage location between nodes and a Local storage directory available on each compute node as follows, where the Data directory is not yet populated or used in this example and hence can be omitted, where $USER should be replaced with your username as shown here:. If its desired the Local and Shared directories are the same then replace --copy-via=mpi with --copy-via=none as will be demonstrated in the below sqsub commands.
[username@orc-login1:/work/roberpj/samples/mpiblast/test1] vi .ncbirc [NCBI] Data=/opt/sharcnet/mpiblast/1.6.0/data [BLAST] BLASTDB=/work/$USER/mpiblasttest2 BLASTMAT=/work/$USER/samples/mpiblast/test2 [mpiBLAST] Shared=/work/$USER/mpiblasttest2 Local=/tmp
Partition the database into 16 fragments directly in the work directly:
mkdir -p /scratch/$USER/mpiblasttest2; rm -f /scratch/$USER/mpiblasttest2/*; cd /work/$USER/samples/mpiblast/test1 mpiformatdb -N 16 -i Hs.seq.uniq -o T -p F
Submit a couple of short jobs 15m time limit. If all goes well output results will be written to biobrewA.out and biobrewB.out and the execution time will appear in corresponding ofile%J's where %J is the job number.
A) In this job submission fragment files are first copied from work to local /tmp before being used (appropriate if work is slow). Usage of the profile option is also shown in this example:
cd /work/$USER/samples/mpiblast/test2; rm -f oTime* sqsub -r 15m -n 18 -q mpi -o ofile%J mpiblast --use-parallel-write --copy-via=mpi -d Hs.seq.uniq -i il2ra.in -p blastn -o biobrew.out --time-profile=oTime
B) In this job submission fragment files are used inplace on work. Usage of the debug option is also shown in this example.
cd /work/$USER/samples/mpiblast/test2; rm -f oLog* sqsub -r 15m -n 18 -q mpi -o ofile%J mpiblast --use-parallel-write --copy-via=none -d Hs.seq.uniq -i il2ra.in -p blastn -o biobrew.out --debug=oLog
Finally compare /opt/sharcnet/mpiblast/1.6.0/examples/BIOBREW.out computed previously with BLASTN 2.2.15 [Oct-15-2006] with your newly generated biobrew.out output file to verify the results and submit a ticket if there are any problems!
MPIBLAST BINARIES (command line arguments)
[roberpj@orc-login1:/opt/sharcnet/mpiblast/1.6.0/bin] ./mpiblast -help mpiBLAST requires the following options: -d [database] -i [query file] -p [blast program name]
[roberpj@orc-login1:/opt/sharcnet/mpiblast/1.6.0/bin] ./mpiformatdb --help Executing: formatdb - formatdb 2.2.20 arguments: -t Title for database file [String] Optional -i Input file(s) for formatting [File In] Optional -l Logfile name: [File Out] Optional default = formatdb.log -p Type of file T - protein F - nucleotide [T/F] Optional default = T -o Parse options T - True: Parse SeqId and create indexes. F - False: Do not parse SeqId. Do not create indexes. [T/F] Optional default = F -a Input file is database in ASN.1 format (otherwise FASTA is expected) T - True, F - False. [T/F] Optional default = F -b ASN.1 database in binary mode T - binary, F - text mode. [T/F] Optional default = F -e Input is a Seq-entry [T/F] Optional default = F -n Base name for BLAST files [String] Optional -v Database volume size in millions of letters [Integer] Optional default = 4000 -s Create indexes limited only to accessions - sparse [T/F] Optional default = F -V Verbose: check for non-unique string ids in the database [T/F] Optional default = F -L Create an alias file with this name use the gifile arg (below) if set to calculate db size use the BLAST db specified with -i (above) [File Out] Optional -F Gifile (file containing list of gi's) [File In] Optional -B Binary Gifile produced from the Gifile specified above [File Out] Optional -T Taxid file to set the taxonomy ids in ASN.1 deflines [File In] Optional -N Number of database volumes [Integer] Optional default = 0