Web Services at the EBI  information


Using WU-Blast


General information
Inputs
Outputs
Help

General Information

Type Pairwise_Local_Aligning
Authority URL http://blast.wustl.edu/
Documentation Richard Christen
Contact e-mail christen@unice.fr
Description  Washington University BLAST (WU BLAST) version 2.0 is a powerful software package for gene and protein identification, using sensitive, selective and rapid similarity searches of protein and nucleotide sequence databases. It can be accessed at EBI via web services, as described here.
This page provides more explanations on how to use this web service and some python examples (SOAP.py package required).
Asynchronous both possible.
Programming langage: Python demo code.

Inputs

This web service takes two named arguments: params   &  content, as in server.runNCBIBlast(params=blast_params,content=blast_data)

params

In the form of a python dictionnary, i.e. a list of key:values.
The following params are presently allowed (which are in much lower numbers than the command line arguments). Shown in grey those I have not yet been able to use.       

param name type allowed values command line equivalent default value mandatory
program string blastn, tblastn, tblastx     X
database string see  or below     X
matrix string default, identity.4.2, identity.4.4 matrix=    
exp float any positive float E=    
echofilter boolean   echofilter    
filter string none, dust
 filter=    
numal integer any integer >=0 B=    
scores integer any integer >=0 V=    
sensitivity string any string >=7 W= 11  
sort string pvalue,count,highscores, totalscores  sort_by_pvalue
sort_by_count 
sort_by_highscore

sort_by_totalscore

sort_by_subjectlength
pvalue  
stats string sump, poisson, kap sump poissonp kap sump  
strand string   dbtop dbbottom
(top, bottom) ???
   
outformat string   mformat=    
topcomboN integer a positive integer  topcomboN= 1  
async boolean 0,1      
email string a valid email      
       
   

content

content name type allowed values mandatory
type string sequence X
content string a validnucleic acid fasta sequence X

Python example (see below for SOAP implementation).

blast_params = {program:'blastn',
            database: 'embl',
            exp: 1e-050,
            scores: 100,
            matrix: 'identity.4.2',
            filter: 'dust',
            align: align_view,
            numal: 100,
            echofilter: 'no',
            sort='totalscore'
            topcomboN 1
            email: 'christen@unice.fr', async: 0}
blast_data = [{'type':'sequence','content':self.sequence}]
jobid = self.server.runNCBIBlast(params=blast_params,content=blast_data)
result = self.server.poll(jobid,'tooloutput')

demo python code (fetch, ncbi-blast & wu-blast working examples, simply run the code, demo sequences are imbedded in code).
This is a pure python code, with graphic interface in Tk

Outputs

This requires to call the method poll as in:
Use server.poll(jobid,'tooloutput')
See python example above (jobid is returned by self.server.runNCBIBlast when done)

Help runNCBIBlast

Programs
blastn
Nucleotide-Nucleotide BLAST
This service compares a nucleotide sequence to a sequence database and calculates the statistical significance of matches using the Basic Local Alignment Search Tool (BLAST). BLAST finds regions of local similarity between sequences. This tool is useful to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of your novel sequence.
blastx
Nucleotide translated query vs Protein database
Compares a nucleotide query sequence translated in all reading frames against a protein sequence database. Allows you to find potential translation products of an unknown nucleotide sequence (both strands).
tblastx
Nucleotide Translated query vs. Nucleotide translated database
This service compares the six-frame translations of a nucleotide sequence to the six-frame translations of a nucleotide sequence database and calculates the statistical significance of matches using the Basic Local Alignment Search Tool (BLAST). BLAST finds regions of local similarity between sequences. This tool is useful to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of your novel sequence.

blastp
Protein query vs Protein database
tblastn
Protein query vs. Nucleotide translated database
This service compares an aminoacid sequence to the six-frame translations of a nucleotide sequence database and calculates the statistical significance of matches using the Basic Local Alignment Search Tool (BLAST). BLAST finds regions of local similarity between sequences. This tool is useful to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of your novel sequence.

Parameters:   back to table
(in red those than can be presently used through the web service)
database: database against sequence is compared.
back to table
exp (expected_threshold): statistical significance threshold for reporting database sequence matches. The default value is 10, meaning that 10 matches are expected to be found merely by chance. Lower expected thresholds are more stringent, leading to fewer chance matches being reported. Increasing the expected threshold shows less stringent matches and is recommended when you are performing searches with short sequences as a short query is more likely to occur by chance in the database than a longer one, so even a perfect match (no gaps) can have low statistical significance and may not be reported. Increasing the Expected threshold allows you to look farther down in the hit list and see matches that would normally be discarded because of low statistical significance. Generally a value of up to 1000 is enough to see results.
back to table
filter: if set to true, will allow you to mask out various segments of the query sequence for regions which are non-specific for sequence similarity searches. Filtering can eliminate statistically significant but biologically uninteresting reports from the output, for example hits against common acidic-, basic- or proline-rich regions, leaving the more biologically interesting regions of the query sequence available for specific matching against database sequences. Filtering is only applied to the query sequence, not to database sequences. The SEG program is used for filtering low complexity regions in amino acid sequences from your protein query sequence and was written by Wootton, J.C., and Federhen, S.
back to table
dropoff: amount a score must drop before extension of word hits is halted.
back to table
opengap: score taken away for the initiation of the gap in sequence or in structure. To make the match more significant you can try to make the gap penalty larger. It will decrease the number of gaps and if you have good alignment without many gaps, its Z-score will be higher.
back to table
extendgap: gap extension penalty that is added to the standard gap open penalty for each base or residue in the gap. This is how long gaps are penalised. If you don't like long gaps, just increase the extension gap penalty. Usually you will expect a few long gaps rather than many short gaps, so the gap extension penalty should be lower than the gap penalty. An exception is where one or both sequences are single reads with possible sequencing errors in which case you would expect many single base gaps. You can get this result by setting the gap open penalty to zero (or very low) and using the gap extension penalty to control gap scoring.
matrix: comparison matrix used when searching the database.
back to table
gapalign: perform optimised alignments within regions involving gaps. If set to true, the program will perform an alignment using gaps. Otherwise, if it is set to false, it will report only individual HSP (High-Scoring Segment Pair) where two sequence match each other, and thus will not produce alignments with gaps.
back to table
scores: number of database sequences to show one-line descriptions for.
back to table
numal (alignments): number of database sequences to show alignments for. Setting this options to any number available in the menu allows you to set the maximum number of reported alignments in the output file. Here is an example of an alignment.
back to table
Topcombo processing causes consistent sets of HSPs to be reported, such that any given HSP is allowed to be a member of just one set. Often, one wishes to see just the best set of consistent HSPs without any other "contaminants" in the output. This would be topcomboN=1.
back to table
stats  You may choose here which type of statistics are used when assessing the significance of aligned pairs. The following options are available:

N.B. when Poisson statistics are used, some HSPs may be reported that were not involved in achieving statistical significance.
The default is to use sump statistics.
back to table
sort Sorts the scores in the score list of the output file. It has the following sort options:
back to table
sensitivity
An increase is sensitivity will increase the length of the search (longer execution times + more memory required), but increase the specificity of the results. A decrease will significantly speed up the search but decrease the sensitivity of the results. The default is will run faster than high sensitivity searches. To perform very quick searches select low or very low sensitivity.
back to table

async
The program can be run as a synchronous job or an asynchronous job.

Synchronous job: The results/errors are returned as soon as the job is finished.
Asynchronous job: Use this if you want to retrieve the results at a later time or if you think it can take a long time to execute.
The results are stored for up to 24 hours.
back to table


The complete list of EBI web services.

Last updated 04/04/2007. Richard Christen