Primer Validator

Hamid R. Shahbazkia, Richard Christen
Usage     Output    Download    Additional files    

Brief description.

Primer Validator is composed first of a C program that estimates couples of primers (oligomers) by comparison to a set of sequences in fasta format. 

Primer Validator looks for sequences that contain both primers (if required, at k differences) and provides the list of couples found, the domains amplified and the identity of the sequences extracted. Next a Python script allows to generate synthetic views of the results. 

Everything is wrapped within a graphical interface (Python & Tk) that will run on every OS (UX, Mac, Windows). Binaries are provided for each OS (64-bit) and have been tested on each OS (Windows binaries compiled under CygWin, dll provided).

Primer Validator is freely available at http://bioinfo.unice.fr/primervalidator/, with online help.

See http://bioinfo.unice.fr/454 for a demonstration of results: thousands of primers tested against hundreds of thousands of sequences.

IMPORTANT NOTE 1: on may 26th I discovered a problem on windows, windowers love to use white spaces in directories names. I fixed that, but make sure that primer file, fasta file and output file are all located in the same dir as primervalidator. Alternatively instead of using button 'RUN primervalidator", make a .bat file that launches primervalidator.

IMPORTANT NOTE 2: dont be confused between "primervalidator" which is the program that validates primers with "graphic_primervalidator" which launches the graphic interface allowing to run "primervalidator" itself and do a synthetic presentation of the results.

Download

Linux:
Simply compile the C with gcc (Cygwin for MS Windows). Install Python 2.x for Windows if necessary, python is present with Linux and Mac OSX.
Run the graphical interface using: "python graphic_primer_validator-1.0.py" (developed with Python 2.5).

MS Windows.
Simply download these files, unzip and save these files in a single directory. No need to install anything, simply donwload, unzip.
Save C program files in a single directory, where you should also save the graphical interface file
In the kgb rules for download,  change names of files such as nnn_exe to nnn.exe (for institutes with rules against download of .exe files). Since there is no installation, this should work.

Mac.
Donwload the test/example files.
  1. a test file of fasta sequences : this is a big fasta file; wait until it name is not .part any more to unzip it. It should download as "test_bacteria.fasta"; if not rename it.
  2. a test file of primer sets:  this is a text file with a single pair of primer; wait until it name is not .part any more to use it. It should download as "primers.txt"; if not rename it.
If you experience problems in downloading these files see (detailed instructions below), or download the zip

Usage

C program

You can either run this program from the graphical interface or from the command line (but the graphical interface is much easier to use, except if you want to go for batch analyses). Depending upon your OS:
./primervalidatormac.app p1 p2 p3 p4 sequence.fasta outputfile
./primervalidatorlinuc.app p1 p2 p3 p4 sequence.fasta outputfile
./primervalidator.exe p1 p2 p3 p4 sequence.fasta outputfile
Important notes:
Fasta file of sequences:

Output formats

C program

>BF1_BR1 ******** query seq ********
Start of analysis of a new couple of primers, identified by BF1_BR1

AACGTTTGACATCCCTAGTATGGTTACCAGAGATGGTTTCCTTCAGTTCGGCTGGCTAGGTGAC ** F 801 884 ** >AB000106|Bacteria|Proteobacteria|Alphaproteobacteria|Sphingomonadales|Sphingomonadaceae|Sphingomonas **  ** CAACGCGXAGAACCTTACC CGACAGCCATGCANCACCT **
A domain was extracted :
NOTES :

Python program

*1    analysed primers : 228 :
        name of analysed couple of primer
*2    nbr extracted tags at k dif :  20052, min len=73, max len = 727, mean len=250
        number of tags extracted at k differences, length of shorter tag, length of longer tag, mean length of tags extracted
*3    nbr exact extracted tags :  16788, min len=73, max len = 727, mean len=250
        number of tags extracted, using primers as provided (no further degeneracy added)
*4    min len = 73  for >U32596|Bacteria|Firmicutes|Clostridia|Halanaerobiales....
        tag of minimal length extract for this sequence
*5    max len = 727  for >AY594276|Bacteria|Firmicutes|Lac....
        tag of maximal length extract for this sequence
*6    mean left pos in sequences 470
        taking every extraction in consideration, domains start around that position
*7    mean right pos in sequences 738
        taking every extraction in consideration, domains end around that position

First primers (should be the reverse one)
printing only primer more abundant than 0.5 %
CAGCMGCCGCGGTAATWC    18251    91.0 %
CAGCMGCCGCGGTAAXWC    1059    5.3 %
CAGCMGCXGCGGTAATWC    178    0.9 %
CAGCMGCCGCGGTXATWC    127    0.6 %
CTACCNGGGTATCTAAT    119    0.6 %
CAGCMGXCGCGGTAATWC    119    0.6 %

Which sequence was found how many times, and % found
explain:       
        CAGCMGCCGCGGTAATWC, could be found in 18251 sequences, i.e. in   91 % of the sequences
        CAGCMGXCGCGGTAATWC  with one degenracy added, was found  119  more times, in    0.6 % more sequences
         CTACCNGGGTATCTAAT   this is the other primer, some sequences in the fasta file are inverted/complemented
........
taxonomy at rank 2
Archaea    Crenarchaeota    116    124    93.5 %
Archaea    Euryarchaeota    536    571    93.9 %
Archaea    Korarchaeota    1    1    100.0 %
Bacteria    Acidobacteria    14    14    100.0 %
Bacteria    Actinobacteria    4663    5168    90.2 %
Bacteria    Aquificae    34    39    87.2 %

For each phylum number of sequences extracted, total nmber of sequences in clade, % extracted (two last numbers provided only if a fasta file of sequences is available).
NOTE : these numbers are for occurences of tags, when both F and R primers could be found

Instructions to use the graphical interface


Running primervalidator
Using the example provided should take a few seconds on your laptop. Once the console closes

Download detailed  instructions (some windowers  may need these explanations).

If you had trouble to download these two files
  1. a test file of fasta sequences : this is a big fasta file; wait until it name is not .part any more to unzip it. It should download as "test_bacteria.fasta"; if not rename it.
  2. a test file of primer sets:  this is a text file with a single pair of primer; wait until it name is not .part any more to use it. It should download as "primers.txt"; if not rename it.
Download. Windows users : right clic on each link below and select "save the link as " as shown below



Additional Files

Fasta files of sequences with taxonomy included in headers, example :
>A61579|Bacteria|Thermotogae|Thermotogales|Thermotogaceae|Thermopallium
Test files to check that everything is working.
Windows users : right clic on each link below and select "save the link as " as shown below