Primer Validator
Hamid R. Shahbazkia,
Richard Christen
Usage Output Download Additional files
Brief
description.
Primer
Validator is composed first of a C program that estimates couples of primers (oligomers) by comparison
to a set of sequences in fasta format.
Primer
Validator looks for sequences that contain both
primers (if required, at k differences) and provides the list of
couples found,
the domains amplified and the identity of the sequences extracted. Next
a Python
script allows to generate synthetic views of the results.
Everything is wrapped within a graphical interface (Python & Tk) that will run on every OS (UX, Mac, Windows).
Binaries are provided for each OS (64-bit) and have been tested on each
OS (Windows binaries compiled under CygWin, dll provided).
Primer
Validator is
freely available at http://bioinfo.unice.fr/primervalidator/, with online
help.
See
http://bioinfo.unice.fr/454
for a demonstration of results: thousands of primers tested against
hundreds of thousands of sequences.
IMPORTANT
NOTE 1: on may 26th I discovered a problem on windows, windowers love
to use white spaces in directories names. I fixed that, but make sure
that primer file, fasta file and output file are all located in the
same dir as primervalidator. Alternatively instead of using button 'RUN
primervalidator", make a .bat file that launches primervalidator.
IMPORTANT
NOTE 2: dont be confused between "primervalidator" which is the program
that validates primers with "graphic_primervalidator" which launches
the graphic interface allowing to run "primervalidator" itself and do a synthetic presentation of the results.
Download
Linux:
Simply
compile the C with gcc (Cygwin for MS Windows). Install Python 2.x for
Windows if necessary, python is present with Linux and Mac OSX.
Run the graphical interface using: "python graphic_primer_validator-1.0.py" (developed with Python 2.5).
MS Windows.
Simply download these files, unzip and save these files in a single
directory. No need to install anything, simply donwload, unzip.
Save C program files in a single directory, where you should also save the graphical interface file
In the kgb rules for download,
change names of files such as nnn_exe to nnn.exe (for institutes with rules against download of .exe files). Since there is no installation, this should work.
Mac.
Donwload the test/example files.
- a
test file of fasta sequences : this is a big fasta file; wait
until it name is not .part any more to unzip it. It should download as
"test_bacteria.fasta"; if not rename it.
- a
test file of primer sets: this is a text file with
a single pair of primer; wait until it name is not .part any more to
use it. It should download as "primers.txt"; if not rename it.
If you experience problems in downloading these files see (detailed instructions below), or download the zip
Usage
C program
You can either run this program from the graphical interface or from the command line (but the graphical interface is much easier to use, except if you want to go for batch analyses). Depending upon your OS:
./primervalidatormac.app p1 p2 p3 p4
sequence.fasta outputfile
./primervalidatorlinuc.app p1 p2 p3 p4
sequence.fasta outputfile
./primervalidator.exe p1 p2 p3 p4
sequence.fasta outputfile
- primervalidatornnnn : choose the proper binaries for your OS.
- p1 Use the one-letter
code for redundancies from the IUB table (example: M = AC) =1,
treat every degenerated nucleotide as a N (slightly faster).
- p2 an integer (0 to 15) allowing up to p1 differences
between a primer and a sequence (on top of the existing degeneracies
alrearedy included in the primer.
- p3 an integer (0 to 15) allowing up to p1
differences between a primer and a sequence (on top
of the existing degeneracies alrearedy included in the primer.
- p4 = 0 search only in provided fasta sequences,
=1 search also in inverted-complemented sequence.
- sequence.fasta the fasta file of sequences to be searched.
- outputfile: the file to store results in.
Important
notes:
- The file of primers is in the form of a fasta file as:
- >name_of_couple
- F_primer_sequence
- R_primer_sequence
- No space
allowed in name_of_couple !
- Last line of
file should end with a "\n" i.e. a carriage return
- The
fasta file of sequences should have each sequence on a single line, no
60-80 nucleotides per line (use the function provided by the Graphic Interface if necessary).
- Not tested when using a file format with a \n end of line different from that of your system (UX or WIndows)
Fasta file of sequences:
- No
space in sequence identifiers either.
- Sequences should be in a single line (use the graphical interface button dedicated to transform your fasta files).
- Please
use a file format with the \n of your system (UX or WIndows)
- It is possible to provide sequences identifiers in the form:
- >A61579|Bacteria|Thermotogae|Thermotogales|Thermotogaceae|Thermopallium
- The taxonomy provided in such way is then used to perform
analyses of the specificities of primers
- If there is no taxonomy in sequence headers, the program
will run, provided there are no spaces in headers.
Output formats
C program
>BF1_BR1 ******** query seq ********
Start of analysis of a new couple of primers, identified by BF1_BR1
AACGTTTGACATCCCTAGTATGGTTACCAGAGATGGTTTCCTTCAGTTCGGCTGGCTAGGTGAC
** F 801 884 **
>AB000106|Bacteria|Proteobacteria|Alphaproteobacteria|Sphingomonadales|Sphingomonadaceae|Sphingomonas
** ** CAACGCGXAGAACCTTACC CGACAGCCATGCANCACCT **
A domain was extracted :
- AACGTTTG....CTAGGTGAC : sequence
extracted
- F 801 884 : found
between positions 801 and 884 on F strand (R if other strand)
- >AB000106|Bacteria|Pr.. : identifier of this sequence
- CAACGCGXAGAACCTTACC : F
primer
- CGACAGCCATGCANCACCT :
R primer
NOTES :
- CAACGCGXAGAACCTTACC
: X means
that a degeneracy is required at this position for amplification
- CGACAGCCATGCANCACCT
: N means
the original primer had already a degeneracy at this position.
Python program
*1 analysed primers : 228 :
name of analysed couple of primer
*2 nbr extracted tags at k dif : 20052, min len=73, max len = 727, mean len=250
number of tags extracted at k
differences, length of
shorter tag, length of longer tag, mean length of tags extracted
*3 nbr exact extracted tags : 16788, min len=73, max len = 727, mean len=250
number of tags extracted, using
primers as provided (no further degeneracy added)
*4 min len = 73 for >U32596|Bacteria|Firmicutes|Clostridia|Halanaerobiales....
tag of minimal length extract
for this sequence
*5 max len = 727 for >AY594276|Bacteria|Firmicutes|Lac....
tag of maximal length extract for this
sequence
*6 mean left pos in sequences 470
taking every extraction in
consideration, domains start around that position
*7 mean right pos in sequences 738
taking every extraction in
consideration, domains end around that position
First primers (should be the reverse one)
printing only primer more abundant than 0.5 %
CAGCMGCCGCGGTAATWC 18251 91.0 %
CAGCMGCCGCGGTAAXWC 1059 5.3 %
CAGCMGCXGCGGTAATWC 178 0.9 %
CAGCMGCCGCGGTXATWC 127 0.6 %
CTACCNGGGTATCTAAT 119 0.6 %
CAGCMGXCGCGGTAATWC 119 0.6 %
Which sequence was found how many times, and % found
explain:
CAGCMGCCGCGGTAATWC, could
be found in 18251 sequences, i.e. in 91 % of the
sequences
CAGCMGXCGCGGTAATWC
with one degenracy added,
was found 119 more times, in
0.6 % more sequences
CTACCNGGGTATCTAAT this is the other primer, some sequences in the fasta file are inverted/complemented
........
taxonomy at rank 2
Archaea Crenarchaeota 116 124 93.5 %
Archaea Euryarchaeota 536 571 93.9 %
Archaea Korarchaeota 1 1 100.0 %
Bacteria Acidobacteria 14 14 100.0 %
Bacteria Actinobacteria 4663 5168 90.2 %
Bacteria Aquificae 34 39 87.2 %
For
each phylum number of sequences extracted, total nmber of sequences in
clade, % extracted (two last numbers provided only if a fasta file of
sequences is available).
NOTE
: these numbers are for occurences of tags, when both F and R primers
could be found
Instructions to use the graphical interface

Running primervalidator
- Select the exe code you downloaded i.e. primervalidator.exe
- Select a file of primers
- First line in the form >definiton_of_couple (no white space)
- Second line the R primer, no space, degenerated position in the form N,H,... possible
- Second line the F primer, no space, degenerated position in the form N,H,... possible
- Select a fasta file of sequences
- If necessary use 'transform fasta file into sequences on a single line" to reformat you file properly
- Select the number of mismatch allowed for each primer (on top of already degenerated positions).
- Select if you want to search both strands
- RUN primer validator => a console appears on screen
Using the example provided should take a few seconds on your laptop. Once the console closes- reselect primervalidator ouput to reanalyse a previous result, or if you closed the graphical interface.
- minimum
number of tags: if your file of primers contains many couples you want
to test, only analyse couples that extracted more than 1000 (in this
example) tags
- taxo
rank to extract: provided you use a fasta file including taxonomic
information, make a synthetic presentation for the first 4 ranks (see
example aboive for rank 2)
- minimal % primers: if different mutations are found in primers, print only primers alternates present more than 0.5 %
- Select an ouput file
- Do synthesis
Download detailed instructions (some windowers may need these explanations).
If you had trouble to download these two files- a
test file of fasta sequences : this is a big fasta file; wait
until it name is not .part any more to unzip it. It should download as
"test_bacteria.fasta"; if not rename it.
- a
test file of primer sets: this is a text file with
a single pair of primer; wait until it name is not .part any more to
use it. It should download as "primers.txt"; if not rename it.
Download. Windows users : right clic on
each link below and select "save the link as " as shown below

Additional
Files
Fasta files of sequences with taxonomy included in headers, example :
>A61579|Bacteria|Thermotogae|Thermotogales|Thermotogaceae|Thermopallium
Test files to
check that everything is working.
Windows users : right
clic on each link below and select "save the link as " as shown below
