Files for analyses of SSU sequences of eukaryotic origins
IMPORTANT NOTICE: the
.xls file are NOT ALL real excel files, they are tabulated files for
programs such as excel. Make sure you parameter your program to have
tabulations (\t) as fields delimitors.
Unzip these folders. NOTE : I
have below separated the most usefull files for biologists. You will
get less data on your hard disk, without missing any important info !
fasta and bio_xls files
zipped files of tag (metazoa removed). Format of fasta: >FSGHFSFJSD_n TGCTAGCTAGCTAGCTAGCTAGCAT
FSGHFSFJSD: tag identifier n : number of times tag occurred in sample
Excel files containing comparisons of sample in a format compatible with DNA chip analysis programs. In columns : the various samples In lines : the "gene", species, genus,...phylum class to which tags are assigned Contents : expression level, two different sub tables,
One with how many tag occurences in each sample, for each assignment
One with how many tag dereplicates in each sample, for each assignment (I have some doubt about usefullness of these numbers)
Each
form of the file is name according to the kind of assignment done.
Different files for assignments at different % of similarity. "-" ==> Could not be assigned to a public sequence at that level of similarity "unassigned"
==> Could not be assigned to a public sequence with good taxonomy,
but was assigned to some kind of clone sequence. Note that
for species and genus, unassinged will be absent.
for
species, assignment is to an accession number instead of unassigned (as
tags in different samples can be assigned the same accession number,
but obviously with some loss of generality).
New use. Open the program. Select working dir and project name
Creat a project
It will contain a list of subdirs you can use with the present version of the program
It contains the following directories
Bio. Was supposed to contain data for the biologists.
blast_out : the various results of the blasts. Not included in this distribution
dereplication
embl
fasta: contains all fasta files
metadata
percent
: contains every file parsed at various % of similarity. Please use
data contained in the folders 60,65,...85. Files out of these folders
are aoutdated if any.
results. Contains some xls files
Copy the result of unzipped folders in the respective folders created
I do not provide all results, since it is presently more than 40 Go (big outputs of blasts)