Basic output flags

These are output flags that do not require the input of any fasta or EnTAP files.

--statistics

Statistics will be run on the gene table and printed to statistics.txt. This command is performed by task_scripts/classic_stats.pl. If a prefix is used, the statistics file will be named accordingly.

These are all the potential statistics in the reported format:

Number of genes:
Number of monoexonic genes:
Number of multiexonic genes:

Number of positive strand genes:
Monoexonic:
Multiexonic:

Number of negative strand genes:
Monoexonic:
Multiexonic:

Average overall gene size:
Median overall gene size:
Average overall CDS size:
Median overall CDS size:
Average overall exon size:
Median overall exon size:

Average size of monoexonic genes:
Median size of monoexonic genes:
Largest monoexonic gene:
Smallest monoexonic gene:

Average size of multiexonic genes:
Median size of multiexonic genes:
Largest multiexonic gene:
Smallest multiexonic gene:

Average size of multiexonic CDS:
Median size of multiexonic CDS:
Largest multiexonic CDS:
Smallest multiexonic CDS:

Average size of multiexonic exons:
Median size of multiexonic exons:
Average size of multiexonic introns:
Median size of multiexonic introns:

Average number of exons per multiexonic gene:
Median number of exons per multiexonic gene:
Largest multiexonic exon:
Smallest multiexonic exon:
Most exons in one gene:

Average number of introns per multiexonic gene:
Median number of introns per multiexonic gene:
Largest intron:
Smallest intron:

The following columns do not involve codons:
Number of complete models:
Number of 5’ only incomplete models:
Number of 3’ only incomplete models:
Number of 5’ and 3’ incomplete models:

If your set is only monoexonics, a smaller version of the statistics will be printed that only contain the categories where monoexonic genes are evaluated.

--statistics-at-every-step

A statistical analysis of the gene table is run following every filtering step. This information is in the same format as regular --statistics but prints to the log following the information line for each flag. To ensure statistics.txt is created at the end, make sure to include -–statistics in your command.

--create-simple-gtf

Identical to --create-gtf, but lacks start and stop codon information. This option is significantly faster.

--create-gff3

An Ensembl v3 gff3 gff3 will be created that contains mRNA, exon, and intron information. ID, Name, and Parent information will be shown.