Fasta required output flags¶

In order to create fasta required outputs, you will need to provide a fasta input. See how to here. If a proper fasta is provided, you unlock all these flags:

--get-fasta¶

The nucleotide fasta sequence is printed to genes_with_introns.fasta. The genes are always printed start to stop. This fasta will not contain the intron sequences. The header for each sequence is the fifth column of the gene line in the gene table.

This command is performed by task_scripts/get_fasta_without_introns.pl. If a prefix is specified, the output fasta will be named accordingly.

--get-protein-fasta¶

A protein fasta of the genes is created called genes_without_introns.fasta.faa. Genes never include the introns (because of course not). All genes are printed in the N-terminus to C-terminus orientation (so M would be first) but reverse complementation of the negative strand is considered to choose the correct amino acids. Stop codons are depicted as *. The header for each sequence is the fifth column of the gene line in the gene table.

This command is performed by task_scripts/get_protein_fasta.pl. If a prefix is specified, the output fasta will be named accordingly.

--create-gtf¶

A gtf file called out.gtf is created. If a prefix is specified, the gtf file will have it. This step is done with two scripts, task_scripts/add_start_stop_to_gene_table.pl and task_scripts/gtf_creator.pl.

Since GTF files (as a general rule) require start and stop codon information, the locations of the start and stop codon (if found) are added to the gene table and the final gtf. CDS scores that correspond to an exon are retrieved from the original input file if found and the “exon” attribute is returned to “CDS”. Introns currently remain.

The source line does say gFACs. Not to steal the credit, it just might be helpful to know where the information is coming from particularly after filtering and rearranging.