Saturday 28 April 2012

Prokka - rapid prokaryotic annotation

Prokka is a software tool I have written to annotate bacterial, archaeal and viral genomes. It is based on years of experience annotating bacterial genomes, both automatically and via manual curation. 

It's main design considerations were to be:
  • fast
    • supports multi-threading
    • hierarchical search databases
  • simple to use
    • no compulsory parameters
    • bundled databases
  • clean
    • standards-compliant output files
    • pipeline-friendly interface
  • thorough
    • finds tRNA, rRNA, CDS, sig_peptide, tandem repeats, ncRNA
    • includes /gene and /EC_number where possible, not just /product
    • traceable annotation sources via /inference tags
  • useful
    • produce files close-to-ready for submission to Genbank
    • complete log file


The first release is a monolithic, but followable Perl script. It only uses core Perl modules, but has quite a few external tool dependencies, some of which I can't bundle due to licence restrictions. Eventually I hope to have a public web-server version, and a version of it in the Galaxy Toolshed. 

It currently takes about 10 minutes on a quad Intel i7 for a typical 4 Mbp genome.

You can download it from here and read the manual here.