Given an input CSV or TSV annotation file from
https://pseudomonas.com, separates and cleans the data, returning a tidy
tibble with the following columns: "locus_tag", "gene_name", and
"product_name". Setting the extra_cols
argument to TRUE
will add the
columns "start", "end", and "strand". Enabling fill_names
will populate
missing gene names with the corresponding locus tag.
References
Download annotation files from https://pseudomonas.com
Examples
link <- system.file(
"extdata/Pseudomonas_aeruginosa_PAO1_107.csv.gz",
package = "tRavis"
)
tr_anno_cleaner(input_file = link)
#> # A tibble: 5,713 × 3
#> locus_tag gene_name product_name
#> <chr> <chr> <chr>
#> 1 PA0001 dnaA chromosomal replication initiator protein DnaA
#> 2 PA0002 dnaN DNA polymerase III, beta chain
#> 3 PA0003 recF RecF protein
#> 4 PA0004 gyrB DNA gyrase subunit B
#> 5 PA0005 lptA lysophosphatidic acid acyltransferase, LptA
#> 6 PA0006 NA conserved hypothetical protein
#> 7 PA0007 NA hypothetical protein
#> 8 PA0008 glyS glycyl-tRNA synthetase beta chain
#> 9 PA0009 glyQ glycyl-tRNA synthetase alpha chain
#> 10 PA0010 tag DNA-3-methyladenine glycosidase I
#> # ℹ 5,703 more rows