Given an input CSV or TSV annotation file from
https://pseudomonas.com, separates and cleans the data, returning a tidy
tibble with the following columns: "locus_tag", "gene_name", and
"product_name". Setting the extra_cols
argument to TRUE
will add the
columns "start", "end", and "strand". Enabling fill_names
will populate
missing gene names with the corresponding locus tag.
References
Download annotation files from https://pseudomonas.com
Examples
tr_anno_cleaner(
input_file = paste0(
"https://pseudomonas.com/downloads/pseudomonas/pgd_r_22_1/",
"Pseudomonas_aeruginosa_PAO1_107/Pseudomonas_aeruginosa_PAO1_107.csv.gz"
)
)
#> # A tibble: 5,713 × 3
#> locus_tag gene_name product_name
#> <chr> <chr> <chr>
#> 1 PA0001 dnaA chromosomal replication initiator protein DnaA
#> 2 PA0002 dnaN DNA polymerase III, beta chain
#> 3 PA0003 recF RecF protein
#> 4 PA0004 gyrB DNA gyrase subunit B
#> 5 PA0005 lptA lysophosphatidic acid acyltransferase, LptA
#> 6 PA0006 NA conserved hypothetical protein
#> 7 PA0007 NA hypothetical protein
#> 8 PA0008 glyS glycyl-tRNA synthetase beta chain
#> 9 PA0009 glyQ glycyl-tRNA synthetase alpha chain
#> 10 PA0010 tag DNA-3-methyladenine glycosidase I
#> # ℹ 5,703 more rows