Performs read coverage pattern-matching and summarizes the results into a list. The first list item summarizes the pattern-matching results. The second list item is the 'cleaned' version of the summary table with all the 'noPattern' classifications removed. (i.e were not filtered out). The third list item contains the pattern-match information needed for pattern-match visualization with `plotProActiveResults()`. The fourth list item is a table containing all the contigs that were filtered out prior to pattern-matching. The fifth list item contains arguments used during pattern-matching (windowSize, mode, chunkSize, chunkContigs). If the user provides a gffTSV files, then the last list is a table consisting of ORFs found within the detected gaps and elevations in read coverage.
Usage
ProActive(
pileup,
mode,
gffTSV,
windowSize = 1000,
chunkContigs = FALSE,
minSize = 10000,
maxSize = Inf,
minContigLength = 30000,
chunkSize = 1e+05,
IncludeNoPatterns = FALSE,
verbose = TRUE,
saveFilesTo
)
Arguments
- pileup
A .txt file containing mapped sequencing read coverages averaged over 100 bp windows/bins.
- mode
Either "genome" or "metagenome"
- gffTSV
Optional, a .gff file (TSV) containing gene predictions associated with the .fasta file used to generate the pileup.
- windowSize
The number of basepairs to average read coverage values over. Options are 100, 200, 500, 1000 ONLY. Default is 1000.
- chunkContigs
TRUE or FALSE, If TRUE and `mode`="metagenome", contigs longer than the `chunkSize` will be 'chunked' into smaller subsets and pattern-matching will be performed on each subset. Default is FALSE.
- minSize
The minimum size (in bp) of elevation or gap patterns. Default is 10000.
- maxSize
The maximum size (in bp) of elevation or gap patterns. Default is NA (i.e. no maximum).
- minContigLength
The minimum contig/chunk size (in bp) to perform pattern-matching on. Default is 25000.
- chunkSize
If `mode`="genome" OR if `mode`="metagenome" and `chunkContigs`=TRUE, chunk the genome or contigs, respectively, into smaller subsets for pattern-matching. `chunkSize` determines the size (in bp) of each 'chunk'. Default is 100000.
- IncludeNoPatterns
TRUE or FALSE, If TRUE the noPattern pattern-matches will be included in the ProActive PatternMatches output list. If you would like to visualize the noPattern pattern-matches in `plotProActiveResults()`, this should be set to TRUE.
- verbose
TRUE or FALSE. Print progress messages to console. Default is TRUE.
- saveFilesTo
Optional, Provide a path to the directory you wish to save output to. A folder will be made within the provided directory to store results.
Examples
metagenome_results <- ProActive(
pileup = sampleMetagenomePileup,
mode = "metagenome",
gffTSV = sampleMetagenomegffTSV
)
#> Preparing input file for pattern-matching...
#> Starting pattern-matching...
#> A quarter of the way done with pattern-matching
#> Half of the way done with pattern-matching
#> Almost done with pattern-matching!
#> Summarizing pattern-matching results
#> Finding gene predictions in elevated or gapped regions of read coverage...
#> Finalizing output
#> Execution time: 2.27secs
#> 0 contigs were filtered out based on low read coverage
#> 0 contigs were filtered out based on length (< minContigLength)
#>
#> Elevation Gap NoPattern
#> 3 3 1