Classify contigs as Prophage-like, Sloping, HighCovNoPattern, and NoPattern
Source:R/TrIdentClassifier.R
TrIdentClassifier.Rd
Performs all the pattern-matching and summarizes the results into a list. The first item in the list is a table consisting of the summary information of all the contigs that passed through pattern-matching (i.e were not filtered out). The second item in the list is a table consisting of the summary information of all contigs that were classified via pattern-matching. The third item in the list contains the pattern-match information associated with each contig in the previous table. The fourth object in the list is a table containing the contigs that were filtered out prior to pattern-matching. The fifth item is the windowSize used for the search.
Usage
TrIdentClassifier(
VLPpileup,
WCpileup,
windowSize = 1000,
minBlockSize = 10000,
maxBlockSize = Inf,
minContigLength = 30000,
minSlope = 0.001,
suggFiltThresh = FALSE,
verbose = TRUE,
SaveFilesTo
)
Arguments
- VLPpileup
VLP-fraction pileup file generated by mapping sequencing reads from a sample's ultra-purified VLP-fraction mapped to the sample's whole-community metagenome assembly. The pileup file MUST have the following format: * V1: Contig accession * V2: Mapped read coverage values averaged over 100 bp windows * V3: Starting position (bp) of each 100 bp window. Restarts from 0 at the start of each new contig. * V4: Starting position (bp) of each 100 bp window. Does NOT restart at the start of each new contig.
- WCpileup
A whole-community pileup file generated by mapping sequencing reads from a sample's whole-community mapped to the sample's whole-community metagenome assembly. The pileup file MUST have the following format: * V1: Contig accession * V2: Mapped read coverage values averaged over 100 bp windows * V3: Starting position (bp) of each 100 bp window. Restarts from 0 at the start of each new contig. * V4: Starting position (bp) of each 100 bp window. Does NOT restart at the start of each new contig.
- windowSize
The number of basepairs to average read coverage values over. Options are 100, 200, 500, 1000 ONLY. Default is 1000.
- minBlockSize
The minimum size (in bp) of the Prophage-like block pattern. Default is 10000. Must be at least 1000.
- maxBlockSize
The maximum size (in bp) of the Prophage-like block pattern. Default is NA (no maximum).
- minContigLength
The minimum contig size (in bp) to perform pattern-matching on. Must be at least 25000. Default is 30000.
- minSlope
The minimum slope value to test for sloping patterns. Default is 0.001 (i.e minimum change of 10x read coverage over 100,000 bp).
- suggFiltThresh
TRUE or FALSE, Suggest a filtering threshold for TrIdent classifications based on the normalized pattern-match scores. Default is FALSE.
- verbose
TRUE or FALSE. Print progress messages to console. Default is TRUE.
- SaveFilesTo
Optional, Provide a path to the directory you wish to save output to. A folder will be made within the provided directory to store results.
Examples
data("VLPFractionSamplePileup")
data("WholeCommunitySamplePileup")
TrIdent_results <- TrIdentClassifier(
VLPpileup = VLPFractionSamplePileup,
WCpileup = WholeCommunitySamplePileup
)
#> Reformatting pileup files
#> Starting pattern-matching...
#> A quarter of the way done with pattern-matching
#> Half of the way done with pattern-matching
#> Almost done with pattern-matching!
#> Determining sizes (bp) of pattern matches
#> Identifying highly active/abundant or heterogenously integrated
#> Prophage-like elements
#> Finalizing output
#> Execution time: 14.64secs
#> 1 contigs were filtered out based on low read coverage
#> 0 contigs were filtered out based on length
#>
#> HighCovNoPattern NoPattern Prophage-like Sloping
#> 1 1 4 3
#> 3 of the prophage-like classifications are highly active or abundant
#> 1 of the prophage-like classifications are mixed, i.e. heterogenously
#> integrated into their bacterial host population