Skip to contents

Performs all the pattern-matching and summarizes the results into a list. The first item in the list is a table consisting of the summary information of all the contigs that passed through pattern-matching (i.e were not filtered out). The second item in the list is a table consisting of the summary information of all contigs that were classified via pattern-matching. The third item in the list contains the pattern-match information associated with each contig in the previous table. The fourth object in the list is a table containing the contigs that were filtered out prior to pattern-matching. The fifth item is the windowSize used for the search.

Usage

TrIdentClassifier(
  VLPpileup,
  WCpileup,
  windowSize = 1000,
  minBlockSize = 10000,
  maxBlockSize = Inf,
  minContigLength = 30000,
  minSlope = 0.001,
  suggFiltThresh = FALSE,
  verbose = TRUE,
  SaveFilesTo
)

Arguments

VLPpileup

VLP-fraction pileup file generated by mapping sequencing reads from a sample's ultra-purified VLP-fraction mapped to the sample's whole-community metagenome assembly. The pileup file MUST have the following format: * V1: Contig accession * V2: Mapped read coverage values averaged over 100 bp windows * V3: Starting position (bp) of each 100 bp window. Restarts from 0 at the start of each new contig. * V4: Starting position (bp) of each 100 bp window. Does NOT restart at the start of each new contig.

WCpileup

A whole-community pileup file generated by mapping sequencing reads from a sample's whole-community mapped to the sample's whole-community metagenome assembly. The pileup file MUST have the following format: * V1: Contig accession * V2: Mapped read coverage values averaged over 100 bp windows * V3: Starting position (bp) of each 100 bp window. Restarts from 0 at the start of each new contig. * V4: Starting position (bp) of each 100 bp window. Does NOT restart at the start of each new contig.

windowSize

The number of basepairs to average read coverage values over. Options are 100, 200, 500, 1000 ONLY. Default is 1000.

minBlockSize

The minimum size (in bp) of the Prophage-like block pattern. Default is 10000. Must be at least 1000.

maxBlockSize

The maximum size (in bp) of the Prophage-like block pattern. Default is NA (no maximum).

minContigLength

The minimum contig size (in bp) to perform pattern-matching on. Must be at least 25000. Default is 30000.

minSlope

The minimum slope value to test for sloping patterns. Default is 0.001 (i.e minimum change of 10x read coverage over 100,000 bp).

suggFiltThresh

TRUE or FALSE, Suggest a filtering threshold for TrIdent classifications based on the normalized pattern-match scores. Default is FALSE.

verbose

TRUE or FALSE. Print progress messages to console. Default is TRUE.

SaveFilesTo

Optional, Provide a path to the directory you wish to save output to. A folder will be made within the provided directory to store results.

Value

Large list containing 5 objects

Examples

data("VLPFractionSamplePileup")
data("WholeCommunitySamplePileup")

TrIdent_results <- TrIdentClassifier(
  VLPpileup = VLPFractionSamplePileup,
  WCpileup = WholeCommunitySamplePileup
)
#> Reformatting pileup files
#> Starting pattern-matching...
#> A quarter of the way done with pattern-matching
#> Half of the way done with pattern-matching
#> Almost done with pattern-matching!
#> Determining sizes (bp) of pattern matches
#> Identifying highly active/abundant or heterogenously integrated
#>       Prophage-like elements
#> Finalizing output
#> Execution time: 14.64secs
#> 1 contigs were filtered out based on low read coverage
#> 0 contigs were filtered out based on length
#> 
#> HighCovNoPattern        NoPattern    Prophage-like          Sloping 
#>                1                1                4                3 
#> 3 of the prophage-like classifications are highly active or abundant
#> 1 of the prophage-like classifications are mixed, i.e. heterogenously
#>         integrated into their bacterial host population