TrIdent is a reference-independent bioinformatics tool that automates the analysis of transductomics data by automatically detecting, classifying and characterizing potential transducing events. Transductomics is a DNA-sequencing based method for the detection and characterization of transduction events. Developed by Kleiner et al. (2020), transductomics relies on mapping reads from a virome (VLP-fraction) of a sample to contigs assembled from the metagenome (whole-community) of the same sample. Reads from bacterial DNA carried by viruses and other VLPs (Virus-like particles) will map back to the bacterial contigs of origin creating read coverage patterns indicative of potential ongoing transduction.
Reference: Kleiner, M., Bushnell, B., Sanderson, K.E. et al. Transductomics: sequencing-based detection and analysis of transduced DNA in pure cultures and microbial communities. Microbiome 8, 158 (2020). https://doi.org/10.1186/s40168-020-00935-5
TrIdent consists of three main functions to automatically detect, classify, and characterize potential transducing events:
TrIdent_Classifier()
: Classifies contigs as
‘Prophage-like’, ‘Sloping’, ‘HighCoverageNoPattern’, and ‘InsufficientCoverage’PlotTrIdentPatternMatches()
: Plots results of
TrIdent_ClassifierSpecializedTransduction_ID()
: Detects potential
specialized transduction on contigs classified as Prophage-likeRunning TrIdent in default mode is the easiest, but users can learn how to use various arguments to modify TrIdent’s results in this tutorial.
The datasets used in this tutorial- ‘VLPFraction_sampledata’ and
‘WholeCommunity_sampledata’- were generated from a conventional mouse
fecal metagenome. The homogenized feces represents the whole-community.
The VLP-fraction of the fecal sample was separated and purified via CsCl
density gradient ultracentrifugation. Both the whole-community and
VLP-fraction were sequenced with Illumina (paired-end mode, 150 bp
reads) after which the metagenome was assembled from the whole-community
reads. The whole-community and VLP-fraction raw reads were mapped to the
metagenome contigs using BBMap.
The resulting .BAM file was sorted and indexed using the Samtools sort
and
index
functions, respectively. Finally, two pileup files
were generated to summarize the respective read coverages across each
contig. The contigs were pre-filtered to remove contigs less than 30
kbp. Note- Specific sequencing requirements are needed for
transductomics! Sample preparation and sequencing procedures are
detailed in Kleiner et al. (2020). A subset of 10 contigs from
the mouse fecal metagenome were selected for the sample dataset used in
this tutorial.
The two pileup files were generated using BBMap’s
ambiguous=random
, qtrim=lr
,
minid=0.97
, and binsize=100
. The
binsize/windowsize must be 100!
We recommend using the commands below (with your own sorted .bam files) to generate the pileup files needed for TrIdent:
pileup.sh in=VLPFraction_ReadMappingSorted.bam out=VLPFraction.pileupcovstats bincov=VLPFraction.bincov100 binsize=100 stdev=t
pileup.sh in=WholeCommunity_ReadMappingSorted.bam out=WholeCommunity.pileupcovstats bincov=WholeCommunity.bincov100 binsize=100 stdev=t
Load the package into your library. If you don’t already have TrIdent, you can find installation instructions here
library(TrIdent)
Import your pileup files:
Note The ‘VLPFraction_sampledata’ and ‘WholeCommunity_sampledata’ pileup files needed for the tutorial come preloaded with the TrIdent package. There is no nead to load or import these files.
Here is what the raw pileup file should look like: (note that the information in the first column may be formatted differently depending on your contig accession format)
## V1 V2 V3 V4
## NODE_4 length_493049_cov_5.62057_ID_9556231 2.62 100 1938832
## NODE_4 length_493049_cov_5.62057_ID_9556231 6.94 200 1938932
## NODE_4 length_493049_cov_5.62057_ID_9556231 6.39 300 1939032
## NODE_4 length_493049_cov_5.62057_ID_9556231 5.98 400 1939132
## NODE_4 length_493049_cov_5.62057_ID_9556231 8.12 500 1939232
## NODE_4 length_493049_cov_5.62057_ID_9556231 5.02 600 1939332
The ‘VLPFraction_bincov100.txt’ and ‘WholeCommunity_bincov100.txt’
files generated by BBMap’s pileup.sh
in the example above
could be used directly as input to TrIdent. TrIdent has built-in data
cleaning and reformatting for files output specifically by BBMap’s
pileup.sh
. If you do not use BBMap’s pileup.sh
to generate the pileup files, then you are responsible for data-cleaning
and reformatting. Your pileup files must be in the following format
(same column names, same column classes, etc.) and you must use the
cleanup=FALSE
argument for the
TrIdent_classifier()
,
Plot_TrIdentPatternMatches()
, and
SpecializedTransduction_ID()
.
## ref_name coverage position
## NODE_4 2.62 100
## NODE_4 6.94 200
## NODE_4 6.39 300
## NODE_4 5.98 400
## NODE_4 8.12 500
## NODE_4 5.02 600
The CleanVLPFraction_sampledata
comes preloaded with
TrIdent and provides an example of a cleaned and reformatted pileup
file. Please use this dataframe as an example if you are doing your own
data-cleaning and reformatting.
TrIdent_Classifier()
:TrIdent_Classifier()
is the main function that TrIdent
relies on. This function cleans and reformats your input data, filters
contigs based on length and read coverage, performs pattern-matching to
classify contigs, identifies active/highly abundant and heterogenously integrated prophage-like
elements, determines which contigs have high
VLP-fraction:whole-community read coverage ratios, identifies start and stop positions and sizes of pattern matches,
calculates slopes for Sloping pattern matches, generates a pattern-match quality score and outputs all
information in a neat summary table.
TrIdent_Classifier()
features:
1. Contig filtering:
Contigs are filtered out based on short length or low read coverage.
TrIdent filters out contigs that do not have at least 10x
coverage on a total of 5000 bp across the whole contig due to insufficient read coverage.
Contigs where the 50th greatest coverage value is less than 10 means
that there is no region on the contig with read coverages greater than
10 for at least 5,000bp. The low read coverage filtering was done in
this way to avoid filtering out long contigs with short Prophage-like
patterns that might get removed if filtering was done with averages or
medians. Additionally, contigs less than 30 kbp are filtered out by default, however this can
be changed with the MinContigLength
parameter. Contigs shorter than 30 kbp may be poor quality and
are not big enough to show clear transduction patterns. If you
would like to speed-up processing time of TrIdent, consider
pre-filtering your assembly for contigs greater than 30 kbp!
2. Pattern-matching:
Contigs that are not filtered out proceed to pattern-matching where they are matched with a variety of patterns representing transduction events. Patterns are ‘built’ and the x and y-axis values are scaled specific to the characteristics of each contig to ensure the pattern-matching is data agnostic. After a pattern is built, it is translated across the contig being assessed, and the mean absolute difference in coverage (match-score) between the contig and the pattern is calculated at each translation. Theoretically, if a pattern is a perfect match to the coverages on a contig, then taking the mean absolute difference in y-axis values will result in a 0. Obviously, no pattern will be a perfect match to a contig, but the closer to 0 the match-score is, the better that pattern matches the read coverage pattern on the contig. The contig is classified based on the pattern that achieves the lowest match-score.
There are four sloping pattern variations in the Sloping class. The sloping pattern is representative of large transfers of bacterial DNA which take place during generalized, lateral and gene transfer agent transduction. Other unknown mechanisms of DNA transfer may also be responsible for sloping patterns. The sloping read coverage is due to the decreasing frequency of DNA packaging moving away from the packaging initiation sites. All tested patterns are adapted by the software to the length of the contig being assessed. The peak of the slope is set to start slightly above the contigs’ maximum coverage value and the base of the slope to start at the contig’s minimum coverage value. Different slopes are tested by both increasing the minimum value and decreasing the maximum value until a minimum slope of 0.00015 (change of read coverage by 15 over 100,000 bp) is reached. Generalized, lateral and gene transfer agent transduction events can span many kbps of DNA and a single contig typically does not capture the entire event. Depending on which part of the transducing event is captured by the contig, the slope can be very steep or close to 0. Patterns 1 and 2 below represent contigs that capture a Sloping transducing event somewhere in the middle of the pattern. Patterns 2 and 4 represent contigs that capture the packaging initiation of a Sloping transduction event. Patterns 2 and 4 are translated across the contig in addition to having the slopes changed while only the slopes are changed on patterns 1 and 2.
There are three patterns in the Prophage-like class. The block
pattern is representative of reads from inegtrated genetic elements, like prophage or phage-inducible chromosomal
islands (PICIs), mapping back to their respective integration sites
in the host bacterium’s chromosome. The block patterns are built based
on the length of the contig being assessed. The top of the block starts
at the contig’s maximum coverage value while the base starts at the
contig’s minimum coverage value. The block width starts close to the
length of the contig however this can be changed with the user-defined
variable maxblocksize
. While most prophage tend to be in
the 30-60 kbp range, some are much larger. As a default, we do not set
an upper-limit for the Prophage-like pattern because the upper-limit of
prophage sizes is not known. The block heights are decreased followed by
the block’s width to ensure a variety of block height/width combinations
are tested. The block widths never get smaller than 10 kbp as a default
as it can be difficult to distinguish between prophage-like elements and other
mobile genetic elements, like transposons, in that size range. However,
the minimum block width is a user-defined variable
(minblocksize
) and can be changed if smaller mobile genetic
elements are of interest. The block heights are decreased followed by
the block’s width to ensure that a variety of block height/width
combinations are tested. Pattern 1 represents a prophage-like element that is
entirely on the contig while patterns 2 and 3 represent a prophage-like element
that trails off the right or left side of the contig, respectively. Each
pattern variation is translated across the contig being assessed.
There is one pattern in the 'No pattern' class. We use the 'No pattern' pattern to generate a basline line match score which we compare to the match scores generated with the Sloping or Prophage-like pattern matches. A contig without a Prophage-like or Sloping pattern is likely to have fairly even read coverage across the contig (i.e. no pattern) which the ‘No pattern’ pattern tries to replicate. If a contig has a Prophage-like or Sloping pattern, its match score to the 'No pattern' pattern will likely be lower than the match score to its true pattern match. The pattern is built to the length of the contig being assessed and is a horizontal line that's simply the mean read coverage of the contig.
3. Identifying highly active/abundant OR heterogenously present prophage-like elements:
A prophage or prophage-like genetic element that is actively replicating or exists in high abundance will generate more reads than its respective host bacterium. This may create a region of elevated read coverage at the elements' genetic insertion sites that can be visualized in the whole-community fraction contigs. Conversely, if a prophage-like genetic element is only integrated into a portion of the host bacterial population, the read coverage at the insertion site will drop in comparison to the read coverage neighboring the insertion site. Since TrIdent locates prophage-like elements as part of its pattern-matching functionality, we can use these genetic 'coordinates' to see if the associated prophage-like region has elevated or decreased read coverage in the whole-community fraction. Contigs where the prophage-like:non-prophage-like region has a mean read coverage ratio of greater than 1.3 are labeled highly active/abundant whereas a mean read coverage ratio less than 0.5 are labeled as not homogenously integrated into the host population ('mixed'). If the non-prophage-like region is less than 20,000 bp, then the contig is labeled as ‘CBD’ (Can’t Be Determined) as its difficult to determine if the prophage-like region is truely elevated or depressed when there is so little non-prophage-like region to compare to.
4. Identifying contigs with high VLP-fraction:whole-community read coverages:
Contigs that are classified as have 'No pattern' via pattern-matching are assessed to see if they have high VLP-fraction:Whole-community median read coverage ratios. A contig with no pattern match that has an unusually high amount of bacterial DNA in the VLP-fraction may represent the ‘tail’ of a sloping pattern formed by a Sloping event, unknown transduction pathways or contamination from other sources e.g. very small, very dense cells that were co-purified with VLPs. As mentioned in the pattern-matching section for Sloping transduction events above, depending on which part of the sloping read coverage pattern is captured by a contig, the sloping can vary from being very steep to almost non-existent. Typically the ‘tails’ of sloping events have very little to no slope, but still represent transduction events. To differentiate contigs classified as having 'No pattern' that represent real transduction events and those that represent no transduction, we use the median read coverage ratio between the VLP-fraction and whole-community metagenome. The idea is that contigs with a high amount of VLP-fraction read coverage relative to the whole-community metagenome read coverage may represent real transfer events rather than just contaminating bacterial DNA. If the VLP-fraction has a median read coverage of greater than 50% of the median read coverage in the whole-community metagenome, then the contig is classified as having high VLP-fraction read coverage but no distinct read coverage pattern (HighCoverageNoPattern). It is up to the user to decide if they would like to include or exclude these classifications in their assessment. NOTE: The HighCoverageNoPattern phenomenom is very much impacted by how many reads are sequenced for the whole-community versus the VLP fraction. For example, if you sequence a lot more reads for the VLP fraction as compared to the whole community the ratio may be less meaningful as it increases automatically with more reads sequenced for the VLP fraction. As such, one needs to be careful in the interpretation of these ratios.
TrIdent_Classifier()
user parameters:VLP-pileup
WC_pileup
windowsize
minblocksize
maxblocksize
mincontiglength
SaveFilesTo
cleanup
VLP-pileup
A dataframe containing contig names, coverages
averaged over 100bp windows, and contig positions associated with mapping VLP-fraction reads to whole-community contigs
WC_pileup
A dataframe containing contig names, coverages
averaged over 100bp windows, and contig positions associated with mapping whole-community reads to whole-community contigs
windowsize
TrIdent resizes the bins or ‘windows’ used
by pileup.sh
to improve processing time and reduce noise in
the data. Resizing is done by averaging the read coverages across the
specified windowsize. TrIdent_Classifier()
resizes windows
to 1000 bp as a default. Depending on the dataset, the user may want to
select a different windowsize. Users can choose between windowsizes of
200, 500, 1000 or 2000 ONLY. We recommend increasing the
windowsize
to 2000 if processing speed is of importance or
if data is noisy (i.e. VLP-fraction contaminated with external bacterial
DNA). We recommend decreasing the windowsize
if your data
is very clean and/or small and you are interested in increasing the
resolution of read coverage patterns for the initial classification of
contigs. Note that decreasing the windowsize will increase TrIdent’s
processing time!
Increasing/decreasing the windowsize
may alter the
results of the TrIdent_Classifier()
slightly. Prophage-like
classifications tend to stay the same when windowsize
is
changed, but Sloping and HighCoverageNoPattern classifications may switch
classes with each other. This is due to how averaging the read coverages
affects the sloping pattern./p>
minblocksize
The minimum size of prophage-like patterns. The default is 10,000bp.
maxblocksize
The maximum size of prophage-like patterns. The default is undefined/NA (i.e. no max size).
Be aware that changing the
minblocksize
and maxblocksize
will not
necessarily remove contigs with prophage-like patterns smaller/larger
than the defined parameters from the resulting classifications. Contigs
with prophage-like patterns larger/smaller than the
maxblocksize
and minblocksize
may still be
classified, however, the classifications and associated pattern matches
may be poor. For example, if a contig has a clear prophage-like pattern
that’s ~50,000 bp but the user sets the maxblocksize=40000
,
TrIdent will likely still classify the contig as prophage-like as the
maximum block-like pattern of 40,0000 bp will still achieve a lower
pattern match score than any of the Sloping or 'No pattern' patterns.
mincontiglength
The minimum contig length used processed for pattern-matching.
Contigs shorter than the mincontiglength
will be filtered out prior to pattern-matching. The default is 30,000 bp
SaveFilesTo
A file path in which TrIdent saves outputs to. This is useful if using TrIdent in a command-line environment. Default is that files are NOT saved to an output folder.
cleanup
If TRUE, TrIdent will clean and re-format the input pileup files to ensure they are in the correct format for pattern-matching. If FALSE, users are responsible
for putting their input files into the correct format (specified above). TRUE by default.
To run TrIdent_Classifier()
in default-mode, run the
following:
TrIdent_results <- TrIdent_Classifier(VLP_pileup=VLPFraction_sampledata, WC_pileup=WholeCommunity_sampledata)
## Starting pattern-matching...
## A quarter of the way done with pattern_matching
## Half of the way done with pattern_matching
## Almost done with pattern_matching!
## Identifying potential transducing events
## Determining sizes (bp) of potential transduction events
## Identifying highly active/abundant or heterogenously integrated prophage-like elements
## Finalizing output
## Execuion time: 23.9757790565491
## 1 contigs were filtered out based on low read coverage
## 0 contigs were filtered out based on length
##
## Sloping HighCoverageNoPattern InsufficientCoverage Prophage-like
## 2 2 1 4
## 2 of the prophage-like classifications are highly active or abundant
## 1 of the prophage-like classifications are 'mixed', i.e. heterogenously integrated into their bacterial host population
##
TrIdent_Classifier()
outputs a histogram containing the distribution of normalized pattern-match scores for your dataset.
The normalized pattern-match score is the pattern-match score for a specific classification divided by the contig's average read coverage.
This normlaized match score is an indicator of a pattern-match's quality i.e. how well the TrIdent pattern fits the associated contig's read coverage profile.
Smaller match-scores, and normalized match-scores, indicate better pattern matches. The histogram can be used to filter the resulting TrIdent classifications
by quality of pattern matches. A suggested filtering threshold is marked on the plot with a vertical line, however, the suggested threshold tends to be stringent and
quality pattern-matches may be filtered out if the upper threshold boundary is not explored. For this reason, we encourage users to initially test a filtering value slightly greater than the suggested threshold.
For the test dataset, the resulting histogram looks like this:
However, for a 'real' dataset, the histogram looks like more of a normal distribution:
The output of TrIdent_Classifier()
is a list containing
five objects:
Save the desired list-item to a new variable using its associated name:
TrIdent_summary_table <- TrIdent_results$Full_summary_table
ref_name | classifications | NormMatchScore | match_size | start_pos | stop_pos | active_prophage | elevation_ratio | slope |
---|---|---|---|---|---|---|---|---|
NODE_25 | InsufficientCoverage | 0.20711667 | NA | NA | NA | NA | NA | NA |
NODE_44 | Prophage-like | 0.19117212 | 65000 | 151000 | 216000 | YES | 1.3657 | NA |
NODE_62 | Prophage-like | 0.14338316 | 171000 | 63000 | 234000 | YES | 1.4700 | NA |
NODE_125 | Prophage-like | 0.23903635 | 31000 | 153000 | 184000 | NO | 1.1175 | NA |
NODE_238 | Sloping | 0.14225027 | 146000 | 1000 | 147000 | NA | NA | -0.2060 |
NODE_251 | Sloping | 0.11734790 | 144000 | 1000 | 145000 | NA | NA | 0.0524 |
NODE_368 | Prophage-like | 0.16050720 | 28000 | 27000 | 55000 | MIXED | 0.3986 | NA |
NODE_560 | HighCoverageNoPattern | 0.07094737 | NA | NA | NA | NA | NA | NA |
NODE_1088 | HighCoverageNoPattern | 0.08964437 | NA | NA | NA | NA | NA | NA |
filteredout_contigs <- TrIdent_results$FilteredOutContig_table
filteredout_contigs | reason |
---|---|
NODE_4 | Low VLP-fraction read cov |
Plot_TrIdentPatternMatches()
:Plot_TrIdentPatternMatches()
will output a list of read
coverage plots of all contigs predicted as either Sloping,
Prophage-like, or HighCoverageNoPattern and their respective pattern
matches.
Plot_TrIdentPatternMatches()
user parameters:VLP-pileup
WC_pileup
transductionclassifications
MatchScoreFilter
SaveFilesTo
cleanup
VLP-pileup
A dataframe containing contig names, coverages
averaged over 100bp windows, and contig positions associated with mapping VLP-fraction reads to whole-community contigs
WC_pileup
A dataframe containing contig names, coverages
averaged over 100bp windows, and contig positions associated with mapping whole-community-fraction reads to whole-community contigs
transductionclassifications
TrIdent_Classifier()
MatchScoreFilter
Used to filter the TrIdent_Classifier()
classifications by the quality of their respective pattern-matches.
Choose the filtering theshold using the histogram of normalize pattern-match scores output with TrIdent_Classifier()
. There is no filter set by default.
SaveFilesTo
A file path in which TrIdent saves outputs to. Each plot is saved as an indiviudal png file named by the associated contig reference name.
This is useful if using TrIdent in a command-line environment. Default is that files are NOT saved to an output folder.
cleanup
If TRUE, TrIdent will clean and re-format the input pileup files to ensure they are in the correct format for pattern-matching. If FALSE, users are responsible
for putting their input files into the correct format (specified above). TRUE by default.
TrIdent_Classifier()
Default:TrIdent_patternmatches <- Plot_TrIdentPatternMatches(VLP_pileup=VLPFraction_sampledata, WC_pileup=WholeCommunity_sampledata, transductionclassifications=TrIdent_results)
View either all plots at once or one plot at a time. All of the
output plots can be saved as individual ggplot
objects for
further manipulation by the user. Each plot is named by its respective
contig accession. View all plots:
TrIdent_patternmatches
View one plot:
TrIdent_patternmatches$NODE_10
SpecializedTransduction_ID()
:Specialized transduction occurs when a prophage-like element has an improper
excision from the host bacterium’s chromosome and accidentally packages
a small portion of bacterial DNA directly outside the prophage-like
region. SpecializedTransduction_ID()
searches contigs
classified as Prophage-like for dense read coverage outside the borders
of the Prophage-like pattern that could represent specialized
transduction. Because specialized transduction tends to be fairly short
(several kbps) compared to Sloping
transduction (tens to hundreds of kbps), averaging over a 1000 bp
distance (i.e using a windowsize=1000
) can ‘blur’
specialized transduction patterns depending on their size. This is why
specialized transduction is not identified in
TrIdent_Classifier()
. Instead, we use the locations of
prophages-like elements identified with TrIdent_Classifier()
to
guide our search for specialized transduction in
SpecializedTransduction_ID()
.
SpecializedTransduction_ID()
does not resize the windows
of the input pileup files to preserve resolution of potential
specialized transduction patterns. Because of this, we can not use the
locations of the Prophage-like pattern matches to determine the exact
border locations of prophage-like elements. The locations generated with
TrIdent_Classifier()
using a windowsize=1000
(or one of the other options) will not perfectly translate back to a
windowsize of 100. Instead, we use the locations of the Prophage-like
pattern matches to ’zoom-in’ on the region of a contig where an
associated Prophage-like pattern match is located.
SpecializedTransduction_ID()
then searches the contig,
starting from the left moving inward, for the first coverage value that
is at least 20% of the maximum coverage value in the defined region.
This represents the left ‘border’. The search is repeated starting from
the right side of the contig moving inwards and the first coverage value
that is at least 20% of the maximum value represents the right ‘border’.
For contigs that have a Prophage-like match that trails off the side of
a contig, then only the border that falls on the contig is searched
for.
Once the prophage-like borders are identified,
SpecializedTransduction_ID()
starts from the borders and
searches outwards for dense read coverage that meet the ‘requirements’
for specialized transduction as defined by the arguments in
SpecializedTransduction_ID()
.
SpecializedTRansducion_ID
uses two arguments to define
specialized transduction:
noreadcov
spectranslength
SpecializedTransduction_ID()
first makes sure that any
coverage it detects outside the borders is not disrupted by a defined
region of no read coverage (noreadcov
). The default value
for noreadcov
is 500 bp. Secondly,
SpecializedTransduction_ID()
ensures that any read coverage
it detects outside of the prophage/PICI borders meets a minimum length
requirement (spectranslength
). The default value for
spectranslength
is 2000 bp. So by default,
SpecializedTransduction_ID()
will search for coverage
immediately outside the left and right prophage/PICI boundaries that is
at least 2000 bp long and is not interrupted at any point by more than
500 bp of no read coverage. If these requirements are met,
SpecializedTransduction_ID()
will mark the contig as having
specialized transduction. We suggest using the default values for
initial usage of SpecializedTransduction_ID()
and only
changing the noreadcov and spectranslength arguments when adapting the
specialized transduction search for your specific dataset.
Search all contigs classified as Prophage-like for specialized transduction:
Specialized_transduction <- SpecializedTransduction_ID(VLP_pileup=VLPFraction_sampledata, transductionclassifications=TrIdent_results, noreadcov=500, spectranslength=2000, cleanup=TRUE)
## 2 contigs have potential specialized transduction
When you search all contigs, the output of
SpecializedTransduction_ID()
will be a list. The first
object contains a summary table for the specialized transduction search
results:
SpecializedTransduction_summary_table <- Specialized_transduction$Summary_table
ref_name | Specialized_transduction | Left | Right | Length_left | Length_right |
---|---|---|---|---|---|
NODE_44 | yes | yes | no | 2700 | NA |
NODE_62 | yes | yes | no | 45300 | NA |
NODE_125 | no | no | no | NA | NA |
NODE_368 | no | no | no | NA | NA |
The second object in the output-list contains another list with the
resulting log 10 read coverage plots for all contigs
classified as Prophage-like. The coverages are put in log-scale to help
users visualize specialized transduction patterns as they are sometimes
too low frequency to be seen with raw coverages alone. Additionally, the
plots are ‘zoomed-in’ on the Prophage-like pattern to further aid with
specialized transduction visualization. The borders of the prophage/PICI
as identified by SpecializedTransduction_ID()
are marked on
each plot with a black vertical line. If
SpecializedTransduction_ID()
identifies potential
specialized transduction, it will color the plot green whereas if it
does not identify specialized transduction, it will color the plot blue.
The end of specialized transduction as determined by
SpecializedTransduction_ID()
will be marked with a red
vertical line. Each plot is named by the associated contig accession and
can be saved as a ggplot
object for further manipulation by
the user.
View all the plots:
Specialized_transduction$Plots
View a specific plot:
Specialized_transduction$Plots$NODE_44
If desired, the user can also search a single contig for specialized
transduction by specifying the contig’s reference name with the
specificcontig
parameter:
Specialized_transduction_NODE44 <- SpecializedTransduction_ID(VLP_pileup=VLPFraction_sampledata, transductionclassifications=TrIdent_results, specificcontig="NODE_44", noreadcov=500, spectranslength=2000, cleanup=TRUE)
If you’d like to combine the summary tables produced by
TrIdent_Classifier()
and
SpecializedTransduction_ID()
, try the following code:
Final_TrIdentSummaryTable <- merge(TrIdent_summary_table, SpecializedTransduction_summary_table, by="ref_name", all.x=TRUE)
ref_name | classifications | NormMatchScore | match_size | start_pos | stop_pos | active_prophage | elevation_ratio | slope | Specialized_transduction | Left | Right | Length_left | Length_right |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NODE_1088 | HighCoverageNoPattern | 0.08964437 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
NODE_125 | Prophage-like | 0.23903635 | 31000 | 153000 | 184000 | NO | 1.1175 | NA | no | no | no | NA | NA |
NODE_238 | Sloping | 0.14225027 | 146000 | 1000 | 147000 | NA | NA | -0.2060 | NA | NA | NA | NA | NA |
NODE_25 | InsufficientCoverage | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
NODE_251 | Sloping | 0.11734790 | 144000 | 1000 | 145000 | NA | NA | 0.0524 | NA | NA | NA | NA | NA |
NODE_368 | Prophage-like | 0.16050720 | 28000 | 27000 | 55000 | MIXED | 0.3986 | NA | no | no | no | NA | NA |
NODE_44 | Prophage-like | 0.19117212 | 65000 | 151000 | 216000 | YES | 1.3657 | NA | yes | yes | no | 2700 | NA |
NODE_560 | HighCoverageNoPattern | 0.07094737 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
NODE_62 | Prophage-like | 0.14338316 | 171000 | 63000 | 234000 | YES | 1.4700 | NA | yes | yes | no | 45300 | NA |