MiXCR parameters (non-barcoded data)
name |
parameters |
comments |
Align |
-f, -g, --noMerge, -p = kaligner2, –species = hsa, -OreadsLayout = Collinear,
-OvParameters.geneFeatureToAlign = VTranscript, -OallowPartialAlignments = true |
Assemble |
-f, -OassemblingFeatures = FR1Begin:FR4Begin |
Since sequences are cropped by the end of CDR3, FR4 region is not present in final sequences.
We selected the specified parameter value since running MiXCR with seemingly more appropriate value FR1Begin:FR4Eend
results in a non-stable behavior and often produces an empty repertoire. |
Export clones |
-f, --no-spaces, -sequence, -count, -readIds |
pRESTO parameters (non-barcoded data)
name |
parameters |
comments |
CollapseSeq |
Default parameters |
Although this stage can use information about primers,
we do not use this information since we want to conduct primer-independent benchmarking.
Although this stage can fix unspecified nucleotides (“N”s), but we do not use this feature too,
since it is addressed at the preliminary alignment step. |
SplitSeq |
Default parameters |
The stage uses a threshold parameter (--num=X) that is analogous in IgReC (discussed in Section 2.2 of the main text).
In our experiments, this parameter is not fixed and estimation of its optimal value is a part of benchmarking. |
pRESTO parameters (barcoded data)
name |
parameters |
ClusterSets |
Default parameters |
BuildConsensus |
--prcons 0.6 --maxerror 0.1 --maxgap 0.5 |
CollapseSeq |
--uf PRCONS --cf CONSCOUNT --act sum |
Table A1. Benchmarking parameters of MiXCR (top) and pRESTO (middle) on non-barcoded datasets and pRESTO (bottom) on barcoded datasets.
For all tools, we unified the read merging, alignment and filtering by using the IgReC preprocessing.
After this preprocessing, all input libraries contain Ig-relevant reads that are cropped by the start of the corresponding V gene.