Next generation sequencing (NGS) has enabled assessment of variants in numerous genes in a single assay. However, when it comes to low-frequency variant analysis (less than 1%), true variants are difficult to be distinguished from background noise introduced during sample preparation, target enrichment and sequencing. By tagging both ends of the double-stranded DNA molecule and incorporating with the error correction algorithm, molecular identifiers enable reduced error rate and higher accuracy in low-frequency mutation analysis.
Library Preparation |
Blocking |
Target capture |
Sequencing |
NanOnCT Panel
Custom panels |
MGISEQ |
1% (0.5%) and 0.3% (0.15%) variants were mimicked by spiking plasma cfDNA from one healthy donor into that of the unrelated other at 1% and 0.3% proportion. 10 ng and 25 ng cfDNA mixture were used for BMI library preparation. Targeted SNPs were enriched by in-house panels and sequenced on DNBSEQ-G400 (PE100) (Fig. 1). Bi-molecular Identifiers (BMIs) allow the error correcting algorithms to filter out false positive calls.
Fig 1. Low-frequency variant model.
Table 1. Different consensus reads filtering setting and requirements
Algorithms
Description
No BMI
Without bi-molecular identifier
SSCS
Single strand consensus sequence
DCS211
Duplex consensus sequence with both top and bottom reads ≥ 1
DCS633
Duplex consensus sequence with both top and bottom reads ≥ 3
DCS211 (≥ 2)
≥ 2 DCS211 reads supporting the variant
DCS633 (≥ 2)
≥ 2 DCS633 reads supporting the variant
Fig 2. A. Sensitivity and B. positive predictive value for 1% and 0.5%; C. sensitivity and D. positive predictive value for 0.3% and 0.15%.
Table 2. Analysis of low-frequency mutations by incorporating NanOnCT Panel v1.0 and molecular identifiers
Variants |
MGISEQ-G400 |
|
SSCS |
DCS211 |
|
EGFR_L858R |
0.75% |
0.82% |
EGFR_T790M |
1.29% |
1.38% |
EGFR_delE746_A750 |
0.93% |
1.04% |
PIK3CA_E545K |
0.92% |
0.8% |
KRAS_G12D |
0.57% |
0.71% |
KRAS_A146T |
1.01% |
0.67% |
NRAS_Q61K |
1.31% |
0.89% |
EGFR_insV769_D770 |
1.37% |
1.3% |
The DNA libraries were prepared from 1% AF (GeneWell, GW-OCTM009). Libraries were prepared using NadPrep cfDNA Library Preparation Kit. The enriched libraries were sequenced on DNBSEQ-G400 (PE100) to an average raw depth of ~ 30,000.
SSCS: Single strand consensus sequence;
DCS211: Duplex consensus sequence.
Fig 3. Coverage cfDNA library with BMI.The DNA library was prepared from 10 ng plasma cfDNA using NadPrep cfDNA library kits for MGI with BMI adapters. The enriched library was sequenced on DNBSEQ-G400 (PE100) to an average raw depth of ~ 30,000.
SSCS (< 3): Single strand consensus sequence, family size < 3;
SSCS (≥ 3): Single strand consensus sequence, family size ≥ 3;
DCS211: Duplex consensus sequences.