Torchlex: a method for real-time demultiplexing of barcoded Oxford Nanopore reads

Torchlex: a method for real-time demultiplexing of barcoded Oxford Nanopore reads

Multiplexing (barcoding) is an efficient and cost-effective method that is used to distribute highthroughput DNA sequencing capacity over multiple DNA samples. The method starts with associating a unique barcode with each input DNA sample. Then, the multiple barcoded DNA libraries are combined on the same flow cell and sequenced in parallel. The reads obtained from the sequencing need to be demultiplexed, which means that they are grouped according to the attached barcode. Once the reads are grouped per barcode, further analysis could be performed per DNA sample. Current state-of-the-art ONT barcode demultiplexing tools (such as guppy) that operate directly on the DNA base-calls are computationally expensive and their throughput is significantly lower in comparison to the existing basecalling methods. This means that they can not be applied in real-time on the stream of base-called DNA reads that are generated by the ONT device, which can significantly influence the real-time monitoring and deciding capacity about the quality and quantity of the reads per DNA sample.

We developed Torchlex (as a part of Phivea – a platform for real-time detection of genetic disorders), a method for real-time demultiplexing of barcoded Oxford Nanopore reads. The proposed method managed to significantly reduce the computational complexity of the demultiplexing, while preserving the quality of classification compared to the competing methods. We compared its computational efficiency and predictive performance with the state-of-the-art demultiplexing method guppy on a nextgeneration sequencing run using 6 different DNA samples. The experimental validation was performed on 1 184 898 base-called DNA reads (sequence length: 900 – 1200) with a Phred quality score higher than 8 as a ground truth.

In terms of computational efficiency, the proposed method demultiplexed the base-called DNA reads byan order of magnitude faster than guppy. The calculated throughput of Torchlex was ~1520 reads/s, while the calculated throughput of guppy was only ~138 reads/s. Also, it managed to significantly reduce the number of unclassified reads (6.7%) in comparison to guppy (24%). In terms of classification performance, both methods showed very similar results. The precision and the recall of Torchlex was 97.7% and 81.4% respectively, while guppy showed precision of 97.8% and recall of 81.3%. All the experiments were performed on one referent hardware architecture (Intel i7 10th generation, 8 cores, 32 GB RAM, no CUDA) using thread parallelism of 10.

Our solution demonstrated significantly better computational efficiency in comparison to the competing state-of-the-art methods. It exceeds the limits of real-time monitoring and analysis per DNA sample. Our technology was validated on single barcoding, but results can be directly extrapolated to combinatorial barcoding, which can even more reduce the analysis costs per DNA sample

Download the abstract here.