All I want for Christmas is a terabase of Nanopore data

Since November we have run 11 PromethION flow cells, generating >900 Gbp of data. It has been a journey of highs and lows, so we want to share what we have learned along the way.

The total yield by read length of the 10 flow cells we have currently processed.

Our first lesson: it is important to use those flow cells ASAP! Despite our lab’s earlier success with a three month old PromethION flow cell, three flow cells out of our first shipment died after five weeks of storage in the fridge, and two had problems with leaking priming solution. After Oxford Nanopore sent a replacement box we were back on track to meet our end of year sequencing goal.

A close up of one of our two leaky flow cells.

The second lesson: flow cell QC can be a bit hit and miss. We found that estimations of active pore numbers varied by up to 3000 when conducting repeated QCs, even within 20 minutes of each other. For example, one flow cell had just over 5000 active pores on arrival, but when QC’d before sequencing had 7795 (YES!), and then following the first MUX scan had 5106 active pores (NO!) at the start of sequencing. Our currently running final flow cell had a count of 5613 on arrival, 4339 two weeks later, 5030 in the check an hour before the run, and 7513 active pores at the start of sequencing! We determined that a best of three (or five) approach to QC is appropriate.

Our quality DNA was size selected using the Nanopore Community SPRI protocol (adapted from this Schalamun and Schwessinger protocol), which noticeably improved the DNA profile by removing most fragments <1.5 kb. We produced 11 barcoded pools, containing three samples each, using the one pot ligation and native barcoding protocol adapted for the SQK-LSK109 PromethION library preparation kit.

TapeStation electropherogram comparisons of extracted DNA before (orange) and after (blue) size selection. Input concentrations were slightly different, but the SPRI protocol resulted in a clear improvement in the DNA profile for all samples.

The number of active pores at the start of sequencing was a good indicator of how much data we would get. Our best flow cell of 8370 active pores generated a massive 144 Gbp of data!

One software hiccup caused a currently sequencing run to stop when we started the QC for another flow cell – we are not sure what caused this, and it hasn’t happened since.

Martin Hjorth Andersen and Caitlin Singleton sequencing the first of 11 pools.

After basecalling using Albacore and demultiplexing using Porechop, we learned our third lesson: not all barcodes are created equal. While looking at our first eight flow cells we found that two barcodes consistently performed poorly compared to the others in their pools. Barcode 3 and Barcode 7 accounted for only 3-5% and 13-19% of the data yield, respectively, on three different flow cells. We had attempted to produce equimolar pools aiming for an ideal yield of 33% of data for each barcoded sample. Consequently, we needed to do a few additional runs to catch up on some samples (to meet our aim of >20 Gbp per sample).

The data distribution for the pools of three barcodes for the first eight runs. Barcode 3 was particularly bad in our samples, with Barcode 7 also having possible problems.

Of all the runs so far we have a data yield between 57.55 Gbp and 144.54 Gbp per sample, with a mean read length of 5145 – 6616 bp, and a median of 3527 – 5586 bp. We have one more pool currently sequencing (fingers crossed for 95 Gbp!), then the machine will have a well-deserved holiday.

Total data yield over time (left) and mean quality score over time (right) for the 10 flow cells examined so far. Each colour represents a flow cell. The brown line with the sharp bend indicates the flow cell that stopped running due to the QC of the next. Run date is indicated in the key.

Figures of run overviews were created using MinIONQC

References:

R Lanfear, M Schalamun, D Kainer, W Wang, B Schwessinger (2018). MinIONQC: fast and simple quality control for MinION sequencing data, Bioinformatics, bty654 https://doi.org/10.1093/bioinformatics/bty654

Bio
Latest Posts

Caitlin Singleton

Postdoc in the Per H Nielsen Lab at Aalborg University. Guest blogger of the Albertsen Lab. Microbe and sequencing enthusiast.

Latest posts by Caitlin Singleton (see all)

All I want for Christmas is a terabase of Nanopore data - December 21, 2018

2 Comments

Iris Gracia

March 22, 2019 at 12:55 pm Reply

Hello!

Thank you for sharing your experience! It provides valuable information!
I wanted to ask you: how do you store the flowcell between the upon-arrival QC and the one that is made just before starting the sequencing?

Thank you very much!

Best,
Iris
Caitlin

March 25, 2019 at 7:35 am Reply

Hello Iris,

They were stored flat in the fridge at 4C, in the same packets and boxes that they arrive in. Good luck with your sequencing project! 🙂

Kind regards,
Caitlin

Caitlin Singleton

Latest posts by Caitlin Singleton (see all)

2 Comments

Leave a Reply Cancel reply