Following our experiences with DNA sequencing using the MinION since 2014 as a part of the Minion (early) Access Programme (MAP), and their developers programme we applied for a spot on the PromethION Early Access Programme (PEAP) back in May 2015. The MinION was the mindblowing DNA sequencer that allows you to do long read (no fixed limit) DNA sequencing by plugging it into a laptop!!! It was an absolutely amazing piece of tech, but the initial throughput was not enough for our aim of retrieving the complete (and closed) genomes from all the abundant organisms in complex samples such as wastewater treatment systems. The PromethION promised a solution to this lack of throughput by having 48 times more flow-cells with 6 times more pores in each cell.
As we were waiting for the Promethion, we used the MinION frequently and our first try at a metagenome sample was a simple two species culture where we used the long reads to scaffold the Nitrospira genome and thus helped show that all the genes neeeded for complete nitrification were present in a single organism (Comammox). At the time, we could scaffold the illumina based assembly with some nanopore reads, but since then ONT has improved their technology tremendously and people have started to get data in the ~5 Gbp range from a single flowcell.
Hence, back-of-the-envelope calculations says that without any further improvements the PromethION would now be able to generate:
[5Gbp pr. flowcell] * [6 x number of pores] * [48 flowcells] = 1440 Gbp (in just ~48hrs)
In other words equivalent to 288.000X coverage of a microbial genome of 5 Mbp (1440000 Mbp/5 Mbp). If we want to retrive genomes of organisms at 0.1% abundance that would still amount to 288 X coverage! While we expected improvements in throughput, we never foresaw that it would come this quick and then suddenly the day came where our Promethion configuration unit arrived. The unit was delivered by ONT in a small van and we had a nice little unboxing experience. The Nanopore hype have finally reached the entire department that have started dreaming about applications for long read sequencing.
— Rasmus Kirkegaard (@kirk3gaard) January 12, 2017
As the PromethION is expected to produce massive amounts of data in very little time the need for fast data transport and storage is another challenge. Even storing data for a single MinION is causing trouble for people.
Oh whoops. 500 gig ssd filled in less than 12 hours by 'disappointing' @nanopore run.
— mattloose (@mattloose) January 19, 2017
ONT therefore ships a PromethION configuration unit to test whether the local infrastructure is ready before shipping the actual PromethION. The accompanying manual states that the maximum expected signal data output would be 80GB/hr per flowcell. The spec sheet for a NAS server suggested by ONT to move the data away from the PromethION itself, while running the sequencing, includes 2 fibre connections and 12*6 TB SSDs to support the internal buffer of 24 TB SDD storage on the PromethION. This amount of SSD storage at enterprise quality does not come cheap and only covers a machine for temporary storage, not the following bioinformatic computations. Compute costs should does not be neglected in the considerations regarding buying a PromethION. As prices tend to drop fast for computer equipment, postponing any unnecessary upgrades could save you a lot of money or give you much more compute power for the same amount. We therefore planned to buy a “cheap” storage server (for now) with the specs below to hopefully meet the needs for the configuration unit and pass the test.
- 768 GB ram
- 2 x Intel Xeon 2650v4 (12 cores each)
- 768gb DDR4 ram 2400MHZ
- 2 x 400gb SSD (for the OS)
- 16 x 8TB NLSAS (12gbps)
- 2 x 10gbit sfp+ fibre ports
We plan to upgrade our entire compute facility when we get a better overview of the true needs for running the sequencing and bioinformatics. With PromethION level output of signal data we do not expect that we will be able to store or upload the raw data files to the read archives in long term, but would hopefully obtain fastq or fasta files as early as possible and discard the raw signals. Re-sequencing samples can probably end up being a lot cheaper than storing raw signal data.
Currently, we are working with our IT support department to get everything connected and hope to be able to share a “hello world” from the PromethION soon!