NCM17 – Day 2: Into the unknown

Day 2 at the NCM17 started, again, with Oxford Nanopore CEO Gordon Sanghera taking the stage, once again stating his dream that sequencing should be available to anybody, anytime, anywhere. There are currently around 12,500 mainframe DNA sequencers around the globe, a number he believes will be passed by ONT in a not too distant future. The availability of portable sequencers to the masses will change the way we sequence DNA (the sequencing singularity?), not to mention all the ethics involved in sequencing the genomes of everyone – a topic which was also discussed in a panel discussion. It will undoubtedly improve our understanding of diversity on our planet.

Arwyn Edwards employs Gordon’s motto in his work as he, in his own words, sequences in strange places. One example of such is a field trip to an old coal mine in Wales, where risks were high due to methane gasses. Although plugging in all equipment over ground to avoid sparks in the explosive environment, he still had a battery powered centrifuge blow up in a blue flame. Nevertheless, the team managed to get some reads out of the run, even though the input material was a mere 3×6 ng DNA! Other field trips have included Greenland to investigate microbial darkening and to Svalbard to monitor microbial populations in the melting Arctic ice. He has also been testing the new lyophilised reagents and found results coherent with other kits and Illumina sequencing. It is important to note however, that all methods have biases, all databases are incomplete and all extraction methods have problems. We need to address these issues before we can go out and sequence metagenomes in the field.

Continuing this subject was Mads Albertsen, my PI, who had a clear message of what we need – more genomes! There are some problems in recovering genomes from metagenomes though; Microdiversity in samples and separation of the genomes (binning). His approach is to use differential coverage, so binning can be performed by abundance. This is something that can, sometimes, be done with Illumina reads only, but hybrid assemblies with long Nanopore reads is simplifying this. Mads believes it won’t be long before we can beat evolution in speed and sequence everything! If you want to do metagenomes with differential coverage binning, check out our mmgenome R package.

It has been a great community meeting (my first!) with plenty of activity in the demonstration zone and flow cell loading workshop.

NCM17 – Day 1: New tech and exciting applications

All the gear

We start out with blazing punk rock (things are never too boring at Oxford Nanopore meetings) and plenty of promises from Oxford Nanopore CEO Gordon Sanghera. He gave us updates on various products, such as PromethION flow cells, which have been harder to produce than expected. However, we and others have now sequenced data on the PromethION! Yes, not the 50 gbp as Oxford Nanopore claim to be able to achive in-house. But we start to be much more optimistic than we have ever been! The VolTRAX should be working now and a second version was announced! The second version will have a fluorescent detector built in for quantification and it will be possible to load the prepped sample directly onto a flow cell. The second version will be released in the first half of 2018.

VolTRAX v. 2

The much-anticipated 128-channel Flongle will be available in the second half of 2018. We are looking forward to fast and cheap sequencing the field and laboratory – although a MinION flowcell is relatively cheap (approx 500 USD) it is still way too expensive for regular use in many applications where we do not need Gbps of data. Gordon Sanghera highlighted that they are currently also moving the flongle into the diagnostic field. The sky seems to be the limits for small, cheap and generic diagnostic kits.

Oxford Nanopore has realised that data yield on flow cells depends largely on extraction and library preparation techniques and have started focusing more on this. Automation of everything from extraction to library prep would be ideal, but no updates were given on Zumbador other than the fact that a lot of work remains to be done. Along these lines, Lyophilised reagents have been sent to a selected few for initial trials, and removing the cold chain will be a large step forward to enable sequencing anywhere. Furthermore the new mobile FPGA base-caller was shown and it will be up to the community to name the unit in a competition – use the #NanoporeAnyone tag! The SmidgION and the mobile basecaller was displayed in the demonstration area and look very fancy:

The talks

Nick Loman was up first and gave us an outlook on what we will be able to achieve with sequencing in the future. Based on the sequencing of Ebola in West Africa he showed a video of how the disease had spread and infected different regions. By using real-time genome sequencing we will be able to track diseases as they develop and although outbreaks will happen – we should be able to avoid epidemics.

This sequencing singularity, as he termed it, is however dependent on sequencing genomes instead of genes, as is standard today. Long reads will make this possible as they can often cover repeat locations in the genomes. The long reads will also mean that much less compute power is needed. An example was given with an E. coli sample, where an assembly was produced in miniasm in 1.5 seconds, producing a single contig with 8 reads (Now that Nick and his team lost the world record of the longest sequenced read, the still claims the world record of assembling E.coli with the fewest number of reads).

Sergey Koren continued in a related subject, talking about how we may be able to finally close the human genome. He showed very convincing results on integrating Hi-C with long-read Nanopore data, see their new tool SALSA and their related paper here. Combining Nanopore and Illumina reads in hybrid assemblies is the preferred way to get the desired accuracy and read length, which was also the theme of Ryan Wick’s talk. He brought up some shortcomings of assemblers and stated that all genomes are somewhat metagenomes because of variation. What should assemblers do when encountering these differences? There is a need for more variation-aware assemblers and more graph-friendly tools.

The diversity of life on Earth has always been tricky to study. Steven Salzberg addressed this in his talk about closing large genomes. Even with long reads you need bioinformagic to assemble the wheat genome, which is 16 Gbp long and full of repetitive transposon regions. The fact that it has taken 8-9 months of sequencing, 1 trillion bases sequenced on Illumina and 545 billion bases sequenced on Pacbio is a good indicator of how difficult it, still, is to close these genomes. And it is still not closed, but a lot closer to than before!

The evening speaker, Aaron Pomerantz, gave us a great presentation about his trip to the Ecuadorian rainforest with a mobile lab. The development of portable sequencing has the potential to give us bigger insights into the diversity in remote areas. Often samples have to be shipped halfway around the globe to be sequenced, adding further expenses and time to the workflow.

Nanopore Community Meeting 2017: Prologue

Oxford Nanopore is hosting their annual community meeting in New York City this week. I’ll be there, thanks to being in top 5 of their one more flowcell competition – thanks to all that voted!

Although not as “big” as the London Calling conference they host, the line-up of speakers is impressive and promises talks on everything from megabase reads and gigabase yields to the newest assembly algorithms, field applications and maybe even some product updates? I’m excited! And in this blog post, I will give an overview of what I expect to hear at the meeting.

The first plenary speaker is Nick Loman, who, until recently, had the boasting rights of the longest read on the MinION. The record is now with Kinghorn Genomics, but still sits below the megabase read.

Although megabase reads may not be practical or strictly necessary for most applications, the importance of read length is bound to be brought up by Loman, who will also speak about the future of sequencing – sequencing singularity, as he terms it. I am also looking forward to hearing Ryan Wick speak as he will undoubtedly tell about his work with Unicycler, which we recently described as our preferred hybrid genome assembler. Wick will give us an overview of what to improve in order to obtain perfect assemblies (In general you should check his awesome analysis and tools hosted on github, see e.g. his comparison on current basecalling algorithms for Oxford Nanopore data).

In general, the algorithm developers are very well representated with both Sergey Koren, developer of Canu, who will give an update on assembling the human genome with ultra-long reads and Steven Salzberg that will present an update on their hybrid assembly method for large genomes (MaSuRCA), which is bound to be interesting (you should also check out his blog).

The current record holder in data yield, Baptiste Mayjonade, has a talk on day 1. It will be interesting to hear updates on yields from both researchers and Oxford Nanopore, as a lot of us are still struggling to reach the 10 Gbp. Perhaps with the right methods we could all compete with Baptiste’s 15.7 Gbp on a MinION flowcell? We could also hope for some updates on PromethION flowcell yields, although these are still under heavy development (we have been running our own PromethION, and are now waiting for next flowcell shipment – a blog post should be in the works..).

The last speaker of day 1 is Aaron Pomerantz, who has been sequencing barcoded DNA in a rainforest in Ecuador. There is also Arwyn Edwards on day 2, who has been sequencing in extreme environments. As I am working with on-site DNA sequencing myself, I am very interested in hearing more about these projects. The general trend lately seems to be increased portability, which will undoubtedly spark a wide array of new exciting projects and give insights into the ecology of the Earth.

Another session in my interest is the “metagenomics, microbiomes and microbiology” session on day 2, where my PI, Mads Albertsen, is giving a talk on metagenomics in the long read era. I am currently moving into the metagenomics field and hope to gain some insight in this session.

With some Oxford Nanopore talks throughout the two days, I hope to get some updates on the new tech released. I’m particularly interested in the “flongle” (the small and dirt-cheap flowcell), which I have heard might be ready soon. Recently, we have seen tweets about the first version of the all-in-one Zumbador, new lyophilised reagents being tested and the first version of chemistry for the VolTRAX sample prep device.

I am sure I will be wiser by the end of next week. I will be giving some daily digests here, so stay tuned. If you are going to NCM17 yourself, please come talk to me by poster 18 on Thursday 10.50-11.15 AM and 3.25-3.50 PM, and I can give you an overview of how we recently sequenced activated sludge samples on-site at a wastewater treatment plant (and on the way home).

Populating the tree-of-life

 

Hi everybody and welcome to my first blog post at Albertsen Lab. As a newly started PhD student, I have engaged myself with the simple, yet Herculean task of populating the tree-of-life. As most people are aware of, microorganisms are more or less inescapably present in all places of the world — no matter how hostile environments you will encounter, they will probably accommodate some sort of small living organisms. As it once elegantly was put “If you don’t like bacteria, you’re on the wrong planet”. Or in this case, if you don’t like bacteria, you’re in the wrong blog post. Recently, a study was published by Kenneth J. Locey & Jay T. Lennon (2016) trying to illuminate just how omnipresent microorganisms are. This was carried out by applying scaling laws to predict the global microbial diversity. Main conclusion: our planet hosts up to 1012 microbial species. That is 1 trillion species! Of course, one trillion microbial species is still a number that is dwarfed by the approximately 1030 bacterial and archaeal cells living on Earth. Since incredibly large numbers tend to be hard to grasp, the article kindly supplied some illustrative examples. For example, the ~1030 bacterial and archaeal cells exceed the 22 orders of magnitude that separate the mass of a Prochlorococcus cell from a blue whale. It also exceeds the 26 orders of magnitude that result from measuring Earth’s surface area at a spatial grain equivalent to bacteria.

From a perspective concerning my research, it is naturally only the so-called microbial dark matter (microbes not yet discovered) that is of interest. Fortunately, it is only an infinitesimal number of microorganisms that have been discovered to date. For some reason, it is currently a prerequisite to have a bacterium growing in a pure culture before naming it. Alas, if you take a quick look at DSMZ’s homepage (one of the largest bioresource centers worldwide located in Germany), you will find that they boast a collection containing around 31,000 cultures representing some 10,000 species and 2,000 genera in total. Only a tiny bit short of one trillion. On a side note, I guess DSMZ eventually will face some kind of capacity-related problems if we insist on requiring each new species as a pure culture before an ‘official’ name can be granted. Luckily, nowadays bacterial species can also be cataloged by its DNA sequence. Currently, one of the most wide-spread methods for identifying bacteria is the 16S rRNA amplicon sequencing. Large databases such as SILVA and the Ribosomal Database Project (RDP) use huge amounts of digital ink to keep the ever-increasing influx of new sequences up to date. The current version 128 of SILVA (released in September 2016) includes over 6,300,000 SSU/LSU sequences, whereas RDP Release 11 contains 3,356,809 16S rRNA sequences (also released in September 2016). Although this is definitely a lot more than 10,000 pure cultures, it is still ridiculously less than the total estimated microbial diversity of the Earth. Hence, as I begin my PhD study, estimations from Kenneth J. Locey & Jay T. Lennon point to the fact that potentially 99.999% of all microbial taxa remain undiscovered!

This may sound like my research is going to be a cakewalk. After all, I should be able to go to the lab and find novel microorganisms literally by accident. However, I have decided to drastically confine my explorations of microbial dark matter to a specific environment. Although traveling around the world chasing novel bacteria would be pretty cool. So far, my primary focus has been sampling of microbial biomass from drinking water. You may think looking for novelty in a resource where one of the main goals is to restrict living things in is a bit weird. However, there is more than one good reason for this choice. 1) I worked with drinking water during my master’s thesis. Thus I already have experience with sampling and extraction procedures as well as a few connections with people working in this field. 2) Recently, articles such as Antonia Bruno et al. 2017, Karthik Anantharaman et al. 2016 and Birgit Luef et al. 2015 have illustrated that drinking water may potentially contain a large portion of microbial novelty. 3) Unless you are living in a third world country, gaining access to water is not very complicated. But just to clarify, do not mistake “easy access” with “easy sampling”, as the sampling of microorganisms from drinking water can be a king-sized pain in the ass.

A drastically under-represented visualization of the microbial diversity in drinking water.

As you might have guessed, I am not going to dedicate much time to trying to make bacteria living in large water reservoirs deep below the surface grow on plates in the lab. For identification, I am instead going to one-up the conventional 16S rRNA amplicon sequencing method. A new technique developed by some of the people in the Albertsen Lab have shown very promising results, generating full-length 16S rRNA gene sequences. Hopefully, I will have the opportunity to address it a bit more in detail in a later blog post. However, the first hurdle is simply collecting sufficient amounts of biomass for subsequent extraction. As it states in the protocol, the input material is ~ 800 ng RNA. If you — like many of my colleges at Albertsen Lab — have been working with wastewater, 800 ng RNA is no big deal. Getting the same amount from drinking water is a big deal. Drinking water typically has bacterial concentrations in the range of 103 – 105 cells/ml, which is why collecting adequate amounts of biomass is difficult. I naively started out using the same sampling setup that I used in my master’s thesis (where 1 ng/µl DNA would be plenty for further analysis). It basically consisted of a vacuum pump and a disposable funnel, and after spending too many hours pouring several liters of water into a puny 250 ml funnel, I ended up with negligible amounts of RNA. This is the point where you break every single pipette in the lab in despair tell your supervisor that the current sampling method is not feasible. Instead, the sampling task has been partly outsourced to people with more specialized equipment. So I have worked with samples based on 100+ liters of water yielding RNA amounts of more than 800 ng, but the workflow still needs some optimization.

Another aspect complicating the sampling step is the really obvious fact that bacteria are really small (I am perfectly aware of the fact that all bacteria with very good reason can be categorized as “small”), therefore my statement refers to the aforementioned article from Birgit Luef et al. 2015Diverse uncultivated ultra-small bacterial cells in groundwater”. The article highlights findings that demonstrate how a wide range of bacteria can pass through a 0.2 µm-pore-sized membrane filter.  FYI, filtration with a 0.2 µm filter is also commonly referred to as “sterile filtration”. One could argue it is a poorly chosen term for a filtration type that apparently allows numerous types of bacteria to pass the membrane unhindered. Also, sterile filtration is by-far the most used sampling method in papers concerning 16S rRNA amplicon sequencing of drinking water.  During my master’s thesis, I also utilized 0.2 µm filters for water samples, however, in the pursuit of novelty, the filter-size has been reduced to 0.1 µm. This should, as far as I am concerned, capture all microbes inhabiting drinking water (although it is hard not to imagine at least one out of a trillion species slipping through).

Hopefully, I will start to generate full-length 16S rRNA sequences from novel bacteria in the near future and maybe share some interesting findings here on albertsenlab.org.

Fast, easy and robust DNA extraction for on-site bacterial identification using MinION

My name is Peter Rendbæk, and I’m currently a master student in the Albertsen lab. The overarching aim of my master project, is as a pre-test for several of the new big projects in the group, which focus on applying the on-line bacterial identification for process control at wastewater treatment plants. Hence, last couple of months I have been working on the project “Developing methods for on-site DNA sequencing using the Oxford Nanopore MinION platform”. The MinION has improved a lot since its release three years ago, and it can now be used to make rapid determination of bacterial compositions.

The potential for this fast and mobile DNA-sequencing is mind-blowing. However, given that the technology is here now (!), there has been relatively little focus on portable, fast, easy and robust DNA extraction. Hence, I’ve spent the last months on trying to develop a fast, cheap, mobile, robust and easy to use DNA extraction method.

There is a significant amount of bias connected with DNA extraction, but the bias associated with wastewater treatment samples has been investigated in depth. However, the “optimized method” is not suited for on-site DNA-extraction. There are 3 principle steps in DNA extraction, cell lysis, debris removal and DNA isolation, which I will cover below and discuss how I simplified each step.

In general, complex samples require bead beating for cell lysis and homogenization. The problem is that our in-house bead-beating is done by a big table top tool weighing 17 kg, which makes it hard to transport. However, I came across a blog post from loman labs about sample preparation and DNA extraction in the field for Nanopore sequencing. In the blog post, the possibilities of a portable bead beater outlined, by the use of a remodeled power-tool. I thought this was interesting, so I went out and bought an Oscillating Multi-Tool cutter and tried this with lots of duct tape…

The amazing part was that it worked! But the problem was that the samples would get “beaten” differently depend on how you taped the sample to the power-tool, which could give rise to variation large variations in the observed microbial community.

I solved this by 3D printing an adapter to the power-tool that fits the bead-beater tube (Finally, a good excuse to use a 3D printer!). I used Solidworks to design the adapter and collaborated with our local department of mechanical and manufacturing engineering (m-tech) in 3D printing it. You can make your own by simply downloading my design from Thingiverse (It did take a few iterations to make it durable enough, and I still use a little duct tape..).

 

After the bead beating, the cell debris removal is done by centrifugation. Our “standard” protocol recommends centrifugation at 14000 x G for 10 minutes at 4 C. However, in our minds that seemed a little extensive and requires a huge non-transportable centrifuge… Alternatively, there are a lot of possibilities to use small, easy to transport and easy to use centrifuges if we do not have to centrifuge at 14.000 xG at 4 C. There is even the possibility to 3D print a hand-powered centrifuge. However, I did not follow this path, as it seems a bit dangerous… After several tests, we discovered that a simple table top centrifuge could do the job perfectly well, using 2000 xG for 1 min at room temperature if we combined it with the DNA isolation described below.

The last step is DNA isolation, I tried several different methods, but we got the idea to simply use Agencourt AMPure XP that is routinely used in e.g. PCR purification (we 10 diluted the AMPure XP beads 1:10 to save some money and it seems to work just as good). And… It works..

So, now you have an overview of the method I developed. The most amazing part is that it works! It takes 10-15 minutes from the sample is taken until you’ve got ready DNA for use, compared to 60+ minutes for our “standard” protocol. Furthermore, it requires inexpensive equipment that can be carried in a small suitcase. So, just to prove that this approach is fast, I filmed myself doing the DNA extraction with a GO-PRO camera, as you can see below.

The next part is to test the MinION in the lab. How, fast can we identify bacteria and is the extracted DNA compatible with the downstream library preparation, which we hope to do on the our new and shiny Voltrax (which is now moving liquids!).