Article recap: method biases in microbial community profilling of drinking water with 16S rRNA gene amplicon sequencing

In recent years, 16S rRNA gene amplicon sequencing has been widely adopted for analyzing microbial communities in drinking water. This has naturally lead to numerous publications relating to the drinking water microbiome. However, microbial analysis based on 16S rRNA gene sequencing is associated with inherent biases, and despite the popularity of the method within the field of drinking water research, no comprehensive studies have been made to illuminate the impact of these biases specifically relating to a low-biomass environment like drinking water — at least until now.

On the 7th of September 2018 we had a paper published in Frontiers in Mircobiology addressing exactly this issue. The paper was part of the research topic Drinking Water Microbiome and it can be found in its entirety right here. The aim of this blog post is to serve as a TL;DR version of the original article highlighting only the main findings.

You only see what you sequence, and only sequence what you can extract and amplify

In our study, we investigated the impact of DNA extraction and primer choice on the observed microbial community as these steps in particular has been pointed to as highly critical in sequencing-based analyses (Albertsen et al., 2015; Vierheilig et al., 2015). The impact of different DNA extraction methods was investigated with a straightforward comparison between two  DNA extraction kits commonly used for drinking water samples (PowerWater DNA Isolation Kit vs. FastDNA SPIN kit for soil). Ten replicate drinking water samples were extracted with the two kits (five samples for each kit), and visualization of the data generated from the experiment can be seen in Figure 1.

In Figure 1A, only the 25 most abundant OTUs from the samples are listed. It shows that the main differences in abundance of the core community between the two kits are associated to the Saccharibacteria OTUs. However, only focussing on the most abundant bacteria in the samples does not accurately reflect the true difference in performance of the two DNA extraction kits. DNA concentrations from the PowerWater kit was significantly higher than the FastDNA kit. Likewise, the number of observed OTUs in the samples was much higher for the PowerWater kit (approx. 1200-1400 OTUs) in comparison to the FastDNA kit (approx. 600-800 OTUs). This difference is better visualized in Figure 1B and 1C where a much clearer distinctions of the two extraction kits emerges. By using non-metric multidimensional scaling to visualize the data in Figure 1B, a stark contrast between the two extraction kits can be seen. The PowerWater replicates form a distinct cluster whereas the FastDNA replicates are scattered around the PowerWater replicates. The trend observed in the ordination plot is further emphasized in Figure 1C. Here, the beta-diversity is illustrated in a sample-to-sample manner, and again, a clear separation between the two extraction kits can be observed.



Figure 1 (A) Heatmap of 2 L DW samples extracted with two different kits. Each column represents a sample and is grouped by extraction kit. The rows list the 25 most abundant OTUs across the samples. Each OTU is assigned with its phylum classification. The numbers state the relative read abundance. (B) Ordination by non-metric multidimensional scaling based on Bray–Curtis dissimilarity. Each sample is represented as a dot and is colored based on extraction me-thod. (C) Sample-by-sample comparison of the 10 replicates (FD, FastDNA; PW, PowerWater). The similarity between any two samples are displayed as a percent from 0 to 1. The numbers are based on Bray–Curtis measures.


Comparison of three commonly used primer-sets

To facilitate direct comparisons between different primer-sets, an experiment was performed where DNA was extracted with the PowerWater kit from three 2 L biological replicate samples of drinking water. Each replicate DNA sample was PCR amplified using three different primer-sets targeting the V1-3, V3-4, and V4 variable region of the 16S rRNA gene. All three regions of the gene have been targeted in studies relating to drinking water. The data is illustrated in the heatmap in Figure 2. Note, replicate A from the V3-4 primer-set failed to generate reads during the sequencing and is omitted from Figure 2.

FIGURE 2. Heatmap of the primer-set comparison. Each column represents a sam-ple denoted by its replicate and grouped by the variable region of the 16S rRNA gene targeted. The rows list the 20 most abundant phyla across the samples. Each phy-lum is assigned with its kingdom. The numbers state the relative read abundance.

The visualization of data from Figure 2 reveals obvious disparities between the three primer-sets used for the experiment. In addition to large variation in OTU abundances, some of the primers proved unable to detect entire phyla. Notwithstanding the fact that only three different primer-sets were tested, similar results should be expected regardless of the variable regions being targeted. These results also clearly underline the futility in comparing specific datasets applying different primers for the PCR step.

How low can you go?

The applicability of 16S rRNA gene amplicon sequencing in drinking water is also greatly dependent on the ability of the method to detect low abundance microorganisms. Many articles have pointed out the potential use of 16S rRNA gene amplicon sequencing to detect ecologically relevant OTUs or pathogens. However, to our knowledge, no attempts have been conducted to estimate a detection limit under conditions applied specifically for drinking water microbiome research.

We designed a detection limit experiment based on 21 L autoclaved and DEPC-treated drinking water samples of 1 L each. The samples exclusively contained E. coli cells in varying concentrations with a bacterial concentration ranging from ∼106 to ∼101 cells/ml. This covered the interval typically associated with drinking water between 103 and 10cells/ml (Pinto et al., 2012). The data is presented in Figure 3.

FIGURE 3. Heatmap of the detection limit experiment. Each column represents a sample and is grouped by bacteria concentration. The rows list the 25 most abundant OTUs across the samples. Each OTU is assigned with its genus or the closest possible taxonomic rank. The numbers state the relative read abundance.

Ideally, all samples should only have contained E. coli cells. Hence, any other OTUs detected need to be considered as contamination from the workflow. Still, the experiment demonstrates that the workflow used was rather effective when focusing on the core microbial community from samples with bacterial concentrations in the range of normal DW (103 to 10cells/ml). For almost all of the samples, contamination was of no concern. Only replicate B from the ∼10cells/ml sample had notable read abundances for non-E. coli OTUs combined with the lowest E. coli read abundance for all samples (88%). Still, the contamination observed in Figure 3 was not overlapping with the contaminating OTUs for the ∼101 and ∼102 cells/ml samples.

However, for samples with less biomass than the ones within the normal range of drinking water, a broad range of contaminating OTUs started to pop up. Indeed, a plethora of contaminating genera has been documented to originate from extraction kits and lab reagents commonly used for 16S rRNA gene sequencing (Salter et al., 2014). Hence, working with low-biomass samples should be accompanied with particular care to minimize potential contaminations and inclusion of appropriate control samples in the workflow to evaluate the reliability of the results.


The field of 16S rRNA gene sequencing analysis of drinking water needs methodological standardization if results are to be compared across studies. We recommend the use of PowerWater DNA Isolation Kit for DNA extraction of bulk DW samples and PCR amplification of the V3-4 or V4 variable region of the 16S rRNA gene. Finally, biological replicates and negative controls should always be included in order to assess data variability and contamination.

The original article is available at:


Albertsen, M., Karst, S. M., Ziegler, A. S., Kirkegaard, R. H., and Nielsen, P. H. (2015). Back to basics – The influence of DNA extraction and primer choice on phylogenetic analysis of activated sludge communities. PLoS One. 10:e0132783. doi: 10.1371/journal.pone.0132783

Pinto, A. J., Xi, C., and Raskin, L. (2012). Bacterial community structure in the drinking water microbiome is governed by filtration processes. Environ. Sci. Technol. 46, 8851–8859. doi: 10.1021/es302042t

Salter, S. J., Cox, M. J., Turek, E. M., Calus, S. T., Cookson, W. O., Moffatt, M. F., et al. (2014). Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. BMC Biol. 12:87. doi: 10.1186/s12915-014-0087-z

Vierheilig, J., Savio, D., Ley, R. E., Mach, R. L., Farnleitner, A. H., and Reischer, G. H. (2015). Potential applications of next generation DNA sequencing of 16S rRNA gene amplicons in microbial water quality monitoring. Water Sci. Technol. 72, 1962–1972. doi: 10.2166/wst.2015.407

The following two tabs change content below.

Jakob Brandt

PhD-student at the Albertsen Lab.
Posted in Drinking Water and tagged , , .

Leave a Reply

Your email address will not be published. Required fields are marked *