Populating the tree-of-life

 

Hi everybody and welcome to my first blog post at Albertsen Lab. As a newly started PhD student, I have engaged myself with the simple, yet Herculean task of populating the tree-of-life. As most people are aware of, microorganisms are more or less inescapably present in all places of the world — no matter how hostile environments you will encounter, they will probably accommodate some sort of small living organisms. As it once elegantly was put “If you don’t like bacteria, you’re on the wrong planet”. Or in this case, if you don’t like bacteria, you’re in the wrong blog post. Recently, a study was published by Kenneth J. Locey & Jay T. Lennon (2016) trying to illuminate just how omnipresent microorganisms are. This was carried out by applying scaling laws to predict the global microbial diversity. Main conclusion: our planet hosts up to 1012 microbial species. That is 1 trillion species! Of course, one trillion microbial species is still a number that is dwarfed by the approximately 1030 bacterial and archaeal cells living on Earth. Since incredibly large numbers tend to be hard to grasp, the article kindly supplied some illustrative examples. For example, the ~1030 bacterial and archaeal cells exceed the 22 orders of magnitude that separate the mass of a Prochlorococcus cell from a blue whale. It also exceeds the 26 orders of magnitude that result from measuring Earth’s surface area at a spatial grain equivalent to bacteria.

From a perspective concerning my research, it is naturally only the so-called microbial dark matter (microbes not yet discovered) that is of interest. Fortunately, it is only an infinitesimal number of microorganisms that have been discovered to date. For some reason, it is currently a prerequisite to have a bacterium growing in a pure culture before naming it. Alas, if you take a quick look at DSMZ’s homepage (one of the largest bioresource centers worldwide located in Germany), you will find that they boast a collection containing around 31,000 cultures representing some 10,000 species and 2,000 genera in total. Only a tiny bit short of one trillion. On a side note, I guess DSMZ eventually will face some kind of capacity-related problems if we insist on requiring each new species as a pure culture before an ‘official’ name can be granted. Luckily, nowadays bacterial species can also be cataloged by its DNA sequence. Currently, one of the most wide-spread methods for identifying bacteria is the 16S rRNA amplicon sequencing. Large databases such as SILVA and the Ribosomal Database Project (RDP) use huge amounts of digital ink to keep the ever-increasing influx of new sequences up to date. The current version 128 of SILVA (released in September 2016) includes over 6,300,000 SSU/LSU sequences, whereas RDP Release 11 contains 3,356,809 16S rRNA sequences (also released in September 2016). Although this is definitely a lot more than 10,000 pure cultures, it is still ridiculously less than the total estimated microbial diversity of the Earth. Hence, as I begin my PhD study, estimations from Kenneth J. Locey & Jay T. Lennon point to the fact that potentially 99.999% of all microbial taxa remain undiscovered!

This may sound like my research is going to be a cakewalk. After all, I should be able to go to the lab and find novel microorganisms literally by accident. However, I have decided to drastically confine my explorations of microbial dark matter to a specific environment. Although traveling around the world chasing novel bacteria would be pretty cool. So far, my primary focus has been sampling of microbial biomass from drinking water. You may think looking for novelty in a resource where one of the main goals is to restrict living things in is a bit weird. However, there is more than one good reason for this choice. 1) I worked with drinking water during my master’s thesis. Thus I already have experience with sampling and extraction procedures as well as a few connections with people working in this field. 2) Recently, articles such as Antonia Bruno et al. 2017, Karthik Anantharaman et al. 2016 and Birgit Luef et al. 2015 have illustrated that drinking water may potentially contain a large portion of microbial novelty. 3) Unless you are living in a third world country, gaining access to water is not very complicated. But just to clarify, do not mistake “easy access” with “easy sampling”, as the sampling of microorganisms from drinking water can be a king-sized pain in the ass.

A drastically under-represented visualization of the microbial diversity in drinking water.

As you might have guessed, I am not going to dedicate much time to trying to make bacteria living in large water reservoirs deep below the surface grow on plates in the lab. For identification, I am instead going to one-up the conventional 16S rRNA amplicon sequencing method. A new technique developed by some of the people in the Albertsen Lab have shown very promising results, generating full-length 16S rRNA gene sequences. Hopefully, I will have the opportunity to address it a bit more in detail in a later blog post. However, the first hurdle is simply collecting sufficient amounts of biomass for subsequent extraction. As it states in the protocol, the input material is ~ 800 ng RNA. If you — like many of my colleges at Albertsen Lab — have been working with wastewater, 800 ng RNA is no big deal. Getting the same amount from drinking water is a big deal. Drinking water typically has bacterial concentrations in the range of 103 – 105 cells/ml, which is why collecting adequate amounts of biomass is difficult. I naively started out using the same sampling setup that I used in my master’s thesis (where 1 ng/µl DNA would be plenty for further analysis). It basically consisted of a vacuum pump and a disposable funnel, and after spending too many hours pouring several liters of water into a puny 250 ml funnel, I ended up with negligible amounts of RNA. This is the point where you break every single pipette in the lab in despair tell your supervisor that the current sampling method is not feasible. Instead, the sampling task has been partly outsourced to people with more specialized equipment. So I have worked with samples based on 100+ liters of water yielding RNA amounts of more than 800 ng, but the workflow still needs some optimization.

Another aspect complicating the sampling step is the really obvious fact that bacteria are really small (I am perfectly aware of the fact that all bacteria with very good reason can be categorized as “small”), therefore my statement refers to the aforementioned article from Birgit Luef et al. 2015Diverse uncultivated ultra-small bacterial cells in groundwater”. The article highlights findings that demonstrate how a wide range of bacteria can pass through a 0.2 µm-pore-sized membrane filter.  FYI, filtration with a 0.2 µm filter is also commonly referred to as “sterile filtration”. One could argue it is a poorly chosen term for a filtration type that apparently allows numerous types of bacteria to pass the membrane unhindered. Also, sterile filtration is by-far the most used sampling method in papers concerning 16S rRNA amplicon sequencing of drinking water.  During my master’s thesis, I also utilized 0.2 µm filters for water samples, however, in the pursuit of novelty, the filter-size has been reduced to 0.1 µm. This should, as far as I am concerned, capture all microbes inhabiting drinking water (although it is hard not to imagine at least one out of a trillion species slipping through).

Hopefully, I will start to generate full-length 16S rRNA sequences from novel bacteria in the near future and maybe share some interesting findings here on albertsenlab.org.

Can you beat our Nanopore read error correction? We hope so!

Tagging of individual molecules has been used as an effective consensus error-correction strategy for Illumina data (Kivioja et al 2011, Burke et al 2016Zhang et al 2016) and the principle is similar to the circular consensus sequencing strategy used to generate consensus reads with error rate of < 1 % on the PacBio (Travers et al 2010, Schloss et al 2016, Singer et al 2016) and the Oxford Nanopore platforms (Li et al 2016). You simply sequence the same molecule several times and compare the reads to generate a consensus with a better accuracy than the individual reads. As far as we know a tag-based consensus error correction strategy has not been attempted for long reads before, probably because the raw error rate complicates identification of the tags. However, we see several benefits of the tag-based strategy in the long run, which is why we decided to pursue it.

My colleague @SorenKarst tested tag-based error correction on the Nanopore MinION in connection with our work on generating primer free, full length 16S/18S sequences from  environmental rRNA (see our bioRxiv paper: Karst et al 2016). The main approach used in the paper is based on Illumina sequencing inspired by Burke et al 2016, but moving to nanopore sequencing in the future would make the approach considerably easier.  His approach was relatively “simple”; individual cDNA molecules were uniquely tagged at both ends with a 10 bp random sequence, then diluted to a few thousand molecules, amplified by PCR to generate 1000’s of copies of each molecule, which were prepared for 2D sequencing on the Nanopore MinION. The resulting sequence reads were binned based on the unique tags, which  indicated they originated from the same parent molecule, and a consensus was generated from each read bin. The approach was tested on a simple mock community with three reference organisms (E. Coli MG 1655, B. Subtilis str. 168, and P. aeruginosa PAO1), which allowed us to calculate error rates.

For locating the unique tags we used cutadapt with loose settings to locate flanking adaptor sequences and extract the tag sequences. The tags were clustered and filtered based on abundance to remove false tags. As tags and adaptors contain errors, it can be a challenge to cluster the tags correctly without merging groups that do not belong together. Afterwards the filtered tags were used to extract and bin sequence reads using a custom perl script ;). For each bin we used the CANU correction tool followed by USEARCH consensus calling. By this very naive approach we were able to improve the median sequence similarity from 90% to 99%.

We think this is a good start, but we are sure that someone in the nanopore community will be able to come up with a better solution to improve the error rate even further. The data is freely available and a short description of the sequence read composition is provided below. We are looking forward to hear your inputs!

Ps. If you come up with a solution that beats our “quick and dirty” one and post it here or on twitter, I will make sure to mention you in my slides at ASM ;).

 

Data and script availability:

The nanopore reads are available as fastq at: 2D.fq or fasta: 2Dr.fa and fast5: PRJEB20906

The 16S rRNA gene reference sequences: mockrRNAall.fasta

Scripts:

Our approach in a shell script
#############################################################################
# 									    #
# Shell script for generating error corrected FL16S and getting error rates #
#									    #
# Use at your own RISK!							    #
#############################################################################

####################
#     Variables    #
####################
ID_adapt=0.1;
ID_cluster=0.8;
LINKtoCANU=/space/users/rkirke08/Desktop/canu/canu-1.3/Linux-amd64/bin;
# Update path to include poretools installation
# export PATH=$PATH:/space/users/rkirke08/.local/bin
####################
# End of variables #
####################
###############################################
# Depends on the following files and software #
###############################################
# folder with fast5 files "data/pass/"
# file with reference 16S sequences "mockrRNAall.fasta"
# perl script "F16S.cluster.split.pl"
# poretools
# cutadapt
# usearch8.1
# CANU
###############################################
# End of dependencies			      #
###############################################

# Extract fastq files
# poretools fastq --type 2D data/pass/ > data/2D.fq


# Rename headers (Some tools do not accept the long poretools headers)
#awk '{print (NR%4 == 1) ? "@" ++i : $0}' data/2D.fq | sed -n '1~4s/^@/>/p;2~4p' > 2Dr.fa

# Find adapters
cutadapt -g AAAGATGAAGAT -e $ID_adapt -O 12 -m 1300 --untrimmed-output un1.fa -o a1.fa 2Dr.fa
cutadapt -a ATGGATGAGTCT -e $ID_adapt -O 12 -m 1300 --discard-untrimmed -o a1_a2.fa a1.fa

usearch8.1 -fastx_revcomp un1.fa -label_suffix _RC -fastaout un1_rc.fa
cutadapt -g AAAGATGAAGAT -e $ID_adapt -O 12 -m 1300 --discard-untrimmed -o ua1.fa un1_rc.fa
cutadapt -a ATGGATGAGTCT -e $ID_adapt -O 12 -m 1300 --discard-untrimmed -o ua1_a2.fa ua1.fa

cat a1_a2.fa ua1_a2.fa > c.fa

# Extract barcodes
cut -c1-12 c.fa > i1.fa
rev c.fa | cut -c1-12 | rev > i2.fa

paste i1.fa i2.fa -d "" | cut -f1-2 -d ">" > i1i2.fa


# Cluster barcodes
usearch8.1 -cluster_fast i1i2.fa -id $ID_cluster -centroids nr.fa -uc res.uc -sizeout

# Extract raw sequences
perl F16S.cluster.split.pl -c res.uc -i c.fa -m 3 -f 50 -r 40
# Count number of files in directory
find clusters -type f | wc -l

FILES=clusters/*.fa
for OTU in $FILES

do
  wc -l $OTU >> lines.txt	
done

FILES=clusters/*.fa
for OTU in $FILES

do
  OTUNO=$(echo $OTU | cut -f2 -d\/);
  # Rename header
  sed "s/>/>$OTUNO/" clusters/$OTUNO > clusters/newHeaders_$OTUNO

  # Correct reads using CANU
  $LINKtoCANU/canu -correct -p OTU_$OTUNO -d clusters/OTU_$OTUNO genomeSize=1.5k -nanopore-raw  $OTU

  # Unsip corrected reads
  gunzip clusters/OTU_$OTUNO/OTU_$OTUNO.correctedReads.fasta.gz

  sed -i "s/>/>$OTUNO/" clusters/OTU_$OTUNO/OTU_$OTUNO.correctedReads.fasta

  # Call consensus using Usearch
  usearch8.1 -cluster_fast clusters/OTU_$OTUNO/OTU_$OTUNO.correctedReads.fasta -id 0.9 -centroids clusters/OTU_$OTUNO/nr_cor_$OTUNO.fa -uc clusters/OTU_$OTUNO/res_$OTUNO.uc -sizeout -consout clusters/OTU_$OTUNO/Ucons_CANUcor_$OTUNO.fa

  sed -i "s/>/>$OTUNO/" clusters/OTU_$OTUNO/Ucons_CANUcor_$OTUNO.fa

  # Map reads to references to estimate error rate
  # Raw reads
  # Map FL16S back to references
  usearch8.1 -usearch_global clusters/newHeaders_$OTUNO -db mockrRNAall.fasta -strand both -id 0.60 -top_hit_only -maxaccepts 10 -query_cov 0.5 -userout clusters/map_raw_$OTUNO.txt -userfields query+target+id+ql+tl+alnlen

  # Usearch consensus corrected sequence
  # Map FL16S back to references
  usearch8.1 -usearch_global clusters/OTU_$OTUNO/Ucons_CANUcor_$OTUNO.fa -db mockrRNAall.fasta -strand both -id 0.60 -top_hit_only -maxaccepts 10 -query_cov 0.5 -userout clusters/map_cor_Ucons_$OTUNO.txt -userfields query+target+id+ql+tl+alnlen

  cat clusters/map_raw_$OTUNO.txt >> myfile.txt
  cat clusters/map_cor_Ucons_$OTUNO.txt >> myfile.txt

  # Collect corrected sequences
  cat clusters/OTU_$OTUNO/Ucons_CANUcor_$OTUNO.fa >> final_corrected.fa
done

Requirements:

cutadapt

USEARCH

a perl script F16S-cluster.split.pl

cDNA molecule composition

The cDNA molecules are tagged by attaching adaptors to each end of the molecule. The adaptor contains a priming site red, the unique 10 bp tag sequence (blue) and a flanking sequence (black). Note that the design is complicated as we simply modified it from our approach to get Thousands of primer-free, high-quality, full-length SSU rRNA sequences from all domains of life.

AAAGATGAAGATNNNNNNNNNNCGTACTAGACTTGCCTGTCGCTCTATCTTCTTTTTTTTTTTTTTTTTTTT<—- fragment of SSU cDNA molecule—->GGGCAATATCAGCACCAACAGAAATAGATCGCNNNNNNNNNNATGGATGAGTCT

The number of T’s before the fragment will vary between molecules because it is a result of the polyA tailing described in the paper. The black parts of the sequence are generally not needed for the purpose of Nanopore sequencing but are present in the molecule because they were needed for the illumina sequencing.

Example Nanopore sequence read:

>18125
GATCTGGCTTCGTTCGGTTACGTATTGCTGGGGGCAAAGATGAAGATGTTCGTTATTCGTACTAGACTTGCCTGTCGCTCTATCTTCTTTTTGGTCAAGCCTCACGAGCAATTAGTACTGGTTAACTCAACGCCTCACAACGCTTACACACCCAGCCTATCAACGTCGTAGTCTCCGACGGCCCTTCAGGGGAATCAAGTTCCAGTGAGATCTCATCTTGAGGCAAGTTTCCGCTTAGATGCTTTCAGCGGTTATCTTTTCCGAACATGGCTACCCGGCAATGCCACTGGCGTGACAACCGGAACACCAGAGTTCGTCCACCCGGTCCTTCCGTACTAGGAGCAGCCCCTCTCAAATTCAAACGTCCACGGCCAGATGGGGACCGAACTGTCTCACGACGTTCTAAGCCCAGCTCGCGTACCACTTTAAATGGCGAACAGCCATACCCTTAGACCGGCTTCAGCCCCAGGATGTGATGAGCCGACATCGAGGTGCCAAACACCGCCGTCGATAAACTCTTGGGCGGTATCAGCCTGTTATCCCGGAGTACCTTTTATCCGTTGAGCGATGGCCTTCCATACAGAACCACCGGATCTTCAAGACCTACTTTCGTACCTGCTCGACGTGTCTGCTCTGATCAAGCGCTTTTGCCTTTATATTCTCTGCGACCGATTTCCGACCGGTCTGAGCGCACCTTCGTGGTACTCCTCCGTTACTCTTTTAGGAGGAGACCGCCCCAGTCAAACTGCCCACCATACACTGTCCTCGATCCGGATTACGGACCAGAGTTAGAACCTCAAGCATGCCAGGATGGTGATTTCAGGATGGCTCCACGCGAACTGGCGTCCACGCTTCAAAGCCTCCCACCTAATCCTACACAGCAGGCTCAGTCCAGTGCCGCTACAGTAAAGGTTCACGGGGTCTTTCCGTCGCCGCGGATACACTGCATCTTCACAGCGATTTCAATTTCACTGAGTCTCGGGTGGAGACAGCGCCGCCATCGTTACGCCACTCGTGCAGGTCGGAACTTACCCGACAAGGAATTTCGCTACCTTGGACCGTTATCGTTACGGCCGCCGTTTACCGGGGCTAGATCAGGCTTCGCGCCCCATCAATACTTCCGGCACCGGGAGGCGTCACACTTATACGCCGTCCACTTTCGTGTTTTGCAGAGTGCTGTGTTTTTAATAAACAGTCGCAGCGGCCTGGTATCTTCGACCAGCCAGAGCTTACGGAGTAAATCCTTCACCCTAGCCGGCGCACCTTCTCCCGAAGTTACGGTGCCATTTGCCTAGTTCCTTCACCCGAGTTCTCAAGCGCCTTGGTATTCTCTACCCGACCACCTGTGTCGGTTTGGGTGCAGTTCCTGGTGCCTGAAGCTTAGAAGCTTTTGGAAGCATGGCATCAACCACTTCGTCGTCTAAAAGACGACTCGTCATCAACTCTCGGCCTTGAAACCCCGGATTTACCTAAGATTTCAGCCTACCACCTTAAACTTGGGGGCAATATCAGCACCAACAGAAACTCTCTATACCATGGACAATGGATGAGTCTGGGTGGAACGTTCTGTTTATGTTTCTGAAA

Example cluster with same random sequence:

>18125
GATCTGGCTTCGTTCGGTTACGTATTGCTGGGGGCAAAGATGAAGATGTTCGTTATTCGTACTAGACTTGCCTGTCGCTCTATCTTCTTTTTGGTCAAGCCTCACGAGCAATTAGTACTGGTTAACTCAACGCCTCACAACGCTTACACACCCAGCCTATCAACGTCGTAGTCTCCGACGGCCCTTCAGGGGAATCAAGTTCCAGTGAGATCTCATCTTGAGGCAAGTTTCCGCTTAGATGCTTTCAGCGGTTATCTTTTCCGAACATGGCTACCCGGCAATGCCACTGGCGTGACAACCGGAACACCAGAGTTCGTCCACCCGGTCCTTCCGTACTAGGAGCAGCCCCTCTCAAATTCAAACGTCCACGGCCAGATGGGGACCGAACTGTCTCACGACGTTCTAAGCCCAGCTCGCGTACCACTTTAAATGGCGAACAGCCATACCCTTAGACCGGCTTCAGCCCCAGGATGTGATGAGCCGACATCGAGGTGCCAAACACCGCCGTCGATAAACTCTTGGGCGGTATCAGCCTGTTATCCCGGAGTACCTTTTATCCGTTGAGCGATGGCCTTCCATACAGAACCACCGGATCTTCAAGACCTACTTTCGTACCTGCTCGACGTGTCTGCTCTGATCAAGCGCTTTTGCCTTTATATTCTCTGCGACCGATTTCCGACCGGTCTGAGCGCACCTTCGTGGTACTCCTCCGTTACTCTTTTAGGAGGAGACCGCCCCAGTCAAACTGCCCACCATACACTGTCCTCGATCCGGATTACGGACCAGAGTTAGAACCTCAAGCATGCCAGGATGGTGATTTCAGGATGGCTCCACGCGAACTGGCGTCCACGCTTCAAAGCCTCCCACCTAATCCTACACAGCAGGCTCAGTCCAGTGCCGCTACAGTAAAGGTTCACGGGGTCTTTCCGTCGCCGCGGATACACTGCATCTTCACAGCGATTTCAATTTCACTGAGTCTCGGGTGGAGACAGCGCCGCCATCGTTACGCCACTCGTGCAGGTCGGAACTTACCCGACAAGGAATTTCGCTACCTTGGACCGTTATCGTTACGGCCGCCGTTTACCGGGGCTAGATCAGGCTTCGCGCCCCATCAATACTTCCGGCACCGGGAGGCGTCACACTTATACGCCGTCCACTTTCGTGTTTTGCAGAGTGCTGTGTTTTTAATAAACAGTCGCAGCGGCCTGGTATCTTCGACCAGCCAGAGCTTACGGAGTAAATCCTTCACCCTAGCCGGCGCACCTTCTCCCGAAGTTACGGTGCCATTTGCCTAGTTCCTTCACCCGAGTTCTCAAGCGCCTTGGTATTCTCTACCCGACCACCTGTGTCGGTTTGGGTGCAGTTCCTGGTGCCTGAAGCTTAGAAGCTTTTGGAAGCATGGCATCAACCACTTCGTCGTCTAAAAGACGACTCGTCATCAACTCTCGGCCTTGAAACCCCGGATTTACCTAAGATTTCAGCCTACCACCTTAAACTTGGGGGCAATATCAGCACCAACAGAAACTCTCTATACCATGGACAATGGATGAGTCTGGGTGGAACGTTCTGTTTATGTTTCTGAAA
>42262
AGCGTTCAGATTACGTATTGCTAGGGGGCAAAGATGAAGATGTTCGTTATTCTTTAGACTTGCCTGTCGCTCTATCTTCTCTTTTTGGTCAAGCCTCACGGGCAATTAGTACTGGTTAGCTCAACGCCTCACAACGCTTACAGCCTATCAACGTCATAATTCTTCTGACGGCCCTTCAGAATCAAGTTCCCAGTGAGATCTCATCTTGAGCAAGTTTCCCACCGTCTTTCAGCGGTTATCTTTTCGAACCTGCTTCCAGCAATACCACTGGCGTGACAACCGGAACACCAGAGGTTCGTCCACTCCGGTCCTCTCCGTACTAGGGCAGCCCTCTCAAATCTCAGAACGTCCACGGCAGATAGGACCGAACTGTCTCACGACGTTCTAAGCAGCTCGCGTACCACTTTAAATGGCGAACAGCCATGCAGGACCGGCTTCGGCCCCAGGATGTGATGAGCCGGCATCGGGTGCCAAACACCGCCGTCGATATAAACTCGGGCATTGACCTGTTATCCCCGGGTACCTTTTTATCGTTGAGCGATGGCCCTTCCATACCAGAACCACCGGATCACTACAGACCTACTTTCGTACCTGCTCGCTGTCTGTCGCGGCCAAGCGCTTTTGCTATGCTCTGCGACCGATTTCCGACCGGTCTGGGCGCACCTTCGTACTCCGTTGCCTCTTTTGGAGACCGCTGATCAAACTGCCCACCATACACTGTCCTCGATCCGGATTACCAGAGTTTAGAACTCAATGCCAGGGTGGTATTTCAAGGATGGCTCCACGCGAACTGGCGTCCACGCTTCAAAGCCTCCACCTATCCTACAAGCAGGCTCAAAGTCCAGTACAACTACAGTAGGTTCACGGGGTCTTTCCGTCTAGCCGCGGATACCTGCATCTTCAGCGTTTCAATTTCACTGAGTCTCAGGTGGAGACAGCGCCGCCATCGTTACGCCATTCGTGCAGGTCGGAACTTGCCGACAAGGAATTTTGCACCTTGGGACCATTCGTTACGCCGTTTACCGGGGCTGATCAAGAGCTTGCTTGCGCTAACCCCATCAATTAATTTTTCCGGCACCGGGGAGGCGTCACACCTACGTCCCACTGCGTGTTTGCAGAGTGCTGTGTTTAATAAGTCGCAGCAGCTCAGTATCTTCGACCAGCCAGAGCTTACGGAGTAAATCTTCACCTAGCCGGCGACCTTCTCCCGGAAGTTACGGTGCCATTTGCCTAGTTCCTCCGCACCCGAAAGCCCTTCGCGCCTTGGTATTCTCTACCCGACCTGTGTCGGTTTGGGGCACGGTTCCTGGCCTGAAGCAGAAGCTTTTCTTGGAAGCCTGGCATCAACCACTTCGTCATCTAAAAGACGACTCGTCATCAGCTCTCGGCCTTGAAACCGGATTTACCTAAGATTTCAGCCTACCACCTTAAACTTGGGGGCAATATCAGCACCAACAGAAACTCTCTATACCATGGACAATGGATGAGTCTGGTGGAGACGTTCTGTTTATGTTTCTATC
>50101
CCCGGTTACGTATTGCTAGGGGCAAAGATGAAGATGTTCGTTATTCGTACTAGACTTGCCTGTCGCTCTATCTTCTTTTTGGTCAAGCCTGCGGGCAATTATACTGGATAGCTCAACGCCTCACAACGCATACACCCAGCTTCTATCAACGTCGTAGTCTTCGACGGCCCTTCAGGAATCAAGTTCCCAGTGAGATCTCATCTTGAGGCAAGTTTCCCGCTTAGATGCTTTCAGCGGTTATCTTTTCCGAACATAGCTACCCGGCAATGCCACTGGCGTGACAACCGGAACACCAGAGGTTCGTCCACTCCGGTCCTCTCGTACTAGGAGCAGCCTCTCAAATCAAACGTCCACGGCAGATATAGGGACCGAACTGTCTCACGACGTTTCTAAACCCAGCTCGCGTACCACTTTAAATGGCGAACAGCCACCCTTGGGACCGGCTTCAGCCCCAGGATGTGATGAGCCGACATCGGGAACAAACACCGCCGTCGATATAAACTCTTGGGCGGTATCAGCCTGTTATCCCCGGAGTACCTTTTATCCGTTGAGCGATGGCCCTTCCATACAGAACCACCGGATCACTAAGACCTACTTTCGTACCTGCTCGACGTGTCTGTCTCGCAGTCAAGCGCGCTTTTGCTTTATACTCTGCGACCGATTTCCGACCGGTCTGAGCGCACCTTCGTACTCTCCGTTACTCTTTAGGAGACCGCCCCAGTCAAACTGCCCACCATACACTGTCCCTATCGATCCGGATTACGGACAGAGTTAGAACCTCAAGCATGCCAGGGTGGTATTTCAAGGATGGCTCCACGCGAACTGGCGTCCACGCTTCAAAGCCTCCACCTATCCTACACAAGCAGGCTCAAAGTCCAGTGCAAAGCTACAGTAAGGTTCACGGGTCTTTCCGTCTAGCCGCGGATACACTGCATCTCCACAGCGATTTCACCTCACTGAGTCTCTCGGGTGGAGACAGCGCCGCCATCGTTACGCCATTCGTGCAGGTCGGAACTTACCGACAAGGAATTTCGCTACCTTAGACCGTTATCGTTACGGCCGCCGTTTACCGGGGCTTCGATCAAGAGCTTCGCTTGCGCTAACCCCATCAATTAACCTTCGGCACCGGGGAGGCGTCACACCCTATACGTCCACTTTCGTGTTTGCAGAGTGCTGTGGCTTTTAATAAACAGTCGCAGCGGCCTGGTATCTTTTCGACCAGCCAGAGCTTACGGAGTAAATCCTTCACCTTAGCCGGCGCACCTTCTCCCGAAGTTACGGTGCCATTTGCTAGTTCCTTCACCCGAGTTCTCTCAAGCGCCTTGGTATTCTCTACCCGACCACCTGTGTCGGTTTGGGGTACGGTTCTGGTTACCTGAAGCTTAGAAGCTTTTCTTGGAAGCATGGCATCAACCACTTCGTCGTCTAAAGACGACTCGTCATCAGCTCTCGGCCTTGAAACCCCGGATTTACCTAAAGATTTCAGCCTACCACCTTAAACTTAGGGGGCAATATCAGCACCAACAGAAACTCTCTATACCATGGACAATGGATGAGTCTGGGTGGAAGTTCTGTTTATGTTTCTTGAGC

Continue reading