Anna Syme

Click name ↑ to return to homepage

Jellyfish - count kmers to estimate genome size

Quick example

jellyfish count -m 21 -s 100M -t 10 -C <(zcat R1.fq.gz) <(zcat R2.fq.gz)
jellyfish histo -t 10 --high=1000000 mer_counts.jf > reads.histo
# then load histogram in genomescope with k = m, read length = 150, max kmer cov = 1000000

Theory

Theory:

Kmer counting tutorial: https://bioinformatics.uconn.edu/genome-size-estimation-tutorial/#

Jellyfish

Note: Aug 2019: conda install seems to work only with: conda install jellyfish=2.2.3

Steps:

jellyfish count -m 21 -s 100M -t 10 -C reads.fasta

To use gzipped files: and paired-end reads:

jellyfish count -m 21 -s 100M -t 10 -C <zcat R1.fq.gz) <(zcat R2.fq.gz)

Plot the histogram:

jellyfish histo -t 10 mer_counts.jf > reads.histo

Plot the histogram with an x axis of one million instead of default 10,000:

jellyfish histo -t 10 --high=1000000 mer_counts.jf > reads.histo