Wild pepper (Capsicum rhomboideum)
The wild pepper is a distant relative of the famous chili pepper. Despite their relative's spicy reputation, the wild pepper has a "0" on the Scoville Heat Units (SHU) scale and is not pungent. Read more about wild peppers on Wikipedia.
C. rhomboideum flowers, [CC BY 2.0], photo by Robert Jarret
Chromosome-length genome assembly
Download the Capsicum_rhomboideum.peregrine.haplotigs.purged.3d-dna.jbatted_HiC.pilonPolished.fasta.gz file containing the chromosome-length (2n=26) assembly of the wild pepper genome. All modifications with respect to the draft (see below) are annotated in the Capsicum_rhomboideum.peregrine.haplotigs.purged.3d-dna.jbatted_HiC.pilonPolished.assembly file. Some basic stats associated with the new reference, Capsicum_rhomboideum.peregrine.haplotigs.purged.3d-dna.jbatted_HiC.pilonPolished, are listed below. The full data release can be explored here.
Contig length (bp) | Number of contigs | Contig N50 (bp) | Longest contig (bp) |
---|---|---|---|
1,744,900,923 | 1,650 | 3,221,460 | 14,850,509 |
Scaffold length (bp) | Number of scaffolds | Scaffold N50 (bp) | Longest scaffold (bp) |
---|---|---|---|
1,745,386,887 | 577 | 140,886,271 | 163,073,281 |
Draft
The chromosome-length genome assembly is based on the draft assembly Capsicum_rhomboideum.peregrine.haplotigs.purged, credited below.
The draft genome assembly was generated by Zhenzhen Yang (DNA Zoo, ShanghaiTech) in collaboration with Robert Jarret (USDA) using primarily long reads (Consensus Long Read (CLR)) from Pacific Biosciences technology. The reads were first corrected using Canu (Nurk et al., 2020) and then assembled using Peregrine (Peregrine Assembler and SHIMMER Genome Assembly Toolkit Copyright (c) 2019- by Jason, Chen-Shan, Chin). Contigs were purged using purge haplotigs (Roach et al., 2018) to generate a clean haploid assembly. 3D Assembly was performed using 3D-DNA pipeline (Dudchenko et al., Science, 2017). The genome was reviewed using Juicebox Assembly Tools (Dudchenko et al., bioRxiv, 2018). Finally, polishing with Pilon (Walker et al., 2014) was done to improve the per-base accuracy of the final assembly. We thank ShanghaiTech High Performance Computing Platform for computational support of this assembly.
Method
3D Assembly was performed using 3D-DNA pipeline (Dudchenko et al., Science, 2017). The genome was reviewed using Juicebox Assembly Tools (Dudchenko et al., bioRxiv, 2018). See Methods for more information.
Hi-C sample
The leaves sample for in situ Hi-C preparation was obtained from Robert Jarret.
Hi-C Contact maps
Hi-C data was aligned to the draft reference using Juicer (Durand, Shamim et al., Cell Systems, 2016), and contact maps visualizing the alignments with respect to the draft and the new reference were built using 3D-DNA (Dudchenko et al., Science, 2017). The contact maps can be explored below via Juicebox.js interactive tool (Robinson et al., Cell Systems, 2018). To explore the assembly in greater detail, please download the .hic and .assembly files from the data release folder and use Juicebox Assembly Tools (Dudchenko et al., bioRxiv, 2018).
References
If you use this genome assembly in your research, please check that the conditions of use associated with the draft permit it, and acknowledge the following work.
The draft genome assembly was generated by Zhenzhen Yang (DNA Zoo, ShanghaiTech) in collaboration with Robert Jarret (USDA) using primarily long reads (Consensus Long Read (CLR)) from Pacific Biosciences technology. The reads were first corrected using Canu (Nurk et al., 2020) and then assembled using Peregrine (Peregrine Assembler and SHIMMER Genome Assembly Toolkit Copyright (c) 2019- by Jason, Chen-Shan, Chin). Contigs were purged using purge haplotigs (Roach et al., 2018) to generate a clean haploid assembly. 3D Assembly was performed using 3D-DNA pipeline (Dudchenko et al., Science, 2017). The genome was reviewed using Juicebox Assembly Tools (Dudchenko et al., bioRxiv, 2018). Finally, polishing with Pilon (Walker et al., 2014) was done to improve the per-base accuracy of the final assembly. We thank ShanghaiTech High Performance Computing Platform for computational support of this assembly.
Dudchenko, O., Batra, S.S., Omer, A.D., Nyquist, S.K., Hoeger, M., Durand, N.C., Shamim, M.S., Machol, I., Lander, E.S., Aiden, A.P., Aiden, E.L., 2017. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95. https://doi.org/10.1126/science.aal3327.
Dudchenko, O., Shamim, M.S., Batra, S., Durand, N.C., Musial, N.T., Mostofa, R., Pham, M., Hilaire, B.G.S., Yao, W., Stamenova, E., Hoeger, M., Nyquist, S.K., Korchina, V., Pletch, K., Flanagan, J.P., Tomaszewicz, A., McAloose, D., Estrada, C.P., Novak, B.J., Omer, A.D., Aiden, E.L., 2018. The Juicebox Assembly Tools module facilitates de novo assembly of mammalian genomes with chromosome-length scaffolds for under $1000. bioRxiv 254797. https://doi.org/10.1101/254797.
Disclaimer
This is a work in progress. If you notice any discrepancies in the map or have data that confirms or contradicts the suggested reference, please email us at thednazoo@gmail.com or leave a comment on the Forum.