A chromosome-level, haplotype-phased genome assembly for Vanilla planifolia highlights that partial endoreplication challenges accurate whole genome assembly

Piet, Q., Droc, G., Marande, W., Sarah, G., Bocs, S., Klopp, C., Bourge, M.,Siljak-Yakovlev, S., Bouchez, O., Lopez-Roques, C., Lepers-Andrzejewski, S., Bourgois, L., Zucca,J., Dron, M., Besse, P., Grisoni, M., Jourda, C., Charron, C., 2022. A chromosome-level, haplotype-phased genome assembly for Vanilla planifolia highlights that partial endoreplication challenges accurate whole genome assembly, Plant Communications https://doi.org/10.1016/j.xplc.2022.100330.

Abstract

Vanilla planifolia, the species cultivated to produce one of the world’s most popular flavors, is highly prone to partial genome endoreplication (PE) which leads to highly unbalanced DNA content in cells. We report here first molecular evidence of PE at chromosome scale by the assembly and annotation of an accurate haplotype-phased genome of V. planifolia. Cytogenetic data demonstrated that the diploid genome size is 4.09 Gb, with 16 chromosome pairs although aneuploid cells are frequently observed. Using PacBio HiFi and optical mapping, we assembled and phased a diploid genome of 3.4 Gb with a scaffold N50 of 1.2 Mb and 59,128 predicted protein-coding genes. The atypical k-mers frequencies and the uneven sequencing depth observed agreed with our expectation of unbalanced genome representation. Sixty-seven percent of the genes were scattered over only 30% of the genome, putatively linking gene-rich regions and the endoreplication phenomenon. On the contrary, low coverage regions (non-endoreplicated) were rich in repeated elements but also contained 33% of the annotated genes. Furthermore, this assembly showed distinct haplotype-specific sequencing depth variation patterns suggesting a complex molecular regulation of endoreplication along the chromosomes. This high-quality anchored assembly represented 83% of the estimated V. planifolia genome. It provides a significant step towards the elucidation of this complex genome. To support post-genomics efforts, we developed the Vanilla Genome Hub, a user-friendly integrated web portal that allows centralized access to high-throughput genomic and other omics data, and interoperable use of bioinformatics tools.

Publiée : 09/05/2022