ICCFGG program 2022

POSTER ABSTRACTS

#51 Use of linkage disequilibrium to assess large-scale assembly structure of canine reference genomes

Reuben M. Buckley and Elaine A. Ostrander reuben.buckley@nih.gov

Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA The accessibility and affordability of long-read sequencing technologies has brought an end to the age of sequential improvement of reference genome assemblies. Today, researchers are faced with the difficulty of choosing the optimal reference assembly for their analyses. This choice is non-trivial as sequencing and assembly errors hinder the discovery of trait and disease genes. However, while various assembly quality metrics are already available on NCBI, such as contig N50 and BUSCO, none reflect whether the ordering of contigs in reference genomes is representative of the population. Here, we use genome-wide linkage disequilibrium (LD) to identify non-represen- tative marker ordering amongst five long-read canine assemblies. A dataset of 333 free-breeding dogs genotyped on the 170K Illumina CanineHD genotyping array were used to calculate r2 for 11.25 billion marker pairs, which were then mapped from CanFam3.1 to ROS_Cfam_1.0, UMICH_Zoey_3.1, UNSW_CanFamBas_1.0, UU_Cfam_GSD_1.0, and Dog10K_Boxer_Tasha. In each long-read assembly, we identified changes in marker order since CanFam3.1 and used them to determine whether updated marker placement was consistent with LD. Although there were some improvements in marker ordering, marker placement in the long-read assemblies was more frequently inconsistent with LD. In some cases, over 100 reordered markers were inconsistent with LD. Despite the vast improvement in assembly quality across the board, especially in resolving gene annotation and promoter sequence, these results highlight difficulties inherent in the assembly process. However, careful genome-wide evaluation using orthogonal data can flag regions where an individual reference genome may not be representative of the broader population under analysis.

95

Made with FlippingBook - Online magazine maker