ICCFGG program 2022

ICCFGG 2022

#40 Characterizing canine non-reference L1 diversity using clustered long reads. Matthew S. Blacksmith , John V. Moran, and Jeffrey M. Kidd blacksmi@umich.edu University of Michigan Medical School, Ann Arbor, MI, USA Long INterspersed Element-1 (L1) retrotransposons utilize a “copy and paste” mechanism to disperse throughout the genome. L1-derived sequences comprise ~19% of canine genomic DNA. Despite their abundance, canine L1s remain under studied, in part, because Illumina reads do not span full-length L1 insertions. The advent of long-read canine genome sequencing has helped overcome the limitations of Illumina sequencing; however, existing long-read L1 detection methods fail to identify some L1 insertions, as available alignment tools frequently map non-reference L1s to reference L1s, generating alternative supplemental L1 sequence alignments that are ignored by most existing tools. Here, I developed the Alternative long-Read Alignment Clustering to Identify non-reference L1 insertions (ARACIL) pipeline to identify non-reference L1s in PacBio reads assigned to alternative alignment loci. Briefly, reads mapping to alternative alignment loci are extracted for assembly using the Flye, Canu, and Wtdbg2 assemblers. Assembled contigs are then mapped to the reference sequences and the resultant L1s are assessed for structural hallmarks, including the presence of target site duplications, 3’ poly(A) tails, and 3’ sequence trans- ductions. ARACIL confirmed L1 insertions within 50bp of 687/704 (97.6%) previously discovered non-reference insertions in a Great Dane genome assembly. Additionally, we identified 2,808 new L1 insertions that are more than 50bp away from known non-reference insertions. ARACIL will be applied to an additional four breed dogs, one wolf, and one dingo to further characterize the contribution of L1 to canine genome diversity.

84

Made with FlippingBook - Online magazine maker