BACKGROUND: Retroposed processed gene transcripts are an important source of material for new gene formation on evolutionary time scales. Most prior work on gene retrocopy discovery involved comparing copies that are present in reference genome assemblies to their source genes. Here, we explore gene retrocopy insertion polymorphisms (GRIPs) that are present in the germlines of individual humans, mice, and chimpanzees, and we identify novel gene retrocopy insertions in cancerous somatic tissues that are absent from patient-matched non-cancer genomes.
RESULTS: Through analysis of whole-genome sequence data, we find evidence for 48 GRIPs in the genomes of one or more humans sequenced as part of the 1,000 Genomes Project and The Cancer Genome Atlas, but not present in the human reference assembly. Similarly, we find evidence for 755 GRIPs at distinct locations present in one or more of 17 inbred mouse strains but not present in the mouse reference assembly, and 19 GRIPs across a cohort of 10 chimpanzee genomes not present in the chimpanzee reference genome assembly. Many of these insertions are new members of existing gene families whose source genes are highly and widely expressed, and the majority have detectable hallmarks of processed gene retrocopy formation. We estimate the rate of novel gene retrocopy insertions in humans and chimps at roughly 1 new gene retrocopy insertion for every 6,000 individuals.
CONCLUSIONS: We find that gene retrocopy polymorphisms are a widespread phenomenon, present a multi-species analysis of these events, and provide a method for their ascertainment.