Impact of reference population and marker density on accuracy of population imputation

https://doi.org/10.17221/148/2019-CJASCitation:Kranjčevičová A., Kašná E., Brzáková M., Přibyl J., Vostrý L. (2019): Impact of reference population and marker density on accuracy of population imputation. Czech J. Anim. Sci., 64: 405-410.
download PDF

The effect of the reference population size and the number of missing single nucleotide polymorphisms (SNPs) on imputation accuracy was determined. The population imputation method using the FImpute software was applied. The dataset used for the purpose of this study was taken from the database of the Holstein Cattle Breeders Association of the Czech Republic. It contains 1000 animals genotyped with the Illumina BovineSNP50 v.2 BeadChip. Two datasets were created, the first containing the original genotypes, including the missing SNPs, the second containing the same genotypes modified to avoid missing data. In these datasets, animals were randomly selected for a reference population (10, 25, 50 and 75%) and there were randomly selected SNPs for deletion (15, 30, 55, 70, and 95%) in animals that were not used as the reference population. Subsequently, the data accuracy was determined by two parameters: correlation between original and imputed SNPs and percentage of correctly imputed SNPs. Since animals and SNPs were randomly selected, the process including data imputation was repeated 100 times. Accuracy was determined as the average accuracy over all repetitions. It was found that the imputation accuracy is influenced by both parameters. If the size of the reference population is sufficient, the imputation accuracy is higher despite the large number of missing SNPs.

References:
Browning S., Browning B. (2011): Haplotype phasing: Existing methods and new developments. Nature Reviews Genetics, 12, 703–714.  https://doi.org/10.1038/nrg3054
 
Browning B., Zhou Y., Browning S. (2018): A one-penny imputed genome from next-generation reference panels. The American Journal of Human Genetics, 103, 338–348.  https://doi.org/10.1016/j.ajhg.2018.07.015
 
Carvalheiro R., Boison S., Neves H., Sargolzaei M., Schenkel F., Utsunomiya Y., O’Brien A., Solkner J., McEwan J., Van Tassell C., Sonstegard T., Garcia J. (2014): Accuracy of genotype imputation in Nelore cattle. Genetics Selection Evolution, 46, 69. https://doi.org/10.1186/s12711-014-0069-1
 
Daetwyler H., Wiggans G., Hayes B., Woolliams J., Goddard M. (2011): Imputation of missing genotypes from sparse to high density using long-range phasing. Genetics, 189, 317–327.  https://doi.org/10.1534/genetics.111.128082
 
Ghoreishifar S.M., Moradi-Shahrbabak H., Moradi-Shahrbabak M., Nicolazzi E.L., Williams J.L., Iamartino D., Nejati-Javaremi A. (2018): Accuracy of imputation of single-nucleotide polymorphism marker genotypes for water buffaloes (Bubalus bubalis) using different reference population sizes and imputation tools. Livestock Science, 216, 174–182. https://doi.org/10.1016/j.livsci.2018.08.009
 
Gurgul A., Sienko K., Zukowski K., Pawlina-Tyszko K., Bugno M. (2014): Imputation accuracy of bovine spongiform encephalopathy-associated PRNP indel polymorphisms from middle-density SNPs arrays. Czech Journal of Animal Science, 59, 224–249.  https://doi.org/10.17221/7405-CJAS
 
Hayes B., Bowman P., Chamberlain A., Goddard M. (2009): Invited review: Genomic selection in dairy cattle. Journal of Dairy Science, 92, 433–443.  https://doi.org/10.3168/jds.2008-1646
 
Hickey J., Kinghorn B., Tier B., Wilson J., Dunstan N., van der Werf J. (2011): A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes. Genetics Selection Evolution, 43, 12. https://doi.org/10.1186/1297-9686-43-12
 
Mulder H., Calus M., Druet T., Schrooten C. (2012): Imputation of genotypes with low-density chips and its effect on reliability of direct genomic values in Dutch Holstein cattle. Journal of Dairy Science, 95, 876–889.  https://doi.org/10.3168/jds.2011-4490
 
Nicolazzi E., Biffani S., Jansen G. (2013): Short communication: Imputing genotypes using PedImpute fast algorithm combining pedigree and population information. Journal of Dairy Science, 96, 2649–2653. https://doi.org/10.3168/jds.2012-6062
 
Sargolzaei M., Chesnais J., Schenkel F. (2014): A new approach for efficient genotype imputation using information from relatives. BMC Genomics, 15, 12. https://doi.org/10.1186/1471-2164-15-478
 
Schaeffer L. (2006): Strategy for applying genome-wide selection in dairy cattle. Journal of Animal Breeding and Genetics, 23, 218–223.  https://doi.org/10.1111/j.1439-0388.2006.00595.x
 
VanRaden P., O’Connell J., Wiggans G., Weigel K. (2011): Genomic evaluations with many more genotypes. Genetics Selection Evolution, 43, 10. https://doi.org/10.1186/1297-9686-43-10
 
VanRaden P., Sun C., O’Connell J. (2015): Fast imputation using medium or low-coverage sequence data. BMC Genetics, 16, 12.  https://doi.org/10.1186/s12863-015-0243-7
 
Ventura R., Lu D., Schenkel F.S., Wang Z., Li C., Miller S.P. (2014): Impact of reference population on accuracy of imputation from 6K to 50K single nucleotide polymorphism chips in purebred and crossbreed beef cattle. Journal of Animal Science, 92, 1433–1444. https://doi.org/10.2527/jas.2013-6638
 
Wang Y., Lin G., Li C., Stothard P. (2016): Genotype imputation methods and their effects on genomic predictions in cattle. Springer Science Reviews, 4, 79–98. https://doi.org/10.1007/s40362-017-0041-x
 
Whalen A., Gorjanc G., Ros-Freixedes R., Hickey J. (2018): Assessment of the performance of hidden Markov models for imputation in animal breeding. Genetics Selection Evolution, 50, 4–10.  https://doi.org/10.1186/s12711-018-0416-8
 
Zhang Z., Druet T. (2010): Marker imputation with low-density marker panels in Dutch Holstein cattle. Journal of Dairy Science, 93, 5487–5494. https://doi.org/10.3168/jds.2010-3501
 
download PDF

© 2019 Czech Academy of Agricultural Sciences