The assignment success for 22 horse breeds registered in the Czech Republic: The machine learning perspective

Štohlová Putnová L., Štohl R. (2021): The assignment success for 22 horse breeds registered in the Czech Republic: The machine learning perspective. Czech J. Anim. Sci., 66: 1–12

supplementary materialdownload PDF

The paper demonstrates the dependability of assignment testing in the identification of an appropriate breed to monitor comprehensive genetic information from molecular markers to analyse the collection of real population data covering 22 horse breeds registered in the Czech Republic, including native breeds and genetic resources. If 17 microsatellites are used, the mean number of alleles per locus corresponds to 10.4. The count of alleles at the individual loci ranges between five (HTG07) and 17 (ASB17). The loci ASB02, ASB23, HMS03, HTG10, and VHL20 exhibit the highest gene diversity and observed heterozygosity (both above 80%), with the mean value of 0.77 and 0.73, respectively. The moderate total inbreeding coefficient (5.2%) is estimated across all the loci and breeds. The levels of apparent breed differentiation span from zero between the Czech Warmblood and Slovak Warmblood to 0.15 between the Shetland Pony and Standardbred. The phylogenetic breed relationships are revealed via the NeighbourNet dendrogram constructed from Reynolds’ genetic distances, which clearly separate the Coldblood draught, Hot/Warmblood, and Pony horses. Our results reveal that the Bayesian approach (the Rannala and Mountain technique) provides the most intensive prediction power (83.6%) out of the GeneClass tools and that the Bayes Net algorithm exhibits the best efficiency (78.4%) from the WEKA machine learning workbench options, considering the use of the five-fold cross-validation technique. The algorithms could be trained on large real reference data sets, and thus there appears another viable perspective for machine learning in horse ancestry testing. In this context, it is also important to stress the fact that innovated computational tools will potentially lead towards structuring a novel webserver to allow the identification of horse breeds.

Baudouin L, Lebrun P. An operational Bayesian approach for the identification of sexually reproduced cross-fertilized populations using molecular markers. Acta Hortic. 2001;546:81-93.
Bjornstad G, Roed KH. Evaluation of factors affecting individual assignment precision using microsatellite data from horse breeds and simulated breed crosses. Anim Genet. 2002 Aug;33(4):264-70.
Cavalli-Sforza LL, Edwards AWF. Phylogenetic analysis: Models and estimation procedures. Am J Hum Genet. 1967 May;19(3 Pt 1):233-57.
Cornuet JM, Piry S, Luikart G, Estoup A, Solignac M. New methods employing multilocus genotypes to select or exclude populations as origins of individuals. Genetics. 1999 Dec;153(4):1989-2000.
Fages A, Hanghoj K, Khan N, Gaunitz C, Seguin-Orlando A, Leonardi M, McCrory Constantz C, Gamba C, Al-Rasheid KAS, Albizuri S, Alfarhan AH, Allentoft M, Alquraishi S, Anthony D, Baimukhanov N, Barrett JH, Bayarsaikhan J, Benecke N, Bernaldez-Sanchez E, Berrocal-Rangel L, Biglari F, Boessenkool S, Boldgiv B, Brem G, Brown D, Burger J, Crubezy E, Daugnora L, Davoudi H, de Barros Damgaard P, Chorro y de Villa-Ceballos MA, Deschler-Erb S, Detry C, Dill N, do Mar Oom M, Dohr A, Ellingvag S, Erdenebaatar D, Fathi H, Felkel S, Fernandez-Rodriguez C, Garcia-Vinas E, Germonpre M, Granado JD, Hallsson JH, Hemmer H, Hofreiter M, Kasparov A, Khasanov M, Khazaeli R, Kosintsev P, Kristiansen K, Kubatbek T, Kuderna L, Kuznetsov P, Laleh H, Leonard JA, Lhuillier J, von Lettow-Vorbeck CL, Logvin A, Lougas L, Ludwig A, Luis C, Arruda AM, Marques-Bonet T, Silva RM, Merz V, Mijiddorj E, Miller BK, Monchalov O, Mohaseb FA, Morales A, Nieto-Espinet A, Nistelberger H, Onar V, Palsdottir AH, Pitulko V, Pitskhelauri K, Pruvost M, Sikanjic PR, Papesa AR, Roslyakova N, Sardari A, Sauer E, Schafberg R, Scheu A, Schibler J, Schlumbaum A, Serrand N, Serres-Armero A, Shapiro B, Seno SS, Shevnina I, Shidrang S, Southon J, Star B, Sykes N, Taheri K, Taylor W, Teegen WR, Trbojevic Vukicevic T, Trixl S, Tumen D, Undrakhbold S, Usmanova E, Vahdati A, Ialenzuela-Lamas S, Viegas C, Wallner B, Weinstock J, Zaibert V, Clavel B, Lepetz S, Mashkour M, Helgason A, Stefansson K, Barrey E, Willerslev E, Outram AK, Librado P, Orlando L. Tracking five millennia of horse management with extensive ancient genome time series. Cell. 2019 May 30;177(6):1419-35.
Fan B, Chen YZ, Moran C, Zhao SH, Liu B, Zhu MJ, Xiong TA, Li K. Individual-breed assignment analysis in swine populations by using microsatellite markers. Asian-Aust J Anim Sci. 2005 Dec 2;18(11):1529-34.
Funk SM, Guedaoura S, Juras R, Raziq A, Landolsi F, Luis C, Martinez AM, Musa Mayaki A, Mujica F, Oom MD, Ouragh L. Major inconsistencies of inferred population genetic structure estimated in a large set of domestic horse breeds using microsatellites. Ecol Evol. 2020 May;10(10):4261-79.
Goldstein DB, Ruiz Linares A, Cavalli-Sforza LL, Feldman MW. Genetic absolute dating based on microsatellites and the origin of modern humans. PNAS. 1995 Jul 18;92(15):6723-7.
Iquebal MA, Ansari MS, Dixit SP, Verma NK, Aggarwal RA, Jayakumar S, Rai A, Kumar D. Locus minimization in breed prediction using artificial neural network approach. Anim Genet. 2014 Dec;45(6):898-902.
Jakobsson M, Rosenberg NA. CLUMPP: A cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics. 2007 Jul 15;23(14):1801-6.
Koskinen M. Individual assignment using microsatellite DNA reveals unambiguous breed identification in the domestic dog. Anim Genet. 2003 Aug;34(4):297-301.
Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA. Circos: An information aesthetic for comparative genomics. Genome Res. 2009 Sep 1;19(9):1639-45.
Librado P, Fages A, Gaunitz C, Leonardi M, Wagner S, Khan N, Hanghoj K, Alquraishi SA, Alfarhan AH, Al-Rasheid KA, Der Sarkissian C, Schubert M, Orlando L. The evolutionary origin and genetic makeup of domestic horses. Genetics. 2016 Oct 1;204(2):423-34.
Morota G, Ventura RV, Silva FF, Koyama M, Fernando SC. Machine learning and data mining advance predictive big data analysis in precision animal agriculture. J Anim Sci. 2018 Apr;96(4):1540-50.
Nei M. Genetic distance between populations. Am Nat. 1972 May-Jun;106(949):283-91.
Nei M. The theory and estimation of genetic distance. In: Morton NE, editor. Genetic structure of populations. Honolulu: University of Hawaii Press; 1973. p. 45-51.
Nei M, Tajima F, Tateno Y. Accuracy of estimated phylogenetic trees from molecular data. J Mol Evol. 1983 Mar;19(2):153-70.
Orlando L, Librado P. Origin and evolution of deleterious mutations in horses. Genes. 2019 Aug 28;10(9): [16 p.].
Paetkau D, Calvert W, Stirling I, Strobeck C. Microsatellite analysis of population structure in Canadian polar bears. Mol Ecol. 1995 Jun;4(3):347-54.
Perez-Enciso M. Animal breeding learning from machine learning. J Anim Breed Genet. 2017 Apr;134(2):85-6.
Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000 Jun 1;155(2):945-59.
Putnova L, Stohl R. Comparing assignment-based approaches to breed identification within a large set of horses. J Appl Genet. 2019 Apr 8;60(2):187-98.
Putnova L, Stohl R, Vrtkova I. Using nuclear microsatellite data to trace the gene flow and population structure in Czech horses. Czech J Anim Sci. 2019 Feb;64(2):67-77.
Rannala B, Mountain JL. Detecting immigration by using multilocus genotypes. PNAS. 1997 Aug 19;94(17):9197-201.
Talle SB, Fimland E, Syrstad O, Meuwissen T, Klungland H. Comparison of individual assignment methods and factors affecting assignment success in cattle breeds using microsatellites. Acta Agric Scand. 2005 Aug;55(2-3):74-9.
Van de Goor LH, van Haeringen WA, Lenstra JA. Population studies of 17 equine STR for forensic and phylogenetic analysis. Anim Genet. 2011 Dec;42(6):627-33.
supplementary materialdownload PDF

© 2021 Czech Academy of Agricultural Sciences | Prohlášení o přístupnosti