Journal of Animal Reproduction and Biotechnology 2024; 39(2): 138-144
Published online June 30, 2024
https://doi.org/10.12750/JARB.39.2.138
Copyright © The Korean Society of Animal Reproduction and Biotechnology.
Gwang Hyeon Lee1,2,# , Jae Don Oh1,3,4,# and Hong Sik Kong1,2,3,4,*
1Department of Biotechnology, Hankyong National University, Anseong 17579, Korea
2Hankyong and Genetics, Anseong 17579, Korea
3Gyeonggi Regional Research Center, Hankyong National University, Anseong 17579, Korea
4Genomic Information Center, Hankyong National University, Anseong 17579, Korea
Correspondence to: Hong Sik Kong
E-mail: kebinkhs@hknu.ac.kr
#These authors contributed equally to this work.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Background: As the number of households raising companion dogs increases, the pet genetic analysis market also continues to grow. However, most studies have focused on specific purposes or native breeds. This study aimed to collect genomic data through single nucleotide polymorphism (SNP) chip analysis of companion dogs in South Korea and perform genetic diversity analysis and SNP annotation.
Methods: We collected samples from 95 dogs belonging to 26 breeds, including mixed breeds, in South Korea. The SNP genotypes were obtained for each sample using an Axiom™ Canine HD Array. Quality control (QC) was performed to enhance the accuracy of the analysis. A genetic diversity analysis was performed for each SNP.
Results: QC initially selected SNPs, and after excluding non-diverse ones, 621,672 SNPs were identified. Genetic diversity analysis revealed minor allele frequencies, polymorphism information content, expected heterozygosity, and observed heterozygosity values of 0.220, 0.244, 0.301, and 0.261, respectively. The SNP annotation indicated that most variations had an uncertain or minimal impact on gene function. However, approximately 16,000 non-synonymous SNPs (nsSNPs) have been found to significantly alter gene function or affect exons by changing translated amino acids.
Conclusions: This study obtained data on SNP genetic diversity and functional SNPs in companion dogs raised in South Korea. The results suggest that establishing an SNP set for individual identification could enable a gene-based registration system. Furthermore, identifying and researching nsSNPs related to behavior and diseases could improve dog care and prevent abandonment.
Keywords: annotation, companion dog, genetic diversity, nsSNP, SNP chip
As the first domesticated animals, dogs have maintained a close relationship with humans from the past to the present, remaining our closest companions in daily life (Perri et al., 2021). Today, this concept has evolved from pets to companion animals that provide emotional support and live with humans. As of 2022, 5.52 million households (25.7%) have pets, and approximately 71.4% of them own dogs (Heo et al., 2023; Hwang and Lee, 2023). The pet-related market is growing globally. The domestic pet-related market size in South Korea was 2.92 trillion won in 2021, and is expected to increase to 4.12 trillion won by 2027 (KREI, 2024). With the development of the pet industry, the pet genetic analysis market is growing. Single-nucleotide polymorphism (SNP) analysis chips are useful tools for genetic analysis. Earlier, only 10-20 SNPs could be analyzed; however, now 10,000-2,000,000 SNPs can be analyzed simultaneously, thereby significantly increasing the accuracy of predictions (Ostrander et al., 2017). High-density SNP chips have SNPs evenly distributed across the entire genome, enabling analyses, such as Genome-Wide Association Studies, to identify associated candidate genes and Quantitative Trait Locus mapping. Currently, in South Korea, high-density SNP chips are used to estimate genetic ability, improve livestock production, and identify candidate genes associated with improvement (Kim et al., 2021; Kim et al., 2022). Such SNP chip analyses are also used in genetic research on companion dogs. The following are some examples of representative genetic analyses: SNP-based genetic tests to accurately determine dog breeds, bio-healthcare research for the early diagnosis and prevention of common genetic diseases in dogs, and genetic analysis studies to predict dog behavior and personality. However, most of these studies were conducted abroad or limited to special-purpose dogs or certain native breeds within South Korea. Such studies are expected to differ based on the breeds of companion dogs raised in typical households, highlighting the need for research that focuses on companion dogs raised in ordinary households. Therefore, this study aimed to conduct SNP chip analysis of companion dogs raised in typical households in South Korea to collect genomic information. We have performed a genetic diversity analysis and annotation of each SNP to conduct a structural analysis of the genomic information. The data collected in this study is expected to serve as a foundation for advancements in the pet genetic analysis industry.
In this study, DNA samples were collected from oral epithelial cells using swabs from 95 dogs across 26 breeds, including mixed-breed dogs, raised in South Korea (Table 1). These samples were gathered at veterinary clinics. DNA extraction was performed using the AccuPrep® Genomic DNA Extraction Kit (BIONEER, Korea), following the manufacturer’s protocol. The swab with collected oral epithelial cells was cut into a 1.5 mL tube. Then, 200 µL of TL buffer, 20 µL of proteinase K, and 10 µL of RNase were added, and the mixture was incubated at 60℃ for 1 h. After removing the swab, 200 µL of GB buffer was added and then vortexed. Then, 400 µL of 99% ethanol was added and mixed using a pipette. Subsequently, the mixture was dispensed into a collection tube, followed by a washing process. Finally, DNA was extracted using 100 µL of EA buffer. SNP genotyping was performed for each individual using the AxiomTM Canine HD Array (Applied BiosystemsTM, USA), and a total of 730,754 SNPs were obtained using the Axiom Analysis Suite Software (Applied BiosystemsTM, USA).
Table 1 . Sample table used for analysis with companion dogs
Breed | Count | Breed | Count |
---|---|---|---|
Maltese | 20 | Cocker Spaniel | 1 |
Mixed dog | 20 | Italian Greyhound | 1 |
Poodle | 10 | Jindo Dog | 1 |
Shih Tzu | 6 | Long haired Dachshund | 1 |
Bichon Frise | 5 | Miniature Pinscher | 1 |
Pomeranian | 5 | Miniature Schnauzer | 1 |
Yorkshire Terrier | 4 | Pointer | 1 |
Chihuahua | 3 | Schnauzer | 1 |
French Bulldog | 3 | Siberian Husky | 1 |
Dachshund | 2 | Spitz | 1 |
Golden Retriever | 2 | Standard Poodle | 1 |
Border Collie | 1 | Welsh Corgi | 1 |
Chow Chow | 1 | Wheaten Terrier | 1 |
Before performing QC for analysis, we used Python code to remove non-autosomal information from the genomic data, SNPs with chromosome and position information of 0, and In/Dels present on autosomes, to initially select SNPs. Based on the initially selected SNPs, we used Perl code to create ped and map files and performed QC using the PLINK 1.9 program (Purcell et al., 2007; Chang et al., 2015). QC criteria included removing SNPs with a sample call rate < 90%, SNP call rate < 90%, and Hardy-Weinberg equilibrium (HWE)
To analyze the genetic diversity of the SNP data after QC, we calculated the polymorphic information content (PIC), observed heterozygosity (Ho), expected heterozygosity (He), and minor allele frequency (MAF) of each SNP using the R package snpReady. To perform SNP annotation, we created a Variant Call Format (VCF) file, which stores variant information and is compatible with SnpEff (version 4.3t), using the PLINK 1.9 program (Purcell et al., 2007; Chang et al., 2015). Annotation was performed using SnpEff (version 4.3t) and Canis lupus familiaris genome annotation data, CanFam3.1.86, to identify genes associated with each SNP. SNPs that did not exhibit genetic diversity were excluded from the annotation. Annotation was based on the chromosome number and SNP position of the Axiom Canine HD Array (Applied BiosystemsTM, USA) used in this study. The annotations included information on the associated genes, non-synonymous SNPs (nsSNPs), introns, untranslated regions (UTRs), and changes in the coding amino acids owing to SNP variations. The annotated VCF file was processed using SnpSift (version 4.3t) to extract the required data.
From the 730,754 SNPs obtained through the AxiomTM Canine HD Array (Applied BiosystemsTM, USA) analysis, we removed non-autosomal SNPs, SNPs with chromosome and position numbers of 0, and In/Dels located on autosomes. This resulted in the initial selection of 691,678 SNPs. We then applied QC criteria to the selected SNPs and removed those that did not meet the standards, resulting in a final set of 686,074 SNPs used for the study. The changes in the number of SNPs and the distances between SNPs according to the QC results are summarized in Table 2. The chromosome with the most removed SNPs after QC was chromosome 19, whereas the chromosome with the fewest removed SNPs was chromosome 23. Overall, a mean of 0.82% of the SNPs was removed, and the average distance between all SNPs increased from 3.134 kb to 3.160 kb.
Table 2 . Number of SNPs and distance between SNPs before and after quality control
Chromosome no. | Number of SNPs | Remove frequence | Mean of interval SNP | ||
---|---|---|---|---|---|
Before QC | After QC | Before QC (kb) | After QC (kb) | ||
1 | 35,715 | 35,432 | 0.79% | 3.435 | 3.462 |
2 | 24,571 | 24,364 | 0.84% | 3.476 | 3.506 |
3 | 28,801 | 28,593 | 0.72% | 3.190 | 3.213 |
4 | 27,514 | 27,311 | 0.74% | 3.208 | 3.231 |
5 | 28,191 | 27,988 | 0.72% | 3.154 | 3.177 |
6 | 23,329 | 23,151 | 0.76% | 3.324 | 3.350 |
7 | 25,125 | 24,966 | 0.63% | 3.222 | 3.243 |
8 | 21,626 | 21,447 | 0.83% | 3.437 | 3.465 |
9 | 17,789 | 17,613 | 0.99% | 3.433 | 3.467 |
10 | 20,569 | 20,416 | 0.74% | 3.369 | 3.395 |
11 | 21,017 | 20,845 | 0.82% | 3.538 | 3.568 |
12 | 23,193 | 22,999 | 0.84% | 3.126 | 3.152 |
13 | 20,295 | 20,111 | 0.91% | 3.116 | 3.145 |
14 | 18,582 | 18,407 | 0.94% | 3.280 | 3.311 |
15 | 19,003 | 18,855 | 0.78% | 3.378 | 3.405 |
16 | 18,208 | 18,062 | 0.80% | 3.271 | 3.298 |
17 | 20,475 | 20,320 | 0.76% | 3.135 | 3.159 |
18 | 16,869 | 16,692 | 1.05% | 3.307 | 3.342 |
19 | 16,533 | 16,359 | 1.05% | 3.250 | 3.285 |
20 | 18,071 | 17,914 | 0.87% | 3.216 | 3.244 |
21 | 15,679 | 15,558 | 0.77% | 3.244 | 3.269 |
22 | 18,843 | 18,668 | 0.93% | 3.257 | 3.288 |
23 | 16,739 | 16,645 | 0.56% | 3.124 | 3.142 |
24 | 15,529 | 15,422 | 0.69% | 3.071 | 3.092 |
25 | 16,242 | 16,116 | 0.78% | 3.178 | 3.203 |
26 | 12,423 | 12,322 | 0.81% | 3.136 | 3.162 |
27 | 15,376 | 15,238 | 0.90% | 2.980 | 3.007 |
28 | 14,034 | 13,935 | 0.71% | 2.933 | 2.953 |
29 | 14,024 | 13,899 | 0.89% | 2.982 | 3.008 |
30 | 13,087 | 12,995 | 0.70% | 3.073 | 3.094 |
31 | 13,417 | 13,284 | 0.99% | 2.973 | 3.002 |
32 | 13,166 | 13,044 | 0.93% | 2.946 | 2.974 |
33 | 10,929 | 10,836 | 0.85% | 2.871 | 2.896 |
34 | 13,498 | 13,378 | 0.89% | 3.120 | 3.148 |
35 | 10,568 | 10,496 | 0.68% | 2.509 | 2.527 |
36 | 11,343 | 11,250 | 0.82% | 2.716 | 2.738 |
37 | 11,242 | 11,160 | 0.73% | 2.747 | 2.767 |
38 | 10,063 | 9,983 | 0.79% | 2.376 | 2.395 |
Total | 691,678 | 686,074 | 0.82% | 3.134 | 3.160 |
The genetic diversity of the SNPs was assessed by calculating the MAF, PIC, He, and Ho values, and the results are presented in Table 3. After excluding 64,402 SNPs that did not exhibit diversity, the genetic diversity of 621,672 SNPs was assessed. The MAF range was between 0.211-0.233, with an overall mean of 0.220. The lowest value was observed on chromosome 37 (0.211), whereas the highest value was observed on chromosome 34 (0.233). We classified SNPs based on MAF intervals of 0.05 and examined their distribution (Fig. 1). On average, approximately 62,167 SNPs were identified. The highest number of SNPs, 89,153, was observed in the MAF range of 0.05 to 0.1, whereas the lowest, 48,118 SNPs, was observed in the range of 0.35 to 0.4. The PIC values ranged from 0.237 to 0.254, with a mean of 0.244. The lowest PIC value was observed on chromosome 37 (0.237), whereas the highest value was observed on chromosome 34 (0.254). He ranged from 0.292 to 0.315, which was higher than that of Ho, which ranged from 0.247 to 0.273. The overall mean values of He and Ho were 0.301 and 0.261, respectively, with higher values for He.
Table 3 . Information on the genetic diversity of SNPs by chromosome
Chromosome no. | No. | MAF | PIC | He | Ho |
---|---|---|---|---|---|
1 | 31,937 | 0.217 | 0.241 | 0.298 | 0.259 |
2 | 21,843 | 0.216 | 0.241 | 0.297 | 0.253 |
3 | 25,933 | 0.220 | 0.244 | 0.301 | 0.266 |
4 | 24,733 | 0.220 | 0.243 | 0.301 | 0.257 |
5 | 25,475 | 0.219 | 0.243 | 0.301 | 0.259 |
6 | 20,828 | 0.219 | 0.243 | 0.300 | 0.256 |
7 | 22,763 | 0.219 | 0.243 | 0.300 | 0.258 |
8 | 19,208 | 0.223 | 0.246 | 0.305 | 0.266 |
9 | 15,875 | 0.219 | 0.243 | 0.300 | 0.261 |
10 | 18,085 | 0.217 | 0.241 | 0.298 | 0.259 |
11 | 18,716 | 0.217 | 0.240 | 0.297 | 0.261 |
12 | 20,783 | 0.220 | 0.244 | 0.302 | 0.262 |
13 | 18,350 | 0.219 | 0.243 | 0.300 | 0.265 |
14 | 16,690 | 0.219 | 0.243 | 0.300 | 0.263 |
15 | 16,878 | 0.219 | 0.241 | 0.299 | 0.256 |
16 | 16,390 | 0.220 | 0.243 | 0.300 | 0.261 |
17 | 18,304 | 0.219 | 0.242 | 0.300 | 0.258 |
18 | 14,985 | 0.219 | 0.243 | 0.300 | 0.247 |
19 | 14,761 | 0.227 | 0.247 | 0.307 | 0.263 |
20 | 16,080 | 0.214 | 0.238 | 0.294 | 0.253 |
21 | 14,181 | 0.224 | 0.247 | 0.306 | 0.272 |
22 | 16,767 | 0.219 | 0.243 | 0.300 | 0.268 |
23 | 15,171 | 0.220 | 0.244 | 0.302 | 0.258 |
24 | 14,097 | 0.223 | 0.246 | 0.304 | 0.267 |
25 | 14,675 | 0.220 | 0.244 | 0.301 | 0.263 |
26 | 11,329 | 0.225 | 0.247 | 0.306 | 0.264 |
27 | 13,944 | 0.220 | 0.245 | 0.302 | 0.261 |
28 | 12,594 | 0.220 | 0.245 | 0.303 | 0.258 |
29 | 12,785 | 0.227 | 0.249 | 0.309 | 0.269 |
30 | 11,689 | 0.220 | 0.243 | 0.301 | 0.258 |
31 | 12,188 | 0.225 | 0.247 | 0.307 | 0.273 |
32 | 11,998 | 0.223 | 0.245 | 0.304 | 0.255 |
33 | 9,870 | 0.221 | 0.244 | 0.302 | 0.267 |
34 | 12,269 | 0.233 | 0.254 | 0.315 | 0.270 |
35 | 9,881 | 0.226 | 0.249 | 0.309 | 0.263 |
36 | 10,204 | 0.219 | 0.242 | 0.299 | 0.266 |
37 | 10,183 | 0.211 | 0.237 | 0.292 | 0.254 |
38 | 9,230 | 0.220 | 0.243 | 0.300 | 0.255 |
Total | 621,672 | 0.220 | 0.244 | 0.301 | 0.261 |
GD, genetic diversity; MAF, minor allele frequency; PIC, polymorphic information content; He, expected heterozygosity; Ho, observed heterozygosity.
Annotations were added for the 621,672 SNPs categorized based on sequence ontology (SO) terms and putative impacts (HIGH, MODERATE, LOW, and MODIFIER), as summarized in Table 4. Most of the SNPs (95.40%) were classified as “MODIFIER” in terms of putative impact, indicating an uncertain or minimal impact on gene function. The next most prevalent putative impact observed was “MODERATE,” accounting for 2.50% of the total. Additionally, “HIGH” putative impact, indicating a significant impact on gene function, was observed in 0.15%, whereas “LOW” putative impact, indicating a low impact on gene function, was observed in 1.95%.
Table 4 . Number of SNPs classified by sequence ontology term after annotation
Sequence ontology term | Putative impact | No. |
---|---|---|
Stop_gained | HIGH | 746 |
Splice_acceptor_variant | HIGH | 77 |
Splice_donor_variant | HIGH | 64 |
Stop_lost | HIGH | 26 |
Start_lost | HIGH | 16 |
Missense_variant | MODERATE | 15,572 |
Synonymous_variant | LOW | 9,787 |
5_Prime_UTR_premature_start_codon_gain_variant | LOW | 358 |
Splice_region_variant | LOW | 1,941 |
Stop_retained_variant | LOW | 7 |
Initiator_codon_variant | LOW | 15 |
Intron_variant | MODIFIER | 217,002 |
Intergenic_region | MODIFIER | 355,721 |
5_Prime_UTR_variant | MODIFIER | 2,290 |
3_Prime_UTR_variant | MODIFIER | 16,301 |
Non_coding_transcript_exon_variant | MODIFIER | 1,749 |
Total | 621,672 |
The “MODERATE” SO term includes missense_variant, where the variant is in the gene’s exon region, causing changes in amino acids corresponding to nsSNPs. The number of genes associated with nsSNPs under the “MODERATE” SO term was tabulated by chromosome and presented in Table 5. The average distribution of nsSNPs across chromosomes was 410, with chromosome 1 having the highest distribution at 949 and chromosome 29 showing the lowest distribution at 155. Chromosome 1 had the highest number of genes associated with nsSNPs (507), whereas chromosome 36 had the lowest number of genes (70).
Table 5 . Number of non-synonymous SNPs and associated genes by chromosome
Chromosome no. | No. of nsSNP | No. of associated gene |
---|---|---|
1 | 949 | 507 |
2 | 524 | 301 |
3 | 459 | 227 |
4 | 460 | 213 |
5 | 736 | 394 |
6 | 611 | 351 |
7 | 626 | 294 |
8 | 421 | 225 |
9 | 753 | 425 |
10 | 405 | 248 |
11 | 424 | 216 |
12 | 559 | 265 |
13 | 316 | 160 |
14 | 300 | 139 |
15 | 361 | 184 |
16 | 385 | 191 |
17 | 475 | 237 |
18 | 597 | 328 |
19 | 193 | 99 |
20 | 670 | 388 |
21 | 491 | 251 |
22 | 184 | 103 |
23 | 282 | 140 |
24 | 352 | 206 |
25 | 359 | 153 |
26 | 359 | 192 |
27 | 482 | 234 |
28 | 345 | 171 |
29 | 155 | 91 |
30 | 375 | 180 |
31 | 216 | 109 |
32 | 256 | 114 |
33 | 314 | 128 |
34 | 198 | 106 |
35 | 195 | 101 |
36 | 280 | 70 |
37 | 233 | 117 |
38 | 272 | 126 |
Total | 15,572 | 7,984 |
The SNP count and the number of associated genes for SO terms stop_gained, stop_lost, and start_lost, which have a “HIGH” putative impact and affect the start and stop codons, were summarized for each chromosome and presented in Table 6. Among these, the highest number of SNPs annotated with the stop_gained SO term was observed on chromosome 9 (40 SNPs), whereas chromosomes 14 and 31 had the lowest counts, each with seven SNPs. Chromosome 37 had the highest number of associated genes (37), whereas chromosome 31 had the lowest number (6). For SNPs annotated with both stop_gained and start_lost SO terms, a maximum of two SNPs per chromosome was observed, and there were chromosomes where no such SNPs were found.
Table 6 . SNPs associated with start and stop codons and their related genes
Chromosome no. | Stop_gained | Stop_lost | Start_lost | |||||
---|---|---|---|---|---|---|---|---|
SNP no. | Gene no. | SNP no. | Gene no. | SNP no. | Gene no. | |||
1 | 30 | 29 | 1 | 1 | - | - | ||
2 | 24 | 23 | 1 | 1 | - | - | ||
3 | 29 | 27 | - | - | 1 | 1 | ||
4 | 21 | 21 | - | - | - | - | ||
5 | 33 | 32 | - | - | - | - | ||
6 | 33 | 31 | 2 | 2 | - | - | ||
7 | 22 | 22 | 1 | 1 | - | - | ||
8 | 23 | 20 | - | - | - | - | ||
9 | 40 | 37 | 1 | 1 | 2 | 2 | ||
10 | 28 | 28 | 2 | 2 | 1 | 1 | ||
11 | 22 | 22 | 2 | 2 | - | - | ||
12 | 27 | 25 | - | - | 2 | 2 | ||
13 | 20 | 19 | 1 | 1 | 1 | 1 | ||
14 | 7 | 7 | - | - | - | - | ||
15 | 20 | 19 | 1 | 1 | 1 | 1 | ||
16 | 19 | 18 | 1 | 1 | - | - | ||
17 | 36 | 34 | - | - | 1 | 1 | ||
18 | 25 | 23 | 2 | 2 | 1 | 1 | ||
19 | 8 | 8 | 1 | 1 | 1 | 1 | ||
20 | 29 | 27 | 1 | 1 | - | - | ||
21 | 21 | 20 | 2 | 2 | - | - | ||
22 | 14 | 13 | - | - | - | - | ||
23 | 8 | 6 | 1 | 1 | - | - | ||
24 | 24 | 23 | 2 | 2 | 1 | 1 | ||
25 | 24 | 23 | - | - | - | - | ||
26 | 13 | 13 | 1 | 1 | 1 | 1 | ||
27 | 20 | 20 | 1 | 1 | - | - | ||
28 | 14 | 12 | - | - | - | - | ||
29 | 11 | 11 | - | - | - | - | ||
30 | 13 | 12 | - | - | 1 | 1 | ||
31 | 7 | 6 | 1 | 1 | 1 | 1 | ||
32 | 13 | 12 | 1 | 1 | - | - | ||
33 | 12 | 12 | - | - | - | - | ||
34 | 9 | 9 | - | - | 1 | 1 | ||
35 | 14 | 14 | - | - | - | - | ||
36 | 12 | 8 | - | - | - | - | ||
37 | 9 | 9 | - | - | - | - | ||
38 | 12 | 12 | - | - | - | - | ||
Total | 746 | 707 | 26 | 26 | 16 | 16 |
We conducted SNP chip analysis on 95 dogs from 26 breeds raised in South Korea to secure genomic data and ensure the accuracy of the analysis, and QC was performed. Finally, the genetic diversity and functional impact of SNPs were assessed using 686,074 SNPs through annotation. Genetic diversity was calculated for each SNP, which revealed that chromosome 34 exhibited the highest genetic diversity, whereas chromosome 37 showed the lowest diversity among the chromosomes. The PIC value is classified as follows: 0.5 or above is considered a highly useful marker, 0.25 to 0.5 is classified as an intermediate-level marker, and below 0.25 is considered a low-level useful marker (Botstein et al., 1980). The mean PIC value calculated for all SNPs was 0.244, and SNPs with PIC values of 0.25 or higher accounted for 53.80% of all the SNPs. The results of this study could help construct an SNP marker set for the identification of companion dogs. SNPs that did not exhibit genetic diversity were removed, and 621,672 SNPs were annotated with associated genes and SNP effects. Most SNPs appear to have either undetermined or minimal effects on gene function. However, over 16,000 nsSNPs that significantly alter gene function, were located in exons, and potentially changed the translated amino acids. These SNPs could potentially alter protein structure and, if harmful, may lead to disease. Therefore, further studies on the deleterious effects of nsSNPs are required.
As the number of households raising companion dogs continues to increase, the issue of stray dogs is also becoming more prevalent (Ko et al., 2020). The amount spent by the government on managing stray animals was reported to have increased from 10.44 billion won in 2014 to 26.7 billion won in 2020 (Yoo and Bae, 2022). As a measure against stray animals, the government has implemented a pet registration system using both external and internal microchips. However, there are concerns regarding the external chips being prone to loss, and internal chips implanted inside the body are often met with reluctance from pet owners. Common reasons for abandoning animals include behavioral issues, such as barking, biting, aggression, odor, and financial burdens due to diseases. In this study, we gathered information on SNP genetic diversity and SNP data related to genetic functions in domestically raised companion dogs. Based on the results of the present study, the establishment of a SNP set for individual identification could enable a gene-based registration system. Furthermore, the discovery and functional analysis of nsSNPs associated with behavior and diseases are expected to improve care for companion animals, potentially reducing the likelihood of abandonment.
None.
Conceptualization, H.S.K.; methodology, G.H.L., J.D.O., H.S.K.; investigation, G.H.L., J.D.O.; writing - original draft preparation, G.H.L., J.D.O.; writing - review and editing, G.H.L., J.D.O., H.S.K.; supervision, H.S.K.; project administration, H.S.K.; funding acquisition, H.S.K.
None.
Not applicable.
Not applicable.
Not applicable.
Not applicable.
No potential conflict of interest relevant to this article was reported.
Journal of Animal Reproduction and Biotechnology 2024; 39(2): 138-144
Published online June 30, 2024 https://doi.org/10.12750/JARB.39.2.138
Copyright © The Korean Society of Animal Reproduction and Biotechnology.
Gwang Hyeon Lee1,2,# , Jae Don Oh1,3,4,# and Hong Sik Kong1,2,3,4,*
1Department of Biotechnology, Hankyong National University, Anseong 17579, Korea
2Hankyong and Genetics, Anseong 17579, Korea
3Gyeonggi Regional Research Center, Hankyong National University, Anseong 17579, Korea
4Genomic Information Center, Hankyong National University, Anseong 17579, Korea
Correspondence to:Hong Sik Kong
E-mail: kebinkhs@hknu.ac.kr
#These authors contributed equally to this work.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Background: As the number of households raising companion dogs increases, the pet genetic analysis market also continues to grow. However, most studies have focused on specific purposes or native breeds. This study aimed to collect genomic data through single nucleotide polymorphism (SNP) chip analysis of companion dogs in South Korea and perform genetic diversity analysis and SNP annotation.
Methods: We collected samples from 95 dogs belonging to 26 breeds, including mixed breeds, in South Korea. The SNP genotypes were obtained for each sample using an Axiom™ Canine HD Array. Quality control (QC) was performed to enhance the accuracy of the analysis. A genetic diversity analysis was performed for each SNP.
Results: QC initially selected SNPs, and after excluding non-diverse ones, 621,672 SNPs were identified. Genetic diversity analysis revealed minor allele frequencies, polymorphism information content, expected heterozygosity, and observed heterozygosity values of 0.220, 0.244, 0.301, and 0.261, respectively. The SNP annotation indicated that most variations had an uncertain or minimal impact on gene function. However, approximately 16,000 non-synonymous SNPs (nsSNPs) have been found to significantly alter gene function or affect exons by changing translated amino acids.
Conclusions: This study obtained data on SNP genetic diversity and functional SNPs in companion dogs raised in South Korea. The results suggest that establishing an SNP set for individual identification could enable a gene-based registration system. Furthermore, identifying and researching nsSNPs related to behavior and diseases could improve dog care and prevent abandonment.
Keywords: annotation, companion dog, genetic diversity, nsSNP, SNP chip
As the first domesticated animals, dogs have maintained a close relationship with humans from the past to the present, remaining our closest companions in daily life (Perri et al., 2021). Today, this concept has evolved from pets to companion animals that provide emotional support and live with humans. As of 2022, 5.52 million households (25.7%) have pets, and approximately 71.4% of them own dogs (Heo et al., 2023; Hwang and Lee, 2023). The pet-related market is growing globally. The domestic pet-related market size in South Korea was 2.92 trillion won in 2021, and is expected to increase to 4.12 trillion won by 2027 (KREI, 2024). With the development of the pet industry, the pet genetic analysis market is growing. Single-nucleotide polymorphism (SNP) analysis chips are useful tools for genetic analysis. Earlier, only 10-20 SNPs could be analyzed; however, now 10,000-2,000,000 SNPs can be analyzed simultaneously, thereby significantly increasing the accuracy of predictions (Ostrander et al., 2017). High-density SNP chips have SNPs evenly distributed across the entire genome, enabling analyses, such as Genome-Wide Association Studies, to identify associated candidate genes and Quantitative Trait Locus mapping. Currently, in South Korea, high-density SNP chips are used to estimate genetic ability, improve livestock production, and identify candidate genes associated with improvement (Kim et al., 2021; Kim et al., 2022). Such SNP chip analyses are also used in genetic research on companion dogs. The following are some examples of representative genetic analyses: SNP-based genetic tests to accurately determine dog breeds, bio-healthcare research for the early diagnosis and prevention of common genetic diseases in dogs, and genetic analysis studies to predict dog behavior and personality. However, most of these studies were conducted abroad or limited to special-purpose dogs or certain native breeds within South Korea. Such studies are expected to differ based on the breeds of companion dogs raised in typical households, highlighting the need for research that focuses on companion dogs raised in ordinary households. Therefore, this study aimed to conduct SNP chip analysis of companion dogs raised in typical households in South Korea to collect genomic information. We have performed a genetic diversity analysis and annotation of each SNP to conduct a structural analysis of the genomic information. The data collected in this study is expected to serve as a foundation for advancements in the pet genetic analysis industry.
In this study, DNA samples were collected from oral epithelial cells using swabs from 95 dogs across 26 breeds, including mixed-breed dogs, raised in South Korea (Table 1). These samples were gathered at veterinary clinics. DNA extraction was performed using the AccuPrep® Genomic DNA Extraction Kit (BIONEER, Korea), following the manufacturer’s protocol. The swab with collected oral epithelial cells was cut into a 1.5 mL tube. Then, 200 µL of TL buffer, 20 µL of proteinase K, and 10 µL of RNase were added, and the mixture was incubated at 60℃ for 1 h. After removing the swab, 200 µL of GB buffer was added and then vortexed. Then, 400 µL of 99% ethanol was added and mixed using a pipette. Subsequently, the mixture was dispensed into a collection tube, followed by a washing process. Finally, DNA was extracted using 100 µL of EA buffer. SNP genotyping was performed for each individual using the AxiomTM Canine HD Array (Applied BiosystemsTM, USA), and a total of 730,754 SNPs were obtained using the Axiom Analysis Suite Software (Applied BiosystemsTM, USA).
Table 1. Sample table used for analysis with companion dogs.
Breed | Count | Breed | Count |
---|---|---|---|
Maltese | 20 | Cocker Spaniel | 1 |
Mixed dog | 20 | Italian Greyhound | 1 |
Poodle | 10 | Jindo Dog | 1 |
Shih Tzu | 6 | Long haired Dachshund | 1 |
Bichon Frise | 5 | Miniature Pinscher | 1 |
Pomeranian | 5 | Miniature Schnauzer | 1 |
Yorkshire Terrier | 4 | Pointer | 1 |
Chihuahua | 3 | Schnauzer | 1 |
French Bulldog | 3 | Siberian Husky | 1 |
Dachshund | 2 | Spitz | 1 |
Golden Retriever | 2 | Standard Poodle | 1 |
Border Collie | 1 | Welsh Corgi | 1 |
Chow Chow | 1 | Wheaten Terrier | 1 |
Before performing QC for analysis, we used Python code to remove non-autosomal information from the genomic data, SNPs with chromosome and position information of 0, and In/Dels present on autosomes, to initially select SNPs. Based on the initially selected SNPs, we used Perl code to create ped and map files and performed QC using the PLINK 1.9 program (Purcell et al., 2007; Chang et al., 2015). QC criteria included removing SNPs with a sample call rate < 90%, SNP call rate < 90%, and Hardy-Weinberg equilibrium (HWE)
To analyze the genetic diversity of the SNP data after QC, we calculated the polymorphic information content (PIC), observed heterozygosity (Ho), expected heterozygosity (He), and minor allele frequency (MAF) of each SNP using the R package snpReady. To perform SNP annotation, we created a Variant Call Format (VCF) file, which stores variant information and is compatible with SnpEff (version 4.3t), using the PLINK 1.9 program (Purcell et al., 2007; Chang et al., 2015). Annotation was performed using SnpEff (version 4.3t) and Canis lupus familiaris genome annotation data, CanFam3.1.86, to identify genes associated with each SNP. SNPs that did not exhibit genetic diversity were excluded from the annotation. Annotation was based on the chromosome number and SNP position of the Axiom Canine HD Array (Applied BiosystemsTM, USA) used in this study. The annotations included information on the associated genes, non-synonymous SNPs (nsSNPs), introns, untranslated regions (UTRs), and changes in the coding amino acids owing to SNP variations. The annotated VCF file was processed using SnpSift (version 4.3t) to extract the required data.
From the 730,754 SNPs obtained through the AxiomTM Canine HD Array (Applied BiosystemsTM, USA) analysis, we removed non-autosomal SNPs, SNPs with chromosome and position numbers of 0, and In/Dels located on autosomes. This resulted in the initial selection of 691,678 SNPs. We then applied QC criteria to the selected SNPs and removed those that did not meet the standards, resulting in a final set of 686,074 SNPs used for the study. The changes in the number of SNPs and the distances between SNPs according to the QC results are summarized in Table 2. The chromosome with the most removed SNPs after QC was chromosome 19, whereas the chromosome with the fewest removed SNPs was chromosome 23. Overall, a mean of 0.82% of the SNPs was removed, and the average distance between all SNPs increased from 3.134 kb to 3.160 kb.
Table 2. Number of SNPs and distance between SNPs before and after quality control.
Chromosome no. | Number of SNPs | Remove frequence | Mean of interval SNP | ||
---|---|---|---|---|---|
Before QC | After QC | Before QC (kb) | After QC (kb) | ||
1 | 35,715 | 35,432 | 0.79% | 3.435 | 3.462 |
2 | 24,571 | 24,364 | 0.84% | 3.476 | 3.506 |
3 | 28,801 | 28,593 | 0.72% | 3.190 | 3.213 |
4 | 27,514 | 27,311 | 0.74% | 3.208 | 3.231 |
5 | 28,191 | 27,988 | 0.72% | 3.154 | 3.177 |
6 | 23,329 | 23,151 | 0.76% | 3.324 | 3.350 |
7 | 25,125 | 24,966 | 0.63% | 3.222 | 3.243 |
8 | 21,626 | 21,447 | 0.83% | 3.437 | 3.465 |
9 | 17,789 | 17,613 | 0.99% | 3.433 | 3.467 |
10 | 20,569 | 20,416 | 0.74% | 3.369 | 3.395 |
11 | 21,017 | 20,845 | 0.82% | 3.538 | 3.568 |
12 | 23,193 | 22,999 | 0.84% | 3.126 | 3.152 |
13 | 20,295 | 20,111 | 0.91% | 3.116 | 3.145 |
14 | 18,582 | 18,407 | 0.94% | 3.280 | 3.311 |
15 | 19,003 | 18,855 | 0.78% | 3.378 | 3.405 |
16 | 18,208 | 18,062 | 0.80% | 3.271 | 3.298 |
17 | 20,475 | 20,320 | 0.76% | 3.135 | 3.159 |
18 | 16,869 | 16,692 | 1.05% | 3.307 | 3.342 |
19 | 16,533 | 16,359 | 1.05% | 3.250 | 3.285 |
20 | 18,071 | 17,914 | 0.87% | 3.216 | 3.244 |
21 | 15,679 | 15,558 | 0.77% | 3.244 | 3.269 |
22 | 18,843 | 18,668 | 0.93% | 3.257 | 3.288 |
23 | 16,739 | 16,645 | 0.56% | 3.124 | 3.142 |
24 | 15,529 | 15,422 | 0.69% | 3.071 | 3.092 |
25 | 16,242 | 16,116 | 0.78% | 3.178 | 3.203 |
26 | 12,423 | 12,322 | 0.81% | 3.136 | 3.162 |
27 | 15,376 | 15,238 | 0.90% | 2.980 | 3.007 |
28 | 14,034 | 13,935 | 0.71% | 2.933 | 2.953 |
29 | 14,024 | 13,899 | 0.89% | 2.982 | 3.008 |
30 | 13,087 | 12,995 | 0.70% | 3.073 | 3.094 |
31 | 13,417 | 13,284 | 0.99% | 2.973 | 3.002 |
32 | 13,166 | 13,044 | 0.93% | 2.946 | 2.974 |
33 | 10,929 | 10,836 | 0.85% | 2.871 | 2.896 |
34 | 13,498 | 13,378 | 0.89% | 3.120 | 3.148 |
35 | 10,568 | 10,496 | 0.68% | 2.509 | 2.527 |
36 | 11,343 | 11,250 | 0.82% | 2.716 | 2.738 |
37 | 11,242 | 11,160 | 0.73% | 2.747 | 2.767 |
38 | 10,063 | 9,983 | 0.79% | 2.376 | 2.395 |
Total | 691,678 | 686,074 | 0.82% | 3.134 | 3.160 |
The genetic diversity of the SNPs was assessed by calculating the MAF, PIC, He, and Ho values, and the results are presented in Table 3. After excluding 64,402 SNPs that did not exhibit diversity, the genetic diversity of 621,672 SNPs was assessed. The MAF range was between 0.211-0.233, with an overall mean of 0.220. The lowest value was observed on chromosome 37 (0.211), whereas the highest value was observed on chromosome 34 (0.233). We classified SNPs based on MAF intervals of 0.05 and examined their distribution (Fig. 1). On average, approximately 62,167 SNPs were identified. The highest number of SNPs, 89,153, was observed in the MAF range of 0.05 to 0.1, whereas the lowest, 48,118 SNPs, was observed in the range of 0.35 to 0.4. The PIC values ranged from 0.237 to 0.254, with a mean of 0.244. The lowest PIC value was observed on chromosome 37 (0.237), whereas the highest value was observed on chromosome 34 (0.254). He ranged from 0.292 to 0.315, which was higher than that of Ho, which ranged from 0.247 to 0.273. The overall mean values of He and Ho were 0.301 and 0.261, respectively, with higher values for He.
Table 3. Information on the genetic diversity of SNPs by chromosome.
Chromosome no. | No. | MAF | PIC | He | Ho |
---|---|---|---|---|---|
1 | 31,937 | 0.217 | 0.241 | 0.298 | 0.259 |
2 | 21,843 | 0.216 | 0.241 | 0.297 | 0.253 |
3 | 25,933 | 0.220 | 0.244 | 0.301 | 0.266 |
4 | 24,733 | 0.220 | 0.243 | 0.301 | 0.257 |
5 | 25,475 | 0.219 | 0.243 | 0.301 | 0.259 |
6 | 20,828 | 0.219 | 0.243 | 0.300 | 0.256 |
7 | 22,763 | 0.219 | 0.243 | 0.300 | 0.258 |
8 | 19,208 | 0.223 | 0.246 | 0.305 | 0.266 |
9 | 15,875 | 0.219 | 0.243 | 0.300 | 0.261 |
10 | 18,085 | 0.217 | 0.241 | 0.298 | 0.259 |
11 | 18,716 | 0.217 | 0.240 | 0.297 | 0.261 |
12 | 20,783 | 0.220 | 0.244 | 0.302 | 0.262 |
13 | 18,350 | 0.219 | 0.243 | 0.300 | 0.265 |
14 | 16,690 | 0.219 | 0.243 | 0.300 | 0.263 |
15 | 16,878 | 0.219 | 0.241 | 0.299 | 0.256 |
16 | 16,390 | 0.220 | 0.243 | 0.300 | 0.261 |
17 | 18,304 | 0.219 | 0.242 | 0.300 | 0.258 |
18 | 14,985 | 0.219 | 0.243 | 0.300 | 0.247 |
19 | 14,761 | 0.227 | 0.247 | 0.307 | 0.263 |
20 | 16,080 | 0.214 | 0.238 | 0.294 | 0.253 |
21 | 14,181 | 0.224 | 0.247 | 0.306 | 0.272 |
22 | 16,767 | 0.219 | 0.243 | 0.300 | 0.268 |
23 | 15,171 | 0.220 | 0.244 | 0.302 | 0.258 |
24 | 14,097 | 0.223 | 0.246 | 0.304 | 0.267 |
25 | 14,675 | 0.220 | 0.244 | 0.301 | 0.263 |
26 | 11,329 | 0.225 | 0.247 | 0.306 | 0.264 |
27 | 13,944 | 0.220 | 0.245 | 0.302 | 0.261 |
28 | 12,594 | 0.220 | 0.245 | 0.303 | 0.258 |
29 | 12,785 | 0.227 | 0.249 | 0.309 | 0.269 |
30 | 11,689 | 0.220 | 0.243 | 0.301 | 0.258 |
31 | 12,188 | 0.225 | 0.247 | 0.307 | 0.273 |
32 | 11,998 | 0.223 | 0.245 | 0.304 | 0.255 |
33 | 9,870 | 0.221 | 0.244 | 0.302 | 0.267 |
34 | 12,269 | 0.233 | 0.254 | 0.315 | 0.270 |
35 | 9,881 | 0.226 | 0.249 | 0.309 | 0.263 |
36 | 10,204 | 0.219 | 0.242 | 0.299 | 0.266 |
37 | 10,183 | 0.211 | 0.237 | 0.292 | 0.254 |
38 | 9,230 | 0.220 | 0.243 | 0.300 | 0.255 |
Total | 621,672 | 0.220 | 0.244 | 0.301 | 0.261 |
GD, genetic diversity; MAF, minor allele frequency; PIC, polymorphic information content; He, expected heterozygosity; Ho, observed heterozygosity..
Annotations were added for the 621,672 SNPs categorized based on sequence ontology (SO) terms and putative impacts (HIGH, MODERATE, LOW, and MODIFIER), as summarized in Table 4. Most of the SNPs (95.40%) were classified as “MODIFIER” in terms of putative impact, indicating an uncertain or minimal impact on gene function. The next most prevalent putative impact observed was “MODERATE,” accounting for 2.50% of the total. Additionally, “HIGH” putative impact, indicating a significant impact on gene function, was observed in 0.15%, whereas “LOW” putative impact, indicating a low impact on gene function, was observed in 1.95%.
Table 4. Number of SNPs classified by sequence ontology term after annotation.
Sequence ontology term | Putative impact | No. |
---|---|---|
Stop_gained | HIGH | 746 |
Splice_acceptor_variant | HIGH | 77 |
Splice_donor_variant | HIGH | 64 |
Stop_lost | HIGH | 26 |
Start_lost | HIGH | 16 |
Missense_variant | MODERATE | 15,572 |
Synonymous_variant | LOW | 9,787 |
5_Prime_UTR_premature_start_codon_gain_variant | LOW | 358 |
Splice_region_variant | LOW | 1,941 |
Stop_retained_variant | LOW | 7 |
Initiator_codon_variant | LOW | 15 |
Intron_variant | MODIFIER | 217,002 |
Intergenic_region | MODIFIER | 355,721 |
5_Prime_UTR_variant | MODIFIER | 2,290 |
3_Prime_UTR_variant | MODIFIER | 16,301 |
Non_coding_transcript_exon_variant | MODIFIER | 1,749 |
Total | 621,672 |
The “MODERATE” SO term includes missense_variant, where the variant is in the gene’s exon region, causing changes in amino acids corresponding to nsSNPs. The number of genes associated with nsSNPs under the “MODERATE” SO term was tabulated by chromosome and presented in Table 5. The average distribution of nsSNPs across chromosomes was 410, with chromosome 1 having the highest distribution at 949 and chromosome 29 showing the lowest distribution at 155. Chromosome 1 had the highest number of genes associated with nsSNPs (507), whereas chromosome 36 had the lowest number of genes (70).
Table 5. Number of non-synonymous SNPs and associated genes by chromosome.
Chromosome no. | No. of nsSNP | No. of associated gene |
---|---|---|
1 | 949 | 507 |
2 | 524 | 301 |
3 | 459 | 227 |
4 | 460 | 213 |
5 | 736 | 394 |
6 | 611 | 351 |
7 | 626 | 294 |
8 | 421 | 225 |
9 | 753 | 425 |
10 | 405 | 248 |
11 | 424 | 216 |
12 | 559 | 265 |
13 | 316 | 160 |
14 | 300 | 139 |
15 | 361 | 184 |
16 | 385 | 191 |
17 | 475 | 237 |
18 | 597 | 328 |
19 | 193 | 99 |
20 | 670 | 388 |
21 | 491 | 251 |
22 | 184 | 103 |
23 | 282 | 140 |
24 | 352 | 206 |
25 | 359 | 153 |
26 | 359 | 192 |
27 | 482 | 234 |
28 | 345 | 171 |
29 | 155 | 91 |
30 | 375 | 180 |
31 | 216 | 109 |
32 | 256 | 114 |
33 | 314 | 128 |
34 | 198 | 106 |
35 | 195 | 101 |
36 | 280 | 70 |
37 | 233 | 117 |
38 | 272 | 126 |
Total | 15,572 | 7,984 |
The SNP count and the number of associated genes for SO terms stop_gained, stop_lost, and start_lost, which have a “HIGH” putative impact and affect the start and stop codons, were summarized for each chromosome and presented in Table 6. Among these, the highest number of SNPs annotated with the stop_gained SO term was observed on chromosome 9 (40 SNPs), whereas chromosomes 14 and 31 had the lowest counts, each with seven SNPs. Chromosome 37 had the highest number of associated genes (37), whereas chromosome 31 had the lowest number (6). For SNPs annotated with both stop_gained and start_lost SO terms, a maximum of two SNPs per chromosome was observed, and there were chromosomes where no such SNPs were found.
Table 6. SNPs associated with start and stop codons and their related genes.
Chromosome no. | Stop_gained | Stop_lost | Start_lost | |||||
---|---|---|---|---|---|---|---|---|
SNP no. | Gene no. | SNP no. | Gene no. | SNP no. | Gene no. | |||
1 | 30 | 29 | 1 | 1 | - | - | ||
2 | 24 | 23 | 1 | 1 | - | - | ||
3 | 29 | 27 | - | - | 1 | 1 | ||
4 | 21 | 21 | - | - | - | - | ||
5 | 33 | 32 | - | - | - | - | ||
6 | 33 | 31 | 2 | 2 | - | - | ||
7 | 22 | 22 | 1 | 1 | - | - | ||
8 | 23 | 20 | - | - | - | - | ||
9 | 40 | 37 | 1 | 1 | 2 | 2 | ||
10 | 28 | 28 | 2 | 2 | 1 | 1 | ||
11 | 22 | 22 | 2 | 2 | - | - | ||
12 | 27 | 25 | - | - | 2 | 2 | ||
13 | 20 | 19 | 1 | 1 | 1 | 1 | ||
14 | 7 | 7 | - | - | - | - | ||
15 | 20 | 19 | 1 | 1 | 1 | 1 | ||
16 | 19 | 18 | 1 | 1 | - | - | ||
17 | 36 | 34 | - | - | 1 | 1 | ||
18 | 25 | 23 | 2 | 2 | 1 | 1 | ||
19 | 8 | 8 | 1 | 1 | 1 | 1 | ||
20 | 29 | 27 | 1 | 1 | - | - | ||
21 | 21 | 20 | 2 | 2 | - | - | ||
22 | 14 | 13 | - | - | - | - | ||
23 | 8 | 6 | 1 | 1 | - | - | ||
24 | 24 | 23 | 2 | 2 | 1 | 1 | ||
25 | 24 | 23 | - | - | - | - | ||
26 | 13 | 13 | 1 | 1 | 1 | 1 | ||
27 | 20 | 20 | 1 | 1 | - | - | ||
28 | 14 | 12 | - | - | - | - | ||
29 | 11 | 11 | - | - | - | - | ||
30 | 13 | 12 | - | - | 1 | 1 | ||
31 | 7 | 6 | 1 | 1 | 1 | 1 | ||
32 | 13 | 12 | 1 | 1 | - | - | ||
33 | 12 | 12 | - | - | - | - | ||
34 | 9 | 9 | - | - | 1 | 1 | ||
35 | 14 | 14 | - | - | - | - | ||
36 | 12 | 8 | - | - | - | - | ||
37 | 9 | 9 | - | - | - | - | ||
38 | 12 | 12 | - | - | - | - | ||
Total | 746 | 707 | 26 | 26 | 16 | 16 |
We conducted SNP chip analysis on 95 dogs from 26 breeds raised in South Korea to secure genomic data and ensure the accuracy of the analysis, and QC was performed. Finally, the genetic diversity and functional impact of SNPs were assessed using 686,074 SNPs through annotation. Genetic diversity was calculated for each SNP, which revealed that chromosome 34 exhibited the highest genetic diversity, whereas chromosome 37 showed the lowest diversity among the chromosomes. The PIC value is classified as follows: 0.5 or above is considered a highly useful marker, 0.25 to 0.5 is classified as an intermediate-level marker, and below 0.25 is considered a low-level useful marker (Botstein et al., 1980). The mean PIC value calculated for all SNPs was 0.244, and SNPs with PIC values of 0.25 or higher accounted for 53.80% of all the SNPs. The results of this study could help construct an SNP marker set for the identification of companion dogs. SNPs that did not exhibit genetic diversity were removed, and 621,672 SNPs were annotated with associated genes and SNP effects. Most SNPs appear to have either undetermined or minimal effects on gene function. However, over 16,000 nsSNPs that significantly alter gene function, were located in exons, and potentially changed the translated amino acids. These SNPs could potentially alter protein structure and, if harmful, may lead to disease. Therefore, further studies on the deleterious effects of nsSNPs are required.
As the number of households raising companion dogs continues to increase, the issue of stray dogs is also becoming more prevalent (Ko et al., 2020). The amount spent by the government on managing stray animals was reported to have increased from 10.44 billion won in 2014 to 26.7 billion won in 2020 (Yoo and Bae, 2022). As a measure against stray animals, the government has implemented a pet registration system using both external and internal microchips. However, there are concerns regarding the external chips being prone to loss, and internal chips implanted inside the body are often met with reluctance from pet owners. Common reasons for abandoning animals include behavioral issues, such as barking, biting, aggression, odor, and financial burdens due to diseases. In this study, we gathered information on SNP genetic diversity and SNP data related to genetic functions in domestically raised companion dogs. Based on the results of the present study, the establishment of a SNP set for individual identification could enable a gene-based registration system. Furthermore, the discovery and functional analysis of nsSNPs associated with behavior and diseases are expected to improve care for companion animals, potentially reducing the likelihood of abandonment.
None.
Conceptualization, H.S.K.; methodology, G.H.L., J.D.O., H.S.K.; investigation, G.H.L., J.D.O.; writing - original draft preparation, G.H.L., J.D.O.; writing - review and editing, G.H.L., J.D.O., H.S.K.; supervision, H.S.K.; project administration, H.S.K.; funding acquisition, H.S.K.
None.
Not applicable.
Not applicable.
Not applicable.
Not applicable.
No potential conflict of interest relevant to this article was reported.
Table 1 . Sample table used for analysis with companion dogs.
Breed | Count | Breed | Count |
---|---|---|---|
Maltese | 20 | Cocker Spaniel | 1 |
Mixed dog | 20 | Italian Greyhound | 1 |
Poodle | 10 | Jindo Dog | 1 |
Shih Tzu | 6 | Long haired Dachshund | 1 |
Bichon Frise | 5 | Miniature Pinscher | 1 |
Pomeranian | 5 | Miniature Schnauzer | 1 |
Yorkshire Terrier | 4 | Pointer | 1 |
Chihuahua | 3 | Schnauzer | 1 |
French Bulldog | 3 | Siberian Husky | 1 |
Dachshund | 2 | Spitz | 1 |
Golden Retriever | 2 | Standard Poodle | 1 |
Border Collie | 1 | Welsh Corgi | 1 |
Chow Chow | 1 | Wheaten Terrier | 1 |
Table 2 . Number of SNPs and distance between SNPs before and after quality control.
Chromosome no. | Number of SNPs | Remove frequence | Mean of interval SNP | ||
---|---|---|---|---|---|
Before QC | After QC | Before QC (kb) | After QC (kb) | ||
1 | 35,715 | 35,432 | 0.79% | 3.435 | 3.462 |
2 | 24,571 | 24,364 | 0.84% | 3.476 | 3.506 |
3 | 28,801 | 28,593 | 0.72% | 3.190 | 3.213 |
4 | 27,514 | 27,311 | 0.74% | 3.208 | 3.231 |
5 | 28,191 | 27,988 | 0.72% | 3.154 | 3.177 |
6 | 23,329 | 23,151 | 0.76% | 3.324 | 3.350 |
7 | 25,125 | 24,966 | 0.63% | 3.222 | 3.243 |
8 | 21,626 | 21,447 | 0.83% | 3.437 | 3.465 |
9 | 17,789 | 17,613 | 0.99% | 3.433 | 3.467 |
10 | 20,569 | 20,416 | 0.74% | 3.369 | 3.395 |
11 | 21,017 | 20,845 | 0.82% | 3.538 | 3.568 |
12 | 23,193 | 22,999 | 0.84% | 3.126 | 3.152 |
13 | 20,295 | 20,111 | 0.91% | 3.116 | 3.145 |
14 | 18,582 | 18,407 | 0.94% | 3.280 | 3.311 |
15 | 19,003 | 18,855 | 0.78% | 3.378 | 3.405 |
16 | 18,208 | 18,062 | 0.80% | 3.271 | 3.298 |
17 | 20,475 | 20,320 | 0.76% | 3.135 | 3.159 |
18 | 16,869 | 16,692 | 1.05% | 3.307 | 3.342 |
19 | 16,533 | 16,359 | 1.05% | 3.250 | 3.285 |
20 | 18,071 | 17,914 | 0.87% | 3.216 | 3.244 |
21 | 15,679 | 15,558 | 0.77% | 3.244 | 3.269 |
22 | 18,843 | 18,668 | 0.93% | 3.257 | 3.288 |
23 | 16,739 | 16,645 | 0.56% | 3.124 | 3.142 |
24 | 15,529 | 15,422 | 0.69% | 3.071 | 3.092 |
25 | 16,242 | 16,116 | 0.78% | 3.178 | 3.203 |
26 | 12,423 | 12,322 | 0.81% | 3.136 | 3.162 |
27 | 15,376 | 15,238 | 0.90% | 2.980 | 3.007 |
28 | 14,034 | 13,935 | 0.71% | 2.933 | 2.953 |
29 | 14,024 | 13,899 | 0.89% | 2.982 | 3.008 |
30 | 13,087 | 12,995 | 0.70% | 3.073 | 3.094 |
31 | 13,417 | 13,284 | 0.99% | 2.973 | 3.002 |
32 | 13,166 | 13,044 | 0.93% | 2.946 | 2.974 |
33 | 10,929 | 10,836 | 0.85% | 2.871 | 2.896 |
34 | 13,498 | 13,378 | 0.89% | 3.120 | 3.148 |
35 | 10,568 | 10,496 | 0.68% | 2.509 | 2.527 |
36 | 11,343 | 11,250 | 0.82% | 2.716 | 2.738 |
37 | 11,242 | 11,160 | 0.73% | 2.747 | 2.767 |
38 | 10,063 | 9,983 | 0.79% | 2.376 | 2.395 |
Total | 691,678 | 686,074 | 0.82% | 3.134 | 3.160 |
Table 3 . Information on the genetic diversity of SNPs by chromosome.
Chromosome no. | No. | MAF | PIC | He | Ho |
---|---|---|---|---|---|
1 | 31,937 | 0.217 | 0.241 | 0.298 | 0.259 |
2 | 21,843 | 0.216 | 0.241 | 0.297 | 0.253 |
3 | 25,933 | 0.220 | 0.244 | 0.301 | 0.266 |
4 | 24,733 | 0.220 | 0.243 | 0.301 | 0.257 |
5 | 25,475 | 0.219 | 0.243 | 0.301 | 0.259 |
6 | 20,828 | 0.219 | 0.243 | 0.300 | 0.256 |
7 | 22,763 | 0.219 | 0.243 | 0.300 | 0.258 |
8 | 19,208 | 0.223 | 0.246 | 0.305 | 0.266 |
9 | 15,875 | 0.219 | 0.243 | 0.300 | 0.261 |
10 | 18,085 | 0.217 | 0.241 | 0.298 | 0.259 |
11 | 18,716 | 0.217 | 0.240 | 0.297 | 0.261 |
12 | 20,783 | 0.220 | 0.244 | 0.302 | 0.262 |
13 | 18,350 | 0.219 | 0.243 | 0.300 | 0.265 |
14 | 16,690 | 0.219 | 0.243 | 0.300 | 0.263 |
15 | 16,878 | 0.219 | 0.241 | 0.299 | 0.256 |
16 | 16,390 | 0.220 | 0.243 | 0.300 | 0.261 |
17 | 18,304 | 0.219 | 0.242 | 0.300 | 0.258 |
18 | 14,985 | 0.219 | 0.243 | 0.300 | 0.247 |
19 | 14,761 | 0.227 | 0.247 | 0.307 | 0.263 |
20 | 16,080 | 0.214 | 0.238 | 0.294 | 0.253 |
21 | 14,181 | 0.224 | 0.247 | 0.306 | 0.272 |
22 | 16,767 | 0.219 | 0.243 | 0.300 | 0.268 |
23 | 15,171 | 0.220 | 0.244 | 0.302 | 0.258 |
24 | 14,097 | 0.223 | 0.246 | 0.304 | 0.267 |
25 | 14,675 | 0.220 | 0.244 | 0.301 | 0.263 |
26 | 11,329 | 0.225 | 0.247 | 0.306 | 0.264 |
27 | 13,944 | 0.220 | 0.245 | 0.302 | 0.261 |
28 | 12,594 | 0.220 | 0.245 | 0.303 | 0.258 |
29 | 12,785 | 0.227 | 0.249 | 0.309 | 0.269 |
30 | 11,689 | 0.220 | 0.243 | 0.301 | 0.258 |
31 | 12,188 | 0.225 | 0.247 | 0.307 | 0.273 |
32 | 11,998 | 0.223 | 0.245 | 0.304 | 0.255 |
33 | 9,870 | 0.221 | 0.244 | 0.302 | 0.267 |
34 | 12,269 | 0.233 | 0.254 | 0.315 | 0.270 |
35 | 9,881 | 0.226 | 0.249 | 0.309 | 0.263 |
36 | 10,204 | 0.219 | 0.242 | 0.299 | 0.266 |
37 | 10,183 | 0.211 | 0.237 | 0.292 | 0.254 |
38 | 9,230 | 0.220 | 0.243 | 0.300 | 0.255 |
Total | 621,672 | 0.220 | 0.244 | 0.301 | 0.261 |
GD, genetic diversity; MAF, minor allele frequency; PIC, polymorphic information content; He, expected heterozygosity; Ho, observed heterozygosity..
Table 4 . Number of SNPs classified by sequence ontology term after annotation.
Sequence ontology term | Putative impact | No. |
---|---|---|
Stop_gained | HIGH | 746 |
Splice_acceptor_variant | HIGH | 77 |
Splice_donor_variant | HIGH | 64 |
Stop_lost | HIGH | 26 |
Start_lost | HIGH | 16 |
Missense_variant | MODERATE | 15,572 |
Synonymous_variant | LOW | 9,787 |
5_Prime_UTR_premature_start_codon_gain_variant | LOW | 358 |
Splice_region_variant | LOW | 1,941 |
Stop_retained_variant | LOW | 7 |
Initiator_codon_variant | LOW | 15 |
Intron_variant | MODIFIER | 217,002 |
Intergenic_region | MODIFIER | 355,721 |
5_Prime_UTR_variant | MODIFIER | 2,290 |
3_Prime_UTR_variant | MODIFIER | 16,301 |
Non_coding_transcript_exon_variant | MODIFIER | 1,749 |
Total | 621,672 |
Table 5 . Number of non-synonymous SNPs and associated genes by chromosome.
Chromosome no. | No. of nsSNP | No. of associated gene |
---|---|---|
1 | 949 | 507 |
2 | 524 | 301 |
3 | 459 | 227 |
4 | 460 | 213 |
5 | 736 | 394 |
6 | 611 | 351 |
7 | 626 | 294 |
8 | 421 | 225 |
9 | 753 | 425 |
10 | 405 | 248 |
11 | 424 | 216 |
12 | 559 | 265 |
13 | 316 | 160 |
14 | 300 | 139 |
15 | 361 | 184 |
16 | 385 | 191 |
17 | 475 | 237 |
18 | 597 | 328 |
19 | 193 | 99 |
20 | 670 | 388 |
21 | 491 | 251 |
22 | 184 | 103 |
23 | 282 | 140 |
24 | 352 | 206 |
25 | 359 | 153 |
26 | 359 | 192 |
27 | 482 | 234 |
28 | 345 | 171 |
29 | 155 | 91 |
30 | 375 | 180 |
31 | 216 | 109 |
32 | 256 | 114 |
33 | 314 | 128 |
34 | 198 | 106 |
35 | 195 | 101 |
36 | 280 | 70 |
37 | 233 | 117 |
38 | 272 | 126 |
Total | 15,572 | 7,984 |
Table 6 . SNPs associated with start and stop codons and their related genes.
Chromosome no. | Stop_gained | Stop_lost | Start_lost | |||||
---|---|---|---|---|---|---|---|---|
SNP no. | Gene no. | SNP no. | Gene no. | SNP no. | Gene no. | |||
1 | 30 | 29 | 1 | 1 | - | - | ||
2 | 24 | 23 | 1 | 1 | - | - | ||
3 | 29 | 27 | - | - | 1 | 1 | ||
4 | 21 | 21 | - | - | - | - | ||
5 | 33 | 32 | - | - | - | - | ||
6 | 33 | 31 | 2 | 2 | - | - | ||
7 | 22 | 22 | 1 | 1 | - | - | ||
8 | 23 | 20 | - | - | - | - | ||
9 | 40 | 37 | 1 | 1 | 2 | 2 | ||
10 | 28 | 28 | 2 | 2 | 1 | 1 | ||
11 | 22 | 22 | 2 | 2 | - | - | ||
12 | 27 | 25 | - | - | 2 | 2 | ||
13 | 20 | 19 | 1 | 1 | 1 | 1 | ||
14 | 7 | 7 | - | - | - | - | ||
15 | 20 | 19 | 1 | 1 | 1 | 1 | ||
16 | 19 | 18 | 1 | 1 | - | - | ||
17 | 36 | 34 | - | - | 1 | 1 | ||
18 | 25 | 23 | 2 | 2 | 1 | 1 | ||
19 | 8 | 8 | 1 | 1 | 1 | 1 | ||
20 | 29 | 27 | 1 | 1 | - | - | ||
21 | 21 | 20 | 2 | 2 | - | - | ||
22 | 14 | 13 | - | - | - | - | ||
23 | 8 | 6 | 1 | 1 | - | - | ||
24 | 24 | 23 | 2 | 2 | 1 | 1 | ||
25 | 24 | 23 | - | - | - | - | ||
26 | 13 | 13 | 1 | 1 | 1 | 1 | ||
27 | 20 | 20 | 1 | 1 | - | - | ||
28 | 14 | 12 | - | - | - | - | ||
29 | 11 | 11 | - | - | - | - | ||
30 | 13 | 12 | - | - | 1 | 1 | ||
31 | 7 | 6 | 1 | 1 | 1 | 1 | ||
32 | 13 | 12 | 1 | 1 | - | - | ||
33 | 12 | 12 | - | - | - | - | ||
34 | 9 | 9 | - | - | 1 | 1 | ||
35 | 14 | 14 | - | - | - | - | ||
36 | 12 | 8 | - | - | - | - | ||
37 | 9 | 9 | - | - | - | - | ||
38 | 12 | 12 | - | - | - | - | ||
Total | 746 | 707 | 26 | 26 | 16 | 16 |
print Article | |
Export to Citation | Open Access |
Google Scholar | Send to Email |
pISSN: 2671-4639
eISSN: 2671-4663