Genetic Determinants of Susceptibility and Resistance to SARS-CoV-2
Maureen P. Martin
Mary Carrington
INTRODUCTION
The clinical manifestations after infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) vary greatly across individuals, ranging from mild disease occurring in the majority of cases to severe or critical disease in 15% to 20%, with an overall fatality rate of 2% to 5% based on data generated prior to the advent of vaccination against the virus.1,2 As is the case for all infectious diseases, there are many variables that differentially impact the severity of disease after infection, and the risk of infection is also likely to be complex. Although age is the strongest predictor of disease severity,3 other factors including comorbidities (asthma, cancer, cerebrovascular disease, etc.), physical inactivity, male sex, socioeconomic background, viral strain, and host genetic factors have all been associated with disease severity and/or infection status,4,5 and vaccination generally diminishes the impact of these variables (https://www.cdc.gov/coronavirus/2019-ncov/vaccines/vaccine-benefits.html). In this chapter, we will concentrate on the role of host genetic variation associated with susceptibility and outcome to SARS-CoV-2 infection.
Impact of Host Genetics on Viral Diseases
Human genetic variation has been associated with a large variety of viral disease outcomes and, in some cases, with resistance to infection. Polymorphisms in immune response genes (both innate and adaptive) that affect the differential functional capabilities of the molecules they encode represent the largest group of host genetic factors found to be associated with viral disease phenotypes. Most of these genetic variants are associated with disease outcome, such as severity of disease after
infection, but variants or mutations in genes involved in viral entry into host cells can affect both viral infection and disease pathogenesis. Two clear examples of genetic mutations that cause resistance to viral infection include a nonsense mutation in the α(1,2)fucosyltransferase gene (FUT2), which encodes a molecule that is required for the generation of the receptor for Norwalk virus infection,6 and a 32-base-pair deletion in the CCR5 gene (CCR5Δ32), which encodes the coreceptor for human immunodeficiency virus-1 (HIV-1) expressed on T cells and some antigen-presenting cells.7,8 Both of these mutations render complete or near-complete protection from infection with the respective viruses among homozygotes for the mutations. Identifying genetic variants that provide virtually complete resistance to infection is the holy grail of genetic association studies of viral disease, as they potently finger drug targets for infection resistance.
infection, but variants or mutations in genes involved in viral entry into host cells can affect both viral infection and disease pathogenesis. Two clear examples of genetic mutations that cause resistance to viral infection include a nonsense mutation in the α(1,2)fucosyltransferase gene (FUT2), which encodes a molecule that is required for the generation of the receptor for Norwalk virus infection,6 and a 32-base-pair deletion in the CCR5 gene (CCR5Δ32), which encodes the coreceptor for human immunodeficiency virus-1 (HIV-1) expressed on T cells and some antigen-presenting cells.7,8 Both of these mutations render complete or near-complete protection from infection with the respective viruses among homozygotes for the mutations. Identifying genetic variants that provide virtually complete resistance to infection is the holy grail of genetic association studies of viral disease, as they potently finger drug targets for infection resistance.
Many associations have been identified between genetic polymorphisms and outcome after viral infection, originally by candidate gene approaches and subsequently by genome-wide association studies (GWAS). A comprehensive account of genetic associations with viral disease is reviewed elsewhere,9 but a few examples in which clearly validated associations (including by GWAS) have been reported involve outcome after HIV-1, hepatitis C virus (HCV), and hepatitis B virus (HBV) infection. Untreated individuals with HIV-1 vary dramatically in disease outcome as measured by rate of progression to acquired immunodeficiency syndrome (AIDS), AIDS-defining illness, and viral load levels over time. Multiple variants within or near the CCR5 gene, some of which impact expression levels of CCR5 on the cell surface,10 including heterozygosity for CCR5Δ32, were identified by candidate gene approaches,10, 11, 12, 13, 14, 15 and 16 and some were confirmed later in GWAS of HIV-1-infected subjects.17 The most significant variant within the CCR5 locus, which was identified initially by GWAS,18 was subsequently shown to affect CCR5 expression through long noncoding RNA ( lncRNA) regulation.19
Variation within the human leukocyte antigen (HLA) loci has been shown to have clear associations in several viral diseases. The primary role of variation at the HLA class I loci, particularly HLA-B alleles, in modulating HIV-1 disease progression and viral control was reported pre-GWAS,20, 21, 22, 23 and 24 and was subsequently confirmed, generally with greater granularity, by GWAS.17,18,25, 26 and 27 Of note, most variants throughout the genome that originally showed significant effects using a candidate gene approach were not confirmed by genome-wide studies of HIV-1-infected subjects, underscoring the benefit and accuracy of agnostic approaches to the identification of host genetic effects on human disease.
Infection with HCV persists for years in 75% to 80% of untreated individuals, whereas the remainder spontaneously clear the virus within several months after infection.28 Among those with persistent infection, liver cirrhosis, hepatocellular carcinoma, or other complications can occur, generally years after initial infection.29 Variation in the HLA class II genes have long been associated with natural clearance of HCV,30, 31, 32, 33, 34 and 35 and HLA-DQB1 has been specified by GWAS to be a primary genetic locus associated with viral clearance versus persistence.36, 37, 38, 39 and 40 The strongest host genetic association with HCV clearance, however, lies within the IL28B locus, which was first reported in GWAS of response to therapy in chronically HCV-infected subjects41 and subsequently shown to be associated with clearance/persistence of HCV in natural history cohorts.42 The effect of IL28B variation on response to therapy and natural history of HCV infection were validated in multiple subsequent studies.43, 44, 45, 46, 47, 48 and 49 The IL28B gene, recently renamed IFNL3, encodes a type III interferon (IFN) that elicits the expression of antiviral genes.50
Chronic HBV infection occurs in nearly 95% of exposed infants, but in only 5% of those exposed in adulthood,51,52 contrasting with SARS-CoV-2 infection where children, and perhaps babies, have a lower fatality rate, are less likely to have severe disease, and are more likely to have fewer symptoms relative to adults.3,53,54 GWAS comparing chronically HBV-infected Japanese subjects, who were presumably infected perinatally, to controls without chronic HBV infection showed that variants in the HLA class II genes DPB1 and DPA1 are associated with the risk of developing chronic infection,55 and this locus was the most significant genome-wide. A study of African and European Americans, most of whom are thought to have been infected as adults, confirmed the DPB1 effect in a comparison of those with chronic infection versus those who cleared the virus.56 The influence of the immune response against HCV and HBV has been studied extensively,57 and it is not surprising that HLA represents a major host genetic factor in determining differential outcome to infection with these and other viruses.
Dengue virus can cause an acute systemic infection, potentially leading to life-threatening dengue shock syndrome (DSS), particularly in children.58 DSS involves increased vascular permeability with risk of severe bleeding and death. Variation at the MICB locus, which, like the HLA class I and II genes, maps to the major histocompatibility complex (MHC) on chromosome 6 and encodes an inducible ligand for the activating NKG2D receptor on natural killer (NK) cells and CD8+ T cells,59 and variants within the phospholipase C, ε-1 (PLCE1) locus on chromosome 10 both showed genome-wide significance in susceptibility to DSS among infected children from Vietnam.60 Targeted genotyping of these same single-nucleotide polymorphisms (SNPs) in subjects with less severe clinical disease of dengue suggested an effect of MICB and PLCE1 on a broad array of dengue phenotypes.61 These data, generated by a thorough genome-wide approach, warrant further validation and delineation of the functional basis for the genetic associations observed. An extensive array of host genetic effects in many other viral diseases have been identified and the vast majority of these are unique to the given viral disease,9 but polymorphism at the HLA loci may be a common denominator, just as it is across autoimmune diseases. Although HLA polymorphism was not identified in the original GWAS of SARS-CoV-2 disease outcome, more recent studies focusing on this region are now uncovering HLA associations (see “HLA” in “Targeted Candidate Gene Analyses” section).
Very few viral diseases have had the wide breadth of host genetic examination that has been displayed for SARS-CoV-2, perhaps matched only by HIV-1. The number of samples in many of the cohorts that have been collected and studied for genetic associations in an exceedingly short period of time is unmatched. This attention from a wide swath of the genetics community has pinpointed and confirmed, many times over in some cases, loci that affect COVID-19 phenotype and, perhaps with less certainty, SARS-CoV-2 infection. These studies are outlined in the following sections.
EXPERIMENTAL APPROACH TO IDENTIFYING HOST GENETIC VARIANTS THAT INFLUENCE SARS-COV-2 INFECTION AND DISEASE PATHOGENESIS
Recognition of the SARS-CoV-2 pandemic in early 2020 led to the immediate organization of consortia and study cohorts aiming to delineate host genetic factors that influence disease pathogenesis with the ultimate goal of developing predictive diagnostics and novel drug targets. The first of such reports involved samples collected cross-sectionally from multiple hospitals in the pandemic epicenter of four Italian and Spanish cities,62 which was rapidly followed by much larger studies involving data from rich population-based sources including the UK Biobank, 23andMe, and AncestryDNA.63, 64 and 65 Collaborative efforts across laboratories involving large numbers of scientists have been organized, most notably the COVID-19 Host Genetics Initiative (https://www.covid19hg.org).66 These and other data collections have provided the foundation for identification of host genetic variants involved in outcome to SARS-CoV-2 exposure. The unified and community-based drive to identify genetic loci accounting for SARS-CoV-2 infection and disease is evident by the rather extensive overlap in samples used across studies at this point in the pandemic.
COVID-19 severity has been the most common outcome studied to date, and certain loci have been validated across multiple studies, strongly supporting their involvement in disease pathogenesis. As expected, a number of discrepancies in genetic loci have also been observed across studies, perhaps due in part to distinctions in clinical comparison groupings, differences in clinical definitions across studies, and, in some cases, minimal statistical power. Large-scale studies have resolved the validity of some genetic variants associated with disease severity that were initially inconsistent across smaller studies. The effect of genetic variation on risk of infection has been studied fairly extensively, as well, but is hampered in some cases by the use of controls that are population based, unscreened for the virus, and lacking information regarding exposure to the virus. Thus, whether any common (>1% allele frequency) genetic variant confers virtually complete protection against infection is unknown. An effort is currently underway to identify a cohort of highly exposed subjects with proof of no current or previous infection, which is likely to improve the chances of identifying potential loci that render virtually complete protection against infection.67 Such attempts to design well-powered
cohorts with strict criteria in order to address specific clinical manifestations will galvanize the veracity of the genetic studies, particularly with regard to risk of infection.
cohorts with strict criteria in order to address specific clinical manifestations will galvanize the veracity of the genetic studies, particularly with regard to risk of infection.
Table 6.1 lists the various outcomes to SARS-CoV-2 infection (or risk of infection itself) and the corresponding citations that have been published as of December 2022. GWAS have been the first-line approach to identifying host genetic variation that distinguishes SARS-CoV-2-infected individuals based on severity of disease and comparisons of infected subjects versus population controls as a means to estimate risk of infection. Many subsequent (and generally smaller) studies have targeted the loci initially identified by GWAS, and these have been useful in validating the GWAS findings. A small number of associations were first identified by a candidate approach for ABO blood groups,68,69 and this locus was subsequently confirmed in GWAS. Certain regions of the genome that are particularly complex because of extreme polymorphism, duplications/deletions, and somatic recombination events, such as the immunoglobulin locus, T-cell receptor locus, and leukocyte receptor complex, are poorly covered by GWAS. SNPs and derived alleles at the HLA class I and II loci can be thoroughly interrogated based on data from GWAS chips, but many characteristics of these genes such as their expression levels, dependence on the peptide loading complex, and strength of interactions with innate immune receptors cannot be identified by SNPs or individual alleles derived from GWAS and require more complex analyses involving continuous variables. Thus, an immunologically important subset of genes and gene families will require more in-depth genetic and analytical interrogation to determine their impact on this disease.
TABLE 6.1 Major genetic determinants of COVID-19 outcome identified by genome-wide studies | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
WHOLE-GENOME ANALYSIS ON RISK OF COVID-19 PHENOTYPES
A striking number of genome-wide studies (mostly GWAS) on COVID-19 cohorts have been published since 2020 (Table 6.1), indicating the contribution of multiple genetic variants to the heterogeneity in disease phenotype based on various definitions of disease severity. Several studies have also compared SARS-CoV-2-infected subjects to population controls, some of whom may have tested negative for the virus depending on the study. The majority of studies were performed on subjects prior to widespread vaccination, but reports describing genetic associations with response to vaccination are starting to emerge.
Disease Severity
The first published GWAS examined 1,610 patients with severe COVID-19 (hospitalization with respiratory failure and confirmed SARS-CoV-2 infection) and 2,205 population controls of mostly unknown infection status from Italy and Spain.62 Two genome-wide significant loci mapping to chromosome 9q34.2 and 3p21.31 were identified: the first occurring at the ABO blood group locus, a locus that had already been identified by a targeted candidate approach with risk for infection,68,69 and the latter encompassing several genes including SLC6A20, LZTFL1, FYCO1, CXCR6, XCR1, and CCR9. Shortly thereafter, the UK-based GenOMICC (Genetics of Mortality in Critical Care) study of 2,244 critically ill patients and >8,000 ancestry-matched population controls identified the same chromosome 3 locus marking LZTFL1/CXCR6/FYCO1/CCR3, in addition to HLA-G, OAS1-3, CCHCR1, NOTCH4, DPP9, TYK2, and IFNAR2 as lead candidates.70 A meta-analysis restricted to individuals of European descent from GenOMICC, the COVID-19 host genetic initiative (HGI) (2,415 cases; 477,741 population controls), and 23andMe (1,128 cases and 679,531 population controls) validated the associations with the chromosome 3 locus, OAS1-3, TYK2, DPP9, CCHCR1, and IFNAR2.70 This meta-analysis heralded the ballooning of sample sizes analyzed in genome-wide studies with the ongoing pandemic and collaborative efforts across the genetics community.
The largest COVID-19 consortium, the HGI, involves an international effort that was established early in the pandemic,66 growing in size and strength to the present incorporation of 60 studies from 25 countries. The initial report from the HGI in March of 202171,72 and an update published subsequently73,74 ultimately described results from 125,584 cases stratified by three COVID-19-related phenotypes: critically ill (confirmed SARS-CoV-2 infection requiring hospitalization followed by respiratory support and/or death; N = 9,376), moderate or severe (hospitalization because of symptoms associated with infection; N = 25,027), and all cases with reported SARS-CoV-2 infection regardless of symptoms (N = 125,584; for a description of results regarding risk of infection, see the next section). Each of these groups was compared to 1.7 to 2.8 million population controls.74 Sixteen genome-wide significant loci associated with disease severity were identified, including replications of LZTFL1, OAS, DPP9, TYK2, and IFNAR2. The association with the OAS1/OAS2/OAS3 gene cluster on chromosome 12q24.1363,70,72,74, 75 and 76 is of interest because OAS proteins are IFN-inducible antiviral effector molecules that activate the latent form of ribonuclease L (RNaseL),77,78 leading to degradation of viral RNA, inhibition of viral replication, and death of the infected cell.78 Expression level of OAS1 in particular has been implicated in the underlying functional mechanism for the genetic association75,79 (see “Functional studies Supporting Genetic Associations with COVID-19” section). Of interest, the protective OAS haplotype first identified in the GenOMICC consortium70 has been suggested to be derived from Neanderthals.80
A large 23andMe study of disease severity based on pneumonia (N = 1,286), severe respiratory symptoms (N = 1,447), need for respiratory support (N = 636), or hospitalization (N = 25,027) identified multiple unique loci, including the well-documented SLC6A20/LZTFL1 region.64 The latter was also identified in a smaller GWAS based on individuals with severe disease (N = 1,244) and hospitalization (N = 3,260).65 A recent GWAS from the Spanish Coalition to unlock Research on Host Genetics on COVID-19 (SCOURGE) consortium, which included 11,939 SARS-CoV-2-positive cases who have not been used in any other GWAS dataset, emphasized genetic associations in males versus females for disease severity using a scale from 0 to 4 (asymptomatic, mild, moderate, severe, critical, respectively).76 The previously identified loci within 3p21.31 (SLC60A20, LZTFL1) and 21q22.11 (IFNAR2) were replicated when stratifying by hospitalization or the disease severity score in the global analysis, which included males and females, and in males only. Further meta-analyses indicated associations with other novel loci, some of which were sex specific (Table 6.1). A follow-up meta-analysis of the severe COVID-19 consortium, which compared 3,255 patients with COVID-19 with respiratory failure with 12,488 population controls from Italy, Spain, Norway, Germany, and Austria, identified the previous associations with the chromosome 3 locus, ABO, DPP9, KANSL1, and TYK2, but also a novel locus near the NAPSA gene on chromosome 19.81 This region was also shown to be associated with risk of infection in the HGI update.74 Taken altogether, GWAS have clearly underscored the importance of the locus containing SLC6A20/LZTFL1 on chromosome 3 along with several other loci in risk of disease severity (see Table 6.1 and Figure 6.1).
![]() FIGURE 6.1 Genetic loci that are consistently associated with COVID-19 and the significant variants identified by GWAS. Genome-wide significant loci that were replicated in multiple studies and the accompanying genetic variants that are associated with disease severity (upper section) and risk of infection (lower section).62,64,65,70,72,74,76,81,82,85 P values of SNPs are shown on the Y-axis. GWAS, genome-wide association studies; SNP, single-nucleotide polymorphism. |
Finally, a single whole-genome sequencing (WGS) study involving 7,491 subjects with critical illness compared to 48,400 population controls identified several novel loci,82 including FUT2 of the ABH histo-blood group family, a locus that confers complete protection from norovirus infection among those who are homozygous for an inactivating mutation of FUT2.6 The variant identified in the SARS-CoV-2 study was distinct from that which confers protection against norovirus infection.
Many loci identified by genome-wide approaches are unique to a single study and require validation. SFTPD on chromosome 1074 (involved in the control of lung inflammation83) and MUC5B on chromosome 1174 (associated with idiopathic pulmonary fibrosis [IPF]84) are two examples of interest because they were previously associated with lung disease, and therefore, the effects of variants in these genes on COVID-19 are biologically plausible. The most robust associations are those that have been validated across studies, which include variants in TYK2 on chromosome 19p13.2,64,70,72,74,81,82 IFNAR2 on chromosome 21q22.1,63,64,70,72,74,76,81,82,85 DPP9 on chromosome 19p13.3,63,64,70,72,74,81,82,85 and ABO on chromosome 9,62,81,86 in addition to 3p21.31, as discussed earlier (see Table 6.1 and Figure 6.1).
Resistance to Infection
Resistance to infection with SARS-CoV-2 has been estimated by comparing those who have tested positive for the virus versus population controls who, for the most part, are untested. Several studies with very large sample sizes have been reported,63, 64 and 65,72,74,76,85 the largest of which was conducted by the HGI. The ABO locus was associated with risk of infection in all but one of the studies, consistent with two original targeted observational studies showing a protective effect for blood group O and a susceptibility effect for blood group A against infection.68,69 There was no genome-wide significant effect of the ABO locus in the SCOURGE study,76 but associations with 3p21.31 (SLC60A20, LZTFL1) and 21q22.11 (IFNAR2) were observed for risk of SARS-CoV-2 infection, just as variants in both of these genes were for disease outcome in the same report. A rare ACE2 variant protective against
infection was observed, originally by the updated HGI report,73,74 and the same rare SNP (0.2%-2%) was identified in another large study of >52,000 subjects with COVID-19 when compared to >704,000 population controls with negative or unknown record of SARS-CoV-2 infection.85 The authors of the latter study showed that the rare variant, which was associated with protection against infection, was also associated with decreased ACE2 expression,85 potentially explaining the genetic association. A third infection susceptibility locus was identified within 3p21.31 near the SLC6A20 gene, which appears to be independent of the variant near the same set of genes that are associated with disease severity.63,65,72,74 The HGI GWAS implicates SNPs within the SLC6A20 gene as the likely locus for infection susceptibility.72,74 In this case, the risk haplotypes (as opposed to the protective haplotypes of the OAS locus) within the SLC6A20 gene region were suggested to have been inherited from Neanderthals.87
infection was observed, originally by the updated HGI report,73,74 and the same rare SNP (0.2%-2%) was identified in another large study of >52,000 subjects with COVID-19 when compared to >704,000 population controls with negative or unknown record of SARS-CoV-2 infection.85 The authors of the latter study showed that the rare variant, which was associated with protection against infection, was also associated with decreased ACE2 expression,85 potentially explaining the genetic association. A third infection susceptibility locus was identified within 3p21.31 near the SLC6A20 gene, which appears to be independent of the variant near the same set of genes that are associated with disease severity.63,65,72,74 The HGI GWAS implicates SNPs within the SLC6A20 gene as the likely locus for infection susceptibility.72,74 In this case, the risk haplotypes (as opposed to the protective haplotypes of the OAS locus) within the SLC6A20 gene region were suggested to have been inherited from Neanderthals.87
Genome-Wide Association Studies in Diverse Populations With COVID-19
The fraction of non-European subjects included in the large GWAS varies, but HGI represents perhaps one of the most diverse of study populations, where 25 countries have participated. Roughly 25% of COVID-19 cases involved subjects of non-European ancestry, and several loci were observed to have a significant heterogeneous effect across the 60 study groups included in HGI.74 Given their large sample collection, the HGI was able to test whether this heterogeneity across study groups was due to differences in effect sizes across ethnic groups. However, only a single locus, FOXP4, showed a significant difference in effect across the various ethnicities. These data suggest that, overall, the most significant genetic effects identified in well-powered GWAS may extend across distinct ancestral populations. Nevertheless, the predominant use of samples from individuals of European descent poses a limitation to identifying the full spectrum of human genetic factors that impact risk of SARS-CoV-2 infection and disease outcome in regions of the world where COVID-19 has had a severe impact. A small number of studies have been performed in non-European populations specifically, with limited sample sizes,86,88, 89, 90, 91, 92, 93 and 94 but novel loci associated with susceptibility to infection and disease severity have been implicated, raising the possibility of ancestry-specific susceptibility loci. The first COVID-19 GWAS in a Chinese population, derived from Wuhan, identified three significant signals that were associated with disease severity, including FOXP4-AS1 on chr6p21.1, ABO on chr9q34.2, and a novel locus on chr19q13.11, MEF2B.86 A variant on chromosome 5q35 that is prevalent in East Asians but rare in Europeans was identified as a risk allele for severe disease in a Japanese cohort,92 particularly in individuals below 65 years of age. This variant is close to the DOCK2 gene, which encodes a molecule involved in chemokine signaling, production of type I IFN, and lymphocyte migration.95,96 RNAseq analysis showed decreased expression of DOCK2 in younger patients with the risk allele, and immunohistochemistry of lung tissue from individuals with severe COVID-19 pneumonia displayed decreased levels of DOCK2 expression (presumably regardless of DOCK2 genotype) as compared to controls without COVID-19 or pneumonia.92 Thus, the risk variant may contribute to severe COVID-19 by further suppressing the expression of DOCK2. Other loci that have been found to be associated with disease severity include variants near DSTYK and RBBP5 on chromosome 1 in a Brazilian cohort (BRACOVID study)94 and two loci on chromosome 11 (11q23.3, 11q14.2) in a Chinese cohort.90 This latter study identified significant expression quantitative trait loci (eQTL) associations for REXO2, C11orf71, NNMT, and CADM1 at the 11q23.3 locus and CTSC at the 11q14.2 locus.90 Overall, these data highlight the need for a thorough examination of large cohorts from diverse populations to study the outcomes of SARS-CoV-2 exposure and infection.

Stay updated, free articles. Join our Telegram channel

Full access? Get Clinical Tree

