Structure and Function of SARS-CoV-2 Spike Protein
Bing Chen
Jeremy Luban
SARS-COV-2 AND CORONAVIRUS PHYLOGENY
The coronavirus disease 2019 (COVID-19) pandemic is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).1 Coronaviruses (CoVs) are enveloped positive-strand RNA viruses, including those that caused the previous outbreaks of severe acute respiratory syndrome (SARS-CoV) and Middle East respiratory syndrome (MERS-CoV),2 the current pandemic (SARS-CoV-2), and four endemic common cold human CoVs (HCoV-229E, HCoV-NL63, HCoV-OC43, and HCoV-HKU1).3,4
CoVs are classified into four genera: Alphacoronavirus, Betacoronavirus, Gammacoronavirus, and Deltacoronavirus (Figure 4.1). Alpha-CoV and Beta-CoV only infect mammals; Gamma-CoV and Delta-CoV primarily infect birds.3,5 The Beta-CoV genus contains most pathogenic human CoVs and is further categorized into four clades (A, B, C, and D). Although the mutation rate of CoVs is relatively low because of the proofreading activity of its replication machinery,6 the global range of SARS-CoV-2 and the consequently vast number of replication events occurring every day make emergence of new variants inevitable. Transmissibility and immune evasion are two independent selective forces that drive the SARS-CoV-2 evolution, leading to multiple waves of COVID-19 cases dominated by infection of variants of concern (VOCs), such as Alpha, Beta, Gamma, Delta, as well
as Omicron and its subvariants (Figure 4.1).7 Each new VOC has rapidly replaced the previously most contagious variant because of its heightened transmissibility and immune evasion.
as Omicron and its subvariants (Figure 4.1).7 Each new VOC has rapidly replaced the previously most contagious variant because of its heightened transmissibility and immune evasion.
![]() FIGURE 4.1 Coronavirus genetic diversity. Taxonomy of CoVs.3,5 Seven human CoVs, including SARS-CoV-2 and its variants, are highlighted in red. BatCoV, bat coronavirus; BuCoV, bulbul coronavirus; HCoV, human coronavirus; MERS, Middle East respiratory syndrome; SARS, severe acute respiratory syndrome. |
CORONAVIRUS GENOME STRUCTURE AND PROTEIN PRODUCTION
CoVs form enveloped and spherical virions of 100 to 160 nm in diameter with characteristic crown-like morphology, encapsulating a single-stranded, positive-sense RNA genome of 26 to 32 kb in size (Figure 4.2A and B).8 When CoVs enter susceptible target cells, the genomic RNA is released into the cytoplasm where it acts as messenger RNA encoding proteins for the viral replication machinery. The first two-thirds of the viral genome encode two polyprotein chains—pp1a and pp1ab, which are processed into 16 nonstructural proteins (nsp1-16) required for viral transcription and replication. Translation of pp1ab requires a ribosomal frameshift stimulated by an RNA structure that shifts the ribosomes into the −1 reading frame.9,10 The last third of the genome encodes structural proteins, including spike (S), envelope (E), membrane (M), and nucleocapsid (N), as well as some accessory proteins that are species-specific and dispensable for viral replication. These structural and accessory proteins are translated from roughly nine different subgenomic RNAs that are generated during genome replication (Figure 4.2A).11
CORONAVIRUS ENTRY
Like other enveloped viruses, a CoV enters a host cell after fusion of its envelope lipid bilayer with the cell membrane. Although membrane fusion is energetically favorable, there are high kinetic barriers when two membranes approach each other, mainly because of repulsive hydration forces.12,13 For viral membrane fusion, free energy to overcome these kinetic barriers comes from refolding of virus-encoded fusion proteins from a high-energy, metastable prefusion conformational state to a low-energy, stable postfusion state.14, 15 and 16 The fusion protein of SARS-CoV-2 is its spike (S) protein, which is a type I, heavily glycosylated membrane protein with a transmembrane (TM) segment embedded in the viral membrane, and another membrane-interacting region, known as “fusion peptide (FP),” which can insert into the target cell membrane.17 Similar to other class I viral fusion
proteins, including human immunodeficiency virus 1 (HIV-1) envelope glycoprotein, influenza A virus hemagglutinin, and Ebola virus glycoprotein,14,15 the S protein is synthesized as a single-chain precursor that is trimerized and subsequently cleaved at a polybasic amino acid sequence into two fragments: the receptor-binding fragment S1 and the fusion fragment S2.18 The cleavage occurs in the trans-Golgi network of the virion-producing cell and is catalyzed by furin, a member of the subtilisin-like proprotein convertase family. To initiate the next round of infection, S protein binds to the angiotensin-converting enzyme 2 (ACE2) receptor on the surface of a new host cell for attachment. As depicted in Figure 4.3, the protein is further cleaved at a second site in S2 (S2′ site) by a host serine protease TMPRSS2 (transmembrane serine protease 2) for cell-surface entry or by an endosomal cysteine protease cathepsin L after endocytosis for endosomal entry.17,19,20 It then undergoes large conformational changes to insert the FP into the target membrane and then refold into the stable postfusion structure, which induces fusion of the two membranes,21,22 and release of the viral genomic RNA into the cytosol.
proteins, including human immunodeficiency virus 1 (HIV-1) envelope glycoprotein, influenza A virus hemagglutinin, and Ebola virus glycoprotein,14,15 the S protein is synthesized as a single-chain precursor that is trimerized and subsequently cleaved at a polybasic amino acid sequence into two fragments: the receptor-binding fragment S1 and the fusion fragment S2.18 The cleavage occurs in the trans-Golgi network of the virion-producing cell and is catalyzed by furin, a member of the subtilisin-like proprotein convertase family. To initiate the next round of infection, S protein binds to the angiotensin-converting enzyme 2 (ACE2) receptor on the surface of a new host cell for attachment. As depicted in Figure 4.3, the protein is further cleaved at a second site in S2 (S2′ site) by a host serine protease TMPRSS2 (transmembrane serine protease 2) for cell-surface entry or by an endosomal cysteine protease cathepsin L after endocytosis for endosomal entry.17,19,20 It then undergoes large conformational changes to insert the FP into the target membrane and then refold into the stable postfusion structure, which induces fusion of the two membranes,21,22 and release of the viral genomic RNA into the cytosol.
SARS-COV-2 SPIKE PROTEIN
SARS-CoV-2 spike protein is a type I membrane protein that has 1,273 amino acid residues for the original Wuhan-Hu-1 strain. It forms a trimer that is anchored to the viral membrane by its TM segment with its large ectodomain decorating the virion surface (Figure 4.4). After the cleavage at the S1/S2 boundary, the S1 fragment contains N-terminal domain (NTD), receptor-binding domain (RBD), and C-terminal domains (CTD1 and CTD2), and the S2 fragment includes FP, heptad repeat 1 (HR1), central helix (CH), connector domain (CD), heptad repeat 2 (HR2), TM segment, and the cytoplasmic tail (CT). The S protein is heavily glycosylated, with each protomer containing 22 N-linked glycosylation sites.23,24
The main function of the SARS-CoV-2 S protein is to catalyze the membrane fusion reaction, but it is also a major surface antigen that induces neutralizing antibody (nAb) responses in the host. It is therefore an important target for developing diagnostics, therapeutics, and vaccines against the virus and has been under intensive studies since the beginning of the pandemic. In particular, remarkable progress in the structural biology of SARS-CoV-2 S protein has been made very rapidly,25 advancing our knowledge of its structure and function considerably. Built upon the previous construct designs of other CoVs,26, 27, 28 and 29 the structures of S protein fragments derived from the original Wuhan-Hu-1 strain, including the S ectodomain stabilized in its prefusion conformation,30,31
RBD-ACE2 complexes,32, 33, 34 and 35 and segments of S2 in the postfusion state,36 were reported within the first several months of the pandemic. Soon after, structures of detergent-solubilized, full-length S proteins in both prefusion and postfusion conformations, determined by cryo-electron microscopy (cryo-EM),37,38 as well as those of the intact S trimers on the surface of virion, studied by cryo-electron tomography (cryo-ET),39, 40, 41 and 42 were also reported.
RBD-ACE2 complexes,32, 33, 34 and 35 and segments of S2 in the postfusion state,36 were reported within the first several months of the pandemic. Soon after, structures of detergent-solubilized, full-length S proteins in both prefusion and postfusion conformations, determined by cryo-electron microscopy (cryo-EM),37,38 as well as those of the intact S trimers on the surface of virion, studied by cryo-electron tomography (cryo-ET),39, 40, 41 and 42 were also reported.
Overall Structure of S Trimer
Using chemically inactivated authentic SARS-CoV-2 preparations, cryo-ET studies show that there are approximately 25 spikes on average, randomly distributed on each virion, with approximately 97% in the prefusion conformation and 3% in the postfusion conformation (Figure 4.5).39,40 The majority of the prefusion spike trimers adopt either the closed, three-RBD-down conformation that represents
a receptor-inaccessible state, or the one-RBD-up conformation that represents a receptor-accessible state (Figure 4.5A-C).30,31 Prefusion trimers can break the 3-fold symmetry and tilt around their membrane-proximal stalk toward the membrane by a wide range of angles, suggesting that part of the stalk forms a flexible hinge to allow tilting in all directions (Figure 4.5A).39, 40 and 41 The postfusion spikes show a distinct rod-like shape with N-linked glycans projecting from the side, and they are much more rigid than the prefusion spikes with their long axis perpendicular to the viral membrane (Figure 4.5D).
a receptor-inaccessible state, or the one-RBD-up conformation that represents a receptor-accessible state (Figure 4.5A-C).30,31 Prefusion trimers can break the 3-fold symmetry and tilt around their membrane-proximal stalk toward the membrane by a wide range of angles, suggesting that part of the stalk forms a flexible hinge to allow tilting in all directions (Figure 4.5A).39, 40 and 41 The postfusion spikes show a distinct rod-like shape with N-linked glycans projecting from the side, and they are much more rigid than the prefusion spikes with their long axis perpendicular to the viral membrane (Figure 4.5D).
Although three-dimensional reconstructions by cryo-ET are limited in resolution, cryo-EM single-particle analysis of purified SARS-CoV-2 spike protein preparations has revealed atomic details (Figure 4.6).30,31,37,43,44 In the prefusion structures of both the stabilized soluble trimer and the full-length trimer (Figure 4.6A-D), S1 shows a “V” shape with the NTD at one arm and the RBD, CTD1, and CTD2 at the other. These four domains wrap around the central helical bundle formed by the prefusion S2, in which the N-terminal end of HR1 projects toward the viral membrane. The RBDs from the three protomers form the apex of the S trimer, sampling either the “up” or “down” conformation (Figure 4.6A-D). The three NTDs are located at the periphery of the trimer, with each making contacts with the RBD from the neighboring protomer. The CTD1 and CTD2 pack underneath the RBD against S2 and between the two adjacent NTDs, suggesting that they may modulate the structural rearrangements of the RBD required for membrane fusion. The overall structures of all the known SARS-CoV-2 variants, exemplified by one of the full-length Omicron BA.1 S trimer conformations shown in Figure 4.6E, are very similar to that of the original strain, with mutations largely decorating the surface of the trimer.45, 46, 47 and 48
![]() FIGURE 4.6 Cryo-EM structures of the purified spike trimers in the prefusion and postfusion conformations. (A) and (B) Cryo-EM structures of the stabilized soluble S ectodomain trimer in the closed, three RBD-down prefusion conformation (A) and in the one RBD-up conformation (B) Various structural components are in the color scheme shown in Figure 4.2. (C) and (D) Cryo-EM structures of the detergent-solubilized full-length S trimer in the closed, three RBD-down prefusion conformation (C) and in the one RBD-up conformation (D). (E) Cryo-EM structure of the Omicron BA.1 full-length S trimer in the closed, three RBD-down prefusion conformation. All BA.1 mutations, as compared to the Wuhan-Hu-1 sequence, are highlighted in sphere model. (F) Cryo-EM structure of the full-length S2 trimer in the postfusion conformation reconstituted in lipid nanodiscs. α1148-1155, an α-helix formed by residues 1148-1155; β1127-1135, a β-strand formed by residues 1127-1135; β718-729, a β-strand formed by residues 718-729 in the S1/S2-S2´ fragment; 3H, three-helix segment; CD, connector domain; CH, central helix region; cryo-EM, cryo-electron microscopy; CT, cytoplasmic tail; CTD1, C-terminal domain 1; CTD2, C-terminal domain 2; FPPR, fusion peptide proximal region; HR1, heptad repeat 1; HR2, heptad repeat 2; i-FP, the internal fusion peptide; NTD, N-terminal domain; RBD, receptor-binding domain; TM, transmembrane anchor. |
The studies with the full-length S trimers have identified the FPPR (fusion peptide proximal region; residues 828-853) and 630 loop (residues 620-640) as control elements that may modulate the RBD movement, as well as the kinetics of the S structural rearrangements.37,43 The RBD-up movement apparently pushes the FPPR and 630 loop out of their original positions in the closed conformation, making them invisible in cryo-EM maps. In the three-RBD-down conformation of the G614 trimer, all three pairs of the FPPR and 630 loop are structured, whereas in the one-RBD-up conformation of the G614 trimer, only one FPPR and 630 loop pair is ordered. Mutations in certain variants cause changes in the configurations of the FPPR and 630 loop, thereby altering the S trimer stability and the RBD movement.43,45,46,48 Other cryo-EM single-particle studies using soluble S trimers indicate that either the D614G or H655Y mutation greatly increases the probability that RBD domains are in the open conformation, as compared to the Wuhan-Hu-1 spike trimer.49,50
In the postfusion state, S1 dissociates as a monomer from S2, whereas the latter adopts a rigid, rod-like shape, consistent with that observed by cryo-ET with the authentic viruses.39 The core structure of the postfusion S2 is a long central three-stranded coiled-coil made up by HR1 and CH (Figure 4.6F). A β-hairpin from the CD and a segment (residues 718-729; β718-729) from the S1/S2-S2′ fragment form a three-stranded β-sheet that wraps around the CH C-terminal end. This connector β-sheet and CH form the invariant structure between the prefusion and postfusion conformations. The segment of residues 737-769 in the S1/S2-S2′ fragment folds into three consecutive short α-helices (3H), stabilized by two disulfide bonds, and packing against the groove of the CH coiled-coil. In the C-terminal portion of S2, a β-strand formed by residues 1127-1135 (β1127-1135) upstream of HR2 from another protomer expands the three-stranded connector β-sheet into four strands, and projects HR2 toward the TM region. Moreover, a two-turn helix formed by residues 1148-1155 (α1148-1155) wedges between two neighboring 3Hs and a longer helix of HR2 makes up the six-helix bundle with the HR1 coiled-coil to further reenforce the very rigid postfusion structure.
The membrane-interacting segments in S2 are important for membrane fusion but missing in most postfusion structures except for one determined in the context of membrane (Figure 4.6F),44 in which nine membrane-spanning helices (three per protomer) are identified in the lipid bilayer. The region immediately upstream of HR1, previously known as the internal fusion peptide (i-FP) from the studies of SARS-CoV,51, 52 and 53 forms a continuous α-helix, which extends the HR1 coiled-coil well into the lipid bilayer, possibly accounting for the rigidity of the postfusion structure including the TM region. This first membrane-spanning helix is followed by a sharp U-turn within the membrane, and a second helix that spans through the lipid bilayer once again. There are a total of six TM helices formed by the i-FP from three protomers that pack tightly together to make up a blunted cone shape (Figure 4.6F). The third membrane-spanning helix is the TM segment, which is tilted relative to the plane of the membrane and gently wraps around the blunted cone. Part of the following CT
is structured and embedded horizontally in the cytosolic headgroup region of the lipid bilayer, and three copies of the CT region make up a triangle that caps the tip of the TM cone (Figure 4.6F).
is structured and embedded horizontally in the cytosolic headgroup region of the lipid bilayer, and three copies of the CT region make up a triangle that caps the tip of the TM cone (Figure 4.6F).
N-Terminal Domain
The NTD projects away from the 3-fold axis of the spike trimer at the periphery, and it is mainly formed by four stacked β-sheets and a number of connecting flexible loops, decorated by several N-linked glycans (Figure 4.7A). It is not known whether the NTD plays any functional role in
membrane fusion by SARS-CoV-2, but the NTDs from other CoVs have been shown to recognize sugars upon initial attachment or specific protein receptors,54,55 or facilitate the prefusion-to-postfusion transition of the S protein.56,57 Nevertheless, the NTD in the SARS-CoV-2 S trimer is targeted by some potent nAbs,58, 59 and 60 indicating that it may be functionally important or at least located in the vicinity of other functionally critical regions, such as the RBD. Interestingly, when the structures of the S trimers from SARS-CoV-2 and its variants in the closed prefusion conformation are superposed on the invariant S2 region, the most prominent differences are in the NTD.45, 46, 47 and 48 Each variant often contains a different set of point mutations, deletions, or even insertions in its NTD and these mutations reconfigure the N-terminal segment and all the surface-exposed loops (Figure 4.7A), which form important parts of the neutralizing epitopes.58 Thus, these structural changes, unique in each variant, drastically alter the antigenic surface of the domain, accounting for the loss of binding and neutralization by NTD-directed nAbs.43,45,46,48 The high level of tolerance to different mutations in the NTD is consistent with the hypothesis that this domain is functionally important because of its location rather than its structure.
membrane fusion by SARS-CoV-2, but the NTDs from other CoVs have been shown to recognize sugars upon initial attachment or specific protein receptors,54,55 or facilitate the prefusion-to-postfusion transition of the S protein.56,57 Nevertheless, the NTD in the SARS-CoV-2 S trimer is targeted by some potent nAbs,58, 59 and 60 indicating that it may be functionally important or at least located in the vicinity of other functionally critical regions, such as the RBD. Interestingly, when the structures of the S trimers from SARS-CoV-2 and its variants in the closed prefusion conformation are superposed on the invariant S2 region, the most prominent differences are in the NTD.45, 46, 47 and 48 Each variant often contains a different set of point mutations, deletions, or even insertions in its NTD and these mutations reconfigure the N-terminal segment and all the surface-exposed loops (Figure 4.7A), which form important parts of the neutralizing epitopes.58 Thus, these structural changes, unique in each variant, drastically alter the antigenic surface of the domain, accounting for the loss of binding and neutralization by NTD-directed nAbs.43,45,46,48 The high level of tolerance to different mutations in the NTD is consistent with the hypothesis that this domain is functionally important because of its location rather than its structure.
Receptor-Binding Domain
There are two subdomains in the RBD: a core structure formed by a five-stranded antiparallel β-sheet sandwiched by short connecting α-helices on both sides, and an extended loop, named the receptor-binding motif (RBM), which packs against one edge of the core structure and makes all the contacts with ACE2 (Figure 4.7B).26,32,34 In the closed prefusion state, the RBD in the down conformation packs against the central helical bundle of S2 and two other RBDs of the S trimer, while leaning on the CTD1 from the same polypeptide chain and the NTD from a neighboring protomer. This configuration partially occludes the RBM, making it inaccessible to the receptor ACE2. When the RBD moves up and fully exposes the RBM in the up conformation of the S trimer, the adjacent CTD1 and NTD shift away to accommodate the RBD movement.
The overall structure of the RBD has changed little among SARS-CoV-2 and its variants, except for a small shift of a short helix (residues 365-371) in various Omicron subvariants (Figure 4.7B), probably reflecting the need for the virus to keep the RBD structure intact in order to maintain its ability to engage the receptor and its fitness. Even in Omicron with approximately 8% of all RBD residues mutated, but without any deletions or insertions, almost all mutations are surface-exposed and modulate binding to ACE2 or nAbs. For example, N501Y has been shown to enhance ACE2 affinity by making additional contacts.61 Q493R and Q498R create new salt bridges with residues from ACE2.62, 63, 64 and 65 K417N and E484A (or some forms) may reduce the ACE2 affinity because of loss of ionic interactions with the receptor.45,66 S477N, T478K, and E484A in the tip of the RBM possibly together with K417N can probably account for loss of binding and neutralization by antibodies that target the RBM. G339, N440K, G446S, Q493R, G486S, Q498R, and Y505H, all aligned along the exposed surface of the S trimer in the RBD-down conformation, are probably responsible for resistance to the antibodies of this epitopic site. Mutations near the so-called “cryptic site,”67 which is occluded in the RBD-down conformation, are rare, explaining why this group of antibodies often with lower neutralization potency retain their binding to the new variants.
C-Terminal Domains
CTDs are formed primarily by β-structures, which are largely not affected by mutations in the variants (Figure 4.7C). The RBD can be considered an insertion between two antiparallel β-strands in the CTD1 and the latter an insertion between two antiparallel β-strands in the CTD2. A continuous strand (residues 306-330) runs through both the CTD1 and CTD2, connecting the NTD and the RBD on its two ends. The CTD1, packed at the bottom side of the RBD, needs to rotate outward with the RBD in the moving-up transition. The FPPR, abutting the opposite side of CTD1 from the RBD, may help clamp down the RBD and stabilize the closed conformation of the S trimer.37 Therefore, the CTD1 appears to be a structural relay between RBD and FPPR that can sense the displacement on either side. The latter is directly connected to the i-FP.

Stay updated, free articles. Join our Telegram channel

Full access? Get Clinical Tree

