Molecular Biology of SARS-CoV-2



Molecular Biology of SARS-CoV-2


Roberto Patarca

William A. Haseltine




INTRODUCTION

Coronaviruses are the largest group within the Nidovirales order that includes eleven families1 and pathogens of veterinary and medical importance. Of ancient origin,2,3 the pleomorphic, lipid-enveloped coronaviruses are characterized by club-like spikes that project from their surface with a crown (corona in Latin)-like appearance under the electron microscope, a feature from which their name is derived. Coronaviruses have an unusually large, complex, nonsegmented, polycistronic, positive-sense, single-stranded RNA genome, the largest among RNA viruses, and a unique replication strategy.

Coronaviruses cause a variety of diseases in mammals and birds ranging from enteritis in cows and pigs to upper respiratory disease in chickens to mainly mild upper respiratory disease to potentially lethal human respiratory infections, particularly among immunocompromised individuals. First described in the 1960s, only two human coronaviruses (hCoVs) were known, hCoV-229E and hCoV-OC43, before 2002,4,5 and in 2004 to 2005, two additional ones, hCoV-NL63 and hCoV-HKU1, were discovered in clinical specimens.6,7 These four coronaviruses seasonally cause mainly mild upper respiratory disease, being responsible for approximately 5% to 30% of endemic common colds in humans globally, and thus regarded as relatively harmless respiratory pathogens.8, 9 and 10 However, in the past two decades, hCoVs have emerged with much greater morbidity, mortality, and pathogenicity, without a seasonal incidence.11,12

The severe acute respiratory syndrome (SARS)-coronavirus (referred to here as SARS-CoV-1 for clarity) emerged in 2002 in Guangdong province, China, and its subsequent global spread was associated with 8,096 cases and 774 deaths (nearly 10% mortality) secondarily mainly to severe lower respiratory disease.11,13, 14 and 15 A decade later, in 2012, another zoonotic coronavirus was identified as the causative agent of Middle East respiratory syndrome (MERS) and named MERS-CoV; by August 2016, MERS-CoV had infected 1,791 patients, with a mortality rate of 35.6%.16 Outbreaks of SARS-CoV-1 and MERS-CoV were contained with isolation and contact tracing approaches. Almost a decade later in late 2019, infections with the 2019 novel coronavirus, named SARS-CoV-2,17 were first reported in Wuhan, China,18, 19, 20 and 21 leading to the coronavirus disease 2019 (COVID-19) pandemic with significant global health challenges. The virus has a higher human-to-human transmission rate,22,23 facilitating rapid spread globally, and similar risk factors for mortality as SARS-CoV-1 and MERS-CoV.24

As of February 2023, there were well over 754 million confirmed positive cases and in excess of 6.83 million reported deaths globally (World Health Organization [WHO]; www.who.int). However, the full impact of the pandemic has been underestimated, driven by limitations in availability and access to diagnostic and treatment facilities, use of home testing, limited or no tracking of cases and contacts, effects of the pandemic on overall vaccination rates, disease burden, morbidity and mortality, and suboptimal and variable reporting.25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35 and 36 Other barriers include inadequate follow-up of affected individuals as well as definition and tally of COVID-19-related or excess mortality.37,38 The global numbers of infected people and deaths therefore range widely and are likely in the order of billions and tens of millions,39 respectively, with the pandemic affecting 70% to 80% of the
worldwide population. Age is the strongest epidemiologic predictor of COVID-19 deaths, with the risk of death doubling every 5 years of age from childhood onward.40,41 Likewise, mortality rates with SARS-CoV-1 and MERS-CoV among people older than 65 years exceed 50%. Men are also at greater risk of death than women.42

The emergence of a highly pathogenic hCoV in China in 2019 confirmed the long-held opinion that coronaviruses are important emerging and reemerging pathogens.43 As exemplified by Omicron and subsequent variants, further novel variants of SARS-CoV-2 have continued and are continuing to emerge with altered properties of antigenic escape and potentially antigenic shift on top of high transmissibility in this till unfolding pandemic.44 Infections of coronaviruses currently infecting humans are believed to have emerged repeatedly from zoonotic sources for the past 1,000 years, and the alpha- and betacoronaviruses to which they belong are estimated to have separated from the gamma and delta ones 300 million years ago. These extended periods have allowed coronaviruses infecting humans the opportunity to evolve and adapt to their host, rendering it particularly challenging to design preventive and therapeutic interventions that will be effective and durable. This is apparent with the yearly seasonal pattern of coronaviruses causing the common cold in humans, and recently with the lingering SARS-CoV-2-driven pandemic. Understanding the molecular biology of SARS-CoV-2 is therefore of critical importance to inform development of the needed antiviral arsenal to keep it at bay.

In this chapter, we review the molecular biology of SARS-CoV-2, highlighting insights gained and lessons learned from it and other coronaviruses. For each viral life cycle stage as well as gene and protein, we discuss how this knowledge is providing insight into how the virus evolves in its infectivity, virulence, pathogenicity, and evasion of the host’s immune system while informing development of approaches to diagnose, prevent, and treat infections by SARS-CoV-2 and other viruses including important emerging pathogens. In Chapters 7 and 16, we take a deeper dive into these topics while illustrating how research on SARS-CoV-2 pathogenesis and therapies is uncovering and providing granularity on signaling and metabolic pathways that are relevant to infectious and noninfectious diseases.


TAXONOMY, ZOONOTIC ORIGIN, AND EVOLUTION OF SARS-COV-2 AND OTHER HUMAN CORONAVIRUSES

Coronaviruses are classified into four distinct genera (alpha, beta, gamma, and delta),45,46 which are estimated to have separated 300 million years ago.2,3 The latter separation time is consistent with that estimated for the separation between mammals and birds47 and with the fact that alpha- and betacoronaviruses circulate in mammalian hosts, whereas gamma- and deltacoronaviruses mainly infect birds.3

SARS-CoV-2 is an enveloped, positive-sense, single-stranded RNA virus with a linear, nonsegmented genome of the order Nidovirales, family Coronaviridae, subfamily Orthocoronavirinae, genus Betacoronavirus, and subgenus Sarbecovirus.1,48 The natural hosts of betacoronaviruses are rodents and bats.

SARS-CoV-2 is one of seven coronaviruses thus far known to infect humans.49 Among the other six, two belong to the Alphacoronavirus genus: the Duvinacovirus hCoV-229E and the Setracovirus hCoV-NL63; and four to the Betacoronavirus genus: the Sarbecovirus SARS-CoV-1; the Merbecovirus MERS-CoV; and the Embecoviruses hCoV-OC43 and -HKU1. HCoV-229E, hCoV-NL63, hCoV-OC43, and hCoV-HKU-1 usually cause mild respiratory disease, whereas SARS-CoV-2 and SARS-CoV-1 as well as MERS-CoV cause severe respiratory diseases in afflicted individuals,16 and SARS-CoV-2 can have multiorgan effects.

Coronaviruses have a long history of cross-species transmission, and the seven that infect humans are suspected to have zoonotic origins,8,50 repeatedly emerging during the past 1,000 years.2 The evolutionary histories of seasonal hCoVs are highly complex, owing to frequent recombination between and within them, and uncertain, because of the undersampling of nonhuman viruses.9

In multigene and complete genome analysis of seasonal hCoVs,9 the recombination rate was highest for hCoV-229E and -OC43, whereas substitutions per recombination event were highest in hCoV-NL63 and hCoV-HKU1. HCoV-HKU1 had the earliest common ancestor (1809-1899) but
fell into two distinct clades (genotypes A and B), possibly representing two independent transmission events from murine-origin coronaviruses that appear to be a single introduction because of large gaps in the sampling of coronaviruses in animals. In fact, genotype B was genetically more diverse than all the other seasonal hCoVs.9 The most recent common ancestor of hCOV-HKU1 extant lineages is estimated to have existed in the 1950s.51

HCoV-OC43 is thought to have shared a common ancestor with bovine coronavirus around 120 years ago.52 However, depending on the gene studied, hCoV-OC43 may have ungulate, canine, or rabbit ancestors,9 and overall, shared amino acid substitutions in multiple proteins are present along the nonhuman to seasonal hCoVs host-jump branches.9 Among human betacoronaviruses, hCoV-OC43 and hCoV-HKU1 have the most common ancestry with rodent-borne coronaviruses, indicating that rodents are most likely the reservoirs and, more importantly, the plausibility for rodent coronaviruses spilling over and infecting humans.53

With often wide confidence intervals, the emergence of hCoV-NL63 and hCoV-229E occurred around 500 to 800 and 200 years ago, respectively.54,55 HCoV-229E may have origins in a bat, camel, or an unsampled intermediate host.9 Coronaviruses closely related to hCoV-229E were isolated from African hipposiderid bats56 and captive alpacas suffering from an acute respiratory syndrome.57,58

For SARS-CoV-1 and MERS-CoV, molecular dating studies estimated that they diverged from bat coronaviruses in the past three decades.59,60 Dromedary camels in Saudi Arabia harbor three different hCoV species, including a dominant MERS-CoV lineage that was responsible for the outbreaks in the Middle East and South Korea during 2015.10 SARS-CoV-2 and SARS-CoV-1 as well as MERS-CoV have bats as key reservoir61, 62, 63 and 64 and possibly intermediate hosts.16,65, 66, 67, 68, 69 and 70 Bat coronaviruses related to human ones belong to the Sarbecovirus, Nobecovirus, and Hibecovirus subgenera of betacoronaviruses.71, 72 and 73

The evolutionary origin of SARS-CoV-2 is unknown despite reports of SARS-CoV-2-related viruses in Asian Rhinolophus bats,74, 75, 76 and 77 including the closest virus from R. affinis, RaTG13,20 and pangolins.78, 79 and 80 A laboratory leak theory linking SARS-CoV-2 to the Mojiang mine incident in 2012 during which six miners fell sick and three died was put forth81 and dismissed82; however, the origin of SARS-CoV-2 remains a matter of debate (reviewed in reference83). An argument has been put forth that there is no determined origin to viruses and simply an evolutionary and selective process in which chance and the environment of a living organism play a key role, with the evolutionary process that gave rise to SARS-CoV-2 continuing with regular emergence of novel variants more adapted than the previous ones.84

Different progenitors probably contributed to the mosaic genome of SARS-CoV-2. SARS-CoV-2 progenitor bat viruses genetically close to SARS-CoV-2 and able to enter human cells through a human angiotensin-converting enzyme 2 (hACE2) pathway have been identified in cave bats living in the limestone karst terrain in northern Laos, in the Indochinese peninsula.85 The receptor-binding domains (RBDs) of these viruses differ from that of SARS-CoV-2 by only one or two residues at the interface with ACE2, bind more efficiently to the hACE2 protein than that of the SARS-CoV-2 strain isolated in Wuhan from early human cases, and mediate hACE2-dependent entry and replication in human cells, which is inhibited by antibodies that neutralize SARS-CoV-2.85 However, none of the bat coronaviruses characterized thus far contains a furin cleavage site in the spike protein that is associated with an increased pathogenicity in humans.85,86

Continued surveillance of coronaviruses across nonhuman hosts would be useful to address the complex evolution of coronaviruses and their frequent host switches.9 Since its appearance in humans, SARS-CoV-2 has evolved through sporadic mutations and recombination events,87 some of which correspond to gains in fitness allowing the virus to spread more widely or to escape neutralizing antibodies.88 In addition to coronavirus spread from animals to humans (spillover; zoonosis),89 there are numerous records of cross-species transmission of coronaviruses among nonhuman animals90,91 and of humans acting as a transmission source to wild or domestic animal species (spillback; reverse zoonosis).67,92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102 and 103

SARS-CoV-2 appears to have a wide vertebrate host range. This host-species promiscuity or generalist property has led to infections in many other animals, as well as establishment of the virus in wild species and spillback to humans.104 The virus has been detected in companion animals such as cats and dogs living with infected owners105; has caused large outbreaks with lethal disease in farmed minks (Neovison vison)106,107 and pet hamsters (Mesocricetus auratus)108; and spread among wild
white-tailed deer (Odocoileus virginianus).66,67,109 In North America, there is phylogenomic evidence of continued transmission of SARS-CoV-2 from humans to white-tailed deer and of a highly divergent lineage of SARS-CoV-2 in white-tailed deer (B.1.641) with shared ancestry with mink-derived virus and an epidemiologically linked human infection (spillover).66 The results of a comprehensive cross-sectional study assessing the prevalence, genetic diversity, and evolution of SARS-CoV-2 in white-tailed deer in the State of New York are consistent with occurrence of multiple spillover events (human to deer) of the Alpha and Delta lineages with subsequent deer-to-deer transmission and adaptation of the viruses long after their broad circulation in humans, thereby serving as a wildlife reservoir.67 Including aforementioned examples, SARS-CoV-2 can infect and replicate efficiently in cats, tigers, lions, gorillas, captive ferrets, and hamsters, with transmission to conspecifics.100,110, 111, 112, 113, 114, 115 and 116 In contrast, the virus replicates poorly in dogs, and experimental infection attempts in pigs, chickens, and ducks have been unsuccessful, suggesting that these species may not be susceptible.116

Interestingly, SARS-CoV-2 poorly infects bats and bat cells tested so far.117 For instance, the SARS-CoV-2 RBD binds to R. macrotis ACE2 with a lower affinity than to human ACE2.118 Moreover, throughout the pandemic, SARS-CoV-2 evolved into several variants such as Delta and Omicron, which exhibited evolutionary adaptation to binding to human ACE2 and less to its bat homolog. However, results of short-term molecular dynamic simulations are consistent with persistent cross-species SARS-CoV-2 variant infectivity not only between human and bats but also beyond because of the evolutionary distance between them.119 Moreover, a recent MERS-CoV-related virus, a Merbecovirus, was shown to be able to use ACE2.120 NeoCoV, the closest relative of MERS-CoV in bats, binds to bat ACE2 receptors and is restricted to binding to human ACE2 receptor by residues 337 to 342 in the latter; however, viruses carrying a mutation (T510F) in the RBD of NeoCoV spike efficiently enter cells expressing human ACE2.120 These results emphasize the need for surveillance for potential spillovers as a zoonotic threat.104 Of note, MERS-CoV prevalence in camel populations in Africa and the Middle East is extremely high. Moreover, MERS-CoV and SARS-CoV-2 coexist in the Middle East, especially in Saudi Arabia and the United Arab Emirates, where sporadic coinfection has been reported. Coinfection, because of reverse spillover of SARS-CoV-2 to camels or in double-infected humans, could lead to recombination between the two viruses, rendering either SARS-CoV-2 more lethal or MERS-CoV more transmittable.121


OVERVIEW OF SARS-COV-2 VIRION STRUCTURE, GENOME, AND LIFE CYCLE

Coronaviruses are roughly spherical, moderately pleiomorphic, and derive their name from the crown-like (corona in Latin) appearance of the 17- to 20-nm surface projections of the spike (S) glycoproteins or peplomers under the electron microscope.122, 123, 124 and 125 Scanning electron and atomic force micrographs of SARS-CoV-1 virions emerging from infected Vero cells show knobby, rosette-like viral particles.126

The host cell–derived viral membrane has two other viral structural proteins, the membrane (M) and the envelope (E) proteins, which bind to a fourth structural protein, the nucleocapsid (N) protein. These four structural proteins are multifunctional and important for coronavirus infectivity. M, which is the most abundant protein in the viral envelope, adapts a region of membrane for virus assembly and captures other structural proteins at the budding site forming a lattice into which the S and E proteins are incorporated. The N protein chaperones and protects the viral RNA genome, assembles it into a ribonucleoprotein complex, and packages it after its replication in the cytoplasm, which it also regulates. Spikes consisting of three copies of the S glycoprotein promote receptor-binding and membrane fusion with host targets. The small membrane protein E, despite being present in substoichiometric amounts and making up only a small percentage of the viral envelope, has multiple functions in virion assembly, morphogenesis, and virus-host interaction including acting as an enhancer of budding.46,127,128

The internal component of the coronavirus virion is obscure in electron micrographs of whole virions, giving the virion a “punched-in” spherical appearance. Imaging of virions that have burst spontaneously, expelling their contents, or that have been treated with nonionic detergents has led
to the attribution of another distinguishing characteristic of the positive-strand RNA coronaviruses, namely that they possess helically symmetric nucleocapsids. The latter configuration is typical of negative-strand RNA viruses, in distinction with the majority of positive-strand RNA animal viruses other than coronaviruses that have icosahedral ribonucleoprotein capsids.128

Packed in the helical nucleocapsid, coronaviruses have exceptionally large genomes; in fact, the largest is up to 32 kb in some cases,129 among RNA viruses, with a 5′-terminal cap structure and polyadenylated at their 3′-ends.130 Coronaviruses belong to the order Nidovirales, which derive their name from the Latin word “nidus,” meaning nest, because they have nested genomes that, in addition to the full-length genomic RNA, produce nested subgenomic RNAs that specifically encode structural and accessory proteins.

As exemplified by the SARS-CoV-2 genome in Figure 2.1, genomic organization in coronaviruses is highly conserved with two large overlapping open reading frames (ORF1a and ORF1b) at the proximal end, constituting two-thirds of the genome and encoding a replicase polyprotein, including an RNA-dependent RNA polymerase.46 ORF1a as well as ORF1ab are translated from a large 5′-genomic end–derived messenger RNA (mRNA). ORF1ab results from translation because of ribosomal frameshifting facilitated by a pseudoknot structure between ORF1a and ORF1b. Sixteen proteins are produced from ORF1ab as a result of autocleavage by the proteases included in the ORF1a protein. These proteins exhibit various functions, such as RNA-dependent RNA polymerase, helicase, proteases, and proteins that regulate cellular functions (Figures 2.1 and 2.2).









































The distal third of the coronaviral genome includes several additional ORFs that encode both structural and accessory proteins and differ in number as well as sequence even among coronaviruses that are closely related or belong to the same lineage.2 The structural proteins encoded by the distal third of the genome include the N, M, E, and S proteins,131 and in SARS-CoV-2 and SARS-CoV-1, the accessory ORFs are very similar, which allowed to infer and provided insight into the evolution that led to the emergence of SARS-CoV-2 and COVID-19.132 In addition to those provided in Figure 2.1, details on the function of accessory proteins, as well as additional ones on nonstructural and structural proteins, are provided in Chapter 7. ORF3b-d are not covered because of paucity of data on their presence, structure, and function.

Nidovirus genomes appeared to have reached different points on an expansion trajectory dominated by consecutive increases in size of ORF1b, ORF1a, and 3′-ORFs, with a unidirectional






hierarchical relation between these genome regions, which are distinguished by their expression mechanism and cooperate bidirectionally on a functional level in the virus life cycle, in which they predominantly control genome replication, genome expression, and virus dissemination, respectively.133 In fact, coronavirus genome expansion allowed the acquisition and maintenance of genes in the distal third of the genome encoding diverse accessory proteins that often contribute to adaptation to specific hosts and immune evasion.

Similar to other RNA viruses, SARS-CoV-2 must enter a target/host cell; reprogram it to ensure its replication; exit the host cell; and repeat this cycle for exponential growth.134 The SARS-CoV-2 life cycle within the host involves attachment, penetration, translation of early proteins, biosynthesis,
assembly, release, and transmission, which will be covered in the following sections. SARS-CoV-2, as do other coronaviruses, engages with a host cell-surface receptor and through direct membrane fusion or endocytosis transfers its genome into the host cell cytoplasm. The positive-sense RNA genome is translated by the host translation machinery to make polyproteins that are cotranslationally cleaved by proteases encoded in the polyprotein to generate components of the RNA-dependent RNA polymerase complex. RNA-dependent RNA polymerase then uses the genome as a template to generate negative-sense subgenome and genome-length RNAs, which are in turn used as templates for synthesis of positive-sense full-length progeny genomes and subgenomic RNAs. Transcription and replication take place in convoluted membranes (CMs) adjacent to double-membrane vesicles (DMVs) that are both derived from rough endoplasmic reticulum. The subgenomic RNAs are translated into structural and accessory proteins. The positive-sense genomic RNA is bound by the nucleocapsid (N) protein and buds into the endoplasmic reticulum-Golgi intermediate compartment (ERGIC), which is studded with structural proteins S, E, and M. The enveloped virion is then exported from the cell by exocytosis.


VIRAL ATTACHMENT AND PENETRATION

To achieve the first step of attachment and penetration in the viral life cycle, the viral spike (S) protein must engage in interactions with specific molecules, such as membrane receptors, glycans, and glycolipids, on the crowded host cell surface. This complex task is aided by the specificity and strength of the interaction between viral S and cellular molecules and by conformational changes in S.135 Enveloped viruses, such as SARS-CoV-2, enter the host cell via endocytosis and are then released into the cytosol directly at the plasma membrane, or follow the endocytic pathway and enter the cytoplasm through early or late endosomes depending on their signals to trigger and support fusion136; how these two pathway choices operate for SARS-CoV-2 and its variants remains to be clarified.135 Enveloped viruses that enter cells by fusing in endosomes traverse the endocytic pathway until they reach an endosome that has all of the environmental conditions (pH, proteases, ions, intracellular receptors, and lipid composition) to, if needed, prime and in all cases trigger and support membrane fusion.136 The endosomes also aid in shielding the virus from host cell defense attack.


Spike (S) Protein Structure, Priming by Host Proteases, and Fusion

The S glycoprotein is the major determinant of virulence of coronaviruses by mediating their binding to specific host receptors.137 Engagement by the SARS-CoV-2 S glycoprotein of cellular receptors and priming of S by host cell proteases allows viral attachment, membrane fusion, and entry into host cells as well as syncytia formation among them.20,138 The expression profile and structural features of these cellular receptors are the main cell, tissue, and host range determinants.

SARS-CoV-2 S consists of 1,273 amino acids, is the largest prototypical class I viral fusion protein known,139,140 and forms a homotrimer (Figure 2.2).141, 142 and 143 As in other coronaviruses, the individual monomers are composed of two functionally distinct subunits, S1 and S2,144,145 and three overall domains: head, central stalk, and cytoplasmic tail. The S1 region contains an N-terminal domain (NTD) and the RBD (also referred to as C-terminal domain or CTD), which interacts with cellular receptors, the main one being ACE2, permitting entry and infection of host cells.138,146,147 The S2 region includes the fusion peptide within a membranotropic-interacting region, which helps merge viral and host membranes138,146, 147 and 148 and is similar to those of the fusion proteins of Ebola virus149 and human and simian immunodeficiency viruses150, 151, 152 and 153: two 4,3-hydrophobic heptad repeats (HR1 and HR2) responsible for the formation of a six-helix bundle, and the transmembrane region.154,155

SARS-CoV-2 resembles other coronaviruses in that its entry depends on diverse host cell proteases. Host cell protease expression mainly dictates which viral entry pathways are preferred and could explain why some drugs, such as hydroxychloroquine, targeting one but not both pathways are ineffective at reducing the SARS-CoV-2 burden in patients.156

Located between S1 and S2 of SARS-CoV-2 is a unique furin cleavage site, which is cleaved by the transmembrane trypsin-like serine protease 2 (TMPRSS2)138,143,157 for S protein priming as a necessary step for infection.158,159 The TMPRSS2-mediated cell surface entry route is considered dominant for SARS-CoV-2 and sufficient to inhibit viral infection.138 Furthermore, ACE2
and TMPRSS2 are largely coexpressed by the main cellular targets of SARS-CoV-2 in vivo, such as epithelial cells within the lower and upper airway, the nasal passage, and the gut.160,161

The methyltransferase phosphorylated CTD interacting factor 1 (PCIF1) plays a major role in facilitating infection of primary human lung epithelial cells and cell lines by SARS-CoV-2, variants of concern, and other coronaviruses. PCIF1 promotes infection by sustaining expression of the coronavirus receptors ACE2 and TMPRSS2 via N6,2′-O-dimethyladenosine (m6Am)-dependent mRNA stabilization. m6Am is an abundant RNA modification located adjacent to the 5′-end of the mRNA N7-methylguanosine (m7G) cap structure, and PCIF1 catalyzes m6A methylation on 2′-O-methylated A at the 5′-ends of mRNAs. In PCIF1-depleted cells, both ACE2/TMPRSS2 expression and viral infection are rescued by reexpression of wild-type, but not catalytically inactive, PCIF1. These findings suggest a role for PCIF1 and cap m6Am in regulating SARS-CoV-2 susceptibility and identify a potential therapeutic target for prevention of infection.162 In contrast, PCIF1 inhibits human immunodeficiency virus (HIV) infection by enhancing stability of the transcription factor ETS1 (ETS Proto-Oncogene 1) that binds HIV promoter to regulate viral transcription, and as a countermeasure, the HIV viral protein R (Vpr) interacts with PCIF1 and induces PCIF1 ubiquitination and degradation.163

SARS-CoV-2 with the furin site deleted replicates in hamsters and in transgenic mice expressing hACE2 but leads to less severe disease and protects from rechallenge with wild-type SARS-CoV-2.86 SARS-CoV-2 variants of concern such as Alpha, Delta, and Omicron possess mutations within the S1/S2 furin cleavage site that affect furin cleavage efficiency and S fusogenic activity.164, 165 and 166 Although the multibasic furin cleavage site of S is likely to contribute to expanding the host cell tropism of SARS-CoV-2,143,146,167 its role in enhanced infectivity and transmissibility of the virus is yet to be confirmed.

The entry pathway selection of SARS-CoV-2 depends on the levels of proteases, such as TMPRSS2, displayed at the plasma membrane. Although a high level of TMPRSS2 expression leads to S priming directly at the cell membrane, followed by rapid fusion and release at the plasma membrane in a pH-independent manner, low levels of TMPRSS2 or its absence leads to endosomal uptake and sorting into endolysosomes, where S priming occurs via proteolytic cleavage by the acid-activated lysosomal endopeptidase cysteine protease cathepsin L protease (CTSL)168,169 following clathrin-mediated endocytosis.138,170, 171 and 172 Overexpression of TMPRSS2 in non-TMPRSS2-expressing cells abolishes the dependence of infection on the cathepsin L pathway and restores sensitivity to the TMPRSS2 inhibitors.173 Nonetheless, the endolysosomal compartments also play critical roles in SARS-CoV-2 replication, including genome replication, late steps of viral replication, viral particle trafficking from the endoplasmic reticulum, and the endoplasmic reticulum-Golgi to lysosomes, from where they are released by lysosomal exocytosis.174

In contrast to host proteases acting as mediators of viral host cell entry, host interferon-induced transmembrane proteins (IFITMs) 1, 2, and 3 act as restriction factors of viral cell entry by inhibiting virus-cell fusion at hemifusion or pore formation stages175 by modifying the rigidity and/or curvature of the membranes in which they reside.175, 176, 177 and 178 IFITM1 is more active at inhibiting S-mediated fusion because it is mostly found at the cell membrane, whereas IFITM2 and 3 accumulate in the endolysosomal compartment. IFITMs inhibit entry of SARS-CoV-1, hCoV-229E, and MERS-CoV, but promote infection by hCoV-OC43.179, 180, 181, 182, 183 and 184 IFITMs, as well as other interferon-stimulated genes (ISGs), including LY6E and Cholesterol 25-hydrolase (CH25H), impair SARS-CoV-2 replication by blocking the fusion of virions.185,186 However, TMPRSS2 thwarts the antiviral effect of IFITMs. Therefore, the pathologic effects of SARS-CoV-2 are modulated by cellular proteins that either inhibit or facilitate syncytia formation.187

An additional proteolytic cleavage site located in the S2 (S2′) appears to be utilized for release of the fusion peptide, allowing for host cell penetration and fusion.188 In a widely accepted membrane fusion model, heptad repeat 1 undergoes a “jack-knife” transition to insert the fusion peptide into the target membrane and heptad repeat 2 folds back to bring the fusion peptide and transmembrane segments close together,189 in turn causing the two membranes to fuse into a single lipid bilayer.190

The exact location of the fusion peptide of coronaviruses remains under study.191 For SARS-CoV-1 S2, three membranotropic-interacting regions have been suggested as putative fusion peptides, including a glycosylated segment upstream of the S2′ cleavage site named the N-terminal fusion peptide192,193; the segment immediately distal to the S2′ cleavage site, widely accepted as the bona fide fusion peptide because of its high degree of conservation among coronaviruses and its stronger calcium-dependent membrane ordering activity potentially constituting a target for
developing universal coronavirus vaccines194,195; and the segment immediately proximal to heptad repeat 1 also known as the internal fusion peptide.192,196,197

An alternative fusion model has been put forth in 2023 based on a cryo-electron microscopy study of SARS-CoV-2 postfusion S2,198 which showed that only the internal fusion peptide inserts into the membrane probably directly coupled with the refolding of the adjacent heptad repeat 1 into the coiled-coil. Primary sequence conservation may not be required for a fusion peptide to correctly insert into the membrane. For instance, a four-residue deletion at the tip of the internal fusion peptide in hCoVs such as hCoV-OC43 and hCoV-229E is not likely to alter the overall shape of the fusion wedge.190

A fourth hydrophobic region, the pre-transmembrane domain, adjacent to the transmembrane domain also is important in SARS-CoV-1 fusion in concert with the internal fusion peptide.190,193,199 The latter pre-transmembrane domain is enriched in aromatic amino acids (ergo, also known as aromatic domain), is highly conserved among coronaviruses, and lies in an identical location to that of the aromatic domains of HIV and Ebola virus surface proteins.193

Specific molecular factors and cellular signaling are essential to trigger and contribute to the configurational transitions of S2 during fusion. These factors can be pH, divalent ions, cellular proteases, and other cellular or membrane proteins and their modifications by, for instance, glycosylation and phosphorylation.138,194,200 However, the essential molecular and cellular factors for the membrane fusion of SARS-CoV-2 are yet to be confirmed. It has been argued that S2′ cleavage is not needed and that direct evidence for the cleavage is not very strong. Also, there is a lack of clarity regarding the fusion pathway, that is, whether the virus fuses at the plasma membrane or requires a low pH environment in endosomes.135 Models used thus far do not reproduce all conditions present at the cellular membrane and tissue level and cannot capture the exact structural changes that occur in vivo.

Another consequence of expression of the viral S gene is that the S glycoprotein at the surface of infected cells, even in the absence of any other viral protein, can fuse with ACE2-positive neighboring cells to form syncytia.187 Cell-cell fusion allows viruses to infect neighboring cells without the need to produce free virus and contributes to tissue damage by creating virus-infected syncytia. TMPRSS2 also underlies SARS-CoV-2 S-mediated cell-cell fusion.201,202

In the absence of TMPRSS2, SARS-CoV-2 S can use the matrix metalloproteinases (MMPs), MMP-2 and -9, to induce cell-cell fusion.203 In cells expressing high levels of MMP-2/9 such as HT1080 cells, infection and syncytia formation induced by native SARS-CoV-2 Alpha were significantly reduced by MMP inhibitors. In lentiviral pseudotypes and virus-like particles (VLPs) harboring SARS-CoV-2 S of wild-type D614G or variants of concern, the various S glycoproteins differentially used the MMP pathway and preferential usage correlated with the extent of S1/S2 processing and syncytia formation. Increased serum levels of MMPs such as MMP-9 have been documented in patients with severe COVID-19.204 Therefore, in the context of hyperinflammation and dysregulated immune responses, MMPs could play a role in facilitating SARS-CoV-2 viral entry and syncytia formation, expanding tropism to TMPRSS2-negative cells, and exacerbating COVID-19.203

Syncytia have been observed in cell cultures and in tissues from individuals infected with SARS-CoV-2 or SARS-CoV-1, or MERS-CoV,138,205, 206, 207, 208 and 209 and may originate from direct infection of target cells or from the indirect immune-mediated fusion of myeloid cells. Fused pneumocytes expressing SARS-CoV-2 RNA and S proteins were observed in postmortem lung tissues of patients infected with COVID-19, indicating that productive infection leads to syncytia formation, at least in critical cases.207


Glycosylation of S and Cell Entry, Immune Evasion, and Pathogenesis

Coronavirus S proteins are extensively glycosylated, with 66 to 87 N-linked glycosylation sites per trimeric spike.210 SARS-CoV-2 S is approximately 67% similar143 and shares all 18 N-glycosylated sites with SARS-CoV-1 S.143,211, 212 and 213 Both SARS-CoV-1 and MERS-CoV S include 69 and SARS-CoV-2 S includes 66 N-linked glycan sequons per trimeric spike, with those at N165 and N234 modulating the conformational dynamics of the SARS-CoV-2 spike’s RBD.138,147,212,214,215

Protein glycosylation is crucial for infection by SARS-CoVs, and their surface proteins as well as the viral class I fusion proteins of influenza viruses, HIV-1, dengue, Lassa, Zika, and Ebola viruses have evolved to be highly glycosylated.216,217 As obligate parasites, viruses exploit host cell machinery
to glycosylate their own proteins during replication. These modifications often mask immunogenic protein epitopes from the host humoral immune system by occluding them with host-derived glycans.218, 219 and 220 Host cell–derived glycans facilitate diverse structural and functional roles during the viral life cycle, ranging from immune evasion by glycan shielding to enhancement of immune cell infection. SARS-CoV-2, influenza, and HIV rely on expression of specific oligosaccharides to evade detection by the host immune system.221,222 Additionally, other viruses such as Hendra, SARS-CoV-1, influenza, hepatitis, and West Nile rely on N-linked glycosylation for crucial functions such as entry into host cells, proteolytic processing, and protein trafficking.216,217 A recent analysis of the SARS-CoV-2 spike protein using large-scale molecular dynamics simulations found that the glycosylated spike has a higher barrier to opening and also energetically favors the down state over the up state. The S protein is characterized by down and up conformational states, which transiently interconvert via a hinge-like motion exposing the receptor-binding motif (RBM), which is composed of RBD residues S438 to Q506. The RBM is buried in the inter-protomer interface of the down S protein; therefore, binding to ACE2 relies on the stochastic interconversion between the down and up states. Analysis of the S protein opening pathway revealed that glycans at N165 and N122 interfere with hydrogen bonds between the RBD and the NTD in the up state, whereas glycans at N165 and N343 could stabilize both the down and up states. Epitope exposure for several known antibodies changes along the opening path, and the epitope for one of the antibodies tested is continuously exposed, explaining its high efficacy.223

Because glycosylation patterns in class I viral fusion proteins influence individual glycan composition and immunologic pressure across the protein surface,210 they could be used as therapeutic target. For instance, the glycoprotein complex of Lassa virus, the etiologic agent of hemorrhagic Lassa fever, uses matriglycan on α-dystroglycan as its major and most efficient cell surface receptor for viral entry.224, 225, 226, 227, 228 and 229 One Lassa virus glycoprotein complex trimer bears 33 N-linked glycans that together form a denser shield than counterparts of other human viruses; the only exception is HIV-1, which has a heavier glycan shield, with more glycans relative to the number of amino acids.210 Neutralizing antibodies against Lassa virus are rare; however, three broadly protective antibodies that target the Lassa virus glycoprotein complex were isolated from survivors of multiple Lassa virus infections and found to either circumvent or exploit specific glycans comprising the extensive glycan shield of the glycoprotein complex.230 These findings guided engineering of a next-generation glycoprotein complex antigen suitable for future neutralizing antibody and vaccine discovery and informed therapeutic approaches against Lassa fever, AIDS, and COVID-19.230


Phosphorylation as a Potential Regulator of S Expression and Furin Cleavage

Except for their short C-terminus, the S proteins of coronaviruses face the endoplasmic reticulum or Golgi lumen during viral replication231 and are absent in cytoplasm or nucleus, where most kinase signaling occurs. Nevertheless, protein phosphorylation occurs not only on cytoplasmic and nuclear proteins, but also on secreted proteins in the endoplasmic reticulum and Golgi lumen, as well as in the extracellular space.232, 233, 234 and 235 For instance, phosphorylation regulates fibroblast growth factor 23 targeting by furin.236

At the boundary of S1/S2 subunits, SARS-CoV-2 S contains the 680SPRRAR↓SV687 insertion with a RxxR cleavage motif for furin-like enzymes,50,143 which is absent in SARS-CoV-1. Although an analogous cleavage motif is present at S1/S2 of the MERS-CoV S,237,238 a crucial third arginine (R) residue comprising a RRxR motif is required for furin recognition in vitro, whereas the general RxxR motif in common with MERS-CoV is insufficient for cleavage.239 SARS-CoV-2 S contains 10 in vivo phosphorylated sites.159 The two serines (S in bold and underlined) at the edges of the SPRRAR↓SV insert can be efficiently phosphorylated by the two largest subfamilies of mammalian kinases, namely the proline-directed kinases (SP target motif) and basophilic protein kinases (RxxS target motif).240,241 Phosphorylation of these serine residues precludes cleavage by furin at the site.239

Phosphorylation of SARS-CoV-2 S also may potentially affect its expression levels and cellular trafficking. A study in clinically relevant immortalized cell line models and 2D human colon organoids showed that the T-1686568 glycogen synthase kinase-3 (GSK3) beta-inhibitor reduced viral load and protein translation of SARS-CoV-2 (including Delta and Omicron variants), with viral titer reduction matching lower levels of the SARS-CoV-2 S.242


Phospho-regulation of secreted proteins remains poorly understood and further studies may uncover novel regulatory mechanisms for SARS-CoV-2.239


Angiotensin-Converting Enzyme 2 Receptor as Cellular Receptor for S

There is no phylogenetic congruence between hCoVs and their diverse cellular receptor usage. The densely glycosylated SARS-CoV-1 and SARS-CoV-2 S proteins prominently exploit ACE2,20,62,138,143,146,148,243 as does that of hCoV-NL63, and anti-ACE2 antibody inhibits SARS-CoV-1 replication242; hCoV-229E uses aminopeptidase N,244 whereas MERS-CoV uses dipeptidyl peptidase 4 (DPP4 or CD26).2,245

The RBM of SARS-CoV-2 S binds to the N-terminal extracellular catalytic ectodomain, also known as the peptidase domain of ACE2, resulting in a SARS-CoV-2/ACE2 complex.147 Although SARS-CoV-1 and SARS-CoV-2 S share approximately 76% amino acid identity, the RBD of SARS-CoV-2 S binds to ACE2 with higher affinity than that of SARS-CoV-1 S,148,246,247 driven by key amino acid substitutions in the CTD147; however, affinity results vary based on the detection method. SARS-CoV-2 variants also differ in their binding capacity to ACE2. For instance, in a computational study, the Omicron variant had a greater affinity for ACE2 compared with the Delta variant because of a greater number of mutations in the RBD.248


Angiotensin-Converting Enzyme 2 as Determinant of Host, Tissue, and Cell Ranges

ACE2 binding is an ancestral trait of sarbecovirus RBDs that has subsequently been lost in some clades.249,250 ACE2 binding is highly evolvable, and for many sarbecovirus RBDs, there are single-amino-acid mutations that enable binding to new ACE2 orthologs, broadening the range of sarbecoviruses that should be considered to have spillover potential and contributing to the evolution of SARS-CoV-2 variants. However, the effects of individual mutations can differ considerably between viruses, as exemplified by the N501Y mutation in S, which enhances hACE2-binding affinity of several SARS-CoV-2 variants but substantially decreases it for SARS-CoV-1.249,250

Successful engagement of receptor protein orthologs is necessary during cross-species transmission,251 and the full host range of SARS-CoV-2 remains to be determined. Viral spillover from animal reservoirs can trigger public health crises, and knowing which viruses are primed for zoonotic transmission can focus surveillance efforts and mitigation strategies for future epidemics and pandemics.

In silico simulations of the binding affinity of SARS-CoV-2 S to ACE2 suggested that mammals including primates, cattle, hamsters, cetaceans, cats, dogs, bats, pigs, ferrets, civets, and pangolins could be highly susceptible to SARS-CoV-2,252,253 whereas a range of fish, amphibian, reptile, and bird species were predicted to have a very low risk for SARS-CoV-2 infection.252 However, in silico analysis also predicted a very low risk for SARS-CoV-2 infection of several bat and pangolin species, and other contradictory results have been reported.254 Additional factors that contribute to the host range of coronaviruses, such as interactions between host proteases and S, remain poorly studied in non-model species.255

Susceptibility of cell lines to SARS-CoV-2 and SARS-CoV-1 infection largely correlates with their surface expression of ACE2. Moreover, to cause disease in mice requires adaptation of SARS-CoV-2 and SARS-CoV-1 S to murine ACE2 or directed expression of human ACE2 in murine tissues, while knockout of ACE2 strongly reduces viral spread.256

The main target organs of SARS-CoV-2 in humans are considered to belong to the respiratory system, including the lungs and upper respiratory tract, but multiple organs, such as the heart, kidneys, liver, spleen, and gastrointestinal tract, can also be affected.257 ACE2 protein is expressed on the surface of lung alveolar epithelial cells and enterocytes of the small intestine, and ACE2 is present in arterial and venous endothelial cells and arterial smooth muscle cells in oral and nasal mucosa, nasopharynx, lung, stomach, small intestine, colon, skin, lymph nodes, thymus, bone marrow, spleen, liver, kidney, and brain.258 Even saliva from asymptomatic individuals with COVID-19 can be a potential source of viral transmission.259 The viral entry factors ACE2 and TMPRSS2 are widely expressed in the oral mucosa and salivary glands,260,261 and SARS-CoV-2 has been identified in the oral mucosa and salivary glands of patients with COVID-19. Moreover, SARS-CoV-2 infection and replication have been demonstrated in a human-induced pluripotent stem cell–derived salivary gland organoids model.262



Structural and Functional Features of Angiotensin-Converting Enzyme 2

ACE2 is the only known mammalian homolog of the zinc-metallopeptidase ACE, a critical regulator of the renin-angiotensin system.263,264 Unlike ACE, which functions as a peptidyl dipeptidase, ACE2 acts as a carboxypeptidase, able to cleave a single C-terminal residue from a number of physiologically significant peptides.265,266 ACE2 regulates the renin-angiotensin system by converting the potent vasoconstrictor angiotensin II to the vasodilatory peptide angiotensin-1-7.267 ACE2 is an 805-amino-acid glycoprotein with an apparent molecular mass of 120 kDa.266,268 Like ACE, ACE2 is a type I transmembrane protein comprising a short C-terminal cytoplasmic tail, a hydrophobic transmembrane region, and a heavily N-glycosylated N-terminal ectodomain containing the active site.266

ACE2 has a more restricted tissue distribution than ACE, being found predominantly in the heart, kidneys, and testes,265,266 where it plays a profound role in controlling blood pressure and preventing heart failure and kidney injury269, 270 and 271; however, low levels have been detected in a variety of tissues.258 For lung diseases, the loss of ACE2 activates the renin-angiotensin system, enhancing vascular permeability and lung edema and contributing to the pathogenesis of severe lung injury.272 The entry of SARS-CoV-2 into the cells markedly downregulates ACE2 receptors, which favors the progression of inflammatory and thrombotic processes.273 Therefore, treatment by blocking ACE2 receptor to prevent SARS-CoV-2 infection may have a negative effect and poor druggability274 depending on treatment timing and other variables.


Host Molecules Other Than Angiotensin-Converting Enzyme 2 Involved in S-Mediated Entry or Pathogenesis

As is also the case for influenza A virus (IAV) and HIV-1,135 the binding affinity (expressed as kD) or avidity of SARS-CoV-2 S does not represent the overall attachment strength of intact virions on the plasma membrane because a single virion contains multiple spike proteins either of the same or different functional types, as seen for SARS-CoV-2,275 IAV,276 and HIV-1277; multiple receptors and coreceptors can be accommodated within the membrane contact area of a virion, as seen for SARS-CoV-2278 and influenza A.279,280 This means a single virion can engage in interaction with multiple receptors, coreceptors, and other membrane factors on the plasma membrane, a phenomenon termed multivalent binding of viruses, as documented for SARS-CoV-2,278,281 influenza,282,283 and norovirus-like particles.284 It is well accepted that various virions, such as those for norovirus,282 influenza A,285 and HIV-1,286 exploit binding of this type for increasing their residence time and attaining an optimal attachment on the plasma membrane.

Beyond ACE2, several lines of evidence point to a broader human receptome for SARS-CoV-2 through which the S protein exploits additional receptors for infection287:



  • Although SARS-CoV-2 and SARS-CoV-1 share ACE2 as an entry receptor, their primary infection sites and clinical manifestations are significantly different.21,160,288, 289, 290 and 291


  • ACE2 is mainly expressed in enterocytes, renal tubules, gallbladder, cardiomyocytes, male reproductive cells, placental trophoblasts, ductal cells, eye, and vasculature.258 In contrast, SARS-CoV-2 has been detected in tissues with little ACE2 expression, including the liver, brain, and blood, and even the lung,289,292 where only a small subset of cells expresses ACE2.161,293 In the lungs, for instance, SARS-CoV-2 causes a high rate of cellular infection as confirmed by the pathogenesis of severe COVID-19. SARS-CoV-2 is able to infect organoids from diverse tissues, including lung, intestine, and brain (particularly choroid plexus epithelium).294, 295, 296, 297 and 298


  • Several large-scale single-cell transcriptome analyses of patients with COVID-19 revealed many virus-positive cells without ACE2 expression,299,300 suggesting that SARS-CoV-2 might infect cells in an ACE2-independent manner.


  • Intra- and extrapulmonary immune and nonimmune cells with ACE2 deficiency273,301 or absence302 are susceptible to SARS-CoV-2.


  • ACE2-independent entry has also been observed with mutated S proteins, with mutation E484D playing an important role.303,304


  • Human ACE2 is the key cell attachment and entry receptor for SARS-CoV-2, with the original SARS-CoV-2 isolates unable to use mouse ACE2. However, in vitro serial passaging of
    SARS-CoV-2 in cocultures of cell lines expressing human and mouse ACE2 led to the emergence of a SARS-CoV-2 strain capable of ACE2-independent infection and the evolution of mouse-adapted SARS-CoV-2.305 Mouse-adapted viruses evolved with up to five amino acid changes in S, all of which have been seen in human isolates. Mouse-adapted viruses replicated to high titers in C57BL/6J mouse lungs and nasal turbinates and caused characteristic lung histopathology. One mouse-adapted virus also evolved to replicate efficiently in several ACE2-negative cell lines across several species, including clustered regularly interspaced short palindromic repeats (CRISPR)-associated protein 9 (CRISPR/Cas9) ACE2 knockout cells. An E484D substitution is likely involved in ACE2-independent entry and has appeared in only approximately 0.003% of human isolates globally, suggesting that it provided no significant selection advantage in humans.

ACE2-independent entry reveals a SARS-CoV-2 infection mechanism that has potential implications for disease pathogenesis, evolution, tropism, and perhaps intervention development.305


Kringle Containing Transmembrane Protein 1 and Asialoglycoprotein Receptor-1

In an ACE2-negative cell line and in a mouse model, Kringle containing transmembrane protein 1 (KREMEN1) and asialoglycoprotein receptor-1 (ASGR1) were identified as S protein–binding partners that support ACE2-independent entry of SARS-CoV-2 but not SARS-CoV-1306 by binding not only the RBD but also the NTD of SARS-CoV-2 S. The pronounced differences between the NTDs in the SARS-CoV-2 and SARS-CoV-1 S proteins might at least partly account for the failure of SARS-CoV-1 to engage ASGR1 and KREMEN1 for entry. Entry via ASGR1 and KREMEN1 was generally less efficient than ACE2-dependent entry. Entry into a lung-derived cell line, HTB-182, and a liver-derived cell line, Li7, was ACE2-independent and KREMEN1-(HTB-182 cells) or ASGR1-dependent (Li7 cells) and was associated with resistance to neutralizing antibodies targeting the S protein/ACE2 interface. Thus, ASGR1 and KREMEN1 are bona fide SARS-CoV-2 receptors that might protect the virus from certain neutralizing antibodies,306 which highlights the impact of the target cell type on antibody-mediated neutralization.

Analysis of clinical samples revealed that relative expression of ACE2, ASGR1, and KREMEN1 in the respiratory epithelium was higher in SARS-CoV-2-infected than uninfected cells. Furthermore, expression of KREMEN1 in secretory cells correlated more strongly with susceptibility to infection than ACE2 expression. ASGR1- and KREMEN1-specific antibodies blocked S protein binding and entry into cell lines and reduced infection of lung organoids, suggesting a role for these factors in SARS-CoV-2 infection of the respiratory tract.306


Heparan Sulfate Proteoglycans and Sialic Acid–Containing Glycolipids

The highly sulfated cell surface glycosaminoglycans heparan sulfate and heparin act as attachment factors of SARS-CoV-2307, 308, 309 and 310 and generate a high surface negative charge. The SARS-CoV-2 RBD contains many positively charged amino acids,307,308 making it suitable for electrostatic interactions with the sulfated glycosaminoglycans. The SARS-CoV-2 S protein contains three glycosaminoglycan-binding sites, that is, 453-459 aa (YRLFRKS) within RBD, 681-686 aa (PRRARS) at the furin cleavage site of S1/S2, and 810-816 aa (SKPSKRS) of S2.308

Heparan sulfate proteoglycans307 and sialic acid–containing glycolipids311 enhance the binding of the S proteins of SARS-CoV-2 and SARS-CoV-1 to ACE2; they are therefore considered coreceptors of ACE2 because unlike KREMEN1 and ASGR1 they do not fulfill the central criterion for a receptor, that is, expression of a bona fide receptor renders cells susceptible to infection that are otherwise nonsusceptible. This is a potential contributing factor to the increased cellular transmission of SARS-CoV-2312 and may even regulate the cell tropism of the virus.288


Integrins α5β3 and α5β1

Integrins are multifunctional, heterodimeric cell surface adhesion molecules. They are receptors of extracellular ligands and transduce biochemical signals into the cell through downstream effector proteins. They are internalized and enter the endo/exocytic pathway before being recycled back to the plasma membrane.313 The trafficking of these proteins is modulated by multiple context-dependent pathways, such as clathrin-mediated endocytosis,314 caveolae-mediated endocytosis,315 and clathrin-caveolae–independent endocytosis.316


Integrins play important roles in cell proliferation, migration, apoptosis, tissue repair, as well as in all processes critical to inflammation, infection, and angiogenesis. They represent a gateway of entry for many viruses that successfully infect host cells using integrin-mediated endocytic pathways.313,317 Indeed, different viruses present the Arginine-Glycine-Aspartic acid (RGD) motif allowing their interaction and internalization through integrins.318, 319 and 320 The RGD motif is the smallest peptide sequence necessary for proteins to bind integrins.

SARS-CoV-2 S contains an RGD motif at the distal tip of the protein, on the surface of the RBD, forming a bend where the direction of the peptide chain reverses148,321 and with structural features reminiscent of known integrin-binding proteins.

All SARS-CoV-2 lineages have an RGD motif (aa 403-405) in their RBD. SARS-CoV-2 gains access into primary human lung microvascular endothelial cells lacking ACE2 expression through this conserved RGD motif.322 Following its entry, SARS-CoV-2 remodels cell phenotype and promotes angiogenesis in the absence of productive viral replication. Following infection, primary human lung microvascular endothelial cells release a plethora of proinflammatory and proangiogenic molecules. This conditioned microenvironment stimulates primary human lung microvascular endothelial cells to acquire an angiogenic phenotype and expression of antiviral molecules as annexin A6 and MX1. Therefore, SARS-CoV-2-infected primary human lung microvascular endothelial cells appear to play an important role in sustaining vascular dysfunction during the early phases of infection. The construction of virus-host interactomes is instrumental to identify potential therapeutic targets for COVID-19 aimed at inhibiting primary human lung microvascular endothelial cell–sustained inflammation and angiogenesis upon SARS-CoV-2 infection.322

αvβ3 integrin is the main molecule responsible for SARS-CoV-2 infection of primary human lung microvascular endothelial cells via clathrin-dependent endocytosis.322 Pretreatment of virus with αvβ3 integrin or pretreatment of cells with a monoclonal antibody against αvβ3 integrin inhibited SARS-CoV-2 entry into primary human lung microvascular endothelial cells. Surprisingly, anti-S antibodies evoked by vaccination were neither able to impair S/integrin interaction nor to prevent SARS-CoV-2 entry into primary human lung microvascular endothelial cells. These data highlight the RGD motif in the S as a functional constraint aimed at maintaining the interaction of the viral envelope with integrins. These findings beckon the development of intervention strategies aimed at neutralizing the SARS-CoV-2 integrin-mediated infection of ACE2-negative cells.323

SARS-CoV-2 S activates the endothelial cell inflammatory phenotype in a manner dependent on integrin α5β1 signaling. Incubation of human umbilical vein endothelial cells with whole S, its RBD, or the integrin-binding RGD peptide induces nuclear translocation of nuclear factor kappa B (NF-κB) and subsequent expression of leukocyte adhesion molecules (vascular cell adhesion molecule 1 [VCAM1] and intercellular adhesion molecule 1 [ICAM1]), coagulation factors (tissue factor [TF] and factor VIII [FVIII]), proinflammatory cytokines (tumor necrosis factor [TNF]α, interleukin [IL]-1β, and IL-6), and ACE2, as well as adhesion of peripheral blood leukocytes and hyperpermeability of the endothelial cell monolayer.324 Inhibitors of integrin α5β1 activation prevent these in vivo effects. Intravenous administration of S increased expression of ICAM1, VCAM1, CD45, TNFα, IL-1β, and IL-6 in the lung, liver, kidney, and eye, and intravitreal injection of S disrupted the barrier function of retinal capillaries. The latter findings suggest a direct action of SARS-CoV-2 S on endothelial cell dysfunction. S, through its RGD motif in the RBD, might bind to integrin α5β1 in endothelial cells to activate the NF-κB target gene expression programs responsible for vascular leakage and leukocyte adhesion. Integrin α5β1 therefore may constitute a promising target for treating vascular inflammation in COVID-19.324


CD147/Basigin2/EMMPRIN

CD147, also known as basigin2 or EMMPRIN, is a transmembrane glycoprotein of the immunoglobulin superfamily,325 which participates in tumor development, Plasmodium invasion, and bacterial and virus infection.326, 327, 328, 329 and 330 CD147 plays an important role in HIV-1, hepatitis C virus, hepatitis B virus, Kaposi sarcoma–associated herpesvirus, and SARS-CoV infections.

SARS-CoV-2 S interacts with the host cell receptor CD147.274 CD147 mediates SARS-CoV-2 entry into host cells by endocytosis, and its blockage by anti-CD147 antibodies inhibits virion amplification in a dose-dependent manner, at least in cellular models.274 However, a later study showed no direct interaction between the SARS-CoV-2 S protein RBD and CD147, casting doubt on its role as a coreceptor and plausibility as a therapeutic target.331 Moreover, in SARS-CoV-1, CD147
plays a functional role in facilitating infection, and CD147-antagonistic peptide-9 has an inhibitory effect332; however, the interaction with CD147 is indirect via binding of its nucleocapsid protein to cyclophilin A (CyPA), a ligand for CD147.

In a more recent study published in 2022, SARS-CoV-2 pseudovirus infection was associated with caveolar/lipid raft– and cytoskeleton-mediated endocytosis, but independent of clathrin-mediated endocytosis and micropinocytosis.333 Knockdown of CD147 and Rab5a in Vero E6 and Huh-7 cells inhibited SARS-CoV-2 pseudovirus infection, and colocalization of S, CD147, and Rab5a was observed in pseudovirus-infected Vero E6 cells, and weakened by CD147 silencing, illustrating that SARS-CoV-2 pseudovirus entered the host cells via CD147-mediated endocytosis. Additionally, Arf6 silencing markedly inhibited pseudovirus infection in Vero E6 and Huh-7 cells, whereas little change was observed in CD147 knockout-Vero E6 cells, indicating that Arf6-mediated CD147 trafficking plays a vital role in SARS-CoV-2 entry.333

ACE2-deficient T cells can be infected with SARS-CoV-2 pseudovirus, in which CD147 overexpression facilitates virus infection.274 SARS-CoV-1 infects peripheral lymphocytes leading to their destruction,334 and SARS-CoV-2 infection of lymphocytes via CD147 may underlie lymphopenia in COVID-19 cases.335

CD147 regulates ACE2 levels, and both receptors are affected by SARS-CoV-2 infection.336 Loss or blockage of CD147 in Vero E6 and BEAS-2B cell lines by the humanized anti-CD147 antibody, Meplazumab, inhibits SARS-CoV-2 amplification and inflammation as demonstrated for variants Alpha through Delta.337 In a proof-of-concept study of 109 patients and 72 healthy blood donors, elevated levels of soluble CD147 were associated with hyperinflammation and COVID-19 severity.338

Apart from its function as a putative receptor used by SARS-CoV-2 for entry into host cells, CD147 is the main tissue inducer of MMPs. These MMPs are zinc-dependent endopeptidases responsible for cleaving the immediate components of the extracellular matrix. The degradation of different components of the extracellular matrix mediated by MMPs is an important component in tissue damage associated with COVID-19. Altered plasma concentrations of MMPs have been found in patients with COVID-19, and increased levels of MMP-2 and MMP-9 appear to be associated with an increased risk of in-hospital mortality.339


Membrane-Type Matrix Metalloproteinases

A study340 revealed that multiple members from the membrane-type MMP and disintegrin and metalloproteinase families can mediate SARS-CoV-2 entry. Inhibition of membrane-type MMPs significantly reduced SARS-CoV-2 replication in vitro and in vivo. Membrane-type MMPs can cleave SARS-CoV-2 spike and ACE2 and facilitate spike-mediated fusion. Relative to ancestral SARS-CoV-2, the Omicron BA.1 variant has more efficient membrane-type MMP usage for virus entry.


C-Lectin Type, Toll-Like, and Mannose Receptors

SARS-CoV-2 S interacts with receptors involved in innate immunity, including C-lectin type receptors (CLRs) and toll-like receptors (TLRs). Recognition of carbohydrate (N-glycan and O-glycan) moieties clustered on the S surface may drive receptor-dependent internalization, promote severe immunopathologic inflammation resulting in cytokine release syndrome, and allow for systemic spread of infection, independent of ACE2.341 Therefore, targeting TLRs, CLRs, and other receptors (Ezrin and DPP-4) that do not directly engage SARS-CoV-2 S but may contribute to augmented antiviral immunity and viral clearance has been proposed as potential therapeutic against COVID-19.341


Neuropilin-1/VEGF165R

Neuropilin-1 (NPR-1), also known as vascular endothelial cell growth factor 165 receptor, VEGF165R,342 binds furin-cleaved substrates, functions as an endogenous negative modulator of the TLR4-NF-κB pathway, and facilitates SARS-CoV-2 cell entry and infectivity together with the receptor for advanced glycation end products (RAGE).343, 344, 345 and 346 Cleavage of SARS-CoV-2 S generates a polybasic Arginine-Arginine-Alanine-Arginine (RRAR) carboxyl-terminal sequence on S1, which conforms to a C-end rule (CendR) motif that binds to cell surface NRP1.344,347 NRP1-mediated enhancement of SARS-CoV-2 infectivity was attributed to increased viral entry into the host cells
rather than to higher viral binding to the cell membrane and was further increased when ACE2 and TMPRSS2 were present.343,344 Pathologic analysis of olfactory epithelium obtained from human COVID-19 autopsies revealed SARS-CoV-2 infected NRP1-positive cells facing the nasal cavity.343 NRP1 gene expression is also upregulated in lung tissue of patients with COVID-19 and in infected olfactory epithelial cells, which also expressed oligodendrocyte transcription factor 2 (OLIG2; mainly by olfactory neuronal progenitors).343,344 Patients with low sRAGE levels were elderly and with lung involvement, which indicates that the RAGE pathway plays an important role in COVID-19 exacerbation.346 NRP1 also potentiates SARS-CoV-2 entry into human cardiomyocytes driven by inflammatory and oxidant signals, which accounted for increased protease activity and apoptotic markers thus leading to cell damage and apoptosis.348

NRP1-enhanced SARS-CoV-2 infectivity in human cell cultures was inhibited by monoclonal antibody against the extracellular NRP1 b1b2 domain or a small-molecule, selective NRP1 antagonist that binds the CendR-binding b1 domain/pocket. Similarly, SARS-CoV-2 mutants with an altered furin cleavage site in S (deleted polybasic cleavage site or resistant to furin-mediated cleavage) were not dependent on NRP1 for infectivity, whereas mutations in the NRP1 b1 domain/pocket also inhibited the NRP1-S1 interactions.312,343,344

Intranasally administered nanoparticles coated with SARS-CoV-2 S-derived CendR peptides to adult mice were taken up not only in olfactory epithelium that expresses NRP1 but also into cortical blood vessels and neurons,343 which is consistent with the known contribution of NRP1 to neurogenesis and angiogenesis.349 NRP1 is also known to play a role in promoting host cell infection by other viruses, such as the human T-cell lymphotropic virus type 1 (HTLV-1)350 and the Epstein-Barr virus (EBV),312,351 and may be involved in the neurologic complications of SARS-CoV-2 infection.351

Inhibitory compounds have been developed that specifically antagonize NRP1 to perturb CendR ligand binding.352 One such compound, EG00229, binds to NRP1 with higher affinity than the S1 CendR sequence and outcompetes it for NRP1 binding. Treatment of cells with EG00229 limits live SARS-CoV-2 infection, which opens the door for NRP1 as a therapeutic target for COVID-19.344,347


Nonimmune Receptor Glucose-Regulated Protein 78

Glucose-regulated protein 78 (GRP78), also referred to as heat shock protein A5 or binding immunoglobulin heavy chain binding protein (BiP), is an essential Hsp70-type molecular chaperone of the endoplasmic reticulum involved in maintenance and protein surveillance by controlling the unfolded protein response (UPR; cellular stress response initiated by accumulation of unfolded or incorrectly folded proteins).353,354 Under normal conditions, GRP78 is localized to the lumen of the endoplasmic reticulum, bound to inactivating enzymes, including activating transcription factor 6, inositol-requiring enzyme 1, and protein kinase R–like endoplasmic reticulum kinase, which are responsible for inhibiting protein synthesis, enhancing protein folding, and initiating cell death.353 Accumulation of unfolded or misfolded proteins results in release of GRP78 from its receptors and translocation to the plasma membrane,353 where it has the ability to recognize and mediate entry of viruses via the substrate-binding domain.355 Thus, GRP78 has been investigated as a potential gateway for viral entry in COVID-19 by binding to motifs on S.353,356,357 In a study,353 four cyclic regions and their corresponding residues (region I: C336-C361, 26 residues; region II: C379-C432, 54 residues; region III: C391-C525, 135 residues; and region IV: C480-C488, 9 residues), present on the outer surface of the SARS-CoV-2 S, were selected for molecular docking assessment because they had been targets of neutralizing antibodies against SARS-CoV-1 and MERS-CoV. There was preferred binding between regions III and IV of S and the substrate-binding domain β of GRP78, with region IV as the major driving force. Therefore, GRP78 represents a potential therapeutic target in COVID-19 treatment.353


CD209L/L-SIGN and CD209/DC-SIGN

SARS-CoV-2 directly attacks the cardiovascular system.358, 359 and 360 In human endothelial cells, the SARS-CoV-2 S RBD binds to CD209L (L-SIGN) and the related protein CD209 (DC-SIGN or dendritic cell–specific intracellular adhesion molecule-3-grabbing non-integrin), mediating SARS-CoV-2 entry; conversely, blocking CD209L activity inhibited virus entry.

CD209L is prominently expressed in lung and kidney epithelia and endothelia. In human endothelial cells, CD209L mediates cell adhesion, capillary tube formation, and sprouting. In cells also
expressing ACE2, CD209L was a receptor for SARS-CoV-2.361 CD209 (DC-SIGN) also facilitates cell entry of Lassa virus in human monocyte–derived immature dendritic cells; however, its role seems distinct from the function as an authentic entry receptor reported for phleboviruses, such as the Uukuniemi virus. In contrast, Lassa virus entry was remarkably slow and depended on actin, indicating the use of different endocytotic pathways.228

In terms of pathophysiology and treatment of COVID-19 with CD209L as target, the lungs from patients with COVID-19 show distinctive vascular features, consisting of severe endothelial injury associated with the presence of intracellular virus and disrupted cell membranes, as well as widespread thrombosis with microangiopathy. In one study, prevalence of alveolar capillary microthrombi in patients with COVID-19 was 9-fold that in patients with influenza, and in lungs from patients with COVID-19, the amount of new vessel growth—predominantly through a mechanism of intussusceptive angiogenesis—was almost thrice as high as that in the lungs from patients with influenza.358


Estrogen Receptor α

Interactions between S and the estrogen receptor α have been involved in SARS-CoV-2 infection and COVID-19 pathology via modulation of estrogen receptor α signaling, transcriptional regulation of ACE2, and potentially of other genes with roles in inflammation and immunity.362 SARS-CoV-2 S exhibits a highly conserved and functional nuclear receptor coregulator (NRC) LSD-motif in the S2 subunit consistent with a role as an NRC at estrogen receptor α; this function may extend to S proteins from other coronaviruses and to other nuclear receptors.362 Estrogen receptor α signaling in alveolar macrophages, a first-line defense against various pathogens363 including SARS-CoV-2,364,365 is considered a key component of the immune response to infection.366, 367 and 368


Anti-S Neutralizing Antibodies Can Replace Angiotensin-Converting Enzyme 2 for Viral Entry

Monoclonal neutralizing antibodies against distinct epitopic regions of the RBD of the SARS-CoV-2 spike can replace ACE2 to serve as a receptor and efficiently support membrane fusion and viral infectivity. These receptor-like antibodies can function in the form of a complex of their soluble immunoglobulin G (IgG) with Fc-gamma receptor I, a chimera of their antigen-binding fragment with the transmembrane domain of ACE2 or a membrane-bound B-cell receptor, indicating that ACE2 and its specific interactions with the spike protein are dispensable for SARS-CoV-2 entry. These results suggest that antibody responses against SARS-CoV-2 may expand the viral tropism to otherwise nonpermissive cell types; they have important implications for viral transmission and pathogenesis.369


Tyrosine Protein Kinase Receptor AXL as Low-Affinity Receptor for S

AXL, a receptor tyrosine kinase of the TAM (TYRO3-AXL-MER) family,370 has been proposed as a candidate receptor for SARS-CoV-220 by directly binding to the NTD of the spike protein but at a relatively low affinity (KD = 882 nM),371 in contrast to the 8-fold higher affinity with which it binds to VP1u of parvovirus B19, which is higher than the binding affinity between most viruses with their cognate receptors.372 Further research is warranted on the potential role of AXL in SARS-CoV-2 infection. Inhibition of AXL in infectious diseases could have the double advantage of blocking virus entry and allowing protective macrophage activation (M1 polarization), by inhibiting M2 development.373


Host Factors Involved in Viral Entry Restriction


Mucins

At the cell surface, SARS-CoV-2 virions come across an approximately 0.5- to 1.5-µm-thick layer of transmembrane proteoglycans and mucins that are rich in glycans. As documented for influenza A374,375 and HIV-1,376 glycans act as binding moieties, decoy agents, or steric barriers to the virions.307,377 In some cases, the dynamic binding between S and glycans enables virions to migrate through them and reach the plasma membrane, whereas in others a relatively strong interaction between S and specific glycans may trap virions.135

Genome-wide CRISPR knockout and activation screens in human lung epithelial cells with endogenous expression of the SARS-CoV-2 entry factors ACE2 and TMPRSS2 identified
membrane-anchored mucins, a family of high-molecular-weight glycoproteins including CD44, as a prominent viral restriction network that inhibits SARS-CoV-2 infection in vitro and in murine models maybe by creating a denser glycocalyx layer.378 In contrast, secreted, gel-forming mucins did not have a protective role against SARS-CoV-2 infection.378

Membrane-tethered mucins also play a steric hindrance role in inhibiting IAV infection,375 and removal of endogenous mucins led to an increase in infection by influenza virus PR8, as well as hCoV-229E and human parainfluenza virus PIV3, whereas no effect was observed on infection with hCoV-OC43 and a reduction in infection was seen for respiratory syncytial virus (RSV).375,379

The NTD of the S proteins of SARS-CoV-2 and other sarbecoviruses binds to unidentified glycans in vitro.380 Heparin does not interfere in ACE2 binding or with proteolytic processing of SARS-CoV-2 S; however, heparin or a highly sulfated heparan sulfate oligosaccharide inhibited SARS-CoV-2 RBD binding to cells.377 Furthermore, enzymatic removal of heparan sulfate proteoglycan from physiologically relevant tissue resulted in a loss of SARS-CoV-2 RBD binding. The latter observations support a model in which heparan sulfate functions as the point of initial attachment allowing the virus to travel through the glycocalyx by low-affinity high-avidity interactions to reach the cell membrane, where it can engage with ACE2 for cell entry.377

The S NTDs of sarbecoviruses are highly diverse and can be phylogenetically clustered into five clades with various levels of glycan binding in vitro.380 Although glycan binding might be an ancestral trait conserved across different coronavirus families, the functional outcome during infection can vary, reflecting divergent viral evolution. For instance, although MERS-CoV attaches sialic acids during cell entry, the spike NTD of SARS-CoV-2 S adheres to Calu 3 cells, a human lung cell line, via sialic acids that inhibit sarbecovirus infection. Therefore, although sarbecoviruses can interact with cell surface glycans as do other coronaviruses, their reliance on glycans for entry is different from that of other respiratory coronaviruses, suggesting sarbecoviruses and MERS-CoV have adapted to different cell types, tissues, or hosts during their divergent evolution. These findings provide important clues for further exploring the biologic functions of sarbecovirus glycan binding and add to our growing understanding of the complex forces that shape coronavirus S evolution.380


Changes in Host Cell Membrane

Membrane factors like cholesterol, lipid rafts, and membrane rigidity/deformability regulate the distribution of membrane proteins, which in turn determine their accessibility. Beyond binding of SARS-CoV-2 to its receptor(s), changes in the host cell membrane during viral processing and entry are also relevant to viral entry. To this end, some bacterial (Pseudomonas aeruginosa, Staphylococcus aureus, Neisseria gonorrhoeae) and viral (rhinovirus, Ebolavirus, measles) infections activate acid sphingomyelinase, which breaks down sphingomyelin into ceramide and phosphocholine. The formation of ceramide transforms cell membrane rafts into ceramide-enriched membrane domains with which the microbes associate.381,382 SARS-CoV-2 also activates the acid sphingomyelinase/ceramide pathway in both freshly isolated nasal epithelial cells and cell culture models and uses this pathway for infection,383,384 whereas inhibition of sphingomyelinase by antidepressants or a mucolytic prevents SARS-CoV-2 entry into epithelial cells.383, 384 and 385


Leucine-Rich Repeat–Containing Protein 15

The TLR-related cell surface receptor called leucine-rich repeat–containing protein 15 (LRRC15) was identified using whole-genome CRISPR activation as a host factor controlling cellular interactions with SARS-CoV-2 spike by binding to it at the cell surface and suppressing SARS-CoV-2 entry into host cells. LRRC15 is primarily expressed in innate immune barriers including placenta, skin, and lymphatic tissues as well as perturbed-state tissue fibroblasts. However, in patients with severe COVID-19, LRRC15 is highly expressed in the alveolar surface of lung tissue, but it is absent in lungs from individuals without COVID-19. LRRC15 expression was mutually exclusive with collagen production because it suppresses it while promoting expression of IFIT, OAS, and MX-family antiviral factors. Therefore, LRRC15, as a transmembrane binding receptor for SARS-CoV-2 spike, contributes to controlling viral load and regulating antiviral and antifibrotic transcriptional programs.386



TRANSLATION OF EARLY VIRAL PROTEINS


Role of the 5′-Untranslated Region in Initial Translation of Genomic Viral RNA

The 5′-untranslated region (UTR), a noncoding segment consisting of multiple highly conserved stem-loops (SLs) and more complex secondary structures, is functionally critical for viral translation. The 5′-UTR of SARS-CoV-2 contains five simple SL (or hairpin) structures (SL1 to SL5)387 in good agreement with bioinformatic secondary structure predictions388 and probing as well as those from other coronaviruses.389

At the proximal end of the 5′-UTR of SARS-CoV-2 is a N7-methylguanosine (m7G) cap structure. In eukaryotic cells, messenger and small nuclear RNAs are transcribed by host cellular RNA polymerase II if a m7G cap structure is present at their 5′-end, and absence of a cap structure on an mRNA allows its recognition as “nonself” and degradation by cellular 5′-3′ exonucleases.390 The cap structure is an essential modification that anchors factors involved in pre-mRNA splicing, nucleocytoplasmic RNA export and localization, and translation initiation, among other cellular processes.391 In translation initiation of SARS-CoV-2, the elongation initiation factor (eIF)4F complex binds to the cap structure for attachment of ribosomes to mRNA.392

The ORF1a initiation (AUG) start codon is embedded in the four-way junction structure of SL5. For an efficient translation initiation, the sequences surrounding the AUG start codon have to be unfolded. The mechanism used by the virus is still unknown and an important issue to investigate is the role and the putative function of the stable SL5 structure in the translation of the SARS-CoV-2 polyprotein. Although the viral genome is capped at its 5′-end, the translation initiation mechanism used to locate the AUG start codon in SL5 remains elusive, albeit ascribed to the intervention of a helicase. To this end, the presence of the 5′ m7G cap and hairpins SL1 to SL5 suggests that a canonical cap-dependent scanning mechanism would require the eIF4A helicase.393,394 On the other hand, the fact that the AUG start codon is located in the vicinity and downstream of a four-way junction structure is reminiscent of similar structures found in the hepatitis C virus internal ribosome entry sites (IRESs), which are typically highly structured.395 In addition, the SARS-CoV-2 5′-UTR contains an unidentified/upstream ORF (uORF) that is part of a larger ORF spanning from SL1 to SL4.5 and is conserved in SARS-CoV-2 variants and could be translated by the host ribosome. The use of uORF could be another way of early translation regulation387,396,397; however, its translation remains to be proven.


Translation of ORF1a and ORF1b and −1 Programmed Ribosomal Frameshift

Overall, the immediate early proteins encoded by ORF1a are involved in ablating the host cellular innate immune response, whereas the early proteins encoded in ORF1b are involved in genome replication and RNA synthesis. These functions include generating the minus-strand replicative intermediate, new plus-strand genomic RNAs, and subgenomic RNAs, which mostly encode structural, late proteins. ORF1b is out of frame with respect to ORF1a, and all coronaviruses utilize a molecular mechanism called −1 programmed ribosomal frameshift to control the relative expression of their proteins.398,399 −1 programmed ribosomal frameshift is a mechanism in which cis-acting elements in the mRNA direct elongating ribosomes to shift the reading frame by one base in the 5′ direction. The use of a −1 programmed ribosomal frameshift mechanism for expression of a viral gene was first identified in the Rous sarcoma virus.400 A −1 programmed ribosomal frameshift mechanism was shown to be required to translate ORF1ab in a coronavirus, avian infectious bronchitis virus, 2 years later.401 In coronaviruses, −1 programmed ribosomal frameshift functions as a developmental switch, and mutations and small molecules that alter this process have deleterious effects on virus replication.402,403

The ORF1a/b contains a structured and highly conserved frameshift stimulation element near its center that controls a shift in the protein translation reading frame by one nucleotide of ORF1a/b genes 3′ to the frameshift stimulation element. The frameshift stimulation element and accurate frame shifting is crucial for the expression of ORF1b, which encodes five nonstructural proteins (NSPs) including an RNA-dependent RNA polymerase essential for SARS-CoV-2 genome replication.404
SARS-CoV-1 employs a structurally unique three-stemmed mRNA pseudoknot that stimulates high −1 programmed ribosomal frameshift rates and harbors a −1 programmed ribosomal frameshift attenuation element. Altering −1 programmed ribosomal frameshift activity impairs virus replication, suggesting that this activity may be therapeutically targeted. Frameshift stimulation elements in SARS-CoV-1 and SARS-CoV-2 promote similar −1 programmed ribosomal frameshift rates and silent coding mutations in the slippery sites and in all three stems of the pseudoknot strongly ablate −1 programmed ribosomal frameshift activity. The upstream attenuator hairpin activity is also functionally retained in both viruses, despite differences in the primary sequence in this region. The frameshift stimulation elements are highly conserved among SARS-CoV-1 and SARS-CoV-2 and have the same conformation in small-angle x-ray scattering analyses.405 Moreover, a small molecule that binds to the SARS-CoV-1 pseudoknot and inhibits −1 programmed ribosomal frameshift is similarly effective against −1 programmed ribosomal frameshift in SARS-CoV-2, suggesting that such frameshift inhibitors may be promising lead compounds to combat the current COVID-19 pandemic.404

The −1 programmed ribosomal frameshift signal includes three discrete parts: the “slippery site,” a linker region, and a downstream stimulatory region of mRNA secondary structure, typically an mRNA pseudoknot.388,399,406 The primary sequence of the slippery site and its placement in relation to the incoming translational reading frame is critical: it must be N NNW WWZ (codons are shown in the incoming or 0-frame), where NNN is a stretch of any three identical nucleotides, WWW is either AAA or UUU, and Z ≠ G. The linker region is less well-defined, but typically is short (1-12 nucleotides long) and is thought to be important for determining the extent of −1 programmed ribosomal frameshift in a virus-specific manner. The function of the downstream secondary structure is to induce elongating ribosomes to pause, a critical step for efficient −1 programmed ribosomal frameshift to occur.407 The generally accepted mechanism of −1 programmed ribosomal frameshift is that the mRNA secondary structure directs elongating ribosomes to pause with its A- and P-site bound aminoacyl- and peptidyl-transfer RNAs positioned over the slippery site. The sequence of the slippery site allows for repairing of the transfer RNAs to the −1 frame codons after they “simultaneously slip” by 1 base in the 5′ direction along the mRNA. The subsequent resolution of the downstream mRNA secondary structure allows the ribosome to continue elongation of the nascent polypeptide in the new translational reading frame.408 The downstream stimulatory elements are most commonly H-type mRNA pseudoknots, so called because they are composed of two coaxially stacked SLs where the second stem is formed by base-pairing between sequence in the loop of the first SL and additional downstream sequence.409 The SARS-CoV-1 and SARS-CoV-2 pseudoknots are more complex because they contain a third, internal SL element.410, 411 and 412 Mutations affecting this structure decreased the rates of −1 programmed ribosomal frameshift and had deleterious effects on virus propagation, thus suggesting that it may present a target for small-molecule therapeutics for SARS-CoV-1 and SARS-CoV-2402,403,408,413 as well as other −1 programmed ribosomal frameshift-dependent viruses.414, 415, 416 and 417 In addition, the presence of a hairpin located immediately 5′ of the slippery site has been reported to regulate −1 programmed ribosomal frameshift by attenuating its activity.418

Drugs targeting −1 programmed ribosomal frameshift in coronaviruses need not necessarily interact directly with the frameshift signal, and they might also act more indirectly, for example, by affecting the concerted interplay between the frameshift signal, ribosome, and elongation factors that govern −1 programmed ribosomal frameshift.419, 420, 421 and 422 An additional benefit of −1 programmed ribosomal frameshift as a drug target is that it is orthogonal and complementary to more standard strategies of targeting viral proteins such as the RNA-dependent RNA polymerase or viral proteases, holding out the promise for combination therapies that could be particularly effective when combining suppression of RNA-dependent RNA polymerase expression by −1 programmed ribosomal frameshift inhibition with suppression of RNA-dependent RNA polymerase activity.


Posttranslational Processing of Early ORF1a and ORF1b Viral Polyproteins

SARS-CoV-2, as many other viruses, rely on proteases to process ORF1a/b-encoded polypeptides into smaller proteins required for replication and virus production. The genome of coronaviruses, including SARS-CoV-2, encodes two proteases, a papain-like (PLpro) protease (NSP3) and the so-called main protease (Mpro),
a chymotrypsin-like cysteine protease, also named 3CLpro or NSP5. The activities of the viral proteases extend to cellular proteins, with both proteases involved in suppression of the innate immune system, which will be covered in Chapter 7.


NSP5 (Main Protease [MPro], 3CLPro)

NSP5 is a key protease of coronaviruses and has a pivotal role in mediating viral replication and transcription, with no similarity to host proteases, making it an attractive, and already used, drug target for SARS-CoV-2.423,424 NSP5 is activated by autoproteolysis. Studies of SARS-CoV-1 NSP5 documented that its N-terminal autoprocessing appears to require only two “immature” monomers approaching one another to form an “intermediate” dimer structure and does not depend on the active dimer conformation existing in the mature protease.425 The octameric form of the immature NSP5, which features a three-dimensional swap of the helical domain III of the enzyme,426 may play a role in the auto-activation process.

NSP5, as the main protease responsible for cutting the SARS-CoV-2 polyprotein into functional units,427 is responsible for at least 10 cleavages, preferably hydrolyzing the peptide bonds C-terminal to glutamine residues, which are an absolute requirement within the sequence motif (small amino acid)-X-(L/F/M)-Q↓(G/A/S)-X (where X is any amino acid; ↓ cleavage site).428 NSP5 is a homodimer cysteine protease where each protomer consists of three domains, namely I (residues 8-101), II (residues 102-184), and III (residues 201-303), with catalytic residues and substrate-binding sites situated between domains I and II.423 Crystal structure of SARS-CoV-2 NSP5429 reveals its structural similarity with Mpro of SARS-CoV-1 with 96.1% sequence identity.430,431 Previous studies showed that HIV-1 protease inhibitors block SARS-CoV-1 Mpro432; however, they show different binding effect on Mpro of SARS-CoV-2.433 One of the HIV protease inhibitors, Lopinavir, was shown to inhibit Mpro of SARS-CoV-1 in vitro,434 whereas none of the HIV inhibitors were able to significantly inhibit Mpro of SARS-CoV-2 in vitro.435 Other known potent inhibitors such as α-ketoamide and N3 have differential inhibition on the activity of Mpros from SARS-CoV-1 and SARS-CoV-2.424,431,436 Ritonavir-boosted nirmatrelvir has been approved for use as a SARS-CoV-2 MPro inhibitor.

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Apr 2, 2025 | Posted by in PUBLIC HEALTH AND EPIDEMIOLOGY | Comments Off on Molecular Biology of SARS-CoV-2

Full access? Get Clinical Tree

Get Clinical Tree app for offline access