Structure, Nomenclature, and Properties of Proteins and Amino Acids

Martha H. Stipanuk, PhD ^∗

Proteins were first recognized as a distinct class of biological molecules in the eighteenth century by Antoine Fourcroy and others, evidenced by the ability of egg whites, wheat gluten, plasma albumin, and fibrin (from clotted blood) to coagulate when treated with heat or acid. The Dutch chemist Gerhadus Johannes Mulder carried out elemental analysis of common proteins and found that nearly all proteins had a similar empirical formula, C₄₀₀H₆₂₀N₁₀₀O₁₂₀P₁S₁, leading him to conclude that all “albuminous” compounds might be composed mainly of a single type of compound. The name “protein” (from the Greek word proteios, meaning “primary”) was first given to this class of molecules in 1838 by Mulder’s associate, Jöns Jakob Berzelius.

We now know that proteins, often called polypeptides, are made up of amino acid residues linked by peptide bonds. Proteins are involved in essentially every process that takes place in cells, and these proteins have a remarkable diversity of functions. Proteins function as enzymes, transcription factors, binding proteins, transmembrane transporters and channels, hormones, immunoglobulins, motor proteins, receptors, structural proteins, and signaling proteins.

The human genome contains about 23,000 protein-coding genes, and proteins make up 20% to 50% of the dry mass of the adult human body, with fat being the other major component. These proteins and peptides are synthesized using amino acids as the building blocks, much as complex carbohydrates are synthesized using sugar residues as the building blocks. In addition to the important role of amino acids as precursors for protein synthesis, amino acids have important roles as intermediates in metabolism; as precursors of nonpeptide compounds, such as the neurotransmitter γ-aminobutyric acid and the coenzyme nicotinamide adenine dinucleotide; and as precursors for synthesis of several unique small peptides, such as glutathione.

In terms of diet, humans and other animals must consume protein to meet needs for amino acids including specific amino acids that cannot be synthesized by the organism. Protein in human diets includes both animal sources, such as meat, milk, fish, and eggs; and plant sources, such as cereal seeds (wheat, rice, maize) and legume seeds (soybeans, peanuts). The major proteins in meat and fish include myofibrillar, sarcoplasmic, and connective tissue proteins. The major protein in egg is ovalbumin in the egg white, which is present as a source of protein for growth of the embryo, whereas the major protein in milk is casein, which is present as a source of protein for the mammalian newborn. The majority of protein in plant seeds consists of globulins or storage proteins that are synthesized during seed development and stored in protein bodies. These proteins are hydrolyzed during seed germination to provide nutrients for the developing seedling.

The Proteinogenic Amino Acids

All peptides and proteins, regardless of their origin, are constructed from amino acids that are covalently linked together, usually in a linear sequence. Twenty-one amino acids are naturally incorporated into polypeptides in mammals. Twenty of these are directly encoded by the universal genetic code. The twenty-first of the amino acid precursors for protein synthesis, selenocysteine, is incorporated into a small number of proteins by a unique cotranslational mechanism requiring special secondary structure in the messenger RNA (mRNA) (i.e., a selenocysteine insertion sequence, SECIS) that causes the UGA stop codon to encode selenocysteine (see Chapter 39). Another unusual amino acid called pyrrolysine is considered the twenty-second proteinogenic amino acid, but it is found only in some methane-producing enzymes in methanogenic archaea. The structures of the 20 common amino acids and selenocysteine are shown in Figure 5-1.

FIGURE 5-1 Structures of the common proteinogenic amino acids. The ionic species shown for each amino acid is the dominant species at pH 7.4.

Amino acids have distinctive “side chains” or “R” groups that give each amino acid size, shape, and characteristics that dictate solubility and electrochemical properties. A given R group confers novel, sometimes unique, chemical properties to an amino acid, and amino acids are often classified based on the chemical properties of their respective R groups. With such diverse building blocks, it is easy to understand how peptides and proteins can be designed for complex activities.

Chirality and Optical Rotation

As shown in Figure 5-2, each amino acid contains an amino group and a carboxylic acid group. Both of these functional moieties are bonded directly to a central carbon atom designated as the α-carbon. Except for glycine, the α-carbon for each of the amino acids has four different functional groups bonded to it: an amino group, a carboxylic acid group, hydrogen, and its R group. The α-carbon of glycine does not have an R group or side chain, so two hydrogen atoms are attached to the α-carbon.

FIGURE 5-2 General structure for the α-amino acids. Stereoisomers are shown in their L and D forms. Note the position of the α-carbon. Because carbon can form four valence bonds, the α-carbon of an amino acid can be viewed as being at the center with its four substituents located at the corners of a tetrahedron. When a carbon atom has four different substituents, two distinct spatial arrangements are possible. Fisher projections are used to depict the L and D isomers. In a Fisher projection, bonds pointing horizontally are viewed as coming out of the plane on which they are depicted, whereas those pointing vertically go below the plane. A zwitterion is also depicted, wherein the arrows designate the potential balance and interaction between the positive (+) charge of the amino group and the negative (−) charge of the carboxylate group.

The presence of four different functional groups creates a chiral center. A chiral center exists when an arrangement around a given molecule cannot be superimposed. For all amino acids (with the exception of glycine), there are two nonsuperimposable, mirror-image forms. These two forms are referred to as stereoisomers, designated as L– and D-isomers. This terminology comes from the Latin terms laevus and dexter or levo and dextro, meaning left and right, respectively. The D– and L-isomers of a given amino acid will rotate plane polarized light in opposite directions, but amino acids are designated D or L not by the direction in which they themselves rotate light. Instead, the L and D convention for amino acid stereochemistry refers to the optical activity of the isomer of glyceraldehyde from which the amino acid can theoretically be synthesized. D-Glyceraldehyde is dextrorotary, whereas L-glyceraldehyde is levorotary. Thus the designation L or D in combination with the given name of an amino acid implies a specific spatial configuration around the amino acid’s α-carbon.

Proline also deserves special comment, because its R group is joined both at the α-carbon and amino group to form a 5-membered ring. Thus, the α-amino nitrogen of proline has two alkyl substituents but only one hydrogen in its unprotonated state. For this reason, proline is referred to as a secondary amino acid or imino acid. The α-carbon of proline remains a chiral center.

Although the L and D designations remain in common usage for most amino acids, another system for assigning stereochemistry, the RS system, is used most often in organic chemistry. The symbol R comes from the Latin rectus for “right,” and S comes from the Latin sinister for “left.” The RS system denotes the absolute stereochemistry of the molecule, with each stereogenic center in a molecule being assigned a prefix (R or S) according to whether its configuration is right- or left-handed. In order to make the R or S assignment, relative priority values are assigned to each of the four substituents on the chiral carbon based on the mass of the groups (heaviest to lightest) according to basic rules. Almost all of the amino acids in proteins are S at the α-carbon, but cysteine and selenocysteine are R and glycine is nonchiral at their α-carbons.

In proteins and peptides, amino acids are found almost exclusively in the L form, although D-amino acids are found in some bacterial proteins and peptides (Petsko and Ringe, 2004). The almost exclusive presence of L-amino acids in proteins indicates that reactions that involve amino acid and protein synthesis must be highly stereospecific. The metabolic pathways for amino acid synthesis create predominantly amino acids in their L forms. Moreover, the biological machinery required for protein assembly recognizes L-amino acids almost exclusively. It should be noted that D-aspartate and D-serine are produced by the mammalian brain by enzymes that catalyze the racemization of L-aspartate and L-serine, respectively, and these D-amino acids are involved in activation of the N-methyl-D-aspartate type of excitatory amino acid receptors (Wolosker et al., 2008).

The Acid and Base Characteristics of Amino Acids

In aqueous solutions, amino acids are easily ionized. The most abundant ionic species present when amino acids are dissolved in an aqueous medium at neutral pH are shown in Figure 5-1, and the pK_as for all dissociable groups are shown in Table 5-1. The acid dissociation constant K_a is used to define characteristics of titratable groups in organic acids and amines. The negative log of the dissociation constant K_a is called the pK_a of the titratable group. In a practical sense, this means that when the pH is equal to the pK_a, the associated (AH, protonated) and dissociated (A^–, unprotonated) species will be present in equal molar concentrations.

TABLE 5-1

Properties of the Amino Acids That Serve as Common Building Blocks of Proteins

	AMINO ACID	MOLECULAR MASS (g/mol)	pK_a α-COOH	pK_a α-NH₃⁺	pK_a R GROUP	HYDROPATHY INDEX (KYTE-DOOLITTLE SCALE)
Hydrophilic amino acids (charged and very polar)	Arginine	155	2.17	9.04	12.48	−4.5
	Lysine	146	2.18	8.95	10.53	−3.9
	Asparagine	132	2.04	9.82		−3.5
	Aspartate	133	2.09	9.82	3.86	−3.5
	Glutamine	146	2.17	9.13		−3.5
	Glutamate	147	2.19	9.67	4.25	−3.5
	Histidine	174	1.82	9.17	6.0	−3.2
Amino acids with intermediate hydrophobicity (Tyr and moderately/weakly polar amino acids)	Tyrosine	181	2.20	10.07	9.11	−1.3
	Tryptophan	204	2.38	9.39		−0.9
	Serine	105	2.21	9.15		−0.8
	Threonine	119	2.63	10.43		−0.7
	Glycine	75	2.34	9.60		−0.4
	Proline	115	1.99	10.6 (NH₂⁺)		1.6
	Alanine	89	2.34	9.69		1.8
	Methionine	149	2.28	9.31		1.9
	Cysteine	121	1.71	10.78	8.33	2.5
Hydrophobic amino acids (uncharged and nonpolar)	Phenylalanine	165	1.83	9.13		2.8
	Leucine	131	2.36	9.68		3.8
	Valine	117	2.32	9.62		4.2
	Isoleucine	131	2.36	9.68		4.5

Ka=[H+] ([A−]/[AH])

log Ka=log [H+]+log ([A−]/[AH])

−log Ka=−log [H+]−log ([A−]/[AH])

pKa=pH−log ([A−]/[AH])

The pK_as of carboxylic acid groups are relatively low, usually 2 to 4, so these groups are almost always negatively charged at physiological pH. Amino groups have pK_as that are relatively high, usually 9 to 11, so these groups are almost always positively charged at physiological pH. Most amino acids have neutral side chains at physiological pH and have an overall net charge of 0. However, these amino acids still have a positively charged amino group and a negatively charged carboxyl group. Thus they are not uncharged molecules, nor are they cations or anions. The name “zwitterion” or “dipolar ion” is given to such molecules that have both positive and negative charges but a net charge of zero.

In a nonhydrated state, most amino acids exist as nonvolatile crystalline solids. In this case, an internal transfer of a hydrogen ion from the −COOH group to the −NH₂ group of the amino acid leaves an ion with both a negative charge and a positive charge. The zwitterionic character of amino acids causes them to be held together by electrostatic forces, or ionic bonds, in a crystalline lattice (i.e., analogous to the crystalline lattice of sodium chloride and other salt crystals). These ionic attractions between oppositely charged ions are strong, and consequently thermal decomposition of amino acids usually requires high temperatures (e.g., above 200° C).

Ionizable groups of amino acids can be characterized by titrating a solution of the amino acid with acid or base to obtain a titration curve. The types and number of the functional groups capable of reacting with or exchanging a hydrogen ion (proton) influence the shape of this curve. Addition of base (or acid) will result in a rapid change in the pH of the solution when no group is being titrated, whereas a much slower rate of change in the pH of the solution will be observed when an ionizable group is being titrated. For example, alanine contains two titratable groups: one carboxylic acid group and one amino group. In aqueous solution at a very low or acidic pH (i.e., a high hydrogen ion concentration), both the amino group and the carboxylic acid group of alanine will be protonated, and as a result alanine will be positively charged. If base is gradually added (to decrease the concentration of hydrogen ions), the carboxylic acid group will lose its proton and alanine will become a zwitterion with one negative charge and one positive charge. If more base is added to increase the pH, eventually the positively charged amino group will lose its proton and alanine will become negatively charged.

The presence of a titratable group can be easily observed on a titration curve as a marked decrease in the change in pH per unit of base added; this will appear as a flattening of the curve when pH is plotted on the vertical axis and units of base are plotted on the horizontal axis. In essence, the titratable group acts as a buffer to resist changes in pH by donating protons to neutralize the base that is added. A curve obtained by the titration of histidine, which contains three titratable functional groups, is shown in Figure 5-3. On a titration curve, the pK_a can be observed as the point of inflection near the center of the “plateau.” The inflection point is where the curvature changes from concave up to concave down. For histidine in Figure 5-3, three pK_as can be detected: the carboxyl group has a pK_a = 1.82 the imidazole group has a pK_a = 6.0, and the α-amino group has a pK_a = 9.17.

FIGURE 5-3 Titration curve for histidine.

Because pK_a is a log₁₀ scale, a 1.0 unit change in pH on either side of the pK_a will be associated with a tenfold change in the ratio of the associated and dissociated species, and a 2.0 unit change in pH on either side of the pK_a will be associated with a 100-fold change in the ratio.

pH−pKa=log ([A−]/[AH])

[A−]/[AH]=10(pH−pKa)

Thus if the pK_a for an ionizable group is 6.0, the ratio of the unprotonated to the protonated species will be 0.01 (mainly protonated) at pH 4.0 and 100 (mainly unprotonated) at pH 8.0. On the titration curve, the rate of change in pH per unit of base (or acid) added increases as one moves away from the pK_a of a titratable group.

When amino acids are incorporated into peptides, they lose their ability to form zwitterions because the α-carboxyl and α-amino groups are in peptide linkage with other amino acid residues. Other than the charges due to ionization of the C-terminal carboxyl group and N-terminal amino group, it is ionization of the R groups of the amino acids in the polypeptide that comprises the electrical charge of the macromolecule. Aspartate and glutamate have carboxylate groups on their side chains, whereas lysine has an ε-amino group and arginine has a basic guanidinium group; these groups are normally charged at physiological pH. In contrast, the imidazole ring of histidine (pK_a = 6.0) and the thiol group (−SH) of cysteine (pK_a = 8.3) have pK_as that are closer to neutral and undergo partial ionization within the range of physiological pH, meaning that relatively small shifts in cellular pH can change the charge of these residues. The seleno group of selenocysteine has a pK_a of about 5.2, such that selenocysteine residues are mostly ionized at physiological pH.

The ionization state of these side chains affects the physical and chemical properties of proteins and is important for their interactions with other proteins, substrates or ligands, and other macromolecules as well as for their physiological functions. Within chromatin, the basic amino acid residues in histones form ionic bonds with the acidic sugar–phosphate backbone of DNA. Acidic amino acid residues are involved in chelation of calcium ions by calcium-binding proteins. The histidine side chain and the carboxylate of acidic amino acids often serve as coordinating ligands for metals in metalloproteins. Within the native protein structure, pK_a values for ionizable groups can be substantially altered because of interactions with nearby residues or the hydrophobicity of the interior of the protein. Such alterations can be critical for the catalytic function of proteins such as enzymes (Harris and Turner, 2002).

Hydrophobicity or Hydrophilicity of Amino Acid Residues

In addition to differences in size and charge, amino acids also differ in hydrophobicity or hydrophilicity (i.e., the tendency to interact with a polar or nonpolar solvent or environment). This property of amino acid R groups can vary widely, ranging from totally nonpolar or hydrophobic (water insoluble) to polar or hydrophilic (water soluble). The hydrophobic character of amino acid residues is believed to be the major driving force in protein folding. The amino acid residues with high positive hydropathy scores (e.g., isoleucine and valine) tend to repel the aqueous environment and consequently tend to pack together in the interior of the protein to avoid contact with water. On the other hand, amino acid residues with high negative hydropathy scores (e.g., arginine and lysine) will most likely be found on the surface of the protein in contact with the aqueous environment.

Jack Kyte and Russell Doolittle (1982) proposed a hydropathy index that is now widely used to predict aspects of protein structure; this scale assigns negative numbers to the most hydrophilic side chains and positive numbers to the most hydrophobic side chains (see Table 5-1). Other scales have been developed, some of which assign quite different values to some of the amino acids. Efforts to develop better methods of predicting protein structure continue. An example of the use of a hydropathy index to predict the transmembrane segments of a protein sequence is shown in Figure 5-4. Transmembrane segments of transmembrane proteins can be predicted from the average hydrophobicity scores for small regions of the polypeptide chain (e.g., segments of 9 to 19 amino acids). Transmembrane regions of proteins, which must pass through the lipid bilayers of cell membranes, tend to have high hydropathy scores (greater than 1.6 units).