Appendix A - A Sample PDB File


The initial lines of a PDB entry contain information on the protein, the source, the folks who sent the entry to the PDB, some other useful references and some basic data regarding the crystallographic data.

HEADER    PROTEINASE INHIBITOR (TRYPSIN)          27-SEP-82   4PTI      4PTI   3
COMPND    TRYPSIN INHIBITOR                                             4PTI   4
SOURCE    BOVINE (BOS $TAURUS) PANCREAS                                 4PTIE  1
AUTHOR    R.HUBER,D.KUKLA,A.RUEHLMANN,O.EPP,H.FORMANEK,J.DEISENHOFER,   4PTI   6
AUTHOR   2 W.STEIGEMANN                                                 4PTI   7
REVDAT   6   16-APR-87 4PTIE   1       SOURCE REMARK                    4PTIE  2
REVDAT   5   31-MAY-84 4PTID   1       REMARK                           4PTID  1
REVDAT   4   23-FEB-84 4PTIC   1       JRNL                             4PTIC  1
REVDAT   3   31-JAN-84 4PTIB   1       REMARK                           4PTIB  1
REVDAT   2   30-SEP-83 4PTIA   1       REVDAT                           4PTIA  1
REVDAT   1   18-JAN-83 4PTI    0                                        4PTIA  2
SPRSDE     18-JAN-83 4PTI      3PTI                                     4PTIA  3
JRNL        AUTH   M.MARQUART,J.WALTER,J.DEISENHOFER,W.BODE,R.HUBER     4PTI   8
JRNL        TITL   THE GEOMETRY OF THE REACTIVE SITE AND OF THE         4PTI   9
JRNL        TITL 2 PEPTIDE GROUPS IN TRYPSIN, TRYPSINOGEN AND ITS       4PTI  10
JRNL        TITL 3 COMPLEXES WITH INHIBITORS                            4PTI  11
JRNL        REF    ACTA CRYSTALLOGR.,SECT.B      V.  39   480 1983      4PTIC  2
JRNL        REFN   ASTM ASBSDK  DK ISSN 0108-7681                  622  4PTIC  3
REMARK   1 REFERENCE 1                                                  4PTIE  3
REMARK   1  AUTH   A.WLODAWER,J.DEISENHOFER,R.HUBER                     4PTIE  4
REMARK   1  TITL   COMPARISON OF TWO HIGHLY REFINED STRUCTURES OF       4PTIE  5
REMARK   1  TITL 2 BOVINE PANCREATIC TRYPSIN INHIBITOR                  4PTIE  6
REMARK   1  REF    J.MOL.BIOL.                   V. 193   145 1987      4PTIE  7
REMARK   1  REFN   ASTM JMOBAK  UK ISSN 0022-2836                  070  4PTIE  8
REMARK   2                                                              4PTI  56
REMARK   2 RESOLUTION. 1.5 ANGSTROMS.                                   4PTI  57
REMARK   3                                                              4PTI  58
REMARK   3 REFINEMENT. J. DEISENHOFER*S VERSION OF THE JACK AND         4PTI  59
REMARK   3  LEVITT REFINEMENT PROCEDURE COMBINING CRYSTALLOGRAPHIC AND  4PTI  60
REMARK   3  ENERGY REFINEMENT. (A.JACK,M.LEVITT, ACTA CRYSTALLOGR.,     4PTI  61
REMARK   3  A34, 931-935, 1978).  THE R-VALUE FOR REFLECTIONS WITHIN    4PTI  62
REMARK   3  THE SHELL 1.5 TO 7.0 ANGSTROMS AND WITH                     4PTI  63
REMARK   3  2*(ABS(FO)-ABS(FC))/(ABS(FO)+ABS(FC)) LESS THAN 1.2 IS      4PTI  64
REMARK   3  0.162.                                                      4PTI  65
REMARK   4                                                              4PTI  66
REMARK   4 COORDINATES FOR 60 WATER MOLECULES ARE GIVEN FOLLOWING THE   4PTI  67
REMARK   4 MAIN BODY OF THE PROTEIN.  THE NOMENCLATURE OF THE WATER     4PTI  68
REMARK   4 MOLECULES IS THAT OF THE DEPOSITORS.                         4PTI  69
REMARK   5                                                              4PTIA  4

Following the introductory material, some specific information regarding the protein and its crystalline form are provided, including the protein sequence, secondary structure, disulfide bonds (where present) and the like. Also the dimensions of the unit cell, the space group and related information are given.

SEQRES   1     58  ARG PRO ASP PHE CYS LEU GLU PRO PRO TYR THR GLY PRO  4PTI  70
SEQRES   2     58  CYS LYS ALA ARG ILE ILE ARG TYR PHE TYR ASN ALA LYS  4PTI  71
SEQRES   3     58  ALA GLY LEU CYS GLN THR PHE VAL TYR GLY GLY CYS ARG  4PTI  72
SEQRES   4     58  ALA LYS ARG ASN ASN PHE LYS SER ALA GLU ASP CYS MET  4PTI  73
SEQRES   5     58  ARG THR CYS GLY GLY ALA                              4PTI  74
FORMUL   2  HOH   *60(H2 O1)                                            4PTI  75
HELIX    1  H1 PRO      2  GLU      7  1
HELIX    2  H2 SER     47  GLY     56  1                                4PTI  76
SHEET    1  S1 2 ALA    16  ALA    25  0                                4PTI  77
SHEET    2  S1 2 GLY    28  GLY    36 -1                                4PTI  78
SSBOND   1 CYS      5    CYS     55                                     4PTI  79
SSBOND   2 CYS     14    CYS     38                                     4PTI  80
SSBOND   3 CYS     30    CYS     51                                     4PTI  81
CRYST1   43.100   22.900   48.600  90.00  90.00  90.00 P 21 21 21    4  4PTI  82
ORIGX1      1.000000  0.000000  0.000000        0.00000                 4PTI  83
ORIGX2      0.000000  1.000000  0.000000        0.00000                 4PTI  84
ORIGX3      0.000000  0.000000  1.000000        0.00000                 4PTI  85
SCALE1       .023202  0.000000  0.000000        0.00000                 4PTI  86
SCALE2      0.000000   .043668  0.000000        0.00000                 4PTI  87
SCALE3      0.000000  0.000000   .020576        0.00000                 4PTI  88

Then come the atom coordinates, whose listing takes up most of the average PDB file. Each listing begins with "ATOM" and is followed by:

The atom number
The type of atom (see Appendix B for more details)
The type of residue
The residue's sequence number in the protein
The x, y and z coordinates
The occupancy factor
The temperature factor (B factor)
And lastly, the models identifier (4PTI in this case) and the line number in the file).
ATOM      1  N   ARG     1      26.465  27.452  -2.490  1.00 25.18      4PTI  89
ATOM      2  CA  ARG     1      25.497  26.862  -1.573  1.00 17.63      4PTI  90
ATOM      3  C   ARG     1      26.193  26.179   -.437  1.00 17.26      4PTI  91
ATOM      4  O   ARG     1      27.270  25.549   -.624  1.00 21.07      4PTI  92
ATOM      5  CB  ARG     1      24.583  25.804  -2.239  1.00 23.27      4PTI  93
ATOM      6  CG  ARG     1      25.091  24.375  -2.409  1.00 13.42      4PTI  94
ATOM      7  CD  ARG     1      24.019  23.428  -2.996  1.00 17.32      4PTI  95
ATOM      8  NE  ARG     1      23.591  24.028  -4.287  1.00 17.90      4PTI  96
ATOM      9  CZ  ARG     1      24.299  23.972  -5.389  1.00 19.71      4PTI  97
ATOM     10  NH1 ARG     1      25.432  23.261  -5.440  1.00 24.10      4PTI  98
ATOM     11  NH2 ARG     1      23.721  24.373  -6.467  1.00 14.01      4PTI  99
ATOM     12  N   PRO     2      25.667  26.396    .708  1.00 10.92      4PTI 100
ATOM     13  CA  PRO     2      26.222  25.760   1.891  1.00  9.21      4PTI 101
ATOM     14  C   PRO     2      26.207  24.242   1.830  1.00 12.15      4PTI 102
ATOM     15  O   PRO     2      25.400  23.576   1.139  1.00 14.46      4PTI 103
ATOM     16  CB  PRO     2      25.260  26.207   3.033  1.00 13.09      4PTI 104
ATOM     17  CG  PRO     2      24.512  27.428   2.493  1.00 11.42      4PTI 105
ATOM     18  CD  PRO     2      24.606  27.382    .978  1.00 11.88      4PTI 106

This goes on for a while, until the end of the peptide chain, which is marked by the "TER" line. If there are any other molecules that cocrystallized with the protein (such as solvent molecules or ligands) they are listed as "heteroatoms" near the end of the file.

TER     455      ALA    58                                              4PTI 543
HETATM  456  O   HOH   101      14.483  32.405  -3.949  1.00 16.73      4PTI 544
HETATM  457  O   HOH   102       5.350  14.061  18.456  1.00 25.35      4PTI 545
HETATM  458  O   HOH   103      18.785  30.833  -6.010  1.00 30.52      4PTI 546
HETATM  459  O   HOH   104      25.258  31.756  -3.598  1.00 34.15      4PTI 547
HETATM  460  O   HOH   105      23.626  30.718   1.059  1.00 35.13      4PTI 548
HETATM  461  O   HOH   106      16.662  21.017  18.977  1.00 29.19      4PTI 549
HETATM  462  O   HOH   107      16.177  23.996  10.526  1.00 27.91      4PTI 550
HETATM  463  O   HOH   108      18.137  26.764   7.490  1.00 31.88      4PTI 551
HETATM  464  O   HOH   109      20.608  29.238   6.548  1.00 36.56      4PTI 552