CHARMM

The CHARMM program is a general purpose molecular mechanics and molecular dynamics simulation program. The core of the program is based on the empirical potential energy function discussed earlier which is used for energy minimization, molecular dynamics simulations, vibrational analysis and thermodynamic calculations. In addition, there are interfaces to several quantum mechanical programs which allow one to do mixed QM and MM calculations. The program contains a comprehensive analysis facility which enables the user to compare structures, evaluate energies, calculate time series and correlation functions. The program can treat systems ranging in size from small individual organic molecules to large proteins and DNA molecules in solution. It has been developed in the laboratory of Professor Martin Karplus at Harvard University over the past 20 years. Today, contributors continue to improve and enhance the capabilities of the program.

Below is a list of the main features of the latest version of the CHARMM program, earlier versions may not contain all the feature listed below.

  1. Molecular Mechanics (Energy Minimization)
  2. Molecular Dynamics
    • Classical Simulations
    • Crystal Simulations
    • Specialized techniques (Umbrella Sampling, Constant Pressure)
  3. Monte Carlo simulation package
  4. Free Energy Perturbation Calculations
    • Block program
    • Perturb program
    • Thermodynamic Simulation Method (TSM)
  5. Normal Mode and Molecular vibrational analysis facility
  6. Reaction Path Determination
  7. Advanced implicit solvent models
    • Analytical Continuum Electrostatics (ACE) model
    • Effective Energy Function 1 (EEF1)
    • Generalized Born Solvation Energy (GBorn)
  8. Interfaces
    • Multi-body dynamics (MBOND)
    • Merck Molecular Force Field (MMFF)
    • Quantum and Molecular Mechanical Force Field (QM/MM)
  9. Misc
    • Analysis facility
    • Time series and correlation function calculations
    • NMR analysis facility
    • Poisson-Boltzmann Equation Solver
    • Reference Interaction Site Model (RISM)

Currently, parameter sets are available for proteins1, DNA/RNA2, and lipids3.

The CHARMM program runs on many different platforms including DEC alpha, Convex, Cray, HP, IBM RISC, IBM SP, SGI, SUN, GNU and Linux machines.

References:
  1. A. D. MacKerell Jr., J. Wiórkiewicz-Kuczera, and M. Karplus (1995). An All-Atom Empirical Energy Function for the Simulation of Nucleic Acids, J. Am. Chem. Soc. 117, 11946-11975.
  2. A. D. MacKerell, Jr., D. Bashford, M. Bellott, R. L. Dunbrack Jr., J. D. Evanseck, M. J. Field, S. Fischer, J. Gao, H. Guo, S. Ha, D. Joseph-McCarthy, L. Kuchnir, K. Kuczera, F. T. K. Lau, C. Mattos, S. Michnick, T. Ngo, D. T. Nguyen, B. Prodhom, W. E. Reiher, III, B. Roux, M. Schlenkrich, J. C. Smith, R. Stote, J. Straub, M. Watanabe, J. Wiórkiewicz-Kuczera, D. Yin, M. Karplus (1998). All-atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins. J. Phys. Chem. 102, 3586-3616.
  3. M. Schlenkrich, J. Brickmann, A. D. MacKerell Jr., and M. Karplus (1996). An Empirical Potential Energy Function for Phospholipids: Criteria for Parameter Optimization and Applications, in Biological Membranes: A Molecular Perspective from Computation and Experiment, K. M. Merz, Jr. and B. Roux, editors (Birkhâuser), 31-81.

A complete description of commands available in the CHARMM Program is available in the CHARMM documentation supplied with the program as well as in the CHARMM paper [J. Comp. Chem., 4, 187 (1983)]. See also http://www-esbs.u-strasbg.fr/notes-de-cours/modeli/charmm/CHARMM-docs.html for the complete documentation online. Here, we describe some of the basic data structures and introduce some of the basic commands.


Data Structures

It is useful to review several fundamental data structures in the CHARMM program. These data structures include information about the molecule, its composition, its chemical connectivity, certain atomic properties, internal coordinates and parameters for the energy function and more. This information for a particular class of molecules, e.g, proteins or nucleic acids, is contained in the topology file and the parameter file. For a specific molecule, the necessary data is extracted from these files and stored in the Protein Structure Files (PSF). In the following section, a brief description of these and several other data structures is given.

Topology and Parameter Files

Residue Topology File (RTF)

The residue topology file contains definitions of the molecular building blocks (residues) which are used to build large molecules, for example, definitions of the amino acids and DNA bases. This file contains the name of each atom type in the parameter set, its mass, hydrogen bond donors and acceptors, as well an atom’s partial charge in a particular residue (amino acid). Atom types are used to define an atom in a particular bonding situation, for example an aliphatic carbon atom in an sp3 bonding situation (CT3) has different properties than a carbon atom (CPH1) found in the His ring. For each residue, the covalent structure is defined, i.e., how the atoms are connected to one another to form amino acids, DNA bases or lipid molecules.

Parameter File (PARAM)

The parameter file is associated with the RTF file as it contains all the necessary parameters for calculating the energy of the molecule(s). These include the equilibrium bond distances and angles for bond stretching, angle bending and dihedral angle terms in the potential energy function as well as the force constants and the Lennard-Jones 6-12 potential parameters. These parameters are associated with particular atom types (see above).

Example of the RTF for the Alanine residue:
RESI ALA0.00
GROUP
ATOM NNH1-0.47  !    |
ATOM HNH 0.31   ! HN-N
ATOM CACT1 0.07 !    |    HB1
ATOM HAHB 0.09  !    |    /
GROUP           !HA-CA--CB-HB2
ATOM CBCT3-0.27 !    |    \
ATOM HB1HA 0.09 !    |    HB3
ATOM HB2HA 0.09 !  O=C
ATOM HB3HA 0.09 !    |
GROUP           !
ATOM CC 0.51
ATOM OO-0.51
BONDCB CA N HN N CA
BOND C CA C +N CA HA CB HB1 CB HB2 CB HB3
DOUBLE O C
IMPR N -C CA HN C CA +N O
DONOR HN N
ACCEPTOR O C
IC -C CA *N HN    1.3551 126.4900 180.0000 115.4200 0.9996
IC -C N CA C      1.3551 126.4900 180.0000 114.4400 1.5390
IC N CA C +N      1.4592 114.4400 180.0000 116.8400 1.3558
IC +N CA *C O     1.3558 116.8400 180.0000 122.5200 1.2297
IC CA C +N +CA    1.5390 116.8400 180.0000 126.7700 1.4613
IC N C *CA CB     1.4592 114.4400 123.2300 111.0900 1.5461
IC N C *CA HA     1.4592 114.4400 -120.4500 106.3900 1.0840
IC C CA CB HB1    1.5390 111.0900 177.2500 109.6000 1.1109
IC HB1 CA *CB HB2 1.1109 109.6000 119.1300 111.0500 1.1119
IC HB1 CA *CB HB3 1.1109 109.6000 -119.5800 111.6100 1.1114
 

The following data structures are constructed for a specific molecule or molecules.

Protein Structure File (PSF)

The Protein Structure File (PSF) is the most fundamental data structure in the CHARMM program. It is generated for a specific molecule or molecules and it is a concatenation of the information contained in the RTF file. It contains the detailed composition and connectivity of the molecule(s) of interest. It gives the total number of bonds and identifies which atoms make up a particular bond. Similarly for the bond angles, where triplets of atoms are listed, and for torsion angles, where quadruplets of atoms are specified. Improper dihedral angles are also listed; these are used to maintain planarity and prevent accidental chiral inversion. The PSF can include several molecular entities, defined as segments, which can range from a single macromolecular chain to multiple chains solvated by explicit water molecules.

The PSF must be specified before any calculations can be performed on the molecule. The PSF constitutes the molecular topology but does not contain information regarding the bond lengths, angles, etc., so it is necessary to read in the parameter file to add the missing information.

Coordinate File (CRD)

The coordinate file contains the Cartesian coordinates of all atoms in the system. The cartesian coordinates of a molecule are most often obtained from x-ray crystal structures or from NMR structures. Alternatively, they can be generated from homology modeling studies or from the library values contained in the topology and parameter files along with some secondary structure information. Missing coordinates can be built within the CHARMM program using the internal coordinate facility (see below); in addition, hydrogen atoms, which are not present in x-ray crystal structures of proteins may be placed using the HBUILD module in CHARMM. Two sets of coordinates can be accommodated in the CHARMM program, which is useful for a variety of calculations. For example, it can be used to the calculate the root mean square difference (rmsd) between a structure from the simulation and the experimental structure.

Internal Coordinate Table (IC Table)

The Internal Coordinate Table contains information on the positions of the atoms relative to one another within the structure, i.e., the bond lengths, angles and dihedrals etc. for the molecule. The IC table is useful for analyzing the structure of a molecule or for building parts of the molecule that are missing. This table can be constructed by including the setup keyword in the generate statement when generating the PSF file. The table itself has a particular notation, see below.

Normal IC table entry:
Improper dihedral entry:

Each entry in the IC table (see below) lists 4 connected atoms, I, J, K and L,; for a normal IC table entry, the I-J bond length, R(IJ); the I-J-K bond angle, T(IJK); the dihedral angle I-J-K-L, PHI; the bond angle T(JKL); and the K-L bond length, R(KL) are listed. Improper dihedral angles, which are used to keep sp2 atoms planar and sp3 atoms in a tetrahedral geometry, are marked with a star. The center atom of an improper dihedral angle is marked with a star. For an improper dihedral angle entry, the I-K bond length, R(IK); the I-K-J bond angle, T(IKJ); the dihedral angle I-J-K-L , PHI; the J-K-L bond angle T(JKL); and the K-L bond length, R(KL) are listed. The atom entry "-99" indicates an undefined atom.

 

Example of an Internal Coordinate (IC) Table

N
I
J
K
L
R(I(J/K))
T(I(JK/KJ))
PHI
T(JKL)
R(KL)
1
-99
1CA
1*N
1HN
0
0
0
116.54
1.001
2
-99
1N
1CA
1C
0
0
0
109.5
1.5091
3
1N
1CA
1C
-99
1.4402
109.5
0
0
0
4
-99
1CA
1*C
1O
0
0
0
120.67
1.2326
5
1CA
1C
-99
-99
1.5091
0
0
0
0
6
1N
1C
1*CA
1CB
1.4402
109.5
-123.04
109.95
1.5504
7
1N
1C
1*CA
1HA
1.4402
109.5
116.52
106.68
1.0813
8
1N
1CA
1CB
1CG
1.4402
111.67
62.63
110.08
1.5029
9
1CG
1CA
1*CB
1HB1
1.5029
110.08
118.93
110.71
1.1117
10
1CG
1CA
1*CB
1HB2
1.5029
110.08
-120.35
110.53
1.1112
11
1CA
1CB
1CG
1CD1
1.5504
110.08
73.53
120.22
1.4051
12
1CD1
1CB
1*CG
1CD2
1.4051
120.22
-179.66
120.75
1.4054
13
1CB
1CG
1CD1
1CE1
1.5029
120.22
-179.47
120.55
1.4031
14
1CE1
1CG
1*CD1
1HD1
1.4031
120.55
-179.96
119.66
1.0805
15
1CB
1CG
1CD2
1CE2
1.5029
120.75
179.41
120.46
1.4021
16
1CE2
1CG
1*CD2
1HD2
1.4021
120.46
-179.87
119.81
1.0802
17
1CG
1CD1
1CE1
1CZ
1.4051
120.55
-0.02
119.84
1.3994
18
1CZ
1CD1
1*CE1
1HE1
1.3994
119.84
179.99
119.72
1.0793
19
1CZ
1CD2
1*CE2
1HE2
1.3974
120.02
179.88
120.43
1.0803
20
1CE1
1CE2
1*CZ
1OH
1.3994
120.09
179.99
119.45
1.4069
21
1CE1
1CZ
1OH
1HH
1.3994
120.45
1.05
107.77
0.9584

 


The CHARMM input file

Although the CHARMM program can be run interactively (see below) it is often more convenient to put the CHARMM commands in a file and submit the CHARMM job in batch mode; this makes debugging the commands easier.

All CHARMM input files begin with several lines of comments. These title lines begin with a "*" as the first character and the title is terminated by a line containing only a "*". For example, in our first CHARMM input file, example1.inp, we begin as follows:

* Example 1: First CHARMM input file
* Defines and builds a PSF file, then calculates the energy
*

There are several rules for specifying CHARMM commands

CHARMM commands:

The first word in every command line specifies the command. The command line is written in free format and is generally written to the standard output. Most commands have several options, these follow the command word in the form of specific keywords. In most cases only the first four letters of the keyword are required. For clarity we will capitalize the first four characters of each command.

Example: To open and read in the topology (top_all22_prot.inp) and the parameter files (par_all22_prot.inp), the CHARMM commands are

OPEN READ FORMatted UNIT 27 NAME top_all22_prot.inp
READ RTF CARD UNIT 27
OPEN READ FORMatted UNIT 28 NAME par_all22_prot.inp
READ PARA CARD UNIT 28

The next step to setting up a CHARMM calculation is the generation of the PSF file for your particular system. First, the sequence must be specified, this can either be done by typing in the sequence or by reading it in from the coordinate file. Once the sequence has been specified, the PSF is built using the GENErate command. For example, 

! Read Sequence for Polyalanine ! This is a comment line
READ SEQU CARD                  ! The read sequence command
* Poly-alanine sequence         ! Title line
*                               ! End of title line
6                               ! Number of residues
ALA ALA ALA ALA ALA ALA         ! Specify the residues
GENErate ALA6 SETUp             ! Command to generate the psf

In the example above, ALA6 is the name associated with the sequence of six alanine residues and the SETUp keyword specifies that the internal coordinate table should be constructed. By default, N and C terminal groups are added during the generate statement. We will look at these and other commands in the exercise section.

 

Stream files

It is often convenient in CHARMM input files to include portions of the main command file in CHARMM stream files, and call them when necessary. This helps make the input files clear and straightforward to understand. The command to read in a stream file is STREam. A stream files starts with a title and can include as many CHARMM command lines as needed. It ends with the command RETUrn.

For example, in your master command file, the following command would be included if you want read in a stream file which contains the commands do an energy minimization

STREam mini.str

In your local directory, you would have a file called mini.str which would include the following commands,

* Energy minimization protocol
*
MINI SD NSTEP 200 NPRInt 10
MINI ABNR NSTEP 500 NPRINT 50
RETUrn
 

Comments

Comments can be entered on new lines anywhere in the command file or at the end of any command line. A comment is indicated by an exclamation mark (!) at its beginning.


Running the CHARMM program

Interactive mode

The CHARMM program may be run interactively by first launching the CHARMM program and then entering the command lines one by one, pressing the <RETURN> key at the end of each line. CHARMM has extensive error checking, so if a command is mistyped, an error message is be provided. Often, if the error is serious enough, the CHARMM session will be terminated. You will then have to run CHARMM again and retype everything up to that final mistake. This can be a very inefficient way to run CHARMM, so it is reccommended that input files are written and submitted to CHARMM in batch mode. To terminate an interactive session, hold the CTRL key down and type c. This action will terminate CHARMM. To run CHARMM interactively simply type

charmm

The full path should be typed if the CHARMM executable is not in your local directory. You will then get a message indicating the version of CHARMM you are running, the current time and date and your name. You then type in the title lines, terminated with a lone *; you can then continue typing in different commands. The output will be directed to the screen.

Batch mode

The alternative to running in interactive mode, is batch mode. In this case, all the CHARMM commands are put in a file, for instance example1.inp. and this file is directed into CHARMM by typing

charmm < example1.inp > example1.out

The output which previously came to your screen is now directed to the file example1.out. The results can be viewed by editing this output file using an editor of your choice.