I. Vacuum Molecular Dynamics Simulations of echistatin, an RGD containing snake toxinIn recent years there has been a significant interest in proteins that contain the sequence Arg-Gly-Asp (RGD). This tripeptide sequence was identified as a fundamental unit involved in recognition by a number of cell surface proteins (Ruoslahti and Pierschbacher 1986) of a variety of glycoproteins which play an important role in cell adhesion. This sequence has been discovered in proteins extracted from the venom of various vipers (Adler, Lazarus et al. 1991; Cooke, Carter et al. 1991; Gould, Polokoft et al. 1990; Saudek, Atkinson et al. 1991) and leeches (Krezel, Wagner et al. 1994) as well as in virus coat proteins (Grimes, Basak et al. 1995; Logan, Abu-Ghazaleh et al. 1993). NMR structural studies indicate that the sequence has a propensity to form a reverse turn but it can also exist in extended conformations. However, in most of the NMR studies, the detailed structure has not been fully determined, in part because the loop involving RGD appears to be flexible (Copié, Tomita et al. 1998). Thus, the available data are not conclusive as to the conformational requirements for cell adhesion activity of RGD. This makes it important to supplement the experimental data by molecular dynamics simulations (Honig, Hudson et al. 1971; Kessler, Griesinger et al. 1988). The exercise presented here employs unconstrained molecular dynamics simulations to probe the structure and conformational dynamics of the 49 residue snake toxin, echistatin, which contains the RGD sequence in a protruding loop.
Adler, M., Lazarus, R. A., Dennis, M. S. and Wagner, G. (1991). Solution structure of kristin, a potent platelet aggregation inhibitor and GP IIb-IIIa antagonist. Science. 253: 445-448. Cooke, A. M., Carter, B. G., Martin, D. M. A., Murray-Rust, P. and Weir, M. P. (1991). Nuclear magnetic resonance studies of the snake toxin echistatin 1H resonance assignements and secondary structure. European Journal of Biochemistry. 202: 323-328. Copié, V., Tomita, Y., Akiyama, S. K., Aota, S.-i., Yamada, K. M., Venable, R. M., Pastor, R. W., Krueger, S. and Torchia, D. A. (1998). Solution Structure and Dynamics of Linked Cell Attachment Modules of Mouse Fibronectin Containing the RGD and Synergy Regions: Comparison with the Human Fibronectin Crystal Structure. J. Mol. Biol. 277: 663-682. Gould, R. M., Polokoft, M. A., Friedman, P. A., Huang, T. F., Holt, J. T., Cook, J. J. and Niewiarowski, S. (1990). Proc. Soc. Exp. Biol. Med. 195: 168-171. Grimes, J., Basak, A. K., Roy, P. and Stuart, D. (1995). The crystal structure of bluetongue virus VP7. Nature. 373: 167-170. Honig, B., Hudson, B., Sykes, B. D. and Karplus, M. (1971). Ring Orientation in b-Ionone and Retinals. Proc. Natl. Acad. Sci. USA. 68: 1289-1293. Kessler, H., Griesinger, C., Lautz, J., Müller, A., van Gunsteren, W. F. and Berendsen, H. (1988). J. Am. Chem. Soc. 110: 3393-3396. Krezel, A. M., Wagner, G., Seymour-Ulmer, J. and Lazarus, R. A. (1994). Structure of the RGD Protein Decorsin: Conserved Motif and Distinct Function in Leech Proteins That Affect Blood Clotting. Science. 264: 1944-1947. Logan, D., Abu-Ghazaleh, R., Blakemore, W., Curry, S., Jackson, T., King, A., Lea, S., Lewis, R., Newman, J., Parry, N., Rowland, s. D., Stuart, D. and Fry, E. (1993). Structure of a major immunogenic site on foot-and-mouth disease virus. Nature. 362: 566-568. Ruoslahti, E. and Pierschbacher, M. D. (1986). Arg-Gly-Asp: a versatile cell recognition signal. Cell. 44: 517-518. Saudek, V., Atkinson, R. A. and Pelton, J. T. (1991). Three-dimensional structure of echstatin, the smallest active RGD protein. Biochemistry. 30: 7369-7372.
Procedure In this exercise, we will go through the different steps of setting up and running a molecular dynamics simulation of echistatin using the CHARMM program; in addition to the tutorial given below, some additional exercises are provided. Below are the different steps, some of which will be discussed in detail and others presented as exercises:
Dynamics in vacuum: Dynamics in water, periodic box: Dynamics in water, solvation sphere: Dynamics in water, local solvation:
All the exercises presented here are easily adapted to other molecular dynamics simulation programs. While the detailed commands may change, the general approach taken to set up and execute a molecular dynamics simulation is the same. In this way, these tutorials can still serve as a guide. Create a new sub-directory exercise_md and change to that directory. Copy the following files from your local CHARMM program directory to your new directory: top_all22_prot.inp par_all22_prot.inp These two files are located in the toppar directory of the standard CHARMM distribution Download the following input files from this course: Type ls to see the content of the directory. In this exercise, you are asked to follow detailed instructions and edit CHARMM template files; only slight modifications are required, but this will give you a flavor of all the steps involved in setting up a molecular dynamics simulation. The first step in any CHARMM calculation is to generate the PSF, load the initial coordinates, and, if necessary, to build missing parts of the protein and to place missing hydrogen atoms. You will execute some of these steps using the build.inp CHARMM input file that you already downloaded. Start by opening the file named build.inp using whatever editor you like. This file holds the appropriate CHARMM commands for reading in a coordinate file in PDB format and generating the necessary CHARMM data structures. The initial coordinates of the echistatin protein were obtained from the Protein Databank (entry code 2ech).
If the atomic coordinates being used were determined from an x-ray crystal structure, it will be necessary to first generate the hydrogen atoms. The command for this (HBUILD) is given in the input file, build.inp. Since the PDB structure we are using was determined by NMR spectroscopy, the hydrogen atoms are already present, however, it may be necessary to change the names of several hydrogen atoms (see above). For the purpose of this exercise, the atom name changes discussed above have already been implemented. One of the exercises given below asks you to introduce these changes and others in the PDB file for the protein barnase. The original protein databank structure for this protein (entry code: 2ech) contains 24 NMR structures of the protein; for this exercise, the 6th structure was taken at random. After the PSF has been built and the coordinate file confirmed to be complete, the coordinates of protein are printed to the output file and the .psf and the full coordinate set are written to disk.
To execute this command file using the CHARMM program, type the following command at the UNIX prompt; the job should take less than a minute to run. charmm < build.inp > build.out If you do an ls in your directory, you will see the additional files corresponding to the .psf file and the .pdb file, which are the PSF and coordinate files, respectively. You can see the output here --> build.out
Answer: Supplementary Exercise 2: From the Protein databank, retrieve the coordinate file for barnase (PDB code: 1bni) and generate the PSF for a single molecule; the PDB file contains the coordinates for 4 molecules. Answer:
2.1) Energy minimization The next input file (minmize.inp) sets up an energy calculation.
As in the previous input file, the topology and parameter files are first read into CHARMM; following this, the PSF and the coordinate file generated previously are read and the cutoff model, e.g. the treatment of the non-bonded interactions, is defined. For the cutoff model, we use a 15Å cutoff with a SWITCH truncation function on both the van der Waals and the electrostatic interactions; a 0.5Å switching region is employed. Notice that the cutoff list is generated with a 15Å cutoff, yet the switching function terminates the interaction at 14.5Å. Can you think of why this is? (Answer given in version 1). A constant dielectric function with epsilon equal to one is being used. Following specification of the cutoff model, the energy is calculated and a steepest descent algorithm is used for 500 steps to minimize the energy by slightly changing the structure. The energies are printed out every 50 steps and finally, the energy is recalculated. The energy minimized coordinates are then written to a new file named 2ech-mini.pdb. The input file is terminated with the stop command. To execute the job type the following at the UNIX prompt , charmm < minimize.inp > minimize.out & The output will be written to the minimize.out file. When the job is done look at the output file and locate the first energy evaluation, the minimization energies and the second energy evaluation. What happens to the various energy terms during the energy minimization?
Notice that the CHARMM program, in addition to printing out the total energy (ENERgy), also gives you a breakdown of the different energy terms; the internal energies such as the bond energy (BONDs), the angle (ANGLes) and dihedral (DIHEdrals) energy terms are listed, as well as, the nonbonded energy terms, such as the van de Waals energy (VDWaals) and the electrostatics energy (ELEC). In addition, the output gives you the change in energy from the last evaluation and the gradient of the energy (GRMS), the smaller this number, the closer you are to the local energy minimized structure.
The next step in the procedure is to heat the system to the desired temperature. Download the CHARMM input file md_heat.inp. As always, prior to any calculation, the topology file, the parameter file and the PSF are read into CHARMM; for this calculation, the energy minimized coordinates from the previous job are used and the same cutoff model as was used for the energy minimization is specified. To run the molecular dynamics, the CHARMM DYNAmics command is used. Since molecular dynamics simulations can often require a significant amount of CPU time, the simulations are often broken up into smaller pieces. The restart file is written at the end of a segment and it allows one to continue the simulation at another time. In preparation for writing the restart file at the end of the heating segment, the file opened with write permission. To run the molecular dynamics simulation, the VERLET integration algorithm is used.
The different keywords specify how the simulation (heating up in this case) is executed:
Run this CHARMM input file for heating the system. This job took 2.05 CPU minutes on a Pentium II 400MHz machine running LINUX. After heating the system to 300K, equilibration is done to ensure that the system is in a stable state. Download the file md_equil.inp. Here, the system is equilibrated for 3 ps at 300 K. The trajectory started in the previous heating step is continued (RESTarting from the end of the previous segment). In this case, the goal is to keep the system at the target temperature. During this part of the simulation, the temperature is checked periodically and if it falls outside the window of ±10K (in this example) of the desired temperature, the velocities are scaled in such a way as to bring the temperature back around the target temperature.
All files pertaining to the dynamics must be opened prior to calling the DYNAmics command. The "restart" file from the previous job (2ech_heat.res from md_heat.inp) is opened for reading (UNIT 11; keyword IUNRead) and the file to which the restart file will be written at the end of the current segment is opened for writing (UNIT 12; keyword IUNWrite). Many of the keywords in the DYNAmics command are the same as in the heating run, except that there is no heating of the system; the temperature is checked (ICHECW 1) every 30 time steps (IEQFrq) to see if it has moved outside the desired window of ±10K (TWINDL -10.0 TWINDH 10.0). If so, the velocities are scaled (IASORS 0 IASVEL 1 ISCVEL 0) to bring the temperature back within range of the target temperature. The coordinates are written out at the end of the equilibration phase (2ech_equil.crd). Analysis: After executing this CHARMM input file, extract the energies that were printed in the output files. You can use the UNIX grep command to direct the results to a file. grep 'DYNA>' md_heat.out > md_heat.data grep 'DYNA>' md_equil.out > md_equil.data The columns in these files contain the following information DYNA> Steps Time E_total E_kinetic E_potential Temperature Plot the total energy, kinetic energy, potential energy as a function of time. As you can see, the total energy fluctuates much less than the other quantities; can you explain why? {answer} Use a molecular graphics viewer such as SWISS-PDB viewer, MOLMOL or VMD to compare the initial structure (2ech.pdb), and the final conformation (2ech_equil.crd). Are there any differences between the two structures? Now that the system is properly equilibrated, the next step is the production run. The CHARMM command file md_run.inp provides the commands necessary to continue the simulation. For the type of calculation we are doing, the velocity scaling is turned off during the production phase and the system is allowed to propagate in time without any further intervention. If the system is properly equilibrated, the energy and temperature should remain stable. If a steady drift in the energy is observed, this could mean that the system was not fully equilibrated.
Since it is during the production run that the data for analysis is generated. Since all analysis in the CHARMM program is done after the simulation, the dynamics trajectory file must be saved; in the previous jobs only the restart file and the output file were saved for analysis. In the present example, the dynamics trajectory is saved to the file 2ech_run.cor. This file is opened as "unit" 14 (keyword NSAVC) and will eventually include a series of conformations written every 100 integration steps, or every 0.1 ps (keyword IUNCRD). The energies are printed every 100 steps and their averages every 200 steps (NPRInt and IPRFrq). Run the md_run.inp script file as before, charmm < md_run.inp > md_run.out & Before continuing the production phase of the simulation, it is important to assess the stability of the system. One measure of the stability is the fluctuation of the total energy. The simulations done here should conserve the total energy of the system and the fluctuations should be very small. Often, the ratio between the average RMS fluctuation of the total energy and the average total energy is calculated; if this less than 10-4 for the type of simulation done here, the energy presumed to be stable. An awk script (energy.awk) is provided with which you can use to calculate the mean total energy and the fluctuations. To execute it type, awk -f energy.awk <grep_output> What is the quality of the simulation based of the fluctuations of the total energy? Supplementary Exercise 3.: Using the input file md_run.inp as a template, write a CHARMM input file for the continuation of the production phase molecular dynamics.
|
* analysis of simulation * ! Read in Topology and Parameter files open unit 1 card read name top_all22_prot.inp read RTF card unit 1 close unit 1 open unit 1 card read name par_all22_prot.inp read PARA card unit 1 close unit 1 ! Read in PSF and Coordinate file open read formatted unit 27 name 2ech.psf read psf card unit 27 open read formatted unit 27 name 2ech.pdb read coor pdb unit 27 coor copy comp !++++++++++++++++++++++++++++++++++++++++++++++++ ! Calculate average structure ! Calculate RMSD from the experimental structure ! Calculate RGYR of the experimental structure ! Calculate RGYR of the average structure OPEN UNIT 14 READ UNFORMatted NAME 2ech_run.cor coor dyna firstu 14 nunit 1 coor orie rms sele type n .or. type ca .or. type c end open write formatted unit 27 name 2ech-ave.pdb write coor pdb unit 27 * average structure from MD * coor rgyr comp coor rgyr !++++++++++++++++++++++++++++++++++++++++++++++++ ! Calculate dihedral angle time series ! OPEN UNIT 25 READ UNFORMatted NAME 2ech_run.cor OPEN UNIT 30 WRITE FORMatted NAME 2ech.anal correl maxseries 9 maxtimesteps 7000 maxatom 70 enter phi2 dihe prot 24 C prot 25 N prot 25 CA prot 25 C enter psi2 dihe prot 25 N prot 25 CA prot 25 C prot 26 N enter phi3 dihe prot 25 C prot 26 N prot 26 CA prot 26 C enter psi3 dihe prot 26 N prot 26 CA prot 26 C prot 27 N trajectory firstunit 25 nunits 1 ! begin 100 skip 100 show all edit phi2 veccode 4 show all phi2 write phi2 unit 30 card !+++++++++++++++++++++++++++++++++++++++++++++++ stop |
Analyzing a molecular dynamics trajectories often involves calculating the time dependent behavior of different properties, for example, certain distances, angles and torsions. In the CHARMM input file, md_anal.inp, the time series of the four backbone dihedral angles of the RGD sequence (f24 and y24 around the Ca of residue 24 (Gly) and f25, y25 around the Ca of residue 25 (Asp)) are calculated using the CORRel facility. Look at the input file above to see the specific commands. Execute this CHARMM job as done earlier. The calculation will produce a new file named 2ech.anal which contains five columns as follows:
Time-Step Phi2 Psi2 Phi3 Psi3
These data are the time series for the four dihedral angles. Plot the time series of the four dihedral angles as a function of time. Are there any significant conformational changes?
Supplementary Exercise 3
In the above example, the calculations were done using a cutoff model where there was no attempt to include any solvent effect, either explicitly with water molecules or implicitly using a dielectric shielding constant. The simplest way to include solvent effects is to include a dielectric screening function. To see how this can effect your results, redo the above simulation after editing the input files for minimization and the different steps of dynamics 2ech_min.inp, 2ech_heat.inp, 2ech_equil.inp and 2ech_run.inp changing the value of EPSilon from 1.0 to 80.0. What qualitatively different behavior do you observe?
Do the same as above, but change the cutoff function for electrostatic interactions from SWITch to SHIFt in all the input files. What differences in behavior do you observe?
Answer:
How can you make use of CHARMM stream files so that when changing the definition of the cutoff model, you only have to modify a single file instead of all the command files for your simulation?