- A composite model for DNA torsion dynamics
- Dynamics of Real Structure in Fresh, Damaged and Reinforced States in Comparison with Shake Table an
- PP-waves with torsion and metric-affine gravity
- Crystal Structure and Luminescence with Relative Principles Calculation of a New One-dimen
- Dynamic structure factor of the Ising model with purely relaxational dynamics
- Ranking MOLGEN Structure Proposals by 13 C NMR Chemical Shift Prediction with ANALYZE
- Modeling The Structure And Dynamics of Dwarf Spheroidal Galaxies with Dark Matter And Tides
- Canonical structure of 3D gravity with torsion
- A model for gelation with explicit solvent effects Structure and dynamics
- Calculation of NMR-relaxation parameters for flexible molecules from molecular dynamics sim

J. Mol. Biol. (1997) 273, 283298

Torsion Angle Dynamics for NMR Structure Calculation with the New Program DYANA

P. Guntert, C. Mumenthaler and K. Wuthrich* ? ?

? Institut fur Molekularbiologie ? und Biophysik, Eidgenossische Technische Hochschule? Honggerberg, CH-8093 ? Zurich, Switzerland The new program DYANA (DYnamics Algorithm for Nmr Applications) for ef?cient calculation of three-dimensional protein and nucleic acid structures from distance constraints and torsion angle constraints collected by nuclear magnetic resonance (NMR) experiments performs simulated annealing by molecular dynamics in torsion angle space and uses a fast recursive algorithm to integrate the equations of motions. Torsion angle dynamics can be more ef?cient than molecular dynamics in Cartesian coordinate space because of the reduced number of degrees of freedom and the concomitant absence of high-frequency bond and angle vibrations, which allows for the use of longer time-steps and/or higher temperatures in the structure calculation. It also represents a signi?cant advance over the variable target function method in torsion angle space with the REDAC strategy used by the predecessor program DIANA. DYANA computation times per accepted conformer in the ``bundle'' used to represent the NMR structure compare favorably with those of other presently available structure calculation algorithms, and are of the order of 160 seconds for a protein of 165 amino acid residues when using a DEC Alpha 8400 5/300 computer. Test calculations starting from conformers with random torsion angle values further showed that DYANA is capable of ef?cient calculation of high-quality protein structures with up to 400 amino acid residues, and of nucleic acid structures.

# 1997 Academic Press Limited

*Corresponding author

Keywords: NMR structure determination; molecular dynamics in torsion angle space; simulated annealing; protein structure; nucleic acid structure

Introduction

Here, we present a novel implementation of torsion angle dynamics (TAD; Bae & Haug, 1987; ? Guntert et al., 1996; Jain et al., 1993; Katz et al.,

Present address: C. Mumenthaler, Boston Consulting ? Group AG, Zollikerstr. 226, CH-8008 Zurich, Switzerland. Abbreviations used: ADB, activation domain of porcine procarboxypeptidase B; Antp(C39S), Antp(C39S) homeodomain; APK, RNA pseudoknot APK; BPTI, basic pancreatic trypsin inhibitor; CPU, central processing unit; CypA, human cyclophilin A; DIANA, distance geometry algorithm for NMR applications; DYANA, dynamics algorithm for NMR applications; INCLAN, interactive command language; NMR, nuclear magnetic resonance; NOE, nuclear Overhauser effect; NOESY, NOE spectroscopy; PDB, Protein Data Bank; PGK, 3phosphoglycerate kinase; PrP(121-231), mouse prion protein domain PrP(121-231); REDAC, use of redundant dihedral angle constraints; rmsd, root-mean-square deviation; TAD, torsion angle dynamics.

00222836/97/41028316 $25.00/0/mb971284

1979; Kneller & Hinsen, 1994; Mathiowetz et al., 1994; Mazur & Abagyan, 1989; Mazur et al., 1991; ? Rice & Brunger, 1994; Stein et al., 1997) in the program DYANA for the calculation of three-dimensional protein and nucleic acid structures from sets of experimental distance constraints and torsion angle constraints collected by NMR spectroscopy ? (Wuthrich, 1986). In common with theoretical approaches to the protein folding problem, NMR structure calculation must afford adequate sampling of conformation space. In light of the fact that the energy landscape of a protein is complex and studded with many local minima (e.g. see Wolynes et al., 1995), new approaches for structure calculation from NMR data can therefore be of potential interest also with regard to gaining novel insights for studies on in vivo protein folding. Although we concentrate here on the aspect of NMR structure calculation, a detailed presentation of the algorithms underlying the high ef?ciency of DYANA for sampling of the conformation space of

# 1997 Academic Press Limited

284 biological macromolecules is justi?ed in view of potential alternative applications. Currently, most NMR structures of proteins are calculated either by simulated annealing with molecular dynamics in Cartesian space ? ? (Brunger, 1992; Brunger & Nilges, 1993), starting from conformers generated by metric matrix dis? tance geometry (Havel & Wuthrich, 1984; Nilges et al., 1988) or from extended conformers (Nilges et al., 1991), or by conjugate gradient minimization of a variable target function in ? torsion angle space (Braun & Go, 1985; Guntert ? et al., 1991a) using redundant dihedral angle ? ? constraints (REDAC; Guntert & Wuthrich, 1991). For larger molecules all these approaches tend to use long computation times, in particular when only relatively poor input data are available. The calculation of each conformer can be time-consuming, and the percentage of structures that reach small residual constraint violations can be so low that an excessively large number of computations need to be started. Usually, a low yield of converged structure calculations is due to the fact that the objective function to be minimized has many local minima, into which a minimizer may be trapped as long as it takes exclusively downhill steps. Although simulated annealing using molecular dynamics has in principle the possibility to ``escape'' from unfavorable local minima, Cartesian space molecular dynamics often fails to calculate successfully the structures of large proteins or of nucleic acids, unless a wellde?ned starting conformation is available (Stein et al., 1997). When compared with other structure calculation ? algorithms based on simulated annealing (Brunger, 1992), the principal difference of TAD is that it works with internal coordinates rather than Cartesian coordinates, which it has in common with the variable target function approach of the predeces? sor program to DYANA, DIANA (Guntert et al., 1991a). Thereby the number of degrees of freedom is decreased about tenfold, since the covalent structure parameters (bond lengths, bond angles, chiralities and planarities) are kept ?xed at their optimal values during the calculation. The strong potentials required to preserve the covalent structure in conventional Cartesian space molecular dynamics and the concomitant high-frequency motions are absent in torsion angle dynamics. This results in a simpler potential energy function and the use of longer time-steps for the numerical integration of the equations of motion, and hence in higher ef?ciency of the structure calculation. An additional, practical advantage of working in torsion angle space is that this approach excludes inadvertent changes of chirality during the course of a structure calculation, which has apparently resulted in errors in structures calculated with Cartesian space molecular dynamics (Schultze & Feigon, 1997). To fully exploit the potential advantages of TAD for macromolecular structure calculation, its implementation

Torsion Angle Dynamics and NMR Structures

has to take account of the fact that the Lagrange equations of motion with torsion angles as degrees of freedom are more complex than Newton's equations in Cartesian coordinates. A ``naive'' implementation of torsion angle dynamics (Mazur & Abagyan, 1989; Mazur et al., 1991) would entail the solution of a system of n linear equations in each time-step (n being the number of degrees of freedom) and thus require a computational effort proportional to n3, which would inevitably render the algorithm unattractive for larger systems. DYANA gets around this potential drawback by using a fast recursive implementation of the equations of motion that was originally developed for spacecraft dynamics and robotics (Jain et al., 1993), which entails a computational effort proportional to n. ? As the successor of the program DIANA (Guntert et al., 1991a), DYANA contains, in addition to the new features described here, all functions of the program DIANA and its supporting programs. This includes in particular protein structure calculation with the variable target function method (Braun & Go, 1985) supported by the REDAC strategy ? ? ? (Guntert & Wuthrich, 1991), conversion of NOESY cross-peak volumes into upper distance bounds, and determination of stereospeci?c assignments ? (Guntert et al., 1989, 1991a,b). Two separate supporting programs of DIANA, CALIBA and GLOMSA, and an automatic assignment method for NOESY spectra, NOAH (Mumenthaler & Braun, 1995; Mumenthaler et al., 1997), have been incorporated directly into DYANA. The program package is available from the authors for use on a variety of Unix systems. For further details, please consult the DYANA home page at http:/ /www.mol.biol.ethz.ch/ dyana/.

Theory and Algorithms

For TAD calculations with DYANA the molecule is represented as a tree structure consisting of a base rigid body that is ?xed in space and n rigid bodies, which are connected by n rotatable bonds (Abe et al., 1983). The degrees of freedom are exclusively torsion angles, i.e. rotations about single bonds. Each rigid body is made up of one or several mass points (atoms) for which the relative positions are invariable. The tree structure starts from a ``base'', typically at the N terminus of the polypeptide chain, and terminates with ``leaves'' at the ends of the side-chains and at the C terminus. The rigid bodies are numbered from 0 to n. The base has the number 0. Each other rigid body, with a number k 5 1, has a single nearest neighbor in the direction toward the base, which has a number p(k) < k. The torsion angle between the rigid bodies p(k) and k is denoted by yk. The conformation of the molecule is uniquely speci?ed by the values of all torsion angles, y ? (y1, . . . , yn). All three-dimensional vectors are in an inertial frame of reference that is ?xed in

Torsion Angle Dynamics and NMR Structures

285 value of fc(d, b) is always given by equation (3). To ensure that target function values are directly comparable with those from DIANA, equation (2) was used for all calculations with proteins in this work. Equation (3) was used for work with nucleic acids. The torques about the rotatable bonds, i.e. the negative gradients of the potential energy with respect to torsion angles, ?rEpot, are calculated by the fast recursive algorithm of Abe et al. (1983). Kinetic energy For all rigid bodies with k ? 1, . . . , n (see above), ~ the angular velocity vector, ok, and the linear vel? ~ r ocity of the reference point, vk?~k, are calculated recursively (Jain et al., 1993): ~ ~ ok ? op?k? ? ~k yk e ? ~ ~ ~ nk ? np?k? ? ?~k ? ~p?k? ? ? op?k? r r ?5? ?6?

space. For each rotatable bond, ~k denotes a unit e vector in the direction of the bond, and ~k is the r position vector of its end point, which is subsequently used as the ``reference point'' of the rigid body k in the inertial reference frame. Potential energy For TAD in DYANA the DIANA target function ? (Guntert et al., 1991a), V, has the role of the potential energy, Epot, i.e. Epot ? w0V, with an overall ? weighting factor w0 ? 10 kJ mol?1 A?2 ? ? 10?10 m). The target function, V, with (1 A V 5 0, is de?ned such that V ? 0 if and only if all experimental distance constraints and torsion angle constraints are ful?lled and all non-bonded atom pairs satisfy a check for the absence of steric overlap. We then have that V(y) < V(yH ) whenever the conformation satis?es the constraints more closely than the conformation yH . The exact de?nition of the DYANA target function is: ? ? wc fc ?dab Y bab ? V?

c?uYlYv

? wd

2 ?

kPId

?aYb?PIc

3 1 ?k 2 ?2 1? k 2 ?k

?1?

For the rigid body k, the vector from its reference ~ point to the center of mass is denoted by Yk, the total mass by mk, and the inertia tensor (see below) by Ik. The kinetic energy, Ekin, is then given by: Ekin ?

n 1? ~ ~ ~k ~ ~ ?mk n2 ? ok ? I k ok ? 2~ k ? ?ok ? mk Yk ?? n 2 k?1

Upper and lower bounds, bab, on distances between two atoms a and b, dab, and constraints on individual torsion angles, yk, in the form of allowed intervals, [ymin, ymax], are considered. Iu, k k Il and Iv are the sets of atom pairs (a, b) with upper, lower or van der Waals distance bounds, respectively, and Id is the set of restrained torsion angles. wu, wl, wv and wd are weighting factors for the different types of constraints. ?k ? p ? (ymax ? ymin)/2 denotes the half-width of k k the forbidden range of torsion angle values, and ?k is the size of the torsion angle constraint violation. The function fc(d, b), which measures the contribution of a violated distance constraint to the target function, may have any of the three forms given in equations (2) to (4). Equation (2): 2 2 d ? b2 fc ?dY b? ? ?2? 2b corresponds to the same de?nition that was used ? in DIANA (Guntert et al., 1991a). Equation (3): fc ?dY b? ? ?d ? b?2 is a simple square potential. Equation (4): Ps??????????????????????????? Q d?b 2 2 2R 1? ? 1S fc ?dY b? ? 2b b bb ?3?

?7?

Inertia tensors The inertia tensor of the rigid body k with respect to its reference point, Ik, is a symmetric 3 ? 3 matrix with elements given by (Arnold, 1978): ? ?Ik ?ij ? ma ?~2 dij ? yai yaj ? ya ?8?

a

?4?

The sum runs over all atoms, a, with mass ma, in ~ the rigid body k. ya is the vector from the reference point to the atom a, and dij is the Kronecker symbol. In DYANA, the center of mass vectors and the inertia tensors are calculated only once by summing over all atoms, with each rigid body in a standard orientation. The resulting standard values ~k are stored as Y(0) and I(0), respectively. If these k quantities are required later for a different conformation, for which the orientation of the rigid body k can be generated from its standard orientation by a rotation, Rk, they can be computed by the following transformations (a superscript T denotes the transposed matrix): ~ ~ Yk ? Rk Y?0? k I k ? Rk I ?0? RT k k ?9? ?10?

is a function with a linear asymptote for large constraint violations, where b is a dimensionless parameter that weighs large violations relative to small ones. For small constraint violations the

Since the shape of a rigid body enters the

286 equations of motion only via the inertia tensor and the center of mass vector, it is not essential to derive these quantities from the masses and relative positions of the individual atoms that constitute the rigid body, as in equation (8). Therefore the ef?ciency of the TAD algorithm can be improved by different choices of inertia tensors and center of mass vectors. By treating, with the use of equations (11) and (12): ~ ~ Yk ? 0 2 Ik ? mk r2 13 5 ?11? ?12?

Torsion Angle Dynamics and NMR Structures

in order to obtain y with a computational effort that is proportional to n. The algorithm of Jain et al. (1993) is initialized by calculating for all rigid bodies, k ? 1, . . . , n, the six-dimensional vectors ak, ek and zk: 4 5 ~ e ? ?ok ? ~k ?yk ?15? ak ? ~ ~ n op?k? ? ?~k ? np?k? ? ~k e ek ? ~ 0 4 ! ?16? 5 ?17?

all rigid bodies, k ? 1, . . . , n, as solid spheres of mass mk and radius r centered at their reference points, ~k, evaluation of equation (7) is simpli?ed r and the transformation of equations (9) and (10) is ~ avoided entirely. In equations (11) and (12), 0 denotes the zero vector and 13 is the 3 ? 3 unit matrix. Further improvement is obtained by setting the ???masses of the rigid bodies, mk, equal to p 10 nk m0, where nk denotes the number of atoms in the rigid body k (pseudoatoms are not counted), and m0 ? 1.66 ? 10?27 kg is the atomic mass unit. In this way, fast motions of light rigid bodies, for example hydroxyl protons, are slowed, thereby permitting longer integration time-steps. For all TAD calculations in this work, equations (11) and (12) were used with this expression for mk, and ? r ? 5.0 A. Note that the use of equations (11) and (12) does not imply an approximation of the van der Waals interaction: the steric repulsion is still calculated for each individual atom pair. Torsional accelerations The equations of motion for a classical mechanical system with generalized coordinates are the Lagrange equations (Arnold, 1978): d dL dL ?0 ?k ? 1Y F F F Y n? ?13? ? ? dt dyk dyk with L ? Ekin ? Epot. They lead to equations of motion of the form: ? M?y?y ? C?yY y? ? 0 ?14?

~ ~ ok ? I k ok zk ? ~ k ?ok ? o2 mk Yk ~ ~ ?ok ? mk Y ~ k

Furthermore, the 6 ? 6 matrices Pk and fk need to be calculated: 4 5 ~ Ik mk A?Yk ? ?18? Pk ? ~ m k 13 ?mk A?Yk ? fk ? 13 03 A?~k ? ~p?k? ? r r 13 ! ?19?

~ 03 is the 3 ? 3 zero matrix, and A(x) denotes the antisymmetric 3 ? 3 matrix associated with the ~~ ~ ~ ~ cross product, i.e. A(x)y ? x ? y for all vectors y. Next, a number of auxiliary quantities is calculated by executing a recursive loop over all rigid bodies in the backward direction, k ? n, n ? 1, . . . , 1: Dk ? ek ? Pk ek Gk ? Pk ek aDk ek ? ?ek ? ?zk ? Pk ak ? ? dV dyk

Pp?k? 2 Pp?k? ? fk ?Pk ? Gk eT Pk ?fT k k zp?k? 2 zp?k? ? fk ?zk ? Pk ak ? Gk ek ? ?20?

Using torsion angles as degrees of freedom, the ? n ? n mass matrix M(y) and C(y, y) have been derived explicitly by Mazur & Abagyan (1989) and Mazur et al. (1991). However, to integrate the equations of motion, equation (14) would have to be solved in each time-step for the torsional accel erations, y. This requires the solution of a system of n linear equations and hence entails a computational effort proportional to n3, which would become prohibitively expensive for larger systems. Therefore, in DYANA the fast recursive algorithm of Jain et al. (1993) is implemented to compute the torsional accelerations, which makes explicit use of the aforementioned tree structure of the molecule

Dk and ek are scalars, Gk is a six-dimensional vector, and 2 means: assign the result of the expression on the right-hand side to the variable on the left-hand side. Finally, the torsional accelerations are obtained by executing another recursive loop over all rigid bodies in the forward direction, k ? 1, . . . , n: ak ? fT ap?k? k yk ? ek aDk ? Gk ? ak ak 2 a k ? e k y k ? a k The auxiliary quantities ak are six-dimensional vectors, with a0 being equal to the zero vector. ?21?

Torsion Angle Dynamics and NMR Structures

287

Integration of the equations of motion T?t? ? The integration scheme for the equations of motion in TAD (Mathiowetz et al., 1994) is a variant of the leap-frog algorithm used in Cartesian dynamics (Allen & Tildesley, 1987). The temperature is controlled by weak coupling to an external bath (Berendsen et al., 1984). A time step, t 3 t ? ?t, that follows a preceding time step, t ? ?tH 3 t, consists of the following parts. (1) On the basis of the torsional positions, y(t), calculate the Cartesian coordinates of all atoms ? (Guntert, 1993). (2) Using the Cartesian coordinates, calculate the potential energy function of equation (1), Epot(t) ? Epot(y(t)), and its gradient rEpot(t). (3) Determine the time-step, ?t ? le?tH , using the time-step scaling factor: s????????????????????????????3 2 eref ? e?t? ?22? le ? min lm?x Y 1 ? e te?t? le is based on the reference value for the relative accuracy of energy conservation, eref, and on the relative change of the total energy, E ? Ekin ? Epot, in the preceding time-step, e(t), as given by: E?t? ? E?t ? ?tH ? ?23? e?t? ? E?t? lmax is the maximally allowed value of the scale ing factor. lmax ? 1.025 was used for all calcue lations in this work. The time constant, t41, is a user-de?ned parameter that is measured in units of the time-step. A value of t ? 20 was used for all calculations in this work. To calculate e(t) in equation (23), E(t) is evaluated before velocity scaling is applied (step 4 below), whereas for E(t ? ?tH ) the value after velocity scaling in the preceding time-step is used. Thus, the measurement of the accuracy of energy conservation in equation (23) is not affected by the scaling of velocities. An exact algorithm would yield E(t) ? E(t ? ?tH ) and consequently e(t ? 0). (4) Adapt the temperature by scaling of the tor? sional velocities, i.e. replace y(t ? ?tH /2) by ? ? ? lTy(t ? ?tH /2) and ye(t) by lTye(t) (see equation ? (27) below for the de?nition of ye(t)). The velocity scaling factor, lT, is given by (Berendsen et al., 1984): s?????????????????????????????? T ref ? T?t? ?24? lT ? 1 ? tT?t? where T ref is the reference value of the temperature. The instantaneous temperature, T(t), is given by:

2Ekin ?t? nkf

?25?

where n denotes the number of torsion angles and kB ? 1.38 ? 10?23 J K?1 is the Boltzmann constant. The kinetic energy, Ekin(t) ? Ekin(y(t), ? ye(t)), is taken from equation (7). Temperature control by coupling to an external heat bath (and time-step control) can be turned off by setting t ? I. (5) Calculate the torsional accelerations, ? y(t) ? y(y(t), ye(t)), using equations (15) to (21). (6) Calculate the new velocities at half time-step: ?t ? ?t ? ? y?t ? ?ta2? ? y?t ? ?tH a2? ? y?t? 2

H

?26?

(7) Calculate the new estimated velocities at full time step: ?t ? ? y?t ? ?ta2? ye ?t ? ?t? ? 1 ? ?t ? ?tH ? ?t ? y?t ? ?tH a2? ?t ? ?tH ?27?

(8) Calculate the new torsional positions: ? y?t ? ?t? ? y?t? ? ?t y?t ? ?ta2? ?28?

The algorithm is initialized by setting t ? 0, ? ? ?tH ? ?t and ye(0) ? y(??t/2). Furthermore, the ? initial torsional velocities, y(??t/2), are chosen randomly from a normal distribution with zero mean value and a standard deviation that ensures that the initial temperature has a prede?ned value, T(0). Once the time step t 3 t ? ?t is completed by going through the operations (1) to (8), t is replaced by t ? ?t, and ?tH by ?t. Unlike the situation in Cartesian dynamics, where the accelerations are a function of the positions only, the torsional accelerations depend also on the velocities. These, however, are only known at half time-steps, whereas the positions and accelerations are required at full time-steps. In equation (27) the presently used algorithm employs linear extrapolation to obtain an estimate of the velocity ? after the full time-step, ye(t), which is used in the next integration step to calculate the torsional accelerations (step (5)). They are therefore beset with an additional error that could be eliminated by iterating the steps (5) to (7). In general, no signi?cant improvement can be observed after one ? iteration (Mathiowetz et al., 1994). Using ye(t) ? instead of y(t) introduces an additional error of order O(?t3) into the velocity calculation of the leap-frog algorithm in equation (26). However, the intrinsic accuracy of the velocity step is also ? O(?t3). Thus, using the estimated velocities, ye(t), does not change the order of accuracy of the integration algorithm. Each iteration of steps (5) to (7) requires the calculation of the torsional accelerations which, in general, takes as long as the calculation of the target function and its gradient. It is

288 therefore more ef?cient to slightly decrease the time-step, ?t, than to perform steps (5) to (7) twice in each integration step. (The situation would be different if a full physical force ?eld were used, since then the calculation of torsional accelerations would require only a negligible fraction of the total computation time.) Since in structure calculations with TAD the time-steps are made as long as possible, a safeguard against occasional strong violations of energy conservation was introduced. If the total energy, E, has changed by more than 10% in a single time-step, this time-step is rejected and replaced by two time-steps of half length. Energy conservation Energy conservation is a key feature of proper functioning of any MD algorithm (Allen & Tildesley, 1987). To verify that TAD was properly implemented in DYANA and to determine the admissible size range for the integration time-step, ?t, we performed a series of TAD runs under conditions where the total energy, E, should be conserved. The experimental NMR data set for cyclophilin A (Ottiger et al., 1997) was used with t ? I in equations (22) and (24), i.e. without coupling to an external heat bath and without automatic time-step control. The accuracy of energy conservation was monitored by the standard devip???????????????????????? ation, s?E? ? h?E ? hEi?2 i, and the rms change of the total energy between successive integration p????????????? steps, h?E2 i, where h...i denotes the average over all steps of a TAD run (Beeman, 1976; van Gunsteren & Berendsen, 1977). The latter parameter, which is closely related to e(t) in equation (23), probes the local error in one time-step of the integration algorithm, whereas the standard deviation is sensitive both to local errors and to slow drifts of the total energy over many time-steps and is thus dependent on the length of TAD run. The results (Figure 1) show that long time-steps of up to 100 fs are tolerated by the algorithm, that s(E) is proportional to ?t2, as expected for Verlet-type integration algorithms (Allen & Tildesley, 1987), and p????????????? that h?E2 i is proportional to ?t3. It shows also that the use of ``exact'' torsional accelerations leads to smaller violations of energy conservation than calculations with accelerations based on approximate velocities obtained by linear extrapolation from preceding time-steps. However, the resulting slight improvement does not warrant the additional computational effort needed to obtain ``exact'' accelerations, in particular since their use does not permit an increase of the maximally allowed length of the time-step. All calculations in the remainder of this work have therefore been performed with accelerations based on approximate velocities. For practical applications, the most relevant result from the test calculations of Figure 1 is that long time-steps, about 100, 30 and 7 fs at low, medium and high temperatures, respectively, can

Torsion Angle Dynamics and NMR Structures

Figure 1. Plots versus integration time-step length, ?t of p???????????????????????? (a) the standard deviation, s?E? ? h?E ? hEi?2 iY and (b) the rms change ofp????????????? energy between successive the total integration steps, h?E2 iY for TAD calculations using the experimental NMR data set for cyclophilin A. The values given are relative to the average total energy, hEi. Circles and continuous lines correspond, from top to bottom, to initial temperatures of 10,600, 390, and 1.18 K, respectively, and were obtained using torsional accelerations calculated on the basis of linearly extrapolated torsional velocities. Squares and dotted lines resulted from identical calculations with exact torsional accelerations. The duration of each TAD run was 0.9 ps, and no coupling to an external heat bath was applied.

be used in TAD calculations with DYANA. The concomitant fast exploration of conformation space provides the basis for ef?cient structure calculation protocols. These time-steps should not be compared directly with the steps of 1 to 2 fs duration used in conventional Cartesian space MD (Allen & Tildesley, 1987), since DYANA uses more uniform masses and a much simpler potential energy function than standard molecular dynamics programs (Brooks et al., 1983; Cornell et al., 1995; van Gunsteren et al., 1996). Computation times with the standard DYANA simulated annealing protocol CPU times per starting conformer for protein structure calculations with the standard DYANA

Torsion Angle Dynamics and NMR Structures Table 1. CPU times (seconds) for DYANA structure calculations of the proteins BPTI and CypA on different computers

Computer NEC SX-4 DEC Alpha 8400 5/300 SGI Indigo2 R10000 (175 MHz) IBM RS/6000-590 Cray J-90 Convex Exemplar SPP1600 Hewlett-Packard 735 BPTI 13 20 23 35 44 44 47 CypA 36 86 127 141 141 177 209

289 linear increase of the overall computation time on scalar computers is related to the evaluation of the target function of equation (1). Since an NMR structure calculation always involves the computation of a group of conformers ? (Wuthrich, 1986), it is highly ef?cient to run calculations of multiple conformers in parallel. In DYANA, parallel computing is implemented on the level of the command language, INCLAN, and achieves nearly ideal speedup. Using a dedicated Cray J-90 computer with eight processors, the elapsed time for the calculation of 100 BPTI conformers on n ? 1, . . . , 8 processors was closely proportional to 1/n (Figure 3), which is the theoretically achievable optimum.

CPU times are for the calculation of one conformer, using the experimental NMR data sets (Berndt et al., 1992; Ottiger et al., 1997) and the standard DYANA simulated annealing protocol (see Methods).

simulated annealing protocol (see Methods) on a variety of different computer systems are in the range of 13 to 47 seconds for the 58 residue protein BPTI, and of 36 to 209 seconds for the 165 residue protein cyclophilin A (Table 1). The dependence of the calculation times on the protein size is illustrated in Figure 2 for a scalar computer (DEC Alpha 8400 5/300) and for a vector supercomputer (NEC SX-4). CPU time as a function of protein size increases linearly on the vector machine and slightly faster than linear on the scalar machine. This moderate increase of computational effort with protein size results from the ef?cient algorithm used to calculate the torsional accelerations (Jain et al., 1993), which scales linearly with the number of torsion angles. It allows handling of macromolecules of any size for which data sets can be collected with present state-of-the-art NMR techniques. Note that the calculation of torsional accelerations in equations (15) to (21) scales linearly with protein size, and that the steeper than

Results and Discussion

Structure calculations with DYANA using TAD Structure calculations were performed with experimental and simulated NMR data sets of six different proteins and an RNA 32-mer (Table 2). The experimental data sets include a small globular protein (BPTI), an all-a-helical protein (Antp(C39S)), a predominantly b-sheet protein (ADB), a medium-size protein with a moderately dense network of constraints (PrP(121-231)), and a larger b-barrel protein with an outstandingly complete set of constraints (CypA). The simulated NMR data sets include a 394 residue protein (PGK) and an RNA molecule (APK), which were selected to illustrate the application of DYANA with protein sizes at the limit of present-day NMR technology, and with nucleic acids.

Figure 2. Dependence on protein size of the CPU time for the calculation of one conformer using the standard DYANA simulated annealing protocol (see Methods) with 4000 TAD steps. CPU times were measured on DEC Alpha 8400 5/300 computers (circles and continuous line) and on a NEC SX-4 computer (squares and dotted line).

Figure 3. Elapsed time for the calculation of 100 BPTI conformers as a function of the number of processors used. Calculations were run on a dedicated Cray J-90 computer with eight processors. Dots represent measured elapsed times. The dotted line indicates the theoretically achievable optimum. The standard DYANA simulated annealing protocol (see Methods) was used with the experimental NMR data set from Berndt et al. (1992).

290

Torsion Angle Dynamics and NMR Structures Table 2. Experimental and simulated NMR data sets used for the DYANA structure calculations discussed in the text

Torsion angles 241 351 379 521 722 1778 318 Upper distance bounds 651 894 795 1508 4093 14,161 929 Torsion angle constraints 115 171 168 256 371 1117 284

Molecule BPTI Antp(C39S)a ADBa PrP(121-231)a CypAa PGKb APK (RNA)b

a b a

Residues 58 68 81 113 165 394 32

Reference Berndt et al. (1992) ? Guntert et al. 1991b) Vendrell et al. (1991) Riek et al. (1996) Ottiger et al. (1997) Davies et al. (1994) Kang et al. (1996)

For these proteins, the previously published sets of experimental NMR constraints were used. Conformational constraints were simulated from the atomic coordinates of the crystal or NMR structure (see Methods).

Figure 4 presents key characteristics of structure calculations with the standard simulated annealing protocol of DYANA (see Methods), using CypA as an illustration. One ?fth of the TAD steps are performed at the initial high reference temperature of 9600 K, and during the remainder of the TAD calculation the reference temperature slowly

Figure 4. Plots versus the number of TAD steps for a structure calculation using the standard DYANA simulated annealing protocol (see Methods) with the experimental NMR data set for CypA. (a) Temperature of the heat bath to which the system is weakly coupled. (b) Integration time-step length, ?t. (c) The rmsptorsion ????????????? angle change between successive TAD steps, h?f2 iY The plot (b) shows data representing averages over both 50 TAD steps and the 80 conformers. The plot (c) shows data representing averages over both 50 TAD steps and all rotatable torsion angles of the 80 independently calculated conformers.

approaches 0 K (Figure 4(a)). The length of the integration time-step, ?t, is adapted so as to attain a user-de?ned accuracy of energy conservation, with short ?t values at the outset and an increase to above 100 fs toward the end of the calculation (Figure 4(b)). The rms torsion angle change between successive integration steps is up to 4 during the high-temperature phase of the annealing schedule and then decreases linearly to small values (Figure 4(c)). To assess the in?uence on the quality of the resulting structures of the parameter that affects the computation time most strongly, we performed structure calculations with different numbers of TAD steps for each of the data sets in Table 2. As expected, increasing the number of TAD steps generally leads to a reduction of the residual target function values for a given data set, and provided that enough TAD steps were performed the percentage of conformers with ``acceptable'' low target function values approached 90 to 100% in all proteins except CypA (Figure 5). However, for most data sets, little further improvement resulted from increasing the number of TAD steps above 6000. In the other limit, for most data sets a decrease of the number of TAD steps to 2000 resulted in signi?cant deterioration of the quality of the resulting structures (Figure 5). The dependence of the quality of the structures on the number of TAD steps is most pronounced if the calculation is either based on incomplete input sets of constraints, such as for PrP(121-231), or when there is intrinsic scarcity of constraints, such as for APK (RNA). The size of the protein appears not to be a decisive factor, since the target function distributions for the 394 residue protein PGK are quite similar to those for the small protein BPTI. The relevant measure for the overall ef?ciency of a structure calculation program is the computation time per accepted conformer, i.e. the total computation time for a group of starting conformers divided by the number of acceptable conformers. Here, conformers with target function values below the cutoffs indicated by horizontal broken lines in Figure 5 are considered to be acceptable (see Methods), and in Figure 6 the

Torsion Angle Dynamics and NMR Structures

291 resulting computation times per accepted conformer are plotted as a function of the number of TAD steps. Optimal ef?ciency for all data sets is attained at about 4000 TAD steps with the given acceptance criteria (see Methods). Possible increase of the percentage of acceptable conformers by going to higher numbers of TAD steps is always overcompensated by the concomitant rise of the computation time per conformer, which is proportional to the number of TAD steps. On the basis of the data in Figure 6, individual optimization of this parameter for different systems seems unnecessary if good input data sets are available. Therefore, 4000 TAD steps are used in the standard DYANA simulated annealing protocol (Table 3). The numbers in Table 3 show that a group of 20 acceptable conformers can be obtained in about eight minutes for a small protein (BPTI), in less than one hour for a mediumsize protein (CypA), and in less than three hours for a large protein (PGK). These data are for the use of a single processor and can of course be further reduced if multi-processor parallel computing is available (Figure 3). The de?nition used here for acceptable conformers (see Methods) excludes conformers with large residual constraint violations (Table 4). The aver? age maximal violations (Table 4) are below 0.5 A ? for upper distance bounds, below 0.4 A for steric lower distance bounds, and below 7 for torsion angle constraints. The sums of all violations and hence the violation per constraint are very small. Since these results were obtained in torsion angle space, the program had, in contrast to calculations in Cartesian space, no possibility to reduce constraint violations by introducing distortions of bond lengths and bond angles. Finally, the rmsd values for the well-de?ned parts of the polypeptide backbone are indicative of high-quality structures for all the proteins investigated (Table 3). Efficiency of structure calculations with TAD: comparison with DIANA/REDAC Since DYANA includes all the functions of the predecessor program DIANA as well as the novel TAD approach, it enabled a direct comparison of the TAD routine with the DIANA/REDAC method ? ? (Guntert & Wuthrich, 1991). The previously pub-

Figure 5. Final target function values of structure calculations for the data sets of Table 2 using the standard DYANA simulated annealing protocol with different numbers of TAD steps, N: N ? 2000, red; N ? 3000, green; N ? 4000, black; N ? 6000, blue; N ? 8000,

cyan; N ? 10,000, magenta; N ? 12,000, red, dotted; N ? 16,000, green, dotted; N ? 20,000, black, dotted. For each value of N the results obtained with groups of calculations using randomized starting conformers are sorted by increasing residual target function values. The horizontal axis is calibrated by the percentage of all starting conformers; to obtain at least 40 acceptable conformers in each case, different numbers of starting conformers were calculated for the different curves. The horizontal broken line indicates the target function cutoff value, Vcut, for acceptable conformers (see Methods).

292

Torsion Angle Dynamics and NMR Structures

and yields structures of comparable or higher quality{. TAD turns out to compare particularly favorably with DIANA/REDAC in the calculation of the predominantly b-sheet protein ADB, and for the larger proteins PrP(121-231) and CypA. Although for BPTI and Antp(C39S) all 20 accepted REDAC conformers would also be acceptable following the criterion used for TAD conformers (see Methods), only four, one and six out of the 20 accepted REDAC conformers for ADB, PrP(121-231) and CypA, respectively, would be accepted by the same criterion. In parallel with the lower residual target function values, the groups of TAD conformers have slightly lower rmsd values for the wellde?ned parts of the polypeptide backbone than the bundles of REDAC conformers for all proteins except CypA (Table 3).

Figure 6. Plots of the CPU times per acceptable conformer as a function of the number of TAD steps in structure calculations with the standard DYANA simulated annealing protocol for the data sets of Table 2, measured on DEC Alpha 8400 5/300 computers: BPTI, black; Antp(C39S), red; ADB, green; PrP(121-231), blue; CypA, magenta; PGK, black, dotted; APK (RNA), cyan. The CPU times for the largest protein, PGK, are scaled down by a factor of 3.

Structure calculations for larger proteins The structure calculations with a simulated NMR data set for the 394 residue protein PGK were included to investigate the potential of the program DYANA for use with larger molecules, for which experimental NMR data sets may become available in the future. The results in Tables 3 and 4, and in Figures 5 and 6 show that with this data set DYANA can be used with the same simulated annealing schedule as for smaller proteins, and that the yield of acceptable conformers is similar to that for the smaller molecules. A comparison of the PGK structure calculated using DYANA with the X-ray structure from which the simulated NMR data set was derived (Davies et al., 1994) is afforded by Figure 7. The two structures coincide closely, with an rmsd between the mean DYANA conformer and the X-ray structure calculated for all backbone atoms N, Ca and CH of ? 1.05 A (Figure 7(a)). This is actually signi?cantly smaller than the corresponding average of the rmsd values between the individual accepted DYANA conformers and their mean (Table 3). Figure 7(b) reveals that the spread among the DYANA conformers is predominantly caused by slightly different relative orientations of the two domains. Within each domain the precision is much higher, with average backbone rmsd values ? to the mean of 0.30 A for residues 1 to 170 and ? 0.36 A for residues 183 to 381. The larger variations observed for the complete protein are directly related to the scarcity of constraints that characterize the orientation of the two domains relative to each other and do not indicate a limitation of the DYANA program. Structure calculations for RNA NMR structure determination of DNA or RNA molecules is more dif?cult than for proteins because the network of NOE distance constraints in nucleic acids is less dense than in proteins ? (Wuthrich, 1986). For instance, the number of distance constraints per degree of freedom is 2.7 times

lished structure calculations for BPTI, Antp(C39S), ? ? and ADB (Guntert & Wuthrich, 1991), for PrP(121231) (Riek et al., 1996) and for CypA (Ottiger et al., 1997) were repeated using DYANA, after the REDAC strategy had been implemented as an INCLAN macro. The same parameters and acceptance criteria were used as in the original calculations. Table 3 shows that TAD is between two and ten times faster than the REDAC strategy, depending primarily on the size and topology of the structure,

{ Stein et al. (1997) recently reported on the ef?ciency of TAD and Cartesian space molecular dynamics protocols implemented in the program XPLOR. According to this reference the CPU time for the calculation of BPTI on a Hewlett-Packard 735 computer with the TAD protocol in XPLOR is 921 s (Table 6 of Stein et al., 1997). This compares with a CPU time of 47 seconds when using the TAD protocol of DYANA for BPTI on the same computer (Table 1). Although small contributions to this difference of more than an order of magnitude may be due to the fact that slightly different input data sets were used, the high performance of DYANA can be traced primarily to differences in the implementation of TAD and the force calculation: DYANA uses the TAD algorithm of Jain et al. (1993), whereas XPLOR employs the algorithm of Bae & Haug (1987) and follows a fourth-order Runge-Kutta scheme for integration of the equations of motion (Rice & ? Brunger, 1994). The latter requires four force evaluations per time-step, whereas a single force evaluation per time-step is performed in the leap-frog algorithm used by DYANA. Overall, both methods have similar convergence rates and yield structures of comparable quality (Tables 3 and 4; Tables 5 to 7 of Stein et al., 1997).

Torsion Angle Dynamics and NMR Structures

293

Table 3. Results of TAD structure calculations and comparison with corresponding calculations performed with REDAC in DYANA

TAD Accepted conformers (%) 88 75 88 47 58 63 45 CPU time/ accepted conformer (s)b 23 36 37 101 159 521 72 Target function ? (A2) 0.33 0.94 0.60 2.19 2.19 3.99 0.32 rmsd ? (A)c 0.37 0.38 0.92 1.11 0.68 1.62 3.08 CPU time/ accepted conformer (s)b 56 74 162 960 1627 REDACa Target function ? (A2) 0.36 1.00 2.06 5.19 4.59 rmsd ? (A)c 0.47 0.49 1.18 1.28 0.54

Molecule BPTI Antp(C39S) ADB PrP(121-231) CypA PGK APK (RNA)

a REDAC calculations with DYANA were performed with the experimental data sets using the same parameters and acceptance criteria as for the DIANA calculations described in the original publications. b CPU times were measured on DEC Alpha 8400 5/300 computers. c rmsd values are given for the backbone atoms N, Ca, CH of the following residues: BPTI, 3 to 55; Antp(C39S), 8 to 60; ADB, 11 to 76; PrP(121-231), 124 to 168 and 173 to 227; CypA, PGK and APK (RNA), all residues.

lower in the simulated data set for the pseudoknot RNA APK than for the protein PGK (Table 2), even though the same criteria were applied to derive NOE distance constraints from the published structures. For this reason, structure calculation programs based on Cartesian space molecular dynamics and metric-matrix distance geometry often fail to ?nd nucleic acid conformers that satisfy the experimental data (Stein et al., 1997) unless ad hoc assumptions about the three-dimensional structure are made, usually by starting the calculations from standard A or B-type duplex conformations. Stein et al. (1997) showed for a 12 basepair DNA duplex that torsion angle dynamics was successful in ?nding structures that satisfy the experimental constraints in a situation where metric-matrix distance geometry and Cartesian space molecular dynamics did not lead to acceptable results. Here, we used simulated data for a 32 nucleotide pseudoknot RNA (Kang et al., 1996), to investigate whether TAD as implemented in DYANA is able to calculate RNA structures starting from conformers with random torsion angle values. In these calculations the sugar rings remained ?exible, which was achieved by ``cutting'' the bond between C4H and O4H and imposing distance constraints to ?x

the length of the C4H O4H bond and the corresponding bond angles. The results of the structure calculations (Figure 5; Tables 3 and 4) show that the standard DYANA simulated annealing protocol for proteins (see Methods) is also adequate for nucleic acid structure calculations, indicating that with the TAD approach in DYANA there is no fundamental difference between the two classes of molecules from the point of view of structure calculation. In particular, the DYANA calculations do not need to start from well-de?ned structures, which could introduce a bias into the ?nal result of the structure calculation.

Conclusions

The results of this work show that the implementation of torsion angle dynamics in the program DYANA is a powerful method for the calculation of protein and nucleic acid structures from NMR data. It represents a signi?cant advance over the previously used implementations of the variable target function method (Braun & Go, 1985) ? ? ? with the REDAC strategy (Guntert & Wuthrich, ? 1991) in DIANA (Guntert et al., 1991a,b), the currently widely used simulated annealing by Carte? sian space molecular dynamics (Brunger, 1992)

Table 4. Constraint violations of structures calculated using TAD in DYANA

Molecule BPTI Antp(C39S) ADB PrP(121-231) CypA PGK APK (RNA) Upper distance limits ? ? Average (A) Max. (A) 0.0040 0.0050 0.0025 0.0062 0.0028 0.0009 0.0014 0.19 0.28 0.25 0.41 0.44 0.47 0.22 Steric distance limits ? ? Sum (A) Max. (A) 1.2 2.5 1.9 5.2 5.1 11.0 0.8 0.09 0.17 0.20 0.25 0.23 0.39 0.13 Torsion angle constraints Average ( ) 0.04 0.11 0.04 0.10 0.03 0.01 0.03 Max. ( ) 1.6 4.5 2.7 5.3 3.7 6.9 1.4

Average, Sum and Max. denote the average violation, the sum of all violations, and the maximal violation per conformer, respectively. Values are averaged over all accepted conformers. The average violation equals the sum of violations divided by the number of constraints and was for practical reasons not evaluated for the steric distance limits.

294

Torsion Angle Dynamics and NMR Structures

Figure 7. Superpositions of the X-ray structure of PGK (Davies et al., 1994; red) and the structure calculated with the standard DYANA simulated annealing protocol using a simulated NMR data set derived from the crystal coordinates (yellow). (a) Stereo view of a superposition for minimal rmsd of all backbone atoms N, Ca and CH of the X-ray structure and the mean of the ten DYANA conformers with the lowest residual target function values. (b) Mono views of superpositions for minimal backbone rmsd of the X-ray structure and the ten DYANA conformers with lowest ?nal target function values; left: superposition for best ?t of residues 1 to 170; right: superposition for best ?t of residues 183 to 381. These Figures were generated with the program MOLMOL (Koradi et al., 1996).

Torsion Angle Dynamics and NMR Structures

295 Parallelization A structure calculation that consists of the independent generation of multiple conformers from the same input data has a high degree of inherent parallelism. For shared-memory parallel computers, almost optimal parallelization is achieved by DYANA on the level of the command language INCLAN, without any modi?cations in the specialized core parts of the program. In INCLAN, parallel execution of a loop is accomplished by creating several copies of the program in its current state, using the Unix system call fork. These copies are identical except for the value of the loop counter, and run in parallel. Except for one instance of the program, the ``main process'', all copies terminate after the execution of the last statement in the loop body. Standard DYANA simulated annealing protocol The DYANA standard simulated annealing protocol used for all structure calculations in this work consists of the following steps. (1) Create a start conformation by selecting all torsion angles as independent, uniformly distributed random variables. (2) Exclude all hydrogen atoms from the check for steric overlap, and increase the repulsive core radii of those heavy atoms that are covalently ? bound to hydrogen atoms by 0.15 A with ? respect to their standard values (Guntert et al., 1991a). Set the weighting factor for userde?ned upper and lower distance bounds to 1, for steric lower distance bounds to 0.5, and for ? torsion angle constraints to 5 A2. (3) Perform 100 conjugate gradient minimization steps at level 3, i.e. including only distance constraints between atoms up to three residues apart along the sequence, followed by further 100 minimization steps including all constraints. Optionally, for large systems, 100 minimization steps can be performed at each of the intermediate levels 1, 3, 10, 20, 50, 100, 150, 200, 300, 600, etc. (4) Perform N/5 TAD steps at a constant high reference temperature, Thigh, where N is the user-de?ned total number of TAD steps. The initial time-step, ?t ? 2 fs, is adapted such as to achieve a relative accuracy of energy conservation of eref ? 0.005. (5) Perform 4N/5 TAD steps with reference values for the temperature and the relative accuracy of energy conservation of: T ref ?s? ? ?1 ? s?4 Thigh eref ?s? ? 0X005 ? 0X02s ?29?

and, as judged by comparison with the available literature data (Table 6 of Stein et al., 1997), other presently available implementations of torsion angle dynamics algorithms for structure calculation of biological macromolecules from NMR data. With DYANA, the calculation of three-dimensional structures will not be a bottleneck for NMR structure determination of proteins up to 400 residues. Considering its high ef?ciency for exploring the conformation space of biological macromolecules, future applications of DYANA might well include problems in molecular modeling, tertiary structure prediction and protein folding.

Methods

The program DYANA DYANA is written in FORTRAN-77 and has been implemented and optimized for a variety of serial, parallel and vector computers. It uses a new userinterface based on the interactive command language INCLAN (see below). Data ?le formats are compatible with the programs XEASY (Bartels et al., ? 1995), GARANT (Bartels et al., 1997), DIANA (Guntert et al., 1991a), MOLMOL (Koradi et al., 1996), OPAL ? ? (Luginbuhl et al., 1996), XPLOR (Brunger, 1992), and with the Brookhaven Protein Data Bank (Bernstein et al., 1977). Furthermore, DYANA has a mechanism for the transparent use of foreign notations for atoms and angles, and new library entries for nonstandard residues can readily be created with the molecular graphics program MOLMOL. The program package includes standard residue libraries ? for the ECEPP/2 (Momany et al., 1975; Nemethy et al., 1983) and AMBER (Cornell et al., 1995) force ?elds. INCLAN The interactive command language INCLAN (P.G., unpublished), which is used also by the ? NMR data processing program PROSA (Guntert et al., 1992), the automatic assignment program GARANT (Bartels et al., 1997), and the molecular ? dynamics program OPAL (Luginbuhl et al., 1996), provides DYANA with a highly ?exible user-interface. INCLAN is an interpreted programming language that combines features of Unix shells with those of conventional programming languages such as FORTRAN or C, and produces high-quality two-dimensional graphics. INCLAN gives the user the possibility to create macros, new commands that are built from existing ones with the help of conditional statements, loops, jumps, variables, functions, FORTRAN-77 mathematical, logical and character expressions, etc. Many standard commands of DYANA are implemented as macros, and can therefore be modi?ed by the user without changing the source code of the program.

The parameter s varies linearly from 0 in the ?rst time-step to 1 in the last time-step. (6) Include all atoms into the check for steric overlap, reset the repulsive core radii to their stan-

296 dard values, increase the weighting factor for steric constraints to 2.0, and perform 100 conjugate gradient minimization steps with inclusion of all constraints. (7) Perform 200 TAD steps at zero reference temperature and eref ? 0.0001. Initial velocities are those from the last time-step of step (5) above, and the length of the initial time-step is set to one fourth of the last time-step. (8) Perform 1000 conjugate gradient minimization steps, including all constraints. Throughout the calculation the list of the van der Waals lower distance bounds is updated every 50 TAD steps, or, during minimization, whenever a torsion angle has changed by more than 10 since the last update, or after 100 minimization steps. Unless otherwise noted, the default values for the initial temperature, Thigh ? 9600 K, and the number of torsion angle dynamics steps, N ? 4000, were used. In DYANA the standard simulated annealing protocol is implemented as an INCLAN macro and can be modi?ed or replaced by the user without changing the source code of the program. Input data sets for structure calculations For the present evaluation of the performance of DYANA, we mainly used the experimental NMR input data sets of NOE upper distance limits and torsion angle constraints for BPTI (Berndt et al., ? 1992), Antp(C39S) (Guntert et al., 1991b), ADB (Vendrell et al., 1991), PrP(121-231) (Riek et al., 1996), and CypA (Ottiger et al., 1997). The data sets for BPTI, Antp(C39S) and ADB are identical with ? ? those used previously by Guntert & Wuthrich (1991) to evaluate the performance of the REDAC strategy. In addition, experimental NMR input data sets for the pheromone Er-2 from Euplotes raikovi (Ottiger et al., 1994), the killer toxin from Williopsis mrakii (Antuch et al., 1996) and the pathogenesis-related protein P14a from tomato ? (Fernandez et al., 1997) were used to measure the data in Figure 2. For structure calculations with the 394 residue protein 3-phosphoglycerate kinase (PGK), we created a simulated set of NMR conformational constraints on the basis of the high-resolution X-ray crystal structure (Davies et al., 1994; PDB code 1PHP) as follows: Hydrogen atoms were attached to the crystal structure with the program MOLMOL (Koradi et al., 1996). NOE distance constraints were generated for all proton-proton pairs (excluding ? OH and SH) that are closer than 4.5 A, whereby the distance bound was set to the actual distance ? plus 0.5 A. Constraints with methyl groups were referred to pseudo atoms, and the appropriate cor? rection was added to the distance limit (Wuthrich et al., 1983). Constraints were then processed by DYANA in order to account for the absence of stereospeci?c assignments (these were included only for the isopropyl methyl groups of Val and

Torsion Angle Dynamics and NMR Structures

? Leu) and to remove irrelevant constraints (Guntert et al., 1991a): 14,161 relevant upper distance limits and 1117 f, c and w1 torsion angle constraints were thus generated, where the torsion angle constraints were taken to be ?60 about the value in the crystal structure. A second simulated NMR data set was used for structure calculations with a 32 nucleotide pseudoknot RNA, APK. Constraints were derived from the NMR solution structure (Kang et al., 1996; PDB code 1KAJ) in the same way as for PGK, except that angle constraints were created for all torsion angles, and that the allowed range for torsion angles in sugar rings was reduced to ?5 about the value in the reference structure. The resulting data set consisted of 769 relevant upper distance limits and 284 torsion angle constraints. In addition, the following ?ve upper and ?ve lower distance limits were used to ``close'' ? ? the sugar rings: 1.39 A 4 d(C4H , O4H ) 4 1.41 A, ? ? ? 2.38 A 4 d(C4H , C1H ) 4 2.40 A, 2.39 A 4 d(C5H , ? ? ? O4H ) 4 2.39 A, 2.11 A 4 d(H4H , O4H ) 4 2.12 A, and ? 4 d(C3H , O4H ) 4 2.28 A. ? 2.19 A Structure calculations For each of the input data sets in Table 2, structure calculations were performed using the standard simulated annealing protocol with N ? 2000, 3000, 4000, 6000, 8000, 10,000, 12,000, 16,000 or 20,000 TAD steps. The default parameters were used, except that additional relaxation was included for the large protein PGK, as outlined in step (3) of the standard simulated annealing protocol, and that equation (3) was used in place of the default term of equation (2) to calculate the target function for the RNA. Calculations were performed on DEC Alpha 8400 5/300 computers, using up to eight processors in parallel. For each system and each value of N, at least 40 acceptable (see the next section) conformers were computed. The exact number of accepted conformers may vary between 40 and 47, since the calculations were started in parallel in groups of up to eight conformers. For comparison, structure calculations for the experimental data sets of Table 2 were also accomplished with DYANA using the REDAC strategy ? ? (Guntert & Wuthrich, 1991), whereby the par? ? ameters were set as given by Guntert & Wuthrich (1991) for BPTI, Antp(C39S) and ADB, by Riek et al. (1996) for PrP(121-231), and by Ottiger et al. (1997) for CypA. Acceptance criteria for conformers Two factors determine the overall ef?ciency of an NMR structure calculation: the time needed to calculate one conformer, and the ``success rate'', i.e. the percentage of conformers that reach small residual constraint violations as manifested by low residual target function values. Criteria for identi?cation of acceptable conformers are necessarily somewhat arbitrary. For a given molecule and con-

Torsion Angle Dynamics and NMR Structures

297

Data Bank: a computer-based archival ?le for macromolecular structures. J. Mol. Biol. 112, 535 542. Berendsen, H. J. C., Postma, J. P. M., van Gunsteren, W. F., DiNola, A. & Haak, J. R. (1984). Molecular dynamics with coupling to an external bath. J. Chem. Phys. 81, 3684 3690. ? ? Berndt, K. D., Guntert, P., Orbons, L. P. M. & Wuthrich, K. (1992). Determination of a high-quality NMR solution structure of the bovine pancreatic trypsin inhibitor (BPTI) and comparison with three crystal structures. J. Mol. Biol. 227, 757 775. Braun, W. & Go, N. (1985). Calculation of protein con? formations by proton-proton distance constraints. A new ef?cient algorithm. J. Mol. Biol. 186, 611 626. Brooks, B. R., Bruccoleri, R. E., Olafson, B. D., States, D. J., Swaminathan, S. & Karplus, M. (1983). CHARMM: a program for macromolecular energy minimization and dynamics calculations. J. Comput. Chem. 4, 187217. ? Brunger, A. T. (1992). X-PLOR, Version 3.1. A System for X-ray Crystallography and NMR, Yale University Press, New Haven, USA. ? Brunger, A. T. & Nilges, M. (1993). Computational challenges for macromolecular structure determination by X-ray crystallography and solution NMRspectroscopy. Quart. Rev. Biophys. 26, 49 125. Cornell, W. D., Cieplak, P., Bayly, C. I., Gould, I. R., Merz, K. M., Jr, Ferguson, D. M., Spellmeyer, D. C., Fox, T., Caldwell, J. W. & Kollman, P. A. (1995). A second generation force ?eld for the simulation of proteins, nucleic acids, and organic molecules. J. Am. Chem. Soc. 117, 5179 5197. Davies, G. J., Gamblin, S. J., Littlechild, J. A., Dauter, Z., Wilson, K. S. & Watson, H. C. (1994). Structure of the ADP complex of the 3-phosphoglycerate kinase ? from Bacillus sterothermophilus at 1.65 A. Acta Crystallog. sect. D, 50, 202 209. ? ? Fernandez, C., Szyperski, T., Bruyere, T., Ramage, P., ? ? Mossinger, E. & Wuthrich, K. (1997). NMR solution structure of the pathogenesis-related protein P14a. J. Mol. Biol. 266, 576 593. ? ? Guntert, P. (1993). Neue Rechenverfahren fur die Proteinstrukturbestimmung mit Hilfe der magnetischen ? Kernspinresonanz, PhD thesis, 10135 ETH, Zurich. ? ? Guntert, P. & Wuthrich, K. (1991). Improved ef?ciency of protein structure calculations from NMR data using the program DIANA with redundant dihedral angle constraints. J. Biomol. NMR, 1, 446456. ? ? Guntert, P., Braun, W., Billeter, M. & Wuthrich, K. (1989). Automated stereospeci?c 1H NMR assignments and their impact on the precision of protein structure determinations in solution. J. Am. Chem. Soc. 111, 3997 4004. ? ? Guntert, P., Braun, W. & Wuthrich, K. (1991a). Ef?cient computation of three-dimensional protein structures in solution from nuclear magnetic resonance data using the program DIANA and the supporting programs CALIBA, HABAS and GLOMSA. J. Mol. Biol. 217, 517 530. ? ? Guntert, P., Qian, Y. Q., Otting, G., Muller, M., Gehring, ? W. J. & Wuthrich, K. (1991b). Structure determination of the Antp(C39 3 S) homeodomain from nuclear magnetic resonance data in solution using a novel strategy for the structure calculation with the programs DIANA, CALIBA, HABAS and GLOMSA. J. Mol. Biol. 217, 531 540. ? ? ? Guntert, P., Dotsch, V., Wider, G. & Wuthrich, K. (1992). Processing of multi-dimensional NMR data

straint data set, the residual target function value can be seen as resulting in part from inconsistencies in the data set manifested in the target function value at the global minimum, and in part from contributions due to local minima. The former depends only on the data set, whereas the latter generally depends also on the algorithm used. Along these lines, we de?ne an acceptable conformer as one with a target function value, V, below a cutoff, Vcut ? Vmin ? kNr, where Vmin is an approximation of the global minimum value of the target function, Nr denotes the number of residues in the ? ? molecule, and k ? 0.02 A2 for proteins and 0.04 A2 for nucleic acids is the tolerated contribution per residue from local minima. Vmin is the lowest residual target function value found for the system in a calculation of 40 to 47 acceptable conformers, using the standard simulated annealing protocol, except that N ? 20,000. The resulting cutoff values, ? ? Vcut, were: 1.25 A2 for BPTI, 1.93 A2 for Antp(C39S), ? ? ? 1.77 A2 for ADB, 3.02 A2 for PrP(121-231), 4.02 A2 ? ? for CypA, 9.03 A2 for PGK, and 1.30 A2 for APK.

Acknowledgments

Financial support by the Schweizerischer Nationalfonds (project 31.32033.91), the use of the C4 and Cray ? J-90 clusters of the ETH Zurich, and the use of the NEC SX-4 supercomputer of the Centro Svizzero di Calcolo Scienti?co are gratefully acknowledged. We thank Mrs R. Hug for the careful processing of the typescript.

References

Abe, H., Braun, W., Noguti, T. & Go, N. (1983). Rapid ? calculation of ?rst and second derivatives of conformational energy with respect to dihedral angles in proteins. General recurrent equations. Comput. Chem. 8, 239 247. Allen, M. P. & Tildesley, D. J. (1987). Computer Simulation of Liquids, Clarendon Press, Oxford. ? ? Antuch, W., Guntert, P. & Wuthrich, K. (1996). bAncestral bg-crystallin precursor structure in a yeast killer toxin. Nature Struct. Biol. 3, 662 665. Arnold, V. I. (1978). Mathematical Methods of Classical Mechanics, Springer, New York. Bae, D. S. & Haug, E. J. (1987). A recursive formulation for constrained mechanical system dynamics: Part I. Open loop systems. Mech. Struct. Mech. 15, 359 382. ? Bartels, C., Xia, T. H., Billeter, M., Guntert, P. & ? Wuthrich, K. (1995). The program XEASY for computer-supported NMR spectral analysis of biological macromolecules. J. Biomol. NMR, 5, 1 10. ? ? Bartels, C., Guntert, P., Billeter, M. & Wuthrich, K. (1997). GARANT: a general algorithm for resonance assignment of multidimensional nuclear magnetic resonance spectra. J. Comput. Chem. 18, 139 149. Beeman, D. (1976). Some multistep methods for use in molecular dynamics calculations. J. Comput. Phys. 20, 130 139. Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E. F., Jr, Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). The Protein

298

with the new software Prosa. J. Biomol. NMR, 2, 619 629. ? ? Guntert, P., Mumenthaler, C. & Wuthrich, K. (1996). DYANA: Torsion angle dynamics for structure calculation and automatic assignment of NOESY spectra. Keystone/Colorado, USA, August 18 23, 1996, Abstracts XVIIth International Conference on Magnetic Resonance in Biological Systems. ? Havel, T. F. & Wuthrich, K. (1984). A distance geometry program for determining the structures of small proteins and other macromolecules from nuclear magnetic resonance measurements of intramolecular 1 H-1H proximities in solution. Bull. Math. Biol. 46, 673 698. Jain, A., Vaidehi, N. & Rodriguez, G. (1993). A fast recursive algorithm for molecular dynamics simulation. J. Comput. Phys. 106, 258268. Kang, H., Hines, J. V. & Tinoco, I., Jr (1996). Conformation of a non-frameshifting RNA pseudoknot from mouse mammary tumor virus. J. Mol. Biol. 259, 135 147. Katz, H., Walter, R. & Somorjay, R. L. (1979). Rotational dynamics of large molecules. Comput. Chem. 3, 2532. Kneller, G. R. & Hinsen, K. (1994). Generalized Euler equations for linked rigid bodies. Phys. Rev. ser. E, 50, 1559 1564. ? Koradi, R., Billeter, M. & Wuthrich, K. (1996). MOLMOL: a program for display and analysis of macromolecular structures. J. Mol. Graph. 14, 51 55. ? ? ? Luginbuhl, P., Guntert, P., Billeter, M. & Wuthrich, K. (1996). The new program OPAL for molecular dynamics simulation of biological macromolecules. J. Biomol. NMR, 8, 136 146. Mathiowetz, A. M., Jain, A., Karasawa, N. & Goddard, W. A., III (1994). Protein simulations using techniques suitable for large systems: the cell multipole method for nonbond interactions and the NewtonEuler inverse mass operator method for internal coordinate dynamics. Proteins: Struct. Funct. Genet. 20, 227 247. Mazur, A. K. & Abagyan, R. A. (1989). New methodology for computer-aided modelling of biomolecular structure and dynamics. (I) Non-cyclic structures. J. Biomol. Struct. Dynam. 4, 815 832. Mazur, A. K., Dorofeev, V. E. & Abagyan, R. A. (1991). Derivation and testing of explicit equations of motion for polymers described by internal coordinates. J. Comput. Phys. 92, 261 272. Momany, F. A., McGuire, R. F., Burgess, A. W. & Scheraga, H. A. (1975). Energy parameters in polypeptides. VII. Geometric parameters, partial atomic charges, nonbonded interactions, hydrogen bond interactions, and intrinsic torsional potentials for the naturally occurring amino acids. J. Phys. Chem. 79, 2361 2381. Mumenthaler, C. & Braun, W. (1995). Automated assignment of simulated and experimental NOESY spectra of proteins by feedback ?ltering and self-correcting distance geometry. J. Mol. Biol. 254, 465 480. ? ? Mumenthaler, C., Guntert, P., Braun, W. & Wuthrich, K. (1997). Automated procedure for combined assignment of NOESY spectra and three-dimensional protein structure determination. J. Biomol. NMR, in the press.

Torsion Angle Dynamics and NMR Structures ? Nemethy, G., Pottle, M. S. & Scheraga, H. A. (1983). Energy parameters in polypeptides. 9. Updating of geometrical parameters, nonbonded interactions, and hydrogen bond interactions for the naturally occurring amino acids. J. Phys. Chem. 87, 1883 1887. Nilges, M., Clore, G. M. & Gronenborn, A. M. (1988). Determination of three-dimensional structures of proteins from interproton distance data by hybrid distance geometry-dynamical simulated annealing calculations. FEBS Letters, 229, 317 324. ? Nilges, M., Kuszewski, J. & Brunger, A. T. (1991). Sampling properties of simulated annealing and distance geometry. In Computational Aspects of the Study of Biological Macromolecules by Nuclear Magnetic Resonance Spectroscopy (Hoch, J. C., Poulsen, F. M. & Red?eld, C., eds), pp. 451 455, Plenum Press, New York. ? Ottiger, M., Szyperski, T., Luginbuhl, P., Ortenzi, C., ? Luporini, P., Bradshaw, R. A. & Wuthrich, K. (1994). The NMR solution structure of the pheromone Er-2 from the ciliated protozoan Euplotes raikovi. Protein Sci. 3, 1515 1526. ? ? Ottiger, M., Zerbe, O., Guntert, P. & Wuthrich, K. (1997). The NMR solution conformation of unligated human Cyclophilin A. J. Mol. Biol. 272, 64 81. Riek, R., Hornemann, S., Wider, G., Billeter, M., ? Glockshuber, R. & Wuthrich, K. (1996). NMR structure of the mouse prion protein domain PrP(121 231). Nature, 382, 180182. ? Rice, L. M. & Brunger, A. T. (1994). Torsion angle dynamics: reduced variable conformational sampling enhances crystallographic structure re?nement. Proteins: Struct. Funct. Genet. 19, 277 290. Schultze, P. & Feigon, J. (1997). Chirality errors in nucleic acid structures. Nature, 387, 668. ? Stein, E. G., Rice, L. M. & Brunger, A. T. (1997). Torsion-angle molecular dynamics as a new ef?cient tool for NMR structure calculation. J. Magn. Reson. 124, 154 164. van Gunsteren, W. F. & Berendsen, H. J. C. (1977). Algorithms for macromolecular dynamics and constraint dynamics. Mol. Phys. 34, 1311 1327. van Gunsteren, W. F., Billeter, S. R., Eising, A. A., ? ? Hunenberger, P. H., Kruger, P., Mark, A. E., Scott, W. R. P. & Tironi, I. G. (1996). Biomolecular Simulation: The GROMOS96 Manual and User Guide, vdf ? Hochschulverlag, Zurich, Switzerland. ? Vendrell, J., Billeter, M., Wider, G., Aviles, F. X. & ? Wuthrich, K. (1991). The NMR structure of the activation domain isolated from porcine procarboxypeptidase B. EMBO J. 10, 11 15. Wolynes, P., Onuchic, J. N. & Thirumalai, D. (1995). Navigating the folding routes. Science, 267, 1619 1620. ? Wuthrich, K. (1986). NMR of Proteins and Nucleic Acids, Wiley, New York. ? Wuthrich, K., Billeter, M. & Braun, W. (1983). Pseudostructures for the 20 common amino acids for use in studies of protein conformations by measurements of intramolecular proton-proton distance constraints with nuclear magnetic resonance. J. Mol. Biol. 169, 949 961.

Edited by P. E. Wright (Received 15 April 1997; received in revised form 11 July 1997; accepted 14 July 1997)