13. Predicting Protein Structure

13. Predicting Protein Structure


The following
content is provided under a Creative
Commons license. Your support will help MIT
OpenCourseWare continue to offer high quality
educational resources for free. To make a donation or
view additional materials from hundreds of MIT courses,
visit MIT OpenCourseWare at ocw.mit.edu. PROFESSOR: Welcome
back, everyone. I hope you had a good break. Hopefully you also
remember a little bit about what we did last time. So if you’ll
recall, last time we did an introduction
to protein structure. We talked a little bit
about some of the issues in predicting protein structure. Now we’re going to go
into that in more detail. And last time, we’d broken
down the structure prediction problem into a couple
of sub-problems. So there was a problem of
secondary structure prediction, which we discussed a
little bit last time. And remember that the
early algorithms developed in the ’70s get about
60% accuracy, and decades of research has only
marginally improved that. But we’re going to see
that some of the work on the main structure
recognition and predicting novel three-dimensional
structures has really advanced
very dramatically in the last few years. Now, the other thing
I hope you’ll recall is that we had this dichotomy
between two approaches to the energetics of
protein structure. We had the physicist’s
approach and we the statistician’s
approach, right? Now, what were some of the key
differences between these two approaches? Anyone want to
volunteer a difference between the statistical
approach to parametrizing the energy of a structure? So we’re trying to come
up with an equation that will convert coordinates
into energy, right? And what were some of the
differences between the physics approach and the
statistical approach? Any volunteers? Yes. AUDIENCE: I think the
statistical approach didn’t change the phi and
psi angles, right? It just changed other variables. PROFESSOR: So you’re close. Right. So the statistical– or maybe
you said the right thing, actually. So the statistical
approach keeps a lot of the pieces the protein rigid,
whereas the physics approach allows all the atoms
to move independently. So one of the key
differences, then, is that in the physics
approach, two atoms that are bonded to each other
still move apart based on a spring function. It’s a very stiff spring, but
the atoms move independently. In the statistical
approach, we just fix the distance between them. Similarly for a tetrahedrally
coordinated atom, in the physics approach
those angles can deform. In the statistical
approach, they’re fixed. Right? So in the statistical
approach, we have more or less
fixed geometry. In the physics approach, every
atom moves independently. Anyone else remember
another key difference? Where do the energy
functions come from? Volunteers? All right. So in the physics
approach, they’re all derived as much as possible
from physical principles, you might imagine. Whereas in the
statistical approach, we’re trying to recreate what we
see in nature, even if we don’t have a good physical
grounding for it. So this is most
dramatic in trying to predict the
solvation free energies. Right? How much does it cost you if
you put a hydrophobic atom into a polar environment? Right? So in the physics
approach, you actually have to have water molecules. They have to
interact with matter. That turns out to be
really, really hard to do. In the statistical approach, we
come up with an approximation. How much solvent
accessible surface area is there on the polar
atom when it’s free? When it’s in the
protein structure? And then we scale the transfer
energies by that amount. OK, so these are then
the main differences. Gotta be careful here. So we’ve got fixed geometry
this the statistical approach. We often use discrete rotamers. Remember? The side-chain angles, in
principle, can rotate freely. But there were only
a few confirmations are typically observed, so
we often restrict ourselves to the most commonly observed
combinations of the psi angles. And then we have the
statistical potential that depends on the
frequency at which we observe things in the database. And that could be the
frequency at which we observe particular atoms at
precise distances. It could be the fraction
of time that something’s solvent accessible versus not. And the other thing that
we talked about a little bit last time was
this thought problem. If I have a protein
sequence and I have two potential
structures, how could I use these
potential energies– whether they’re derived
from the physics approach or from the
statistical approach– how could I use these potential
energies to decide which of the two structures
is correct? So one possibility is that
I have two structures. One of them is truly the
structure and the other is not. Right? Your fiendish lab mate
knows the structure but refuses to tell you. So in that case,
what would I do? I know that one of these
structures is correct. I don’t know which one. How could I use the
potential energy function to decide which one’s correct? What’s going to be true
of the correct structure? AUDIENCE: Minimal energy. PROFESSOR: It’s going
to have lower energy. So is that sufficient? No. Right? There’s a subtlety
we have to face here. So if I just plug my protein
sequence onto one of these two structures and compute
the free energy, there’s no guarantee
that the correct one will have lower free energy. Why? What decision do I have to
make when I put a protein sequence onto a
backbone structure? Yes. AUDIENCE: How to
orient the side chain. PROFESSOR: Exactly. I need to decide how to
orient the side chains. If I orient the
side chains wrong, then I’ll have side chains
literally overlapping with each other. That’ll have incredibly
high energy, right? So there’s no
guarantee that simply having the right
structure will give you the minimal free energy
until you correctly place all the side chains. OK, but that’s the simple case. Now, that’s in the
case where you’ve got this fiendish friend who
knows the correct structure. But of course, in the general
domain recognition problem, we don’t know the
correct structure. We have homologues. So we have some
sequence, and we believe that it’s either homologous
to Protein A or to Protein B, and I want to decide
which one’s correct. So in both cases, the
structure’s wrong. It’s this question of
how wrong it is, right? So now the problem
actually becomes harder, because not only do
I need to get the right side chain confirmations, but I
need to get the right backbone confirmation. It’s going to close to one
of these structures, perhaps, but it’s never going
to be identical. So both of these
situations are examples where have to do some
kind of refinement of an initial
starting structure. And what we’re going to
talk about for the next part of the lecture are
alternative strategies for refining a partially
correct structure. And we’re going to look
at three strategies. The simplest one is called
energy minimization. Then we’re going to look
at molecular dynamics and simulated annealing. So energy minimization
starts with this principle that we talked about last time
I remember that came up here, that a stable structure has to
be a minimum of free energy. Right? Because if it’s not, then there
are forces acting on the atoms and that are going
to drive it away from that structure to
some other structure. Now, the fact that it is
a minimum of free energy does not guarantee that is
the minimum of free energy. So it’s possible that there
are other energetic minima. Right? The protein structure,
if it’s stable, is at the very least a
local energetic minimum. It may also be the global
free energy minimum. We just don’t know
the answer to that. Now, this was a
big area of debate in the early days of the
protein structure field, whether proteins could
fold spontaneously. If they did, then it meant
that they were at least apparently global
free energy minima. Chris Anfinsen actually
won the Nobel Prize for demonstrating that
some proteins could fold independently
outside of the cell. So at least some proteins had
all the structural information implicit in their
sequence, right? And that seems to imply that
there are global free energy minimum. But there are other
proteins, we now know, where the most commonly
observed structure has only a local
free energy minimum. And it’s got very high energetic
barriers that prevent it from actually getting to the
global free energy minimum. But regardless of
the case, if we have an initial
starting structure, we could try to find the nearest
local free energy minimum, and perhaps that is
the stable structure. So in our context,
we were talking about packing the side chains
on the surface of the protein that we believe might
be the right structure. So imagine that this
is the true structure and we’ve got the
side chain, and it’s making the dashed green lines
represent hydrogen bonds. It’s making a series
of hydrogen bonds from this nitrogen
and this oxygen to pieces of the
rest of the protein. Now, we get the crude
backbone structure. We pop in our side chains. We don’t necessarily– in
fact, we almost never– will choose randomly to
have the right confirmation to pick up all these
hydrogen bonds. So we’ll start off with
some structure that looks like this,
where it’s rotated, so that instead of seeing both
the nitrogen and the oxygen, you can only see the profile. And so the question is
whether we can get from one to by following the
energetic minima. So that’s the question. How would we go
about doing this? Well, we have this
function that tells us the potential energy for every
XYZ coordinate of the atom. That’s what we talked
about last time, and you can go back
and look at your notes for those two approaches. So how could we minimize
this free energy minimum? Well, it’s no different
from other functions that we want to minimize, right? We take the first derivative. We look for places where the
first derivative is zero. The one difference is that we
can’t write out analytically what this function
looks like and choose directions and locations in
space that are the minima. So we’re going to have
to take an approach that has a series of perturbations to
a structure that try to improve the free energy systematically. The simplest understanding is
this gradient descent approach, which says that I have some
initial coordinates that I choose and I take a
step in the direction of the first derivative
of the function. So what does that look like? So here are two possibilities. I’ve got this function. If I start off at x equals
2, this minus some epsilon, some small value times
the first derivative, is going to point
me to the left. And I’m going to take
steps to the left until this function, f prime,
the first derivative, is zero. Then I’m going to stop moving. So I move from my initial
coordinate a little bit each time to the left
until I get to the minimum. And similarly, if I
start off on the right, I’ll move a little bit
further to the right each time until the
first derivative is zero. So that looks pretty good. It can take a lot
of steps, though. And it’s not actually guaranteed
to have great convergence properties. Because of the number of
steps you might have to take, it might take quite a long time. So that’s the first
derivative, in a simple one-dimensional case. We’re dealing with a
multi-dimensional vector, so instead of doing
the first derivative we use the gradient,
which is a set of partial first derivatives. And I think one thing that’s
useful to point out here is that, of course, the force
is negative of the gradient of the potential energy. So when we do
gradient descent, you can think of it from
a physical perspective as always moving in the
direction of the force. So I have some structure. It’s not the true
native structure, but I take incremental steps
in the direction of the force and I move towards
some local minima. And we’ve done this in the
case of a continuous energy, but you can actually also
do this for discrete ones. Now, the critical point
was that you’re not guaranteed to get to the
correct energetic structure. So in the case that I showed
you before where we had the side chain side-on, if you actually
do the minimization there, you actually end up with the
side chain rotated 180 degrees where it’s supposed to be. So it eliminates all
the steric clashes, but it doesn’t actually pick
up all the hydrogen bonds. So this is an example of a
local energetic minima that’s not the global energetic minima. Any questions on that? Yes. AUDIENCE: Where do all these
n-dimensional equations come from? PROFESSOR: Where
do what come from? AUDIENCE: The
n-dimensional equations. PROFESSOR: So these are the
equations for the energy in terms of every single
atom in the protein if you’re allowing the
atoms to move, or in terms of every rotatable
bond, if you’re allowing only bonds to rotate. So the question was, where do
the multi-dimensional equations come from. Other questions? OK. All right, so that’s
the simplest approach. Literally minimize the energy. But we said it has this problem
that it’s not guaranteed to find the global
free energy minimum. Another approach is
molecular dynamics. So this actually
attempts to simulate what’s going on in a
protein structure in vitro, by simulating the force in
every atom and the velocity. Previously, there was
no measure of velocity. Right? All the atoms were static. We looked at what the
gradient of the energy was and we move by
some arbitrary step function in the
direction of the force. Now we’re actually
going to have velocities associated with all the atoms. They’re going to be
moving around in space. And we’ll have the
coordinate at any time t is going to be determined
by the coordinates of the previous
time, t of i minus 1 plus a velocity
times the time step. And the velocities are
going to be determined by the forces,
which are determined by the gradient of
the potential energy. Right? So we start off, always, with
that potential energy function, which is either from
the physics approach or the statistical approach. That gives us
velocities, eventually giving us the coordinates. So we start off
with the protein. There are some serious
questions of how you equilibrate the atoms. So you start off with a
completely static structure. You want to apply forces to it. There are some
subtleties as to how you go about doing that,
but then you actually end up simulating the motion
of all the atoms. And just give you a sense
of what that looks like, I’ll show you a quick movie. So this is the simulation of the
folding of a protein structure. And the backbone is
mostly highlighted. Most of the side chains
are not being shown. Actually, in bold, but you
can see the stick figures. And slowly it’s accumulating
its three-dimensional structure. [VIDEO PLAYBACK] [LAUGHTER] [END VIDEO PLAYBACK] PROFESSOR: OK, I think
you get the idea here. Oh, it won’t let me give up. OK, here we go. OK, so these are
the equations that are governing the motion
in an example like that. Now, the advantage of
this is we’re actually simulating the protein folding. So if we do it correctly,
we should always get the right answer. Of course, that’s not
what happens in reality. Probably the biggest problem
is just computational speed. So these simulations–
even very, very short ones like the
one I showed you– so how long does it take a
protein to fold in vitro? A long folding might
take a millisecond, and for a very small
protein like that it might be orders
of magnitude faster. But to actually
compute that could take many, many, many days. So a lot of computing
resources going into this. Also, if we want to
accurately represent solvation– the interaction of
the protein with water, which is what causes the hydrophobic
collapse, as we saw– then you actually would have to have
water in those simulations. And each water molecule adds
a lot of degrees of freedom, so that increases the
computational cost, as well. So all of these things determine
the radius of convergence. How far away can you be
from the true structure and still get there? For very small
proteins like this, with a lot of
computational resources, you can get from an unfolded
protein to the folded state. We’ll see some
important advances that allow us to get around
this, but in most cases we only can do
relatively local changes. So that brings us to our third
approach for refining protein structures, which is
called simulated annealing. And the inspiration
for this name comes from metallurgy
and how to get the best atomic
structure in a metal. I don’t know if any of you have
ever done any metalworking. Anyone? Oh, OK, well one person. That’s better than most years. I have not, but I understand
that in metallurgy– and you can correct me if I’m
wrong– that by repeatedly raising and lowering
the temperature, you can get better
metal structures. Is that reasonably accurate? OK. You can talk to one of your
fellow students for more details if you’re interested. So this similar
idea is going to be used in this
competition approach. We’re going to try to find
the most probable confirmation of atoms by trying to get
out of some local minima by raising the
energy of the system and then changing
the temperatures, or raising and lowering it
according to some heating and cooling schedule to get
the atoms into their most probable confirmation, the
most stable conformation. And this goes back
to this idea that we started with the local minima. If we’re just doing
energy minimization, we’re not going to be able
to get from this minimum to this minimum, because
these energetic barriers are in the way. So we need to raise the
energy of the system to jump over these
energetic barriers before we can get to the
global free energy minimum. But if we just move at very
high temperature all the time, we will sample the
entire energetic space but it’s going to
take a long time. We’re going to be sampling
a lot of confirmations that are low
probability, as well. So this approach
allows us to balance the need for speed and the
need to be at high temperature where we can overcome
some of these barriers. So one thing that I
want to stress here is that we’ve made a physical
analogy to this metallurgy process. We’re talking about raising
the temperature of the system and let the atoms
evolve under forces, but it’s in no way
meant to simulate what’s going on in
protein folding. So molecular dynamics
would try to say, this is what’s actually
happening to this protein as it folds in water. Simulated annealing is
using high temperature to search over spaces
and then low temperature. But these temperatures much,
much higher than the protein would ever encounter, so
it’s not a simulation. It’s a search strategy. OK, so the key to
this– and I’ll tell you the full
algorithm in a second– but at various steps
in the algorithm we’re trying to make
decisions about how to move from our current
set of coordinates to some alternative
set of coordinates. Now, that new set of coordinates
we’re going to call test state. And we’re going to decide
whether the new state is more or less probable
than the current one. Right? If it’s lower in energy,
then what’s it going to be? It’s going to be
more probable, right? And so in this
algorithm, we’re always going to accept those states
that are lower in free energy than our current state. What happens when
the state is higher in free energy than
our current state? So it turns out we are going
to accept it probabilistically. Sometimes it’s going to move
up in energy and sometimes not, and that is going
to allow us to go over some those
energetic barriers and try to get to new
energetic states that would not be accessible to
purely minimization. So the form of this is the
Boltzmann equation, right? The probability of some
test state compared to the probability
of a reference state is going to be the ratio of
these two Boltzmann equations– the energy of the test
state over the energy of the current state. So it’s the e to the minus
difference in energy over KT. And we’ll come back to
where this temperature term comes from in a second. OK, so here’s the
full algorithm. We will either iterate for
a fixed number of steps or until convergence. We’ll see we don’t
always converge. We have some initial
confirmation. Our current confirmation
will be state n, and that we can
compute as energy from those potential
energy functions that we discussed
in the last meeting. We’re going to choose a
neighboring state at random. So what does neighboring mean? So if I’m defining this in
terms of XYZ coordinates, for every atom I’ve
got a set of XYZ coordinates I’m going to
change them a few of them by small amount. Right? If I change them all
by large amounts, I have a completely
different structure. So I’m going to make
small perturbations. And if I’m doing this
with fixed backbone angles and just rotating the
side chains, then what would a neighboring state be? Any thoughts? What would a
neighboring state be? Anyone? Change a few of the side
chain angles, right? So we don’t want to globally
change the structure. We want some continuity
between the current state and the next state. So we’re going to
chose an adjacent state in that sense, so
the state space. And then here are the rules. If the new state
has an energy that’s lower than the current state,
we simply accept the new state. If not, this is where
it gets interesting. Then, we accept
that higher energy with a probability
that’s associated with the difference
in the energies. So if the difference
is very, very large, there’s a low
probability it’ll accept. If the differences are
slightly higher, than there’s a higher
probability that we accept. If we reject it, we just drop
back to our current state and we look for
a new test state. OK? Any questions on how we do this? Question, yes. AUDIENCE: How far away do
we search for neighbors? PROFESSOR: That’s the
art of this process, so I gave you a straight answer. Different approaches will
use different thresholds. Any other questions? OK, so the key thing
I want you realize, then, is there’s
this distinction between the
minimization approach and simulated
annealing approach. Minimization can only
go from state one to the local free
energy minimum, whereas the simulated
annealing has the potential to go much further
afield, and potentially to get to the global
free energy minimum. But it’s not
guaranteed to find it. OK, so let’s say we
start in state one and our neighbor
state was state two. So we’d accept that with
100% probability, right? Because it’s lower in energy. Then let’s say the
neighboring state turns out to be state three.
that’s higher in energy, so there’s a probability
that we’ll accept it, based on the difference
between the energy of state two and state three. Similarly from state
three to state four, so we might drop
back to state two. We might go up. And then we can eventually
get over the hump this way with sum probability. It’s a sum of each
of those steps. OK? OK, so if this is our
function for deciding whether to accept
a new state, how does temperature
affect our decisions? What happens when the
temperature is very, very high, if you look at that equation? So it’s minus e to the delta. The difference in
the energy over kT. So if t is very,
very large, then what happens that exponent? It approaches zero. So e to the minus zero is going
to be approximately 1, right? So at very high temperatures,
we almost always take the high energy state. So that’s what allows us to
climb those energetic hills. If I have a very
high temperature in my simulated
annealing, then I’m always going over
those barriers. So conversely,
what happens, then, when I set the
temperature very low? Then there’s a very,
very low probability of accepting those
changes, right? So if I have a very low
temperature– temperature approximately zero– then
I’ll never go uphill. Almost never go uphill. So we have a lot of control
over how much of the space this algorithm explores by
how we set the temperature. So this is again a little bit
of the art simulated annealing– decide exactly what
annealing schedule to use, what temperature
program you use. Do you start off high
and go literally down? Do you use some other,
more complicated function to decide the temperature? We won’t go into exactly
how to choose these. [INAUDIBLE] you could
track some of these things down from the references
that are in the notes. So we have this choice. But the basic idea
is, we’re going to start at higher temperatures. We’re going to explore
most of the space. And then, as we lower
the temperature, we freeze ourselves into the
most probable confirmations. Now, there’s nothing that
restricts simulated annealing to protein structure. This approach is
actually quite general. It’s called the Metropolis
Hastings algorithm. It’s often used in cases where
there’s no energy whatsoever and it’s thought of purely
in probabilistic terms. So if I have some probabilistic
function– some probability of being in some state S– I
can choose a neighboring state at random. Then I can compute
an acceptance ratio, which is the probability
of being a state S test over the probability
of being in a current state. This is what we did in terms
of the Boltzmann equation, but if I some other formulation
for the probabilities I’ll just use that. And then, just like in our
protein folding example, if this acceptance
ratio is greater than 1, we accept the new state. If it’s less than
1, then we accept it with a probabilistic statement. And so this is a very
general approach. I think you might see
it in your problem sets. We certainly have done
this on past exams– asked you to apply this algorithm to
other probabilistic settings. So it’s a very, very general
way to search the sample across a probabilistic
landscape. OK, so we’ve seen these
three separate approaches, starting with an
approximate structure and trying to get to
the correct structure. We have energy
minimization, which will move towards the
local confirmation. So it’s very fast
compared the other two, but it’s restricted
to local changes. We have molecular
dynamics, which actually tries to simulate the
biological process. Connotationally very intensive. And then we have
simulated annealing, which tries to shortcut
the root to some of these global
free energy minima by raising the temperature,
pretending at this very high temperature so we
can sample all the space, and then cooling down so
we trap a high probability confirmation. Any questions on any of
these three approaches? OK. All right, so I’m going
to go through now some of the approaches
that have already been used to try to
solve protein structures. We started off with a sequence. We’d like to figure out
what the structure is. And this field has had
a tremendous advance, because in 1995 a group
got together and came up with an objective
way of evaluating whether these
methods were working. So lots of people
have proposed methods for predicting
protein structure, and what the CASP group
did in ’95 was they said, we will collect structures
from crystallographers, NMR spectroscopists,
that they have not yet published but they know
they’re likely to be able to get within the
time scale of this project. We will send out those
sequences to the modelers. The modelers will attempt
to predict the structure, and then at the end
of the competition we’ll go back to the
crystallographers and the spectroscopists and
say, OK, give us a structure and now we’ll compare
the predicted answers the real ones. So no one knows
are the answer is until all the
submissions are there, and then you can see objectively
which of the approaches did the best. And one of the approaches
that’s consistently has done very well, which
we’ll look at in some detail, is this approach called Rosetta. So you can look at
the details online. They split this modeling
problem into two types. There are ones for
which you can come up with a reasonable
homology model. This can be very, very
low sequence homology, but there’s something in the
database of known structure that it’s sequenced
similarly to the query. And then ones where
it’s completely de novo. So how do they go about
predicting these structures? So if there’s homology, you
can imagine the first thing you want to do is align your
sequence to the sequence of the protein that
has a known structure. Now, if it’s high homology this
is not a hard problem, right? We just need to do a few tweaks. But we get to places–
what’s called the Twilight Zone, in fact– where there’s
a high probability that you’re wrong, that your sequence
alignments could be to entirely the wrong structure. And that’s where
things get interesting. So they’ve got high
sequence similarity– greater than 50%
sequence similarity that are considered
relatively easy problems. These medium problems that are
20% to 50% sequence similarity. And then very low sequence
similar problems– less than 20% sequence similarity. OK, so you’ve already
seen this course methods for doing sequence
alignment, so we don’t have to go into
that in any detail. But there are a lot of
different specific approaches for how to do those alignments. You could do anything from blast
to highly sophisticated Markov models to try to decide what’s
most similar to your protein structure. And one of the important
things that Rosetta found was not to align on
any single method but to try a bunch of
different alignment approaches and then follow
through with many of the different alignments. And then we get
this problem of how do you refine the models,
which is what we’ve already started to talk about. So in the general
refinement procedure, when you have a protein that’s
relatively in good shape they apply random perturbations
to the backbone torsion angle. So this is again the
statistical approach, the not allowing
every atom to move. They’re just rotating a certain
number of the rotatable side chains. So we’ve got the fine psi
angles in the backbone, and some of the side channels. They do what’s called rotamer
optimization of the side chain. So what does that mean? Remember that we
could allow the side chains to rotate
freely, but very, very few of those rotations
are frequently observed. So we’re going to choose,
as these three choices, among the best possible
rotamers, rotational isomers. And then once we’ve found
a nearly optimal side chain confirmation from those
highly probable ones, then we allow more
continuous optimization of the side chains. So when you have a very,
very high sequence homology template, you don’t
need to do a lot of work on most of the structure. Right? Most of it’s going
to be correct. So we’re going to
focus on those places where the alignment is poor. That seems pretty intuitive. Things get a little
bit more interesting when you’ve got these medium
sequence similarity templates. So here, even your basic
alignment might not be right. So they actually proceed
with multiple alignments and carry them through
the refinement process. And then, how do you decide
which one’s the best? You use the potential
energy function. Right? So you’ve already
taken a whole bunch of starting confirmations. We’ve taken them through
this refinery procedure. You now believe that
those energies represent the probability that the
structure is correct, so you’re going to choose
which of those confirmations to use based on the energy. OK, in these medium sequence
similarity templates, the refinement doesn’t do
the entire protein structure, but it focuses on
particular region. So places where there
are gaps, insertions, and deletions in the alignment. Right? So your alignment is
uncertain, so that’s where you need to
refine the structure. Places that were loops
in the starting models, so they weren’t
highly constrained. So it’s plausible
that they’re going to be different in
the starting structure from some homologous protein
and in the final structure. And then, regions where the
sequence conservation is low. So even if there is a
reasonably good alignment, there’s some
probability that things have changed during evolution. Now, when they do a
refinement, how they do that? In these places that
we’ve just outlined, they don’t simply randomly
perturb all of the angles. But actually, they take
a segment of the protein, and exactly how long
those segments are has changed over the course
of the Rosetta algorithm’s refinement. But say something on the order
of three to six amino acids. And you look in the
database for proteins that have known structure that
contain the same amino acid sequence. So it could be completely
unrelated protein structure, but you develop
a peptide library for all of those short
sequences for all the different
possible structures that they’ve adopted. So you know that those
are at least structures that are consistent with
that local sequence, although they
might be completely wrong for this
individual protein. So you pop in all of
those alternative possible structures. So OK, we replace
the torsion angles with those of peptides
of known structure, and then we do a local
optimization using the kinds of minimization
algorithms we just talked about to see whether
there is a structure that’s roughly compatible with
that little peptide that you took from the
database that’s also consistent with the
rest the structure. And after you’ve done that,
then you do a global refinement. Questions on that approach? OK, so does this work? One of the best competitors
in this CASP competition. So here are examples where the
native structure’s in blue. The best model they
produced was in red, and the best template– that’s
the homologous protein– is in green. And you can see that they
agree remarkably well. OK? So this is very
impressive, especially compared to some of
the other algorithms. But again, it’s
focusing on proteins where there’s at least some
decent homology to start with. If you look here at the
center of these proteins, you can see the original
structure, I believe, is blue, and their model’s in red. You can see they also get the
side chain confirmations more or less correct, which
is quite remarkable. Now, what gets
really interesting is when they work on
these proteins that have very low
sequence homologies. So we’re talking about 20%
sequence similarity or less. So quite often, you’ll actually
have globally the wrong fold– a 20%
sequence similarity. So what do they do here? They start by saying,
OK, we have no guarantee that our templates are
even remotely correct. So they’re going to start
with a lot of templates and they’re going to refine
all of these in parallel in hopes that some of them come
out right at the other end. And these are what they call
more aggressive refinement strategies. So before, where did we focus
our refinement energies? We focused on places that
were poorly constrained, either by evolution or
regions of the structure that weren’t well-constrained,
or places where the alignment wasn’t good. Here, they actually go after
the relatively well-defined secondary structure
elements, as well. And so they will
allow something that was a clear alpha helix
in all of the templates to change some of the structure
by taking peptides out of the database that
have other structures. OK? So you take a very,
very aggressive approach to the refinement. You rebuild the secondary
structure elements, as well as these gaps, insertions,
loops, and regions with low sequence conservation. And I think the really
remarkable thing is that this
approach also works. It doesn’t work quite as
well, but here’s a side by side comparison of a native
structure and the best model. So this is the hidden
structure that was only known to the crystallographer,
or the spectroscopist, who agreed to participate
in this CASP competition. And here is the
model they submitted blind without
knowing what it was. And you can see again
and again that there’s a pretty good global similarity
between the structures that they propose
and the actual ones. Not always. I mean, here’s an example where
the good parts are highlighted and the not-so-good
parts are shown in white so you can barely see them. [LAUGHTER] PROFESSOR: But even
so, give them that. Give them their credit. It’s a remarkably
good agreement. Now, we’ve looked at cases
where there’s very high sequence similarity, where there’s medium
sequence similarity, where there’s low sequence similarity. But the hardest category are
ones where there’s actually nothing in the structural
database that’s a detectable homologue to the
protein of interest. So how do you go
about doing that? That’s the de novo case. So in that case, they take
the following strategy. They do a Monte Carlo
search for backbone angles. So specifically, they
take short regions– and again, this is
the exact length. Changes in different
versions of the algorithm, but it’s either three to nine
amino acids in the backbone. They find similar
peptides in the database of known structure. They take the
backbone confirmations from the database. They set the angles
to match those. And then, they use those
Metropolis criteria that we looked at in
simulated annealing. Right? The relative probability
of the states, determined by the
Boltzmann energy, to decide whether
to accept or not. If it’s lower
energy, what happens? Do you accept? Do you not accept? AUDIENCE: Accept. PROFESSOR: You accept. And if it’s high energy,
how do you decide? AUDIENCE: [INAUDIBLE] PROFESSOR: [INAUDIBLE],
probability. Very good. OK, so they do a fixed number
of Monte Carlo steps– 36,000. And then they repeat
this entire process to get 2,000 final structures. OK? Because they really have
very, very low confidence in any individual one
of these structures. OK, now you’ve got
2,000 structures, but you’re allowed
to submit one. So what do you do? So they cluster
them to try to see whether there are common
patterns that emerge, and then they
refine the clusters and they submit each cluster
as a potential solution to this problem. OK, questions on the
Rosetta approach? Yes. AUDIENCE: Can you mention again
why the short region of three to nine amino acids,
and whether [INAUDIBLE]. PROFESSOR: So the
question is, what’s the motivation for taking
these short regions from the structural database? Ultimately, this is
a modeling choice that they made that
seems to work well. So it’s an empirical choice. But what possibly motivated
them, you might ask, right? So, the thought has been in
this field for a long time, and it’s still, I
think, unproven, that certain sequences will
have a certain propensity to certain structures. We saw this in the secondary
structure prediction algorithms, that there were
certain amino acids that occurred much more
frequently in alpha helixes. So it could be that there
are certain structures that are very likely to occur
for short peptides, and other ones that
almost never occur. And so if you had a large enough
database of protein structures, then that would be a
sensible sampling approach. Now, in practice, could you
have gotten some good answer in some other approach? We don’t know. This is what
actually worked well. So there’s no real theoretical
justification for it other than that
crude observation that there is some information
content that’s local, and then a lot of information
content that’s global. Yes? AUDIENCE: So when you’re
doing a de novo approach, is it general that you
come up with a bunch of different clusters
as your answer, whereas with the
homology approach, you are more confident
of structure answer? PROFESSOR: So the question
was, if you’re doing a de novo approach, is it
generally the case that you have lots
of individual, or clusters of structures,
whereas in homology you tend not to. And yes, that’s correct. So in the de novo,
there are frequently going to be multiple
solutions that look equally plausible to you,
whereas the homology tends to drive you to certain classes. Good questions. Any other questions? All, right so that was CASP. One was in 1995, which
seems like an eon ago. So how have things
improved over the course of the last decade or two? So there was an interesting
paper that came out recently that just looked at the
differences between CASP 10, one of are the most
recent ones, and CASP 5. They’re every two years,
so that’s a decade. So how have things
improved or not over the last decade
in this challenge? So in this chart, the
y-axis is the percent of the residues
that were modeled and that were not
in the template. OK? So I’ve got some template. Some fraction of the amino acids
have no match in the template. How many of those
do I get correct? As a function of
target difficulty, they have their own definition
for target difficulty. You can look in the
actual paper to find out what is in the CASP
competition, but it’s a combination of structural
and sequence data. So let’s just take
them that they made some reasonable
choices here. They actually put
a lot of effort into coming up with a
criteria for evaluation. Every point in this diagram
represents some submitted structure. The CASP5, a decade
ago, are the triangles. CASP 9, two years
ago, were the squares, and the CASP10 are the circles. And then they have
trend lines for CASP9 and CASP10 are shown
here– these two lines. And you can see that they
do better for the easier structures and worse for
the harder structures, which is what you’d expect,
whereas CASP5 was pretty much flat across all of them
and did about as well even on on the easy structures
as these ones are doing on the hard structures. So in terms of the fraction
of the protein that they don’t have a template for that
they’re able to get correct, they’re doing much, much
better in the later CASPs than they did a decade earlier. So that’s kind of encouraging. Unfortunately, the story isn’t
always that straightforward. So this chart is, again, target
difficulty on the x-axis. The y-axis is what they call
the Global Distance Test, and it’s a model of accuracy. It’s the percent of the carbon
alpha atoms in the predictions that are close– and they have
a precise definition of close that you can look up– that are
close to the true structure. So for a perfect model, it would
be up here in the 90% to 100% range, and then random
models would be down here. You can see a lot of
them are close to random. But more important here
are the trend lines. So the trend line for
CASP10, the most recent one in this report, is black. And fore CASP5, it’s
this yellow one, which is not that
different from the black. So what this shows is that,
over the course of a decade, the actual prediction
accuracy overall has not improved that much. It’s a little bit shocking. So they tried in this paper to
try to figure out, why is that? I mean, the percentage of
the amino acids that you’re getting correct is going up,
but overall accuracy has not. And so they make
some claims that it could be that target difficulty
is not really a fair measure, because a lot of the proteins
that are being submitted are now actually much harder
in different sense, in that they’re not single domain
proteins initially. So in CASP5, a lot
of them were proteins that had independent structures. By the time of CASP10,
a lot of the proteins that are being
submitted are more interesting structural problems
in that they’re folding is contingent on interactions
with lots of other things. So maybe all the
information you need is not composed entirely in
the sequence of the peptide that you’ve been given
to test but depends more on the interactions
of it with its partners. So those were for
homology models. These are the free
modeling results. So in free modeling, there’s
no homology to look at, so they don’t have a measure of
difficulty except for length. They’re using, again,
that Global Distance Test. So up here are perfect models. Down here are nearly
random models. CASP10 is in red. CASP5, a decade
earlier, is in green. And you can see the trend
lines are very, very similar. And CASP9, which is
the dashed line here, looks almost identical to CASP5. So again, this is
not very encouraging. It says that the
accuracy the models has not approved very
much over the last decade. And then, they do
point out that if you focus on the short structures,
then it’s kind of interesting. So in CASP5, which are the
triangles, only one of these was above 60%. CASP9, they had 5 out
of 11 were pretty good. But then you get to
CASP10 and now only three are greater than 60%. So it’s been
fluctuating quite a lot. So modeling de novo is still
a very, very hard problem. And they have a whole
bunch of theories as to why that could be. They proposed, as
I already said, that maybe the models that
they’re trying to solve have gotten harder in ways
that are not easy to assess. A lot of the proteins that
previously wouldn’t have had a homologue now already do,
because there has been a decade of structural work trying
to fill in missing domain structures. And that these targets tend
to have more irregularity. Tendency be part
of larger proteins. So again, there’s not
enough information in the sequence of
what you’re given to make the full prediction. Questions? So what we’ve seen so far
has been the Rosetta approach to solving protein structures. And it really is,
throw everything at it. Any trick that you’ve got. Let’s look into the databases. Let’s take homologous proteins. Right? So we have these high,
medium, low levels homologues. And even when we’re
doing a homologue, we don’t restrict ourselves
to that protein structure. But for certain parts,
we’ll go into the database and find the structures of
peptides of length three to nine. Pull those out of the
[? betas. ?] Plug those in. Our potential energy functions
are grab bag information, some of which has strong
physical principles, some which is just curve fitting
to make sure that we keep the hydrophobics inside
and hydrophilics outside. So we throw any information
that we have at the problem, whereas our physicist has
disdain for that approach. He says, no, no. We’re going to this
purely by the book. All of our equations are going
to have some physical grounding to them. We’re not going to start
with homology models. We’re going to try to do the
simulation that I showed you a little movie of for
every single protein we want to know the structure of. Now, why is that problem hard? It’s because these
potential energy landscapes are incredibly complex. Right? They’re very rugged. Trying to get from any current
position to any other position requires a go over
many, many minima. So the reason it’s
hard to do, then, is it’s primarily a
computing power issue. There’s just not
enough computer power to solve all of these problems. So what one group, DE
Shaw, did was they said, well, we can solve
that by just spending a lot of money, which
fortunately they had. So they designed
hardware that actually solves individual components of
the potential energy function in hardware rather
than in software. So they have a chip that
they call Anton that actually has parts of it that solve the
electrostatic function, the van der Waals function. And so in these chips,
rather than in software, you are doing as fast
as you conceivably can to solve the energy terms. And that allows you to
sample much, much more space. Run your simulations
for much, much longer in terms of real time. And they do remarkably well. So here are some pictures
from a paper of theirs– a couple of years ago
now– with the predicted and the actual structures. I don’t even remember
which color is which, but you can see it
doesn’t much matter. They get them down to
very, very high resolution. Now, what do you notice
about all these structures? AUDIENCE: They’re small. PROFESSOR: They’re small, right? So obviously there’s
a reason for that. That’s when you can do in
reasonable compute time, even with a high-end computing
that’s special purpose. So we’re still not
in a state where they can fold any
arbitrary structure. What else do you
notice about them? Yeah, in the back. AUDIENCE: They have
very well-defined secondary structures. PROFESSOR: They have
very well-defined secondary structures. And they’re specifically
what, mostly? AUDIENCE: Alpha helixes. PROFESSOR: Alpha helixes, right. And it turns out that a lot more
information is encoded locally in an alpha helix than
in a beta sheet, which is going to be contingent on
what that piece of protein comes up against. Right? Whereas in the
alpha helix, we saw that you can get 60% accuracy
with very crude algorithms, right? So we’re going to do best
with these physics approaches when we have small proteins
that are largely alpha helical. But in later papers– well
here’s even an example. Here’s one that has a
certain amount of beta sheet. And the structures are going
to get larger with time. So it’s not an inherent problem. It’s just a question of
how fast the hardware is today versus tomorrow. OK, a third approach. So we had the
statistical approach. We have the physics approach. The third approach, that
I won’t go into detail but you can play around
was literally yourselves, is a game where
we have humans who try to identify the
right structure, just as humans do very well
in other kinds of pattern recognition problems. So you can try this video game
where you’re given structures to try to solve and say, oh,
should I make that helical? Should I rotate that side chain? So give it a try. Just Google FoldIT,
and you can find out whether you can be the best
gamers and beat the hardware. All right. So so far we’ve been
talking about solving the structures of
individual proteins. We’ve seen there is some
success in this field. It’s improved a
lot in some ways. Between CASP1 and CASP5 I think
there’s been huge improvements. Between CASP5 and CASP10, maybe
the problems have gotten hard. Maybe there have
been no improvements. We’ll leave that for
others to decide. What I’d like to look at
in the end of this lecture and the beginning
of the next lecture are problems of proteins
interacting with each other, and can we predict
those interactions? And that’ll, then, lead us
towards even larger systems and network problems. So we’re going to
break this down to three separate
prediction problems. The first of these is predicting
the effect of a point mutation on the stability
of a known complex. So in some ways, you might
think this is an easy problem. I’ve got two proteins. I know their structure. I know they contract. I want to predict whether
a mutation stabilizes that interaction or
makes it fall apart. That’s the first
of the problems. We can try to
predict the structure of particular complexes,
and we can then try to generalize that and try
to predict every protein that interacts with
every other protein. We’ll see how we
do on all of those. So we’ll go into one of these
competition papers, which are very good at
evaluating the fields. This competition paper looked at
what I call the simple problem. So you’ve got two proteins
of known structure. The authors of the paper,
who issued the challenge, knew the answer for the effect
of every possible mutation at a whole bunch of positions
along these proteins on the– well, an approximation
to the free energy of binding. So they challenged
the competitors to try to figure out, we
give you the structure, we tell you all the
positions we’ve mutated, and you tell us whether those
mutations made the complex more stable or made the
complex less stable. Now specifically, they had two
separate protein structures. They mutated 53
positions in one. 45 positions in another. They didn’t directly measure
the free energy of binding for every possible complex,
but they used a high throughput assay. We won’t go into
the details, but it should track, more or
less, with the free energy. So things that seem to be more
stable directors here probably are lower free energy complexes. OK, so how would you go
about trying to solve this? So using these potential
energy functions that we’ve already
seen, you could try to plug in the mutation
into the structure. And what would you
have to do then in order to evaluate the energy? Before you evaluate the energy. So I’ve got known structure. I say, position 23 I’m mutating
from phenylalanine to alanine. I’ll say alanine
to phenylalanine. Make it a little
more interesting. OK? So I’m now stuck on
this big side chain. So what do I need to do before
I can evaluate the structure energy? AUDIENCE: Make sure
there’s no clashes. PROFESSOR: Make sure
no clashes, right? So I have to do one
of those methods that we already described
for optimizing the side chain confirmation, and
then I can decide, based on the free energy,
whether it’s an improvement or makes things worse. OK, so let’s see how they do. So here’s an example
of a solution. The submitter, the person who
has the algorithm for making a prediction, decides on
some cutoff in their energy function, whether
they think this is improving things or
making things worse. So they decide on the color. Each one of these
dots represents a different mutation. On the y-axis is the
actual change in binding, the observed change in binding. So things above zero
are improved binding. Below zero are worse binding. And here are the predictions
on the submitter scale. And here the submitter
said that everything in red should be worse and everything
green should be better. And you can see that
there’s some trend. They’re doing reasonably well
in predicting all these red guys as being bad, but
they’re not doing so well in the neutral ones, clearly,
and certainly not doing that well in the improved ones. Now, is this one of the better
submitters or one of the worst? You’d hope that this
is one of the worst, but in fact this is one
of the top submitters. In fact, not just
the top submitter but top submitter
looking at mutations that are right at the
interface where you’d think they’d do the best, right? So if there’s some mutation on
the backside of the protein, there’s less
structural information about what that’s going to
be doing in the complex. There could be some
surprising results. But here, these are
amino acid mutations right at the interface. So here’s an example
of the top performer. This is the graph
I just showed you, focusing only at
the [? residues ?] of the interface, and all sites. And here’s an average group. And you can see the
average groups are really doing rather abysmally. So this blue cluster that’s
almost entirely below zero were supposed to be neutral. And these green ones were
supposed to be improved, and they’re almost
entirely below zero. This is not encouraging story. So how do we
evaluate objectively whether they’re
really doing well? So we have some sort
of baseline measure. What is it the sort
of baseline algorithm you could use to predict
whether a mutation is improving or hurting this interface? So all of their
algorithms are going to use some kind
of energy function. What have we already seen in
earlier parts of this course that we could use? Well, we could use the
substitution matrices, right? We have the BLOSUM
substitution matrix that tells us how
surprised we should be when we see an evolution,
that Amino Acid A turns into Amino Acid B.
So we could use, in this case, the BLOSUM matrix. That gives us for
each mutation a score. It ranges from minus 4 to 11. And we can rank
every mutation based on the BLOSUM matrix
for the substitution and say, OK, at some value
in this range things should be getting better
or getting worse. So here’s an area
under the curve plot where we’ve plotted
the false positives and true positive
rates as I change my threshold for
that BLOSUM matrix. So I compute what the
mutation BLOSUM matrix is, and then I say, OK, is a
value of 11 bad or is it good? Is a value of 10 bad or good? That’s what this
curve represents. As I vary that threshold,
how many do I get right and how many do I get wrong? If I’m doing the
decisions at random, then I’ll be getting
roughly equal true positives and false positives. They do slightly better in
the random using this matrix. Now, the best algorithm at
predicting that uses energies only does marginally better. So this is the best
algorithm at predicting. This is this baseline algorithm
using just the BLOSUM matrix. You can see that the green curve
predicting beneficial mutations is really hard. They don’t do much
better than random. And for the
deleterious mutations, they do somewhat better. So we could make these
plots for every single one of the algorithms,
but a little easier is to just compute the
area under the curve. So how much of the area? If I were doing perfectly, I
would get 100% true positives and no false positives, right? So my line would go
straight up and across and the area under the
curve would be one. And if I’m doing terribly,
I’ll get no true positives and all false positives. I’d be flatlining and
my area would be zero. So the area under
the curve, which is normalized
between zero and one, will give me a sense of how
well these algorithms are doing. So this plot– focus first
on the black dots– shows at each one of these algorithms
what the area under the curve is for beneficial and
deleterious mutations. Beneficial on the x-axis,
deleterious mutations on the y-axis. The BLOSUM matrix is here. So good algorithms should be
above that and to the right. They should having a better
area under the curve. And you can see the
perfect algorithm would have been all the way up here. None of the black dots
are even remotely close. The G21, which we’ll talk
about a little bit in a minute, is somewhat better than the
BLOSUM matrix, but not a lot. Now, I’m going to ignore the
second round in much detail, because this is a case
where people weren’t doing so well in the first round so
they went out and gave them some of the information about
mutations at all the positions. And that really changes
the nature of problem, because then you have
a tremendous amount of information about which
positions are important and how much those
mutations are making. So we’ll ignore
the second round, which I think is an overly
generous way of comparing these algorithms. OK, so what did the authors
of this paper observe? They observed that the
best algorithms were only doing marginally better
than random choice. So three times better. And that there seemed to be
a particular problem looking at mutations that
affect polar positions. One of the things that I think
was particularly interesting and quite relevant when we
think about these things in a thermodynamic context
is that the algorithms that did better– none of them
could be really considered to do really well– but the
algorithms that did better didn’t just focus on
the energetic change between forming the
native complex over here and forming this mutant
complex indicated by the star. But they also focused on
the affect of the mutation on the stability of
the mutated protein. So there’s an
equilibrium not just moving between the free
proteins and the complex, but also between moving
between the free proteins that are folded and the free
proteins that are unfolded. And some of these
mutations are affecting the energy of the folded
state, and so they’re driving things to the
left, to the unfolded. And if you don’t include
that, then you actually get into trouble. And I’ve put a link here
to some lecture notes from a different course that
I teach where you can look up some details and more
sophisticated approaches that actually do take into account
a lot of the unfolded states. So the best approach–
best of a bad lot– consider the effects of
mutations on stability. They also model packing,
electrostacks, and solvation. But the actual
algorithms that they used were a whole mishmash
of approaches. So there didn’t seem to emerge
a common pattern in what they were doing, and I thought I
would take you through one of these to see what
actually they were doing. So the best one was this
machine learning approach, G21. So this is how they
solved the problem. First of all, they dug
through the literature and found 930 cases where they
could associate a mutation with a change in energy. These had nothing to do with
proteins under consideration. They were completely
different structures. But they were cases
where they actually had energetic information
for each mutation. Then we go through
and try to predict what the structural change
will be in the protein, using somebody else’s
algorithm, FoldX. And now, they describe
each mutant, not just with a single energy–
we have focused, for example, on PyRosetta,
which you’ll use in process– but they actually had
85 different features from a whole bunch of
different programs. So they’re taking a
pretty agnostic view. They’re saying, we don’t know
which of these energy functions is the best, so let’s let
the machine learning decide. So every single mutation that’s
posed to them as a problem, they have 85
different parameters as to whether it’s
improving things or not. And then, they had their
database of 930 mutations. For each one of those
they had 85 parameters. So those are label
trending data. They know whether things
are getting better or worse. They actually don’t even rely
on a single machine learning method. These actually used five
different approaches. We’ll discuss Bayesian
nets later in this course. Most of these others
we won’t cover at all, but they used a lot of different
computational approaches to try to decide how to go
from those 85 parameters to a prediction of whether the
structures improved or not. So this actually
shows the complexity of this apparently
simple problem, right? Here’s a case where I have two
proteins of known structure. I’m making very specific
point mutations, and even so I do only
marginally better than random. And even throwing at it all
the best machine learning techniques. So there’s clearly a
lot in protein structure that we don’t yet have
parametrized in these energy functions. So maybe some of
these other problems are actually not as
hard as we thought. Maybe instead of trying to
be very precise in terms of the energetic change
for a single mutation at an interface, we’d do
better trying to predict rather crude parameters of which
two proteins interact with each other. So that’s what
we’re going to look at in the next
part of the course. We’re going to
look at whether we can use structural data to
predict which two proteins will interact. So here we’ve got a problem,
which is a docking problem. I’ve got two proteins. Say they’re of known
structure, but I’ve never seen them interact
with each other. So how do they come together? Which faces of the proteins are
interacting with each other? That’s called a docking problem. And if I wanted to try to
systematically figure out whether Protein A and Protein
B interact with each other, I would have to do a search
over all possible confirmations, right? Then I could use
the energy functions to try to predict which
one has the lowest energy. But it actually would be
a computationally very inefficient way to do things. So we could imagine we
wanted to solve this problem. For each potential
partner, we could evaluate all relative
positions and orientations. Then, when they come together
we can’t just rely on that, but as we’ve seen
several times now we’re going to have to do local
confirmational changes to see how they fit together for
each possible docking. And then, once
we’ve done that, we can say, OK, which of
these has the lowest energy of interaction? So that, obviously, is going
to be too computationally intensive to do
on a large scale. It could work very
well if you’ve got a particular pair
or proteins that you need to study. But on a big sale, if
we wanted to predict all possible interactions,
we wouldn’t really be able to get very far. So what people typically do is
use other kinds of information to reduce the search space. And what we’ll see
in the next lecture, then, are different ways
to approach this problem. Now, one question we
should ask is, what role is structural homology
going to play? Should I expect that
any two proteins that interact with each
other– let’s say that that Protein A and
I know its interactors. So I’ve got A known to
interact with B. Right? So I know this interface. And now I have
protein C, and I’m not sure if it interacts or not. Should I expect the interface
of C, that touches A, to match the interface of B? Should these be homologous? And if not precisely
homologous, then are there properties
that we can expect that should be
similar between them? So different
approaches we can take. And there are
certainly cases where you have proteins that interact
with a common target that have no overall structure
similarity to each other but do have local
structural similarity. So here’s an example
of subtilisn, which is shown in light
gray, and pieces of it that interactive with the
target are shown in red. So here are two proteins that
are relatively structurally homologous– they interact
at the same region. That’s not too surprising. But here’s a subtilisn
inhibitor that has no global structural
similarity to these two proteins, and yet its
interactions with subtilisn are quite similar. So we might expect, even if
C and B don’t look globally anything like each
other, they might have this local similarity. OK, actually I think we’d
like to turn back your exams. So maybe I’ll stop here. We’ll return the
exams in the class, and then we’ll pick up at this
point in the next lecture.

2 Replies to “13. Predicting Protein Structure”

Leave a Reply

Your email address will not be published. Required fields are marked *