|
Evolution in a nutshell an alternative outline on evoution and some consequences concerning valuations by Gregor Kjellström
|
|
6 Normal (Gaussian) adaptation 6.1 Normal adaptation in one parameter; high school level. In its simplest form, the theorem of normal adaptation may be proved by fairly simple mathematics at the high school level. A more general proof is given in section 6.2, but already here it will be clear that evolution carries out a simultaneous maximization of the collective parameters mean fitness and average information. But first some definitions: Definition 6.1. s(x) is the probability that the individual having the parameter value x will be selected as a parent to new individuals in the population. In contrast to other earlier models of evolution (see Introduction), where the gene is supposed to be a unit of selection, this new model relies on the premise that selection primarily takes place on the individual level, and therefore a measure of the fitness of the individual is necessarily needed, because otherwise the selection of individuals will not be possible. The advantage is that it will in principle take into account all interactions between genes and all contemporary environmental factors that may affect s(x); including the stroke of lightning. The phenotypes of the adult individual will be seen as a result of a DNA-message and some standard environment. Deviations from this standard will be regarded as a change of s(x). Thus, since we do not presume any particular structure or extension of s(x), the environmental factors have been ignored. A disadvantage is that s(x) is a relative measure that can hardly be known for any individual, and that can only be estimated for an individual cloned in a very large number. Nevertheless, it can be used in mathematical models to examine certain properties of the evolutionary process. Even if s(x) of a particular individual is uncertain, the behaviour of mean fitness, determined over a large set of individuals, may as well be examined with fairly good reliability and even if this approach is of little practical use, it may have some philosophical implications. It also makes it possible to apply the theory of information, which is an abstract theory on probabilities only. Definition 6.2. N (m – x) = C exp(-(m-x)2/2s2) is a normal probability density function (p. d. f.) of the parameter x in the population. C is a constant so that ò N (m – x) dx = 1. The interval of integration is from -¥ to ¥. The mean value of x in N (m – x) is m. 6.1.1 Maximization of mean fitness Then P(m) = ò s(x) N (m – x) dx is an exact measure of mean fitness in the population. An error with this model is that it is valid for a normally distributed parameter in an infinitely large population only. But in a population with millions of individuals, the approximation may be fairly good. Let us now find out under what circumstances P becomes maximal. Thus we want to find the derivative of dP(m)/dm and put it equal to zero. Because we can differentiate to the right of the integral sign and because the derivative of the exponential is equal to the exponential itself, we get dP(m)/dm = s-2 ò (m – x) s(x) N (m – x) dx = = s-2 m ò s(x) N (m – x) dx - s-2 ò x s(x) N (m – x) dx = = s-2 P (m* – m) = 0, where we have also introduced the mean m* over the set of selected individuals: m* = ò x s(x) N (m – x) dx / ò s(x) N (m – x) dx. As long as P is > 0 (otherwise we are extinct), a necessary condition for P(m) to be maximal is that m* becomes equal to m. Because all factors that may have an impact on survival (for example egoism, altruism, genes, lightning, earthquakes etcetera) are in principle included in s(x), the condition is very generally valid. The same condition may also be extended to any number of parameters. Assuming random mating in proportion to s(x), evolution strives to fulfill the condition m* = m, because the gene frequencies among the offspring equal those among their parents. That is to say that the random process lacks prejudices and has no compass advising the direction. It is only s(x) that may have an impact on m*. Therefore evolution strives to a selective equilibrium where m* = m, even though s(x) changes with time. Suppose now that we have found a placement of m that makes P maximal. If m is slightly moved in an arbitrary direction, then P will decrease but may be recovered if the standard deviation, s, of N is slightly decreased, which is the same as decreasing the average information of N (see theorem 5.2). Therefore we may say that the mean fitness and the average information (disorder) are simultaneously maximized. For a general proof see section 6.2. If N deviates from normal, then the condition of optimality, m* = m, is no longer valid and P can no longer be maximized. In a computer simulation, however, P can still be maximized if the weight of certain individuals is changed in the distribution. But the natural process can hardly do such a thing. Instead it has to increase the number of certain more valuable individuals and to decrease the less valuable ones, which is the same as changing the distribution, and if it is not normal, it has to be changed again etc. Of course, this situation is absurd; it contradicts the original task. See also section 3.1. 6.2 Normal adaptation in many parameters; university level. This chapter provides a more general mathematical support of the pop-model and the statement that evolution strives to a simultaneous maximization of the collective parameters mean fitness and average information (disorder) presented earlier. It is assumed that a sufficiently important and large number of morphologic (and even mental) parameters are normally distributed in a large population. Thus our model may be seen as a statistical second order approximation of the real process. The problem of speciation will be ignored. Definition 6.3. The fitness of the individual has been defined by Hartl as the probability s(x) that the individual having the n characteristic parameters xT = (x1, x2, ..., xn) – where xT is the transpose of x - will survive, i. e. become selected as a parent of new individuals in the progeny. Definition 6.4. The genetic value landscape due to Eigen is entirely defined by s(x). This is also in contrast to the fitness surface defined by Wright (see Ridley) which relies on the fitness of genes. Definition 6.5. The region of acceptability, A, is defined as the set of all x satisfying the condition s(x) = q (a scalar constant ) > 0 and £ 1. s(x) = 0 for all other x outside A. More generally s(x) may be replaced by A. This is possible by the division of the parameter space in a lattice of small cells so that some fraction { = s(x) } of each cell, having the centre x, belongs to A, while the other part does not. Then, when the mesh size of the lattice ® 0, A may approximate any s(x) to any degree of precision. Definition 6.6. The mean fitness of a large population is defined over the set of individuals as P(m) = ò s(x) N(m – x) dx = òA N(m – x) dx, where N(m-x) is the p. d. f. of parameters in the population and m is the mean of N. Definition 6.7. N(m – x) = (2p)-n/2(det M)-1/2 exp{ -(m – x)TM-1(m – x) } is a normal p. d. f. with mean m (= [mi]) and moment matrix M (= [mij]). Definition 6.8. The mean of parameter values (first order moments) of the set of selected parents in a large population is defined as m* = [mi*] = ò x s(x) N(m–x) dx / ò s(x) N(m–x) dx = = ò x s(x) N(m–x) dx / P Definition 6.9. The second order moments of the set of selected parents in a large population is defined as M* = [ mij* ]. mij* = ò [ (mi – xi)(mj – xj) ] s(x) N(x) dx/ P 6.2.1 The theorems of normal adaptation In this section it will be shown that the mean fitness P of a normal p. d. f. over an arbitrary landscape s(x) may be maximized by fulfilling the condition m* = m. The average information or disorder of a normal p. d. f. will also be maximized, while keeping P at some suitable level, if the conditions m* = m and M* proportional to M are satisfied. Earlier versions of the theorems may be found in Kjellström & Taxén, 1981. Assuming random mating and that the gene frequencies among the offspring equal those among their parents, it follows that m moves to m* in every generation. Therefore the process will strive to a state of equilibrium where m* = m. The condition M* proportional to M seems to be more difficult to fulfill, which may be unfavorable while the process is climbing a mountain crest in the genetic landscape. In this case it is important that the offspring from two parents spread out along the crest avoiding the steep slopes at the sides. But, as earlier shown (Kjellström, 1996), the condition may be approximately fulfilled under certain circumstances as for instance if consecutive evolutionary steps are consecutively stored along a chromosome. On the other hand, if they are randomly distributed along the chromosomes there will be an adaptation of M, but it is not as good. Nevertheless, the efficiency of the process may probably be increased by many orders of magnitude even in the latter case. Now, let us see what the average information of a normal p. d. f. should look like. Theorem 6.2.1. The average information of a normal p. d. f. is H = log{ (2pe)n det(M) }1/2. Proof: As earlier shown by theorem 5.2 H is equal to log{ (2pe)n/2 s1s2 ... sn ) } in the case of statistically independent parameters. But since the product of the standard deviations is equal to det(M)1/2 and arbitrary orthogonal rotations of the coordinate system do not affect the average information, it follows that the average information is always equal to H = log{ (2pe)n det(M) }1/2. Theorem 6.2.2. The gradient of the mean fitness of a normal p. d. f. with respect to m is equal to M-1 P ( m* – m). The maximizing necessary condition for mean fitness is m* = m. Proof: P(m) = ò s(x) N(m – x) dx. Since differentiation is here allowed to the right of the integral sign we get ¶P(m)/¶mj = ò s(x) { ¶ N(m – x) /¶mj } dx = - ò s(x) N(m – x) { ujT M-1(mj – xj) + (mj – xj)T M-1 uj } / 2 dx, where the components of the vector uj are = 0, except for the component number j which is = 1. Thus we have gradm P(m) = - ò s(x) N(m – x) M-1 (m – x) dx = - P M-1 { m - ò x s(x) N(m – x) dx / P } = P M-1 ( m* – m ) where we have introduced the mean of phenotypes of the set of selected parents m* = [mj*] = ò x s(x) N(m–x) dx / ò s(x) N(m–x) dx = = ò x s(x) N(m–x) dx / P which proves the theorem. In contrast to Fisher's fundamental theorem, this result seems more reliable, because in a state of selective equilibrium, we have m* = m and consequently no increase in P, i. e. gradm P(m) = 0, but the phenotypic variance – displayed by M and H - must not be equal to zero. Instead, P has been maximized and - as will be shown by the next theorem - det(M) and H are simultaneously maximized with respect to variations in m, keeping P constant, even though this is only a sub-optimal solution as long as the condition M proportional to M* is not utilized. A correspondence to Fisher’s increase in mean fitness may now be derived from Gaussian adaptation. In this case the increase is defined from the offspring in one generation to the offspring in the next (it is assumed that M is fairly constant from one generation to the next). We haveDP = (¶P/¶m1) Dm1 + (¶P/¶m2) Dm2 + … + (¶P/¶mn) DmnIf the Gaussian is moved from m to m* we get the approximationDP = P (m* - m)T M-1 (m* - m);Theorem 6.2.3. A normal p. d. f. may be adapted for maximum average information to any A or s(x) at any given value of P = a < q (= the maximum attainable value of P). The maximizing necessary conditions are m* = m and M* proportional to M. Proof. Introduce the Lagrangian function F(m,P,L) = log{ (2pe)n det(M) }1/2 + g(P - a) + ågij(lij - lji); i > j, where L = M-1 = [ lij ]; g, gij = Lagrange multipliers, P = òA N(m – x) dx. N(m – x) = (2p)-n/2 det(L)1/2 exp(Q); Q = - (m – x)TL(m – x)/2 We will also use det(L) = å lik cik; k = 1, 2, ..., n, where cik is the cofactor of lik. Also mij = det(L)-1 cij mij = mji. Differentiating F w.r.t. lij gives ¶F/¶lij = - (1/2) (detL)-1¶(detL)/¶lij + ¶P/¶lij + Yij where Yij = gij if i > j, Yij = 0 if i = j, Yij = -gij if i < j. ¶(detL)/¶lij = cij ¶P/¶lij = òA ¶N(x)/¶lij dx ¶N(x)/¶lij = (2p)-n/2 [ det(L)1/2 exp(Q) ¶Q/¶lij + det(L)-1/2 ¶(detL)/¶lij exp(Q) ] ¶Q/¶lij = - (mi – xi)(mj – xj)/2 Thus we have ¶N(x)/¶lij = N(x) [- (mi – xi)(mj – xj) + mij ]/2 ¶F/¶lij = - mij/2 + òA (1/2) N(x) [- (mi – xi)(mj – xj) + mij ] dx + Yij = - mij/2 + gP(mij – mij*)/2 + Yij where mij* = òA N(x) [- (mi – xi)(mj – xj) ] dx/ òA N(x) dx. Thus, the maximizing condition is - mij/2 + gP(mij – mij*)/2 = 0 mij = mij*gP/(gP – 1) or M* proportional to M. Differentiating F w.r.t. mi gives ¶F/¶mi = g ¶P/¶mi, which means that the gradient of F with respect to m is pointing in the same direction as the gradient of P. It follows from theorem 6.2.3 that the maximizing condition of P with respect to m is m* = m. Thus average information is maximized by fulfilling the same condition. And the theorem is proved. Theorem 6.2.4. A normal distribution has the highest average information as compared to other distributions having the same moment matrix M. Proof: Because it is required that ò x2 N(x) dx = s2 and that ò N(x) dx = 1, we introduce the Lagrangian function F(N(x), x) = ò { -N(x) log[N(x) ] + a x2 N(x) + b N(x) } dx = extremum, with a and b as Lagrange multipliers. This is a problem within the calculus of variations, and we could use the Euler-Lagrange differential equation. But in this case F is independent of ¶N(x)/¶x and it is therefore sufficient to differentiate with respect to N and put the differential equal to zero. We have dF = (¶F/¶N) dN = ò { -log[N(x) ] – 1 + a x2 + b } dN dx = 0. Now assuming that the constants a and b have the values a = - 1/(2s2) and b = log[ (2p)-1/2s -1 ] + 1. Then we have dF = ò{-log[N(x)]–1+x2/(2s2)+log[(2p)-1/2s -1]+1}dN dx = 0. In order to guarantee that the integral will always vanish for all x and all variations dN we must have log[N(x) ] = log[ (2p)-1/2s -1 ] - x2/(2s2), and thus N(x) must be normal, i. e. N(x) = (2p)-1/2s -1 exp(- x2/2s2 ). The reason for showing an interest in theorem 6.2.4 is that in a technical application the moment matrix, M, is estimated and therefore it is of interest to find the distribution having the greatest average information assuming M given. But in the natural case this argument is not equally plausible. |
|
|
|