How to Explore Morphological Integration in Human Evolution and Development? Philipp Mitteroecker

Evol Biol (2012) 39:536–553
DOI 10.1007/s11692-012-9178-3
How to Explore Morphological Integration in Human Evolution
and Development?
Philipp Mitteroecker • Philipp Gunz
Simon Neubauer • Gerd Mu¨ller
Received: 6 February 2012 / Accepted: 5 April 2012 / Published online: 28 April 2012
Ó Springer Science+Business Media, LLC 2012
Abstract Most studies in evolutionary developmental
biology focus on large-scale evolutionary processes using
experimental or molecular approaches, whereas evolutionary quantitative genetics provides mathematical models of
the influence of heritable phenotypic variation on the shortterm response to natural selection. Studies of morphological
integration typically are situated in-between these two
styles of explanation. They are based on the consilience of
observed phenotypic covariances with qualitative developmental, functional, or evolutionary models. Here we review
different forms of integration along with multiple other
sources of phenotypic covariances, such as geometric and
spatial dependencies among measurements. We discuss one
multivariate method [partial least squares analysis (PLS)] to
model phenotypic covariances and demonstrate how it can
be applied to study developmental integration using two
empirical examples. In the first example we use PLS to study
integration between the cranial base and the face in human
postnatal development. Because the data are longitudinal,
we can model both cross-sectional integration and integration of growth itself, i.e., how cross-sectional variance and
covariance is actually generated in the course of ontogeny.
We find one factor of developmental integration (connecting
facial size and the length of the anterior cranial base) that is
highly canalized during postnatal development, leading
to decreasing cross-sectional variance and covariance.
P. Mitteroecker (&) G. Mu¨ller
Department of Theoretical Biology, University of Vienna,
Althanstrasse 14, 1090 Vienna, Austria
e-mail: [email protected]
P. Gunz S. Neubauer
Department of Human Evolution, Max Planck Institute
for Evolutionary Anthropology, Deutscher Platz 6, 04103
Leipzig, Germany
A second factor (overall cranial length to height ratio) is less
canalized and leads to increasing (co)variance. In a second
example, we examine the evolutionary significance of these
patterns by comparing cranial integration in humans to that
in chimpanzees.
Keywords Canalization Cranial growth Developmental integration Modularity Morphometrics Partial least squares analysis
A central theme in evolutionary developmental biology
(EvoDevo) is the influence of the developmental system—
the processes by which genotype translates into phenotype—on evolutionary change (e.g., Raff 1996; Wagner
2000; Arthur 2002; Mu¨ller 2007). EvoDevo studies often
focus on large-scale evolutionary processes, such as the
emergence of novel anatomical structures or of entire body
plans, and on how development constrains or drives these
processes. In parallel, there is a long-standing tradition in
evolutionary quantitative genetics to model the influence of
heritable phenotypic variation—which largely is determined by the developmental system—on the response to
natural selection (e.g., Fisher 1930; Lande 1979; Arnold
et al. 2001). While EvoDevo usually aims at qualitative,
causal explanations, quantitative genetics provides a set of
formal mathematical models. Studies performed under the
heading of morphological integration or phenotypic integration typically are situated in-between these two styles of
explanation. Observed phenotypic variances and covariances are interpreted in terms of (qualitative) developmental or functional models, and evolutionary inferences
are derived from the observed patterns (e.g., Chernoff and
Evol Biol (2012) 39:536–553
Magwene 1999; Pigliucci and Preston 2004; Mitteroecker
and Bookstein 2007; Hallgrimsson et al. 2009). Conclusions mostly are not based on formal models, but on the
‘‘consilience’’ (Wilson 1998; Bookstein in press) of multiple lines of evidence, both quantitative and qualitative.
In the early 20th century, pioneers such as D’Arcy
Thompson, Sewall Wright, and Paul Terentjev developed
ingenious approaches to study the integration of morphological traits. Thompson (1917) considered inter-species
differences of complex anatomical structures as relatively
simple—hence structurally integrated—geometric transformations, whereas Terentjev (1931) and Wright (1932)
devised hierarchical statistical models to explain phenotypic covariances within a population. In 1958, at a time
when most scientists focused on the evolution of single
isolated traits, the two paleontologists Everett Olson and
Robert Miller published their influential book ‘‘Morphological Integration,’’ in which they emphasized developmental and functional dependencies among traits and their
resulting coevolution. Olson and Miller’s statistical and
conceptual approaches, however, were relatively simple—
as were those of Berg (1960), who continued Terentjev’s
work in botany. Based on extensive plant breeding and
crossing experiments, Jens Clausen and colleagues (e.g,
Clausen and Hiesey 1960) computed an array of phenotypic correlations and interpreted them in a thoughtful
genetic context. In the 1980s, Jim Cheverud, Miriam Zelditch, and others (e.g., Cheverud 1982, 1989; Zelditch
1987, 1988; Cheverud et al. 1989) raised renewed interest
in morphological integration by applying novel statistical
techniques to primate and rodent morphology. By connecting morphological integration to the emerging concepts of developmental and genetic modularity, it became
part of contemporary EvoDevo theory and evolutionary
quantitative genetics (e.g., Lande 1980; Bonner 1988;
Cheverud 1982, 1996a, b; Raff 1996; Wagner and Altenberg 1996). Advances in geometric morphometrics and
multivariate statistics have led to another series of publications on morphological integration in the new millennium (e.g., Rohlf and Corti 2000; Klingenberg and Zaklan
2000; Bookstein et al. 2003; Klingenberg et al. 2003;
Hallgrimsson et al. 2004, 2006; Bastir and Rosas 2005;
Monteiro et al. 2005; Gunz and Harvati 2007; Mitteroecker
and Bookstein 2008). All these approaches share the focus
on inter-dependences between measured traits at various
causal and statistical levels, which are interpreted within a
developmental, functional, or evolutionary context.
Most work in contemporary EvoDevo is experimental
and at the molecular level, whereas empirical quantitative
genetic research requires large-scale breeding experiments
to reliably estimate genetic variances and covariances.
Therefore, studies of morphological integration, which can
make use of adult or postnatal individuals and concentrate
on phenotypic instead of genetic covariances, are ideal to
address EvoDevo questions in anthropology and
In this paper we aim to place the study of morphological
integration in a contemporary biological and biometric
context. We describe a multivariate statistical approach to
explore patterns of integration and apply it to study morphological integration of cranial growth in humans and
chimpanzees. As the line separating an insightful study of
morphological integration from an ad hoc story is relatively
thin, we start with an outline of the conceptual framework—the biometrics of morphological integration—,
determining how statistics and biology must meet in order
to arrive at a successful consilience of the two kinds of
The Biometrics of Morphological Integration
Where do Covariances Come From?
The parts of an organism develop in a coordinated way.
Adjacent elements of complex anatomical structures, such
as the cranium, physically interact during development to
form a tightly integrated adult phenotype. Many growth
factors and signaling molecules affect different tissues and
body parts, and thus mechanistically link these parts in the
course of their development. Likewise, signaling cascades
and induction processes interconnect different body parts
during development. These processes have been termed
developmental integration and referred to as ‘‘individuallevel integration’’ by Cheverud (1996a, b); the underlying
mechanisms are rooted in individual development and can
be studied experimentally (compare also Needham’s 1933
approach to ‘‘dissociability’’ in development). Variation of
such integrated developmental processes in a sample of
different individuals induces a covariance between the
phenotypic traits affected by these processes.
A related concept in genetics, exactly a century old, is
pleiotropy: the effect of genes (or of mutations of these
genes) on multiple traits (e.g., Hodgkin 1998; Wagner and
Zhang 2011). Pleiotropy can result from multiple molecular functions of a gene product, from the expression of a
gene in multiple tissues, and from the chemical and
mechanical integration of developmental processes. Allelic
variation of a pleiotropic gene induces covariance between
the affected traits. Genotype-phenotype maps are graphical
or mathematical representations of the relationship
between a set of (pleiotropic) genes and a set of phenotypic
traits; they are frequently used in theoretical studies to
represent integration due to pleiotropic genes and to compute the induced phenotypic covariances (e.g., Wagner and
Altenberg 1996; Stadler and Stadler 2006; Mitteroecker
Evol Biol (2012) 39:536–553
Fig. 1 a A pathmodel of a simple genotype-phenotype map,
illustrating the linear effects of two local growth factors A and B on
the phenotypic traits V1 . . .V6 . When both factors have unit variance,
the phenotypic covariances are given by the products of the
corresponding path coefficients, e.g., Cov(V1, V2) = 0.42 9 0.28 =
0.12. Because of the modular genotype-phenotype map, the two
groups of variables V1 . . .V3 and V4 . . .V6 are uncorrelated—they are
variational modules. b A genotype-phenotype map with two pleiotropic factors C, D. Phenotypic covariances are given by the sums of
the covariances induced by C and D, e.g., Cov(V1, V2) = 0.3 9
0.2 ? 0.3 9 0.2 = 0.12. Note that the covariances between V1 . . .V3
and V4 . . .V6 cancel out so that the variables have the same modular
covariance structure as in (a), even though the genotype-phenotype
map is not modular. c Another simple but slightly more realistic
genotype-phenotype map, consisting of one global and two local
factors, such as in Wright’s 1932 model. Phenotypic covariances
reflect the local or modular growth factors only if all path coefficients
are approximately equal (Mitteroecker and Bookstein 2007)
and Bookstein 2007; Pavlicev and Hansen 2011; see also
Fig. 1).
A major, often dominating component of phenotypic
covariation is related to allometry, the effect of overall size
on organismal shape (Huxley 1932; Bookstein 1991; Gould
1977; Klingenberg 1998). Whenever size varies, allometry
induces phenotypic covariances (both in ontogenetic samples and in samples of adult specimens). Individual differences in body size owe to a large part to differences in
the timing of growth and development. In primates, body
size is mainly determined by the amount and duration of
the expression of growth hormones during postnatal
development and by the onset of steroid hormone expression during puberty (e.g., Bogin 1999). Apart from these
highly pleiotropic growth factors, covariances in a population due to allometry not necessarily reflect developmental integration. Two body parts under completely
independent genetic and developmental control would still
covary in a population if the amount or duration of overall
growth varies (they would be uncorrelated only after statistically controlling for overall size; see below).
In addition to developmental integration and pleiotropy,
phenotypic covariances in a population can owe to linkage
disequilibrium, the non-random association of alleles at
two or more loci affecting different traits. Linkage disequilibrium can result from genetic linkage, the co-inheritance of genes due to their physical proximity on a
chromosome. Among other factors such as non-random
mating and population structure, linkage disequilibrium
can also result from correlational selection. When several
anatomical elements are jointly involved in a particular
function, their dimensions usually need to fit together
tightly (consider, e.g., the bony and cartilaginous elements
of a joint such as the knee). This functional integration
leads to correlational selection, i.e, selection for particular
character combinations, which in turn leads to the covariance of traits within a population, even if they are not
linked developmentally. Likewise, a joint function of traits
during development can result in prenatal correlational
selection (internal selection). However, the contribution of
linkage disequilibrium to phenotypic covariances is small
compared to developmental integration and pleiotropy
unless correlational selection is strong and persisting
(Lande 1980, 1984; Lynch and Walsh 1998; Sinervo and
Svensson 2002).
Following the usual distinction in quantitative genetics
between genetic variation and environmental variation, one
can contrast genetic integration with environmental integration (Cheverud 1982, 1996a, b). Genetic integration is
the co-inheritance of traits, resulting from pleiotropy and
from linkage disequilibrium. Environmental integration is
the integration of phenotypic traits due to environmental,
non-heritable influences on development.
Covariances and correlations between traits are not only
determined by common developmental and genetic causes,
but also by the variance of the underlying growth processes
and by the allele frequencies of pleiotropic genes (e.g.,
Pigliucci 2006; Mitteroecker and Bookstein 2008; Hallgrimsson et al. 2009). If a pleiotropic factor does not vary
in a sample, it induces no covariance, even though the traits
are developmentally linked. Two species might share the
same pleiotropic growth factor (the same developmental
integration), but differ in the variance of this factor and
hence also in covariance. Reflecting Wagner and Altenberg’s (1996) distinction between variation and variability,
Hallgrimsson et al. (2009) thus defined integration as the
Evol Biol (2012) 39:536–553
ability to covary, which is determined by the underlying
developmental factors; the manifestation as observable
phenotypic covariance depends on the variation of these
factors in a population.
Phenotypic variances and covariances in samples of adult
individuals are the result of variation in a vast array of
developmental processes. A covariance between two traits
close to zero can result from the absence of any developmental and genetic factors leading to integration (or from the
lack of variation in these factors), but two or more pleiotropic factors may also cancel out: some factors inducing a
positive covariance and some factors inducing a negative
covariance of the same amount would lead to statistically
(but not developmentally) independent traits (Fig. 1a, b).
More often, covariances of opposite sign may not cancel out
exactly but lead to a reduced total covariance, even though
many developmental or genetic factors may link the traits
(Clausen and Hiesey 1960; Houle 1991; Cheverud 1984;
Gromko 1995; Pigliucci 2006; Mitteroecker and Bookstein
2007; Mitteroecker 2009; Pavlicev and Hansen 2011).
In addition to all these biological factors, a further (and
often neglected) source of phenotypic covariances is the
nature of the measurements themselves. For example, in a
set of distance measurements between landmarks, distances
sharing the same start or end point necessarily correlate,
but no biological interpretation of this correlation is warranted. Likewise, size-corrected measurements, such as
distance ratios with the same denominator or Procrustes
shape coordinates, are geometrically dependent. Also the
spatial distribution of measurements affects the correlation
structure: closely adjacent measurements necessarily correlate higher than more distant measurements (e.g., Mitteroecker 2009; Huttegger and Mitteroecker 2011). For
example, Sawin et al. (1970) reported an approximately
linear decline in the correlation among dimensions of
rabbit bones with their spatial distance. An even more
fundamental difficulty is the definition of measurements or
phenotypic traits (particularly of distance measurements),
as pointed out by Wagner and Zhang (2011). For instance,
the length of the upper jaw (LU) and the length of the lower
jaw (LL) are highly correlated and developmentally integrated, but the mathematically equivalent variables ‘‘upper
jaw length plus lower jaw length’’ (LU ? LL) and ‘‘difference between upper and lower jaw length’’ (LU - LL) are
uncorrelated. Phenotypic covariances thus cannot be
interpreted without reference to the generation of the
variables and their spatial and geometric dependencies.
traits, and low correlations as evidence of the absence of
integration. Based on this rationale, they defined q-sets as
sets of variables with high mutual correlations within one
set and low correlations between variables from different
q-sets. Terentjev (1931) and Berg (1960) referred to such
highly correlated sets of variables as correlation pleiades,
whereas in the more recent morphometric and quantitative
genetic literature they are called variational modules
(Wagner et al. 2007; Mitteroecker 2009, Wagner and
Zhang 2011). They are frequently interpreted as indications
of developmental modules (e.g., Klingenberg 2008).
Given the many possible origins of covariances listed
above, it should be evident that phenotypic covariances and
correlations can not be taken as direct evidence for
developmental integration. In particular, this applies to low
covariances, which can result from different developmental
factors with opposite effects rather than from the absence
of any such factors (which is very unlikely in higher animals). Terentjev (1931) and Wright (1932) thus removed
estimates of pleiotropic factors from the data before
interpreting correlations as the result of local or modular
developmental processes (see also Hansen 2003; Mitteroecker and Bookstein 2007). Both arrived at a hierarchical
model of factors influencing phenotypic variation and
covariation—a nested arrangement of factors with different
pleiotropic ranges (Fig. 1c). Mitteroecker and Bookstein
(2007) showed that net phenotypic covariances reflect
developmental modularity as expected by Olsen and Miller
only for size measurements (distances, volumes, etc.) and if
these factors induce almost isometric growth.
The multivariate estimation of factors that together
explain the observed variances and covariances make much
more biometric sense than the interpretation of raw
covariances. The factors can be interpreted as regressions
of the variables on the (unmeasured) factor scores—as
models of how an underlying growth factor affects phenotypic traits (the path coefficients in Fig. 1). They quantify the ability of phenotypic traits to covary, not the actual
covariance. All the loadings of one factor can be interpreted and visualized as a single spatial pattern.
There is a large body of statistical literature on exploratory and confirmatory factors analysis, but only few
approaches have been applied to morphometric data.
Below, we describe one multivariate approach to study
morphological integration, two-block partial least squares
analysis, which turns out to be closely related to Wright’s
(1932) method.
How to Interpret Phenotypic Covariances?
How Does Integration Affect Evolution?
Olson and Miller (1958), like many other authors, interpreted high phenotypic correlations or covariances as evidence of developmental or functional integration between
Developmental integration due to heritable pleiotropic
factors as well as genetic linkage leads to joint inheritance
(genetic integration) of trait values. Directional selection of
Evol Biol (2012) 39:536–553
a trait A that is genetically correlated with a trait B will
induce an indirect response in trait B to the selection of
A (e.g., Lande 1979; Falconer and Mackay 1996). For
example, many developmental processes affect both forelimbs and hindlimbs. If individuals with long hindlimbs
would produce more offspring than those with short
hindlimbs, a larger fraction of the offspring will have
longer hindlimbs than of the parent generation and, because
of the joint inheritance, also longer forelimbs. Forelimb
length indirectly responds to the selection on hindlimb
length. Interpreting the evolutionary change of forelimb
length itself as an adaptation thus would be highly
If the indirectly affected trait B would be neutral with
respect to fitness, it might be permanently changed as an
indirect response to selection of A (Fig. 2a). If B would
itself be under stabilizing or conflicting directional selection, the genetic correlation between the two traits would
only affect short-term evolution (e.g., Schluter 1996).
Eventually, selection would compensate for the indirect
response in trait B, leading to a curved instead of a linear
‘‘evolutionary trajectory’’ (Fig. 1b)—genetic integration
would have no persisting evolutionary effect. If, for
instance, forelimb length affects some relevant function
and hence is under stabilizing selection, directional selection of the hindlimbs would initially modify average
forelimb length, but after some generations—depending on
the genetic correlation and the selection pressures—the
forelimbs would again assume their optimal length.
Note that the short-term response to selection is determined by the net genetic variances and covariances,
regardless of the underlying genotype-phenotype map or
the actual developmental integration (e.g., both genotypephenotype maps in Fig. 1a, b induce the same covariance
structure and hence lead to the same response to selection).
Models of long-term evolution (including the model in
Fig. 2) often are based on the idealized assumption that the
genetic covariance structure remains stable. But genetic
variances and covariances are modified both by directional
and stabilizing selection, and by the pattern of new variation and covariation produced by mutations, which in turn
is largely determined by the developmental system and the
genotype–phenotype map (Lande 1979, 1980; Cheverud
In some cases, when a trait is tightly integrated with
another trait that is under very strong stabilizing selection,
integration might prevent any evolutionary change (developmental constraint; Cheverud 1984; Maynard Smith et al.
1985). For example, Galis et al. (2006) explained the highly
conserved number of cervical vertebrae in mammals by the
deleterious side effects during development that a modification of the number of vertebrae would have. By contrast,
integration between functionally related traits that are
subject to the same selection regime can facilitate evolution
by channeling variation in an adaptive direction.
The response of a population to selection is determined
by genetic variance and covariance (quantified by the G
matrix), not by phenotypic (co)variance (the P matrix). By
contrast, most studies on morphological integration are
based on phenotypic covariances (but see, e.g., Leamy
1977; Cheverud 1982; Martinez-Abadias et al. 2009; in
press). There is a large and inconclusive body of literature
on the question of whether P is a useful substitute for G in
evolutionary models (e.g., Cheverud 1988, 1996a, b; Roff
1997; Marroig and Cheverud 2004). Reliable estimates of
genetic covariances require large-scale breeding experiments, which are not possible in anthropology and primatology; estimates based on collections of human bones with
Fig. 2 The ellipses represent the distribution of heritable phenotypic
variation (G matrices) for the two traits A, B, and the gray values
represent fitness. In a only trait A is under directional selection but
trait B indirectly responds because of the genetic correlation between
traits. In b trait A is under directional selection and B under stabilizing
selection (it is at the fitness optimum already). Trait B initially
responds to the selection of A, but later assumes its original value
Evol Biol (2012) 39:536–553
known genealogies (such as the Hallstatt collection) are
connected with large standard errors. In typical studies of
morphological integration, however, only the major factors
of covariance, which are reflected both by G and P, can be
reliably identified and interpreted.
and for larger evolutionary changes, involving non-linear
genotype-phenotype effects, constrained pleiotropy and
modular genotype phenotype maps are important for
increasing evolvability (Mitteroecker 2009; Pavlicev and
Hansen 2011).
Does Integration Evolve?
How to Estimate Factors of Developmental Integration?
Evolvability is the capacity for an adaptive response to
selection (e.g., Wagner and Altenberg 1996; Hansen and
Houle 2008). The evolvability of a trait is determined by
the amount of heritable phenotypic variance of this trait
and by the genetic covariance with other traits. If one of
two genetically correlated traits is under directional
selection, the other trait will indirectly respond to selection
(Fig. 2). If this other trait is under stabilizing selection, or
under directional selection in the opposite direction, the
indirect response would have negative effects on fitness.
Thus, functionally unrelated traits, which are subject to
different selection regimes, should be genetically uncorrelated (variational modular) in order to maximize evolvability. On the other hand, functionally related traits should
vary in a concerted way to increase evolvability (think
again on the elements of a joint or of the masticatory
One would thus expect that integration evolves to reflect
functional dependencies among traits; Riedl (1978) called
this the ‘‘imitatory epigenotype’’. The contemporary EvoDevo literature and some of the quantitative genetics literature use the term modularity instead: development, or
the genotype-phenotype map, should evolve so that most
genes mainly affect functionally related traits (a modular
genotype-phenotype map with restricted pleiotropy; Raff
1996; Wagner and Altenberg 1996; Mitteroecker 2009;
Wagner and Zhang 2011). However, empirical evidence
for the evolution of developmental integration is scarce.
Quantitative genetic models predict that genetic correlations evolve to reflect functional dependencies (Lande
1979, 1980; Cheverud 1996a, b; Arnold et al. 2008), but
models about the evolution of the underlying genotypephenotype map (developmental integration) are partly
contradictory (reviewed by Pavlicev and Hansen 2011).
It became clear, however, that variational modularity—
reduced genetic correlations between functionally unrelated traits—does not require a modular genotype-phenotype map. Multiple pleiotropic genetic factors can partly
cancel out so that genetic covariances are reduced (Fig. 1).
Hansen (2003) and Pavlicev and Hansen (2011) showed
that under most selection scenarios genotype-phenotype
maps with multiple overlapping pleiotropic factors even
lead to a higher evolvability than purely modular genotypephenotype maps because of the increased genetic variance.
However, for new mutations with large pleiotropic effects
In his 1932 paper ‘‘General, group and special size factors’’, Sewall Wright devised a method to estimate general
factors that account (in a least-squares sense) for all the
pairwise correlations between certain groups of variables.
In addition to general factors, Wright estimated group
factors that account for the residual correlations within
these groups of variables. He selected the actual groups of
variables by careful inspection of the residual correlations
after removing an initial estimate of a general factor. Based
on these groups of variables, he updated the general factor
(see also Bookstein 1991; Mitteroecker and Bookstein
2007). Wright arrived at a hierarchy of nested and overlapping general factors and group factors (e.g., Fig. 1c). He
did not interpret these factors as single genes but as the
‘‘entire array of factors, environmental as well as genetic,
which have a general effect on growth’’ (p. 605).
The hierarchy of general factors and group factors corresponds well to our usual biological explanations. General
factors (common factors in Mitteroecker and Bookstein
2007, 2008) reflect genetic factors with wide pleiotropic
ranges, such as genes expressed in different body parts or
genes with many downstream effects. They also reflect
epigenetic interactions of developmental processes, such as
tissue inductions or mechanical interactions, as well as
common environmental influences, linking the variation of
different tissues or body parts. These general or common
factors account for the joint variation—the integration—of
different morphological traits. Group factors or local factors, by contrast, reflect factors with more local effects on
growth. Notice that while these local factors are more
‘‘modular’’ than the general factors, the hierarchical and
overlapping group factors do not necessarily induce morphological modularity.
Wright applied his approach (sometimes referred to as
Wright-style factor analysis) only to a small number of
variables. For more variables, as they occur in modern
morphometrics, a visual inspection of correlations or
covariances is not possible. Furthermore, in geometric
morphometrics not all covariances can be large and positive, not even for isometric growth. Wright’s approach
cannot be completely extended to a modern multivariate
context, even though the algebra remains valid.
A series of recent papers on morphological integration
used another technique, two-block partial least squares
analysis (PLS), which was invented by Herman Wold in
1966 (for morphometric examples see Bookstein 1991;
Rohlf and Corti 2000; Bookstein et al. 2003; Gunz and
Harvati 2007; Mitteroecker and Bookstein 2008). Several
variants of this technique are used in multivariate biometrics and chemometrics; they are often referred to as
‘‘multivariate calibration’’ techniques (Martens and Naes
1989). For two groups or blocks of variables, the algorithm
seeks a linear combination for each block so that the
covariance between these two linear combinations is a
maximum. Further components can be extracted after
regressing or projecting out these linear combinations
separately from each block of variables. The high-dimensional pattern of covariances between the two blocks can
thus be represented by a small number of dimensions
(linear combinations). Several extensions of the PLS
algorithm to multiple blocks have been published. In
studies of morphological integration, the groups of variables usually are derived from some developmental or
functional models, and PLS is used to explore the multivariate pattern of covariance between these groups.
Even though Wright-style factor analysis and two-block
PLS originate from different statistical contexts, Mitteroecker and Bookstein (2007) demonstrated that both
techniques are numerically identical. Both are least-squares
estimates of between-block covariances or correlations, yet
differing in their typical applications. Wright inferred the
groups of variables from residual correlations, whereas
they are defined prior to the analysis in most PLS applications. When both techniques are applied to the same
groups of variables, the resulting path coefficients or
weightings for the linear combinations differ only by
scaling. In PLS, the weightings are standardized to unit
sum of squares, whereas in Wright’s approach they are
scaled to reflect how much of the pattern in one block
corresponds to how much of the pattern in the other block
(but both approaches give the same pattern). Mitteroecker
and Bookstein (2007) showed how the PLS scores can be
scaled in order to reflect this quantitative relationship as in
Wright-style factor analysis. Only after such a scaling can
PLS vectors (singular warps) be visualized within a single
shape configuration (for examples see Mitteroecker and
Bookstein 2008 and below).
The resemblance of PLS and Wright-style factor analysis allows for a biological interpretation of PLS. When
PLS is applied as an exploratory tool to represent the
multivariate pattern of covariance between two or more
blocks of variables, these blocks need not necessarily
represent developmental or variational modules; they may
just be selected because the corresponding anatomical units
serve different functions or have different evolutionary
histories. But the PLS loadings for all blocks, taken together and scaled accordingly, can be interpreted as a pleiotropic factor integrating these blocks. The choice of the
Evol Biol (2012) 39:536–553
blocks of variables and the selected sample determine how
well the estimated factors correspond to actual biological
models. Several PLS dimensions (pleiotropic factors) can
be extracted and removed from the data; variances and
covariances of the residuals are then due to local or modular developmental factors—group factors sensu Wright
(for more details see Bookstein 1991; Mitteroecker and
Bookstein 2007, 2008).
What Kinds of Samples do We Need to Study
Developmental Integration?
Developmental integration is best studied experimentally
or in longitudinal growth series. In the latter case, when
individuals are measured at multiple age stages, covariances can be computed for individual growth, i.e., for
individual differences between the age stages (see the
analysis below). But for technical, biological, and ethical
reasons, large ontogenetic samples of primates often are
cross-sectional (i.e., each individual is measured only once,
such as in museum collections) so that individual growth
and development cannot be assessed. Yet, patterns of
covariance in a cross-sectional sample comprising different
age stages do not necessarily reflect developmental integration. For example, two body parts such as the face and
the neurocranium both grow postnatally, but the way the
average facial growth coincides with the average neurocranial growth does not necessarily imply any causal relationship; if the face and the neurocranium were under
completely independent genetic and epigenetic control,
they would still be (spuriously) correlated across different
age groups.
In a cross-sectional sample, developmental integration
should be studied across individuals of the same developmental stage (see also below). Phenotypic covariances
within such a sample result from one or more common
developmental factors or tissue interactions, or from some
of the other sources described above. Covariances in a
sample of adult individuals reflect developmental processes
and interactions throughout the complete prenatal and
postnatal ontogeny (Hallgrimsson et al. 2007; Mitteroecker
and Bookstein 2009).
When a sample is comprised of multiple populations or
species that differ in average phenotype, overall covariances are dominated by the species differences. Hence
these covariances among populations not only depend on
developmental integration and linkage, but to a large extent
on the coevolution of traits due to joint selection, drift, and
gene flow, as well as on the phylogenetic relations among
the populations in the sample (Armbruster and Schwaegerle 1996). Developmental and genetic integration must
be assessed from within-population covariances. If it can
be expected that the populations have similar integration
Evol Biol (2012) 39:536–553
patterns and are not too different in mean shape (see
below), one can use pooled within-population covariances,
which are the covariances after subtracting from each
individual the corresponding population mean. To some
degree, the same argument also applies to different sexes
within a sample, so that sexual dimorphism might be
removed from the data by subtracting the corresponding
population-specific sex mean from each individual. The
ensuing integration pattern may then be compared to the
species mean differences or the between-species covariances (the covariances across the species means), i.e., to
the pattern of evolutionary integration.
As mentioned above, in a cross-sectional sample, integration should be studied across individuals of the same
developmental stage. But developmental stages often can
not clearly be identified or even be defined. Alternatively,
one may use specimens of the same age, or adult individuals (of any age). In such a sample, variability in developmental timing would considerably affect phenotypic
variances and covariances. Static allometry or ontogenetic
allometry based on the final age period likely captures most
of these shape differences and hence should be removed
from the data (e.g., by regressing or projecting out size
from the variables; Rohlf and Bookstein 1987).
How to Register Landmark Configurations in Studies
of Integration?
Geometric morphometric studies require the superimposition (registration) of landmark configurations in order to
remove variation in overall position, size, and orientation
(e.g., Rohlf and Slice 1990; Bookstein 1996; Mitteroecker
and Gunz 2009). When studying the integration between
two or more anatomical parts (sets of landmarks), the
landmark configurations can either be superimposed by a
single Procrustes registration, or each part can be superimposed separately. In the first case, relative size, position,
and orientation of the two parts are retained whereas they
are lost when superimposing the parts separately. A single
superimposition, however, induces covariances between
the parts that must not be biologically interpreted. For
example, the standardization of overall size during the
Procrustes superimposition induces a negative correlation
between the relative sizes of two parts: if one part increases
in relative size, the relative size of the other part necessarily decreases. The same applies to size corrections for
other kinds of measurements (e.g., linear distances).
Does the Number of Measurements Matter?
The number of measurements used to describe a structure
can be interpreted as a form of weighting of this part relative to other parts (Mitteroecker and Huttegger 2009;
Huttegger and Mitteroecker 2011): the more measurements
(e.g., linear distances, landmarks) per anatomical part, the
more influence has this part on multivariate statistical
parameters. For example, the covariance between two
singular warp scores (which is equal to the singular value)
depends on the number of landmarks (Mitteroecker and
Bookstein 2007). Likewise, most indices of overall integration (e.g., Pavlicev et al. 2009; Haber 2011) depend on
the number and spatial distribution of measurements.
However, when sufficiently many measurements are taken,
estimates of the pattern of integration (such as singular
warps or common factors) usually are unchanged by small
modifications of the number and position of measurements
(see also the analysis below). Likewise, estimating the
position of semilandmarks along curves or surfaces
(Bookstein 1997; Gunz et al. 2005) or the position of
completely missing landmarks (Gunz et al. 2009) increases
the covariance between sets of (semi)landmarks, but usually does not considerably affect the spatial pattern of
How to Compare Integration Across Populations
and Species?
Primates, like other groups of closely related species, share
the vast majority of genes, organs, bones, and muscles. The
physical and chemical conditions affecting development
are the same in these species. Environments and life styles
vary, but only within a limited range. Anatomical differences between related primates are mainly quantitative
(e.g., bones differ in size and shape across primates, but
basically all primates have the same bones). Developmental integration—the way changes in the development
of one trait affects the development of other traits—thus is
expected to be conserved across primates. Because of size
differences, however, patterns of variation (and thus also of
covariation) may differ considerably.
For the given reasons, it is important to separate differences in variance from differences in covariance when
comparing integration in multiple populations (see also
Mitteroecker and Bookstein 2008; Hallgrimsson et al.
2009). As an example, consider integration between the
length of the upper and the lower jaw in humans and
chimpanzees. Apparently, upper and lower jaws must be
tightly integrated in length to maintain dental occlusion; if
the length of the upper jaw increases 1 cm, the length of
the lower will also increase about 1 cm, both in humans
and in chimps. But because average jaw length in chimpanzees is larger than that in humans, jaw length is also
more variable in chimps. Thus—despite the same developmental and functional relationship—the covariance (and
usually also the correlation) between upper and lower jaws
is larger in chimpanzees than in humans. Differences in
covariance between two groups do not necessarily indicate
differences in developmental integration; a comparison of
regression slopes, for instance, would be more useful.
Consider further a data set comprised of many cranial
measurements on humans and chimpanzees. If applying
PLS to both species separately, the first PLS dimension
might capture integration between upper and lower jaws in
chimpanzees, as it dominates both variance and covariance.
In humans, however, where integration of the jaws contributes less to total variance and covariance, it might be
represented be the second or higher dimension, or might be
‘‘smeared’’ over multiple dimensions. But concluding that
integration differs across the two species, just because the
first PLS dimensions differ, would be misleading. Integration is the same, just variance differs. One way to avoid
this problem is to compare the statistical association
between the same two traits or between the same two linear
combinations of traits across different groups. For example,
the PLS axes might be computed from only one species or
from the pooled within-species distribution, and the scores
along these axes are compared across both species (see
Mitteroecker and Bookstein 2008 and the analysis below
for examples).
Another problem is that if average population phenotypes differ substantially, changes in the position of
homologous landmarks need not necessarily be comparable
across populations. For example, the foramen magnum is
approximately horizontally oriented and located below the
brain in humans, whereas it is almost vertical and posterior
to the brain in mice. The landmarks basion and opisthion
(the two borders of the foramen magnum in the midsagittal
plane) may be considered as biologically homologous in
both species, but an upward shift of these landmarks would
indicate a completely different process in humans than in
mice. In such a case, integration cannot be compared
quantitatively between the two species, but shape changes
and integration patterns can be compared qualitatively,
e.g., by visual comparison of deformation grids. Primates
are more similar than humans and mice, of course, but still
differ considerably in certain morphological aspects (e.g.,
prognathism, brow ridges, cranial crests). Comparative
analyses of integration should be carefully interpreted in
this regard.
Integration of Postnatal Cranial Growth
We appled the principles described above to study morphological integration between the cranial base and the
face during postnatal human development. The cranial base
and the face differ in developmental origin, mode of
ossification, and postnatal growth pattern (e.g., Lieberman
et al. 2000; Sperber 2001; Helms et al. 2005; Mitteroecker
Evol Biol (2012) 39:536–553
and Bookstein 2008), but they physically interact in the
course of development. A large number of studies focused
on the influence of the cranial base on facial form and
orientation during human development and evolution (e.g.,
Moss and Young 1960; Biegert 1963; Ross and Henneberg
1995; Enlow and Hans 1996; Bookstein et al. 2003; Bastir
and Rosas 2005, 2006; Lieberman et al. 2000; Bastir et al.
2010; Lieberman 2011). Most, if not all, of these studies
analyzed covariances and correlations in cross-sectional
samples or studied average growth patterns. In our first
analysis, we study integration of individual postnatal facial
growth using longitudinal data in order to investigate how
adult integration is generated during ontogeny. In the
second analysis, we compare morphological integration in
adult humans to that in adult chimpanzees.
Analysis 1: Longitudinal Growth
Our sample consists of 13 male and 13 female untreated
Caucasian individuals of the Denver Growth Study, a
longitudinal X-ray study carried out between 1931 and
1966. On a total of 500 lateral radiographs, covering the
age range from birth to early adulthood, 18 landmarks were
digitized by Ekaterina Stansfield. In the present study we
used three of these landmarks to represent the cranial base
(basion, sella, nasion) and further three landmarks to represent the maxilla (posterior nasal spine, nasospinale,
prosthion) (Fig. 3a; Bulygina 2003; Bulygina et al. 2006).
The landmark configurations were superimposed by a
Generalized Procrustes Analysis, standardizing for overall
size, position, and orientation of the configurations (Rohlf
and Slice 1990; Mitteroecker and Gunz 2009). We decided
for a single superimposition of all landmarks (instead of
two separate ones for the face and the cranial base; see
above) because position and orientation of the maxilla
relative to the cranial base determines facial size and is an
important aspect of cranial morphology. Because not all
individuals were radiographed at the same ages, we interpolated the shape coordinates for each individual by a local
linear regression. We removed sexual dimorphism by
subtracting from each configuration the age- and sex-specific average. The resulting shape coordinates of the
landmarks were used for further statistical analysis.
Because the data are longitudinal (the same 26 individuals were measured at different ages), we could study
morphological integration of growth itself. In other words,
we did not study covariation between the shape variables xt
at a given age t, but covariation between the shape differences xt?1 - xt. We started our analysis with the shape
differences between 2 and 4 years of age. We computed a
two-block PLS analysis for the age-related shape differences
between the six shape coordinates (three x- and three ycoordinates) of the cranial base and the six shape coordinates
Evol Biol (2012) 39:536–553
Fig. 3 a The landmarks on the cranial base (basion, sella, nasion) and
the upper jaw (posterior nasal spine, nasospinale, prosthion) used in
the geometric morphometric analysis. The landmarks are a subset of
the data used in Bulygina et al. (2006). b The first pair of singular
warps is visualized as a single shape deformation (in both directions);
it can be interpreted as a the first general or common factor
integrating cranial shape. c The second pair of singular warps (second
common factor)
of the maxilla, giving for each extracted dimensions one
singular vector for the cranial base and one singular vector
for the maxilla (the term singular vector is derived from the
actual computation, a singular value decomposition). These
vectors contain a weighting or loading for each variable, so
that the covariance between the linear combinations
(weighted sums) specified by these vectors is a maximum.
Because in geometric morphometrics the variables are shape
coordinates, the vectors can be represented as shape deformations and are called singular warps (Bookstein 1991;
Bookstein et al. 2003). The corresponding linear combinations are called singular warp scores; they can be interpreted
as coordinates along the singular vectors in shape space.
Like in a principal component analysis, multiple dimensions
(pairs of singular vectors) can be extracted, each one
orthogonal to all previous dimensions.
By convention, the singular vectors are unit vectors, i.e.,
the squared elements of each vector sum up to 1. They
represent the shape features with maximum covariance in
the sample, but they do not specify how much of one pattern relates to how much of the other pattern. This relationship can be estimated by a major axis regression
between the singular warp scores, which is equal to the
first principal component axis of these scores. When the
singular vectors are weighted by the corresponding loadings, they are equal to Wright’s general factors (Mitteroecker and Bookstein 2007).
The first pair of singular warps is visualized as a single
shape deformation in Fig. 3b and can be interpreted as the
first general or common factor integrating facial shape. A
relatively long maxilla with an anteriorly positioned prosthion and an inferiorly positioned posterior nasal spine is
associated with a flexed cranial base and a relatively short
clivus (sella-basion distance). Conversely, a less flexed
cranial base and an elongated clivus is associated with a
short upper jaw. Note that these shape changes affect the
relative width of the pharynx. Basically, this factor seems
to reflect a large face together with a long anterior cranial
base relative to the clivus, which is part of the middle
cranial fossa. The second common factor reflects the
mainly uniform shape differences between short and high
faces versus long and low faces: both the cranial base and
the jaw are affected in the same way by this pattern.
The two common factors account for 95.5 % of the
summed squared covariances between the variables of the
face and the cranial base; they are both statistically significant with P \ 0.01 (the covariances or singular values
significantly deviate from a permutation distribution; Rohlf
and Corti 2000; Mitteroecker and Bookstein 2008). The
two factors are uncorrelated in the growth period from 2 to
4 years (r = 0.04), and they also show very low correlation
for the other growth periods as well as for the cross-sectional samples. The common factors thus seem to represent
independent growth processes (but not necessarily two
single genes), even though average shape change from 2 to
4 years comprises a combination of the two common factors, both an increase of facial height and a relative
enlargement of the face and the anterior cranial base (see
Bulygina et al. 2006).
We estimated and visualized integration of growth from
2 to 4 years of age; the common factors of the subsequent
growth periods closely resemble to the ones presented in
Fig. 3 and hence are not shown. Figure 4a plots the
covariance between the first pair of singular warps
(between the shape features affected by common factor 1
as estimated from the growth from 2 to 4 years) for all oneyear growth intervals within the sampled age range (from 2
to 3 years, 3–4, 4–5, etc.). The covariance decreases
sharply and almost goes to zero at about 8 years of age; it
rises again during puberty and decreases thereafter. This
plot also shows the variance of common factor 1 during the
different growth intervals. Apparently, covariance decreases because variance decreases. The pleiotropic nature of
the common factor remains unchanged but it ceases to vary
across individual growth, probably because it stops contributing to development.
Instead of covariances of growth, we can also compute
the usual cross-sectional covariances (covariance across
individuals of the same age) between the shape features
affected by common factor 1. Interestingly, even though
growth of these shape features is integrated and the
underlying common factor varies during the first 6–8 years
of postnatal growth (Fig. 4a), cross-sectional variance and
covariance decrease (Fig. 4b). But how can the cross-sectional variance of the factor decrease, even though it varies
during growth? The answer is given in Fig. 5a: Growth (the
shape difference xt?1 - xt) is negatively correlated with
individual morphology (xt) till about 8 years of age. This
means that individuals with a high score for common factor
1 (relatively small face, long clivus) experience less than
average growth of these features (relative facial size
increases, relative clivus length decreases as compared to
the average), and individuals with a low score experience
more than average growth along this factor. Such a process
of variance reduction during growth has been termed targeted growth or developmental canalization (e.g., Waddington 1942; Tanner 1963; Debat and David 2001). From 2
to 8 years of age, cross-sectional variance of common
Evol Biol (2012) 39:536–553
factor 1 is halved; the individuals gain similar relative sizes
of the face and the clivus, including a similar relative
pharyngeal width.
The developmental dynamics of common factor 2 clearly
differ from the dynamics of the first factor. Cross-sectional
variance and covariance continuously increase (Fig. 4d)
even though variance and covariance of growth decrease
during the first years of age (Fig. 4c). Cross-sectional
(co)variance increases because growth along common factor 2 is uncorrelated with individual age-specific morphology—growth is not canalized (Fig. 5b). Variances and
covariances thus accumulate during ontogeny.
The increasing variance of both common factors during
early puberty reflects the massive variation in the onset of
puberty. After all individuals have reached puberty and
experienced a growth spurt, variance decreases again. But
such a reduction of variance usually is not interpreted as a
canalization process.
Analysis 2: Comparing Integration Between Humans
and Chimpanzees
In order to compare integration between the face and the
cranial base in humans to that in chimpanzees, we used CT
scans of 20 adult human individuals (10 males, 10 females
from different human populations) from the sample used in
Bookstein et al. (2003) and 22 adult chimpanzees (Pan
troglodytes, most specimens of unknown sex) from the
sample used in Neubauer et al. (2010). Additionally, we
included two fossils with a preserved cranial base from the
Bookstein et al. (2003) data set, the early modern human
skull Mladecˇ I from central Europe (Czech Republic) and
the South African Australopithecus africanus STS 5. The
cranial base is represented by the four landmarks basion,
dorsum sellae, sphenobasilare, and nasion, and the face by
the five landmarks posterior nasal spine, nasospinale,
prosthion, rhinion, and glabella (Fig. 6a). The assignment
of nasion to the cranial base instead of to the face (or to
both parts) is somewhat arbitrary, especially as glabella is
assigned to the face. However, the presented results do not
depend on these choices (see also the Discussion). The
landmark configurations were superimposed by a single
Procrustes registration.
After subtracting from each human individual the corresponding sex average and projecting out allometry, we
estimated common factors (scaled singular warps) from the
residual shape coordinates as described above. The first
common factor reflects variation in overall cranial length
relative to cranial height, whereas the second common
factor represents the size of the face and the anterior cranial
base relative to the length of the clivus (Fig. 6b, c). These
two factors clearly resemble the factors estimated from the
longitudinal sample above (Fig. 3). They account for 93 %
Evol Biol (2012) 39:536–553
Fig. 4 a Covariance between the first pair of singular warp scores
(covariance between face and cranial base due to common factor 1)
during one-year growth periods (individual shape differences between 2
and 3 years, 3 and 4 years, etc.), shown as a black dashed line. The red
line indicates the variance of common factor 1 in these growth intervals
(note the different scale at the right vertical axis). b Cross-sectional
variance (red line) and covariance (black dashed line) for common
factor 1 across individuals of the same age. c Variance and covariance
for common factor 2 during one-year growth periods. d Cross-sectional
variance and covariance for common factor 2 (Color figure online)
Fig. 5 Correlation of individual shape changes during one-year
growth intervals with the morphology before growth ðCorðxtþ1 xt ; xt ÞÞ plotted against age (t) for both common factors. A negative
correlation indicates targeted growth or canalization, because individuals with a low score for this factor grow more than the average
and individuals with a high score grow less
Evol Biol (2012) 39:536–553
Fig. 6 a Midsagittal landmarks on the cranial base (basion, dorsum
sellae, nasion, sphenobasilare) and the face (posterior nasal spine,
nasospinale, prosthion, rhinion, glabella) measured on CT scans of
adult humans, chimpanzees, and two fossils. The first two singular
warps estimated from the human sample are visualized as common
factors in (b) and (c). They clearly resemble the common factors in
Fig. 3, just the order is reversed
of summed squared covariances in the cross-sectional
human sample.
Using these common factor estimates from the human
sample, we computed scores along the factors (singular
warp scores) both for humans and chimps. The scores
were computed separately for the face and the cranial
base. Note that we use the original human sample here,
not the one corrected for sex and allometry, in order to
investigate if average sex and species differences follow
the integration pattern. Figure 7b shows that—despite
apparent mean shape differences—integration due to
common factor 1 is similar in humans and in chimps: the
point clouds are similarly oriented, i.e., one unit of shape
change in the cranial base is associated with a similar
amount of facial shape change in both species. Yet, variance (and thus also covariance) along this factor is much
larger in humans than in chimpanzees. Mladecˇ I falls
within the modern human distribution, whereas STS 5
clusters with the chimpanzees. The situation is different
for common factor 2: In contrast to humans, chimpanzees
are not integrated along the second common factor; both
fossils are closer to the human than to the chimpanzee
distribution. Human males and females completely overlap for both factors.
Olson and Miller coined the term ‘‘morphological integration’’ with their 1958 book and raised a broader interest
for this topic in the paleontological and biological communities. Yet, they never defined morphological integration in their book (as pointed out, e.g., by Chernoff and
Magwene 1999). As Olson and Miller, many subsequent
authors referred to morphological integration either as the
underlying developmental and functional causes or as the
statistical pattern of phenotypic variances and covariances.
We showed that net phenotypic covariances do not directly
reflect the underlying developmental and genetic factors of
integration, most importantly because covariances depend
on the variance of pleiotropic factors in a sample, and
multiple pleiotropic factors with opposite effects may
partly cancel out. Like Cheverud (1996a, b), we thus used
the term integration to denote the biological processes and
properties leading to phenotypic covariance. The definition
of integration by Hallgrimsson et al. (2009) as the ability to
covary, rather than actual covariance, closely reflects this
The use of covariances to describe the relationship
between phenotypic traits originates from early biometrics
Evol Biol (2012) 39:536–553
Fig. 7 Scores along the first two common factors, computed
separately for the face and the cranial base (first two pairs of singular
warp scores). Chimpanzees are shown as filled gray circles, male and
female humans as filled and empty black circles, respectively. Along
common factor 1, humans and chimpanzees are similarly integrated
but differ in variance (a), whereas chimpanzees are not as integrated
along common factor 2 as humans are (b)
and from their role in predicting response to selection. But
actual scientific models usually are not based on covariances, but on regressions (e.g., Bookstein in press).
Regression quantifies the average effect of one variable
onto another—how a change in one trait would alter another
trait in the course of development or evolution—reflecting
the typical reasoning in biology. Multivariate factors, such
as Wright’s general factors or singular warps, can be
interpreted as regressions of the variables on the (unmeasured) factor score—they are models of how an underlying
growth factor affects phenotypic traits (the path coefficients
in Fig. 1). When comparing integration between humans
and chimps in Fig. 7, we interpreted similar regression
slopes between the singular warp scores as an indication of
similar integration, regardless of the different covariances.
Regressions and factor models describe the ability or propensity of phenotypic traits to covary; the induced covariance depends on the sample variance.
We reviewed different forms of integration (developmental integration, genetic integration, environmental integration) along with multiple other sources of phenotypic
covariances, such as geometric and spatial dependences
between the measurements. Developmental integration is
the result of a very large number of developmental processes. The effect of single developmental factors can only
be identified experimentally (see, e.g., the work by Hallgrimsson et al. on mice; Hallgrimsson et al. 2004, 2006,
2009). Modern imaging techniques allow for the threedimensional measurement of morphological features even
during organogenesis and fetal development (e.g., Metscher
2009; Metscher and Mu¨ller 2011). Based on morphometrics
alone, it is impossible to specify how many genes or growth
processes actually contribute to an estimated common factor
and the induced phenotypic covariance, but it is a wellknown phenomenon that the vast array of genetic variation is
‘‘funneled’’ into a smaller set of pathways, which in turn
influence a smaller set of developmental processes (Hallgrimsson and Lieberman 2008). Furthermore, it has been
argued that structural and organizational integration at the
morphological level determines the identity and homology
of anatomical elements, regardless of the complex underlying genetic and developmental networks (Mu¨ller and
Newman 1999; Mu¨ller 2003). Careful studies of morphological integration can model these few mediating processes
and can compare them across age groups and across different species. The heritable (additive genetic) part of the
induced phenotypic variances and covariances is sufficient
to predict short-term response to selection. Because experimental approaches are not possible in humans and primates,
studies of morphological integration are a good alternative
to investigate developmental integration in these taxa.
Cranial Integration in Humans and Chimpanzees
We studied morphological integration between the cranial
base and the face during postnatal human growth using 26
individuals of the longitudinal Denver Growth Study. This
is a relatively small sample for reliably estimating variances and covariances and it is probably no representative
random sample, but we could still estimate and interpret
two common factors (general factors in Wright’s terminology, or pleiotropic factors in the quantitative genetic
language) underlying the integration between the cranial
base and the face. These factors were estimated from
individual growth between 2 and 4 years of age, but the
same two factors were also detectable from the other
growth periods and even from the cross-sectional subsamples. In the second analysis, the same two factors could
be estimated from a different and very heterogenous sample of 20 adult humans, even though number and position
of the landmarks differed slightly across the two data sets.
The first common factor reflects a large face together
with a long anterior cranial base (the ‘‘roof’’ of the face)
relative to the clivus (which is part of the middle cranial
fossa), thus also determining the relative width of the
pharynx. The face is under different developmental control
than the brain and the basicranium, but facial length and
the length of the anterior cranial base need to fit (‘‘growth
counterparts’’; Enlow and Hans 1996)—they are developmentally integrated. The anterior cranial base elongates in
concert with the frontal lobes of the brain, reaching
approximately 95 % of its adult length by the end of the
neural growth period, but the more inferior portions of the
anterior cranial base continue to grow as part of the face
after the neural growth phase, forming the ethmomaxillary
complex (Lieberman et al. 2000). This common factor is of
apparent functional relevance because of its effect on jaw
size and on pharyngeal width. This is likely the reason why
common factor 1 is canalized during postnatal ontogeny:
the sample variance of common factor 1 is halved within
six years of postnatal development.
The second common factor is a standard finding in
cephalometrics: Relative height and length of anatomical
elements are integrated within the cranium (dolichocephalic versus brachycephalic crania), where shorter faces
are associated with a more flexed cranial base than longer
faces (e.g., Enlow and Hans 1996; Bookstein et al. 2003;
Bastir and Rosas 2004; Mitteroecker and Bookstein 2008).
Growth processes are integrated in this way throughout full
postnatal development and the cross-sectional covariance
between these shape aspects increases during ontogeny.
This factor seems to be less developmentally canalized
than common factor 1, probably because it is of no obvious
functional relevance.
Using longitudinal data, we could study both crosssectional integration and integration of growth itself, i.e.,
how cross-sectional variance and covariance is actually
generated. The variance of common factor 1 during growth
is much higher than the variance of common factor 2, and
likewise, common factor 1 has a higher cross-sectional
variance at age 2 than the second factor (cross-sectional
variances of about 0.4 versus 0.3; Fig. 4). But owing to the
different developmental dynamics, the situation is reversed
at age 16: Common factor 2 is three times more variable
than common factor 1 in the cross-sectional sample of
16-year-olds. It is a common finding in cephalometrics that
the ratio of overall length to height (our common factor 2)
Evol Biol (2012) 39:536–553
is the most dominant pattern of variance and covariance
apart from allometry (e.g., Bookstein et al. 2003; Bastir
and Rosas 2004; Mitteroecker and Bookstein 2008).
In our second analysis, we compared integration
between the face and the cranial base in cross-sectional
samples of adult humans and chimpanzees. Basically the
same two common factors result from this cross-sectional
human sample as from the longitudinal X-ray sample (just
in reversed order). Both humans and chimpanzees are
similarly integrated regarding the overall length to height
ratio of the face and the cranial base, but differ in the
integration between facial size and anterior cranial base
length. In humans, the face is positioned below the anterior
cranial base and hence both parts are developmentally
integrated. In chimpanzees, large parts of the face are more
anteriorly positioned than the brain case, so that facial size
and the length of the anterior cranial base are less integrated and almost uncorrelated in our sample. This is
probably the reason why the average species difference
between humans and chimpanzees—the evolutionary
integration—along this common factor does not resemble
the human integration pattern (Fig. 7b). Evolutionary
integration of length to hight ratios, by contrast, more
closely resembles the common pattern of developmental
integration in both species (Fig. 7a).
As we set out in the beginning, these insights are not
derived from a formal mathematical model, nor are they
based on biological experiments. They are based on the
consilience of multiple lines of evidence: spatial statistical
patterns (deformation grids), temporal statistical patterns
(ontogenetic dynamics of variance and covariance), and
qualitative biological models (both of development and
The Palimpsest
Most studies on integration and evolutionary quantitative
genetics assess covariances in cross-sectional samples of
adult individuals. It has been shown that cross-sectional
variances and covariances continually change during
development (e.g., Zelditch et al. 2006; Hallgrimsson et al.
2007, 2009; Mitteroecker and Bookstein 2009). Hallgrimsson et al. (2007) used the metaphor of a medieval
palimpsest to described the ontogeny of the adult covariance structure: Much like a reused scroll on which the
shadows of the various texts accumulate over time, ‘‘the
covariation structure of an adult skull represents the summed imprint of a succession of effects, each of which
leaves a distinctive covariation signal determined by the
specific set of developmental interactions involved’’ (p.
164). In our analysis we were able to show how variation in
growth at different age stages contributes to the final pattern. We can thus add a further piece to the palimpsest
Evol Biol (2012) 39:536–553
metaphor: Variation of growth processes can accumulate
during ontogeny, whereas other growth processes are
canalized so that cross-sectional variances and covariances
decrease. Some text fragments on the palimpsest accumulate over time, whereas others get (partly) erased in the
course of ontogeny. Many, but not all growth processes
varying in a population might be reflected in the adult
covariance structure.
Acknowledgments We thank Mihaela Pavlicev and Fred Bookstein
for stimulating discussions and helpful comments on the manuscript.
We are grateful to Ekaterina Stansfield for loaning us the digitized
Denver growth study data.
Armbruster, W. S., & Schwaegerle, K. E. (1996). Causes of
covariation of phenotypic traits among populations. Journal of
Evolutionary Biology, 9, 261–276.
Arnold, S. J., Bu¨rger, R., Holenhole, P. A., Beverly, C. A., & Jones,
A. G. (2008). Understanding the evolution and stability of the
G-matrix. Evolution, 62, 2451–2461.
Arnold, S. J., Pfrender, M. E., & Jones, A. (2001). The adaptive
landscape as a conceptual bridge between micro- and macroevolution. Genetica, 112-113, 9–32.
Arthur, W. (2002). The emerging conceptual framework of evolutionary developmental biology. Nature, 415(14), 757–764.
Bastir, M., & Rosas, A. (2004). Facial heights: Evolutionary
relevance of postnatal ontogeny for facial orientation and skull
morphology in humans and chimpanzees. American Journal of
Physical Anthropology, 47, 359–381.
Bastir, M., & Rosas, A. (2005). Hierarchical nature of morphological
integration and modularity in the human posterior face. American Journal of Physical Anthropology, 128(1), 26–34.
Bastir, M., & Rosas, A. (2006). Correlated variation between the
lateral basicranium and the face: A geometric morphometric
study in different human groups. Archives of Oral Biology, 51,
Bastir, M., Rosas, A., Stringer, C., Manuel Cue´tara, J., Kruszynski,
R., Weber, G. W., et al. (2010). Effects of brain and facial size
on basicranial form in human and primate evolution. Journal of
Human Evolution, 58(5), 424–431.
Berg, R. L. (1960). The ecological significance of correlation
pleiades. Evolution, 14, 171–180.
Bogin, B. (1999). Patterns of human growth. Cambridge: Cambridge
University Press.
Bonner, J. T. (1988). The evolution of complexity by means of natural
selection. Princeton, NJ: Princeton University Press.
Bookstein, F. (1991). Morphometric tools for landmark data:
Geometry and biology. Cambridge, UK: Cambridge University
Bookstein, F. (1996). Biometrics, biomathematics and the morphometric synthesis. Bulletin of Mathematical Biology, 58(2),
Bookstein, F. (1997). Landmark methods for forms without landmarks: Morphometrics of group differences in outline shape.
Medical Image Analysis, 1(3), 225–243.
Bookstein, F. L. (in press). Reasoning and measuring: Numerical
inferences in the sciences. Cambridge: Cambridge University
Bookstein, F. L., Gunz, P., Mitteroecker, P., Prossinger, H., Schaefer,
K., & Seidler, H. (2003). Cranial integration in Homo: Singular
warps analysis of the midsagittal plane in ontogeny and
evolution. Journal of Human Evolution, 44(2), 167–187.
Bulygina, E., Mitteroecker, P., & Aiello, L. C. (2006). Ontogeny of
facial dimorphism and patterns of individual development within
one human population. American Journal of Physical Anthropology, 131(3), 432–443.
Chernoff, B., & Magwene, P. M. (1999). Morphological integration:
Forty years later. In: Morphological integration, (pp. 319–354).
Chicago: University of Chicago Press.
Cheverud, J. M. (1982). Phenotypic, genetic, and environmental
morphological integration in the cranium. Evolution, 36,
Cheverud, J. M. (1984). Quantitative genetic and developmental
constraints on evolution by selection. Journal of Theoretical
Biology, 110, 155–171.
Cheverud, J. M. (1988). A comparison of genetic and phenotypic
correlations. Evolution, 42(5), 958–968.
Cheverud, J. M. (1989). A comparative analysis of morphological
variation patterns in papionins. Evolution, 43, 1737–1747.
Cheverud, J. M. (1996a). Developmental integration and the evolution
of pleiotropy. American Zoologist, 36, 44–50.
Cheverud, J. M. (1996b). Quantitative genetic analysis of cranial
morphology in the cotton-top (Saguinus oedipus) and saddleback (S. fuscicollis) tamarins. Journal of Evolutionary Biology,
9, 5–42.
Cheverud, J. M., Wagner, G. P., & Dow, M. M. (1989). Methods for
the comparative analysis of variation patterns. Systematic
Zoology, 38, 201–213.
Clausen, J., & Hiesey, W. M. (1960). The balance between coherence
and variation in evolution. PNAS, 46(4), 494–506.
Debat, V., & David, P. (2001). Mapping phenotypes: Canalization,
plasticity and developmental stability. Trends in Ecology &
Evolution, 16(10), 555–561.
Enlow, D., & Hans, M. (1996). Essentials of facial growth.
Philadelphia, PA: Saunders Company.
Falconer, D. S., & Mackay, T. F. C. (1996). Introduction to
quantitative genetics. Essex: Longman.
Fisher, R. A. (1930). The genetical theory of natural selection.
Oxford: Clarendon.
Galis, F., Van Dooren, T. J., Feuth, J. D., Metz, J. A., Witkam, A., &
Ruinard, S., et al. (2006). Extreme selection in humans against
homeotic transformations of cervical vertebrae. Evolution,
60(12), 2643–2654.
Gromko, M. H. (1995). Unpredictability of correlated response to
selection: Pleiotropy and sampling interact. Evolution, 49,
Gould, S. J. (1977). Ontogeny and phylogeny. Cambridge: Harvard
University Press.
` chignonO
Gunz, P., & Harvati, K. (2007). The Neanderthal O
Variation, integration, and homology. Journal of Human Evolution, 52(3), 262–274.
Gunz, P., Mitteroecker, P., & Bookstein, F. L. (2005). Semilandmarks
in three dimensions. In: D. E. Slice (Ed.), Modern morphometrics in physical anthropology (pp. 73–98). New York: Kluwer
Gunz, P., Mitteroecker, P., Neubauer, S., Weber, G. W., & Bookstein,
F. L. (2009). Principles for the virtual reconstruction of hominin
crania. Journal of Human Evolution, 57(1), 48–62.
Haber, A. (2011). A Comparative Analysis of Integration Indices.
Evolutionary Biology, 38, 476–488.
Hallgrimsson, B., Brown, J. J., Ford-Hutchinson, A. F., Sheets, H. D.,
Zelditch, M. L., & Jirik, F. R. (2006). The brachymorph mouse
and the developmental-genetic basis for canalization and morphological integration. Evolution & Development, 8(1), 61–73.
Hallgrimsson, B., Dorval, C. J., Zelditch, M. L., & German, R. Z.
(2004). Craniofacial variability and morphological integration in
mice susceptible to cleft lip and palate. Journal of Anatomy,
205(6), 501–517.
Hallgrimsson, B., Jamniczky, H., Young, N. M., Rolian, C., Parson,
T. E., Boughner, J. C., et al. (2009). Deciphering the palimpsest:
Studying the relationship between morphological integration and
phenotypic covariation. Evolutionary Biology, 36(4), 355–376.
Hallgrimsson, B., & Lieberman, D. E. (2008). Mouse models and the
evolutionary developmental biology of the skull. Integrative and
Comparative Biology, 48, 373–384.
Hallgrimsson, B., Lieberman, D. E., Young, N. M., Parsons, T., &
Wat, S. (2007). Evolution of covariance in the mammalian skull.
Novartis Found Symp 284 (Tinkering—The Microevolution of
Development), 284, 164–190.
Hansen, T. F. (2003). Is modularity necessary for evolvability?
Remarks on the relationship between pleiotropy and evolvability. Biosystems, 69(2–3), 83–94.
Hansen, T. F., & Houle, D. (2008). Measuring and comparing
evolvability and constraint in multivariate characters. Journal of
Evolutionary Biology, 21(5), 1201–1219.
Helms, J. A., Cordero, D., & Tapadia, M. D. (2005). New insights
into craniofacial morphogenesis. Development, 132(5), 851–861.
Hodgkin, J. (1998). Seven types of pleiotropy. The International
Journal of Developmental Biology, 42(3), 501–505.
Houle, D. (1991). Genetic covariance of fitness correlates: What
genetic correlations are made of and why it matters. Evolution,
45, 630–648.
Huttegger, S., & Mitteroecker, P. (2011). Invariance and meaningfulness in phenotype spaces. Evolutionary Biolog, 38, 335–352.
Huxley, J. S. (1932). Problems of relative growth. London: Methuen
and Co.
Klingenberg, C. P. (2008). Morphological Integration and Developmental Modularity. Annual Review of Ecology, Evolution and
Systematics, 39, 115–132.
Klingenberg, C. P. (1998). Heterochrony and allometry: The analysis of
evolutionary change in ontogeny. Biological Reviews, 73, 70–123.
Klingenberg, C. P., Mebus, K., & Auffray, J. C. (2003). Developmental integration in a complex morphological structure: how
distinct are the modules in the mouse mandible? Evolution &
Development, 5(5), 522–531.
Klingenberg, C. P., & Zaklan, S. D. (2000). Morphological integration between developmental compartments in the Drosophila
wing. Evolution, 54(4), 1273–1285.
Lande, R. (1979). Quantitative genetic analysis of multivariate
evolution, applied to brain: Body size allometry. Evolution, 33,
Lande, R. (1980). The genetic covariance between characters
maintained by pleiotropic mutations. Genetics, 94, 203–215.
Lande, R. (1984). The genetic correlation between characters maintained by selection, linkage and inbreeding. Genetical Research,
44, 309–320.
Lieberman, D. E. (2011). The evolution of the human head.
Cambridge, MA: Belknap Press/Harvard University Press.
Lieberman, D. E., Ross, C., & Ravosa, M. J. (2000). The primate
cranial base: Ontogeny, function, and integration. Yearbook of
Physical Anthropology, 43, 117–169.
Leamy, L. (1977). Genetic and Environmental Correlations of
Morphometric Traits in Randombred House Mice. Evolution,
31(2), 357–369.
Lynch, M., & Walsh, B. (1998). Genetics and analysis of quantitative
traits. Sunderland, MA: Sinauer Associates.
Marroig, G., & Cheverud, J. M. (2004). Did natural selection or
genetic drift produce the cranial diversification of neotropical
monkeys? American Naturalist, 163(3), 417–428.
Martens, H., & Naes, T. (1989). Multivariate calibration. Chichester:
Evol Biol (2012) 39:536–553
Martinez-Abadias, N., Esparza, M., Sjovold, T., Gonzalez-Jose, R.,
Santos, M., & Hernandez, M. (2009). Heritability of human
cranial dimensions: Comparing the evolvability of different
cranial regions. Journal of Anatomy, 214(1), 19–35.
˘ lez-JosO,
} R.,
MartSˇnez-AbadSˇas, N., Esparza, M., Sj£vold, T., GonzG
˘ ndez, M., et al. (in press). Pervasive genetic
Santos, M., HernG
integration directs the evolution of human skull shape.
Maynard Smith, J., Burian, R., Kauffman, S., Alberch, P., Campbell,
J., Goodwin, B., et al. (1985). Developmental constraints and
evolution: A perspective from the mountain lake conference on
development and evolution. The Quarterly Review of Biology,
60(3), 265–287.
Metscher, B. D. (2009). MicroCT for developmental biology: A
versatile tool for high-contrast 3D imaging at histological
resolutions. Developmental Dynamics, 238(3), 632–640.
Metscher, B. D., & Mu¨ller, G. B. (2011). MicroCT for Molecular
Imaging: Quantitative Visualization of Complete Three-Dimensional Distributions of Gene Products in Embryonic Limbs.
Developmental Dynamics, 240, 2301–2308.
Mitteroecker, P. (2009). The developmental basis of variational
modularity: Insights from quantitative genetics, morphometrics,
and developmental biology. Evolutionary Biology, 36, 377–385.
Mitteroecker, P., & Bookstein, F. (2009). The ontogenetic trajectory
of the phenotypic covariance matrix, with examples from
craniofacial shape in rats and humans. Evolution, 63(3),
Mitteroecker, P., & Bookstein, F. L. (2007). The conceptual and
statistical relationship between modularity and morphological
integration. Systematic Biology, 56(5), 818–836.
Mitteroecker, P., & Bookstein, F. L. (2008). The evolutionary role of
modularity and integration in the hominoid cranium. Evolution,
62(4), 943–958.
Mitteroecker, P., & Gunz, P. (2009). Advances in geometric
morphometrics. Evolutionary Biology, 36, 235–247.
Mitteroecker, P., & Huttegger, S. (2009). The concept of morphospaces in evolutionary and developmental biology: Mathematics
and metaphors. Biological Theory, 4(1), 54–67.
Monteiro, L. R., Bonato, V., & Reisb, S. F. (2005). Evolutionary
integration and morphological diversification in complex morphological structures: Mandible shape divergence in spiny rats
(Rodentia, Echimyidae). Evolution & Development, 7(5), 429–439.
Mu¨ller, G. B. (2003). Homology: The evolution of morphological
organization. In: G. B. M§ller, & S. A. Newman (Eds.),
Origination of organismal form: Beyond the gene in developmental and evolutionary biology. Cambridge, MA: MIT Press.
Mu¨ller, G. B. (2007). Evo-devo: extending the evolutionary synthesis.
Nature Reviews Genetics, 8(12), 943–949.
Mu¨ller, G. B., & Newman, S. A. (1999). Generation, integration,
autonomy: Three steps in the evolution of homology. Novartis
Foundation Symposium, 222, 65–73.
Needham, J. (1933). On the dissociability of the fundamental
processes in ontogenesis. Biological Reviews, 8, 180–223.
Neubauer, S., Gunz, P., & Hublin, J. -J. (2010). Endocranial shape
changes during growth in chimpanzees and humans: A morphometric analysis of unique and shared aspects. Journal of Human
Evolution, 59, 555–566.
Olson, E. C., & Miller, R. L. (1958). Morphological Integration.
Chicago: University of Chicago Press.
Pavlicev, M., & Hansen, T. F. (2011). Genotype-phenotype maps
maximizing evolvability: Modularity revisited. Evolutionary
Biology, 38(4), 371–389.
Pavlicev, M., Wagner, G., & Cheverud, J. M. (2009). Measuring
evolutionary constraints through the dimensionality of the
phenotype: Adjusted bootstrap method to estimate rank of
Evol Biol (2012) 39:536–553
phenotypic covariance matrices. Evolutionary Biology, 36,
Pigliucci, M. (2006). Genetic variance-covariance matrices: A
critique of the evolutionary quantitative genetics research
program. Biology and Philosophy, 21, 1–23.
Pigliucci, M., & Preston, K. (eds.) (2004). Phenotypic integration:
Studying the ecology and evolution of complex phenotypes.
Oxford: Oxford University Press.
Raff, R. (1996). The shape of life: Genes, development, and the
evolution of animal form. Chicago: Univeristy of Chicago Press.
Riedl, R. J. (1978). Order in Living Organisms. New York: John
Wiley and Sons.
Roff, D. A. (1997). Evolutionary quantitative genetics. New York:
Chapman & Hall.
Rohlf, F. J., & Bookstein, F. (1987). A comment on shearing as a
method for ‘‘size correction’’. Systematic Zoology, 36, 356–367.
Rohlf, F. J., & Corti, M. (2000). The use of two-block partial leastsquares to study covariation in shape. Systematic Biology, 49,
Rohlf, F. J., & Slice, D. E. (1990). Extensions of the Procrustes
method for the optimal superimposition of landmarks. Systematic Zoology, 39, 40-59.
Ross, C., & Henneberg, M. (1995). Basicranial flexion, relative brain
size and facial kyphosis in Homo sapiens and some fossil
hominids. American Journal of Physical Anthropology, 98,
Sawin, P. B., Fox, R. R., & Latimer, H. B. (1970). Morphogenetic
studies of the rabbit XLI. Gradients of correlation in the
architecture of morphology. American Journal of Anatomy,
128(2), 137–145.
Schluter, D. (1996). Adaptive radiation along genetic lines of least
resistance. Evolution, 50(5), 1766–1174.
Sinervo, B., & Svensson, E. (2002). Correlational selection and the
evolution of genomic architecture. Heredity, 89, 329–338.
Sperber G. H. (2001). Craniofacial development. Ontario: BC Decker
Stadler, P. F., & Stadler, B. M. R. (2006). Genotype-phenotype maps.
Biological Theory, 1(3), 268–279.
Tanner, J. M. (1963). Regulation of Growth in Size in Mammals.
Nature, 199, 845–850.
Terentjev, P. V. (1931). Biometrische Untersuchungen u¨ber die
morphologischen Merkmale von Rana ridibunda Pall. (Amphibia, Salientia). Biometrika, 23, 23–51.
Thompson, D. A. W. (1917). On growth and form. Cambridge:
Cambridge University Press.
Waddington, C. H. (1942). The canalization of development and the
inheritance of acquired characters. Nature, 150, 563.
Wagner, G., & Zhang, J. (2011). The pleiotropic structure of the
genotype-phenotype map: The evolvability of complex organisms. Nature Reviews Genetics, 12, 204–213.
Wagner, G. P. (2000). What Is the Promise of Developmental
Evolution? Part I: Why Is Developmental Biology Necessary to
Explain Evolutionary Innovations? Journal of Experimental
Zoology. Part B, Molecular and Developmental Evolution, 288,
Wagner, G. P., & Altenberg, L. (1996). Complex adaptations and the
evolution of evolvability. Evolution, 50(3), 967–976.
Wagner, G. P., Pavlicev, M., & Cheverud, J. M. (2007). The road to
modularit. Nature Reviews Genetics, 8, 921–931.
Wright, S. (1932). General, group and special size factors. Genetics,
15, 603–619.
Zelditch, M. L. (1987). Evaluating models of developmental integration in the laboratory rat using confirmatory factor analysis.
Systematic Zoology, 36, 368–380.
Zelditch, M. L. (1988). Ontogenetic variation in patterns of phenotypic integration in the laboratory rat. Evolution, 42(1), 28–41.
Zelditch, M. L., Mezey, J. G., Sheets, H. D., Lundrigan, B. L., &
Garland, T. (2006). Developmental regulation of skull morphology II: Ontogenetic dynamics of covariance. Evolution &
Devlopment, 8, 46–60.