How to Generate Reliable and Predictive CoMFA Models L. Zhang

Current Medicinal Chemistry, 2011, 18, ????-????
How to Generate Reliable and Predictive CoMFA Models
L. Zhang1, K.-C. Tsai2, L. Du1, H. Fang1, M. Li*,1 and W. Xu*,1
Department of Medicinal Chemistry, School of Pharmacy, Shandong University, Jinan, Shandong 250012, China
The Genomics Research Center, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 115, Taiwan
Abstract: Comparative Molecular Field Analysis (CoMFA) is a mainstream and down-to-earth 3D QSAR technique in the coverage of
drug discovery and development. Even though CoMFA is remarkable for high predictive capacity, the intrinsic data-dependent characteristic still makes this methodology certainly be handicapped by noise. It's well known that the default settings in CoMFA can bring about
predictive QSAR models, in the meanwhile optimized parameters was proven to provide more predictive results. Accordingly, so far numerous endeavors have been accomplished to ameliorate the CoMFA model’s robustness and predictive accuracy by considering various
factors, including molecular conformation and alignment, field descriptors and grid spacing. Herein, we would like to make a comprehensive survey of the conceivable descriptors and their contribution to the CoMFA model’s predictive ability.
Keywords: CoMFA, conformation, alignment, fields, grid spacing.
Quantitative structure-activity relationship (QSAR) studies
generally perform a crucial role in drug discovery and design as a
ligand-based approach [1]. Such approaches are explicitly judgmental to provide not only the reliable prediction of specific properties of new compounds, but also the help to elucidate the possible
molecular mechanism of the receptor-ligand interactions, in case
that the experimental NMR or crystal structure of the target protein
is unavailable [2]. As one of the most sought-after QSAR methods,
Comparative Molecular Field Analysis (CoMFA) recruits interactive graphics and statistical techniques for correlating several molecular features, such as steric and electrostatic properties with their
biological activities [3]. It needs to be noted that over the past few
decades, CoMFA has become tremendously prevalent in regions of
both industrial and academic research regarding QSAR studies. A
Scifinder Scholar survey with the term ‘CoMFA’ indicated about
160 publications in 2009. This result compares with only about 50
publications in 1995 (Fig. 1). These CoMFA applications in drug
design have been comprehensively summarized in several excellent
book chapters as well as review articles [4-6].
terms of Lennard-Jones and Coulombic potentials, respectively.
The steric and electrostatic potential energies are calculated by a
probe atom, located at each vertex of a spaced lattice, in which a
series of molecules are embedded. The performance of the standard
CoMFA procedure requires the specification of both conformations
and alignments of molecules. Despite their popularity, CoMFA
analyses can be highly sensitive to the parameters used in QSAR
modeling, including the setting of steric fields, molecular alignments, and grid spacing and dimensions. Because of these variances, there have been interests in finding ways to enhance QSAR
quality and in building robust CoMFA models.
The common steps for CoMFA modeling (Fig. 2) include:
1. Active molecules are placed in a three-dimensional grid (2Å spacing) encompassing all of the molecules.
2. At each grid point, steric energy (Lennard-Jones potential)
and electrostatic energy are measured for each molecule by
a probe atom (sp3-hybridized carbon with +1 charge).
3. To minimize domination by large steric and electrostatic
energies, all energies that exceed a specified value (default
30 kcal/mol) are set to the cutoff value.
4. CoMFA uses a partial least-squares (PLS) analysis to predict activity from energy values at the grid points.
Based on these procedures, hither we would like to review the
up-to-date literatures for enhancing the quality of CoMFA modeling herein.
CoMFA Settings
It is a strenuous job for modelers to determine which settings,
or combinations of settings, are suitable for their data sets. Notwithstanding the fact that a number of settings are provided, most
CoMFA users still rely on the default parameters (Fig. 3). Hitherto,
the influences of adjusting one or more CoMFA settings such as
steric molecular field settings, grid distances and cut off values
have been well explored by several studies [7-10].
Fig. (1). CoMFA publication number by year, 1988-2009.
In CoMFA philosophy, the biological properties of molecules
are correlated with steric and electrostatic potential energies in
*Address correspondence to this author at the Department of Medicinal Chemistry,
School of Pharmacy, Shandong University, Jinan, Shandong 250012, China;
Tel/Fax: +86-531-8838-2076; E-mail: [email protected];
Tel/Fax: +86-531-88382264; E-mail: [email protected]
0929-8673/11 $58.00+.00
Karlen and coworkers produced a total of 6120 CoMFA models
by settings optimization to evaluate the possibility of improving the
predictive ability [11]. This effort was evaluated by nine different
data sets, and the optimal models shared ordinary feature of steric
fields derived by either indicator or squared transform types. The
internal and external predictive abilities of the derived models were
successfully strengthened by testing distinctive combinations of
CoMFA settings.
Molecular Conformation
Cramer and his coworkers have annotated that “active conformation” and “alignment rules” are major would-be dilemmas for
© 2011 Bentham Science Publishers Ltd.
Current Medicinal Chemistry, 2011 Vol. 18, No. 1
Zhang et al.
tory for a CoMFA modeling, along with the bioactive shapes as
prerequisite for a credible model. It’s pretty recognized that construction of bioactive conformation for a set of molecules is arduously realizable by way of crystallographic studies. Nevertheless,
diverse computational approaches, including docking, molecular
dynamic simulation and conformational sampling, are prosperously
ready for use to predict the bioactive conformation.
Provided that the holo-form crystal structure for target-ligand
complex is attainable and the data set compounds have an analogous scaffold with the ligand, this cocrystallized ligand can be of
course applied as a template for establishing the framework of the
selected data set. This method is straightforward butcommon-used
in the 3D-QSAR resolution [14-18]. It needs to be remarked that
this method can be exclusively plausible for a bunch of rigid and
structurally similar molecules. Notwithstanding, for a collection of
highly structurally flexible or diverse molecules, this method is too
much simple to be persuasive.
Fig. (2). The standard CoMFA process.
CoMFA [12]. Data sets for CoMFA examination should manoeuvre
through the same mechanism of action, as well as have a common
pharmacophore even if the molecular skeleton is different [4, 5, 13].
Three dimensional structures of the selected compounds are manda-
A common used approach for predicting the bioactive conformation is molecular docking. In the docking procedure, a ligand
was positioned to the functional site of target for determining the
binding mode of the complex as well as generating the active conformation of the ligand. So far a number of predictive CoMFA
models were successfully generated based on docking conformations, such as Chretien’s acetylcholinesterase (AChE) model [19],
Bharatam’s glycogen synthase kinase-3 (GSK-3) model [20], Qiu’s
c-Jun N-terminal kinase-1 (JNK-1) model [21], and Yao’s vascular
endothelial growth factor receptor tyrosine kinase-2 (VEGFR-2)
model [22].
Molecular dynamic simulation of the ligand-target complex can
still provide a precise conformation. There are numerous successful
examples for CoMFA modeling based on MD simulation, for ex-
Fig. (3). Decision tree for determining possible combinations of CoMFA settings [11].
How to Generate Reliable and Predictive
ample, Shang and coworkers built the initial structures of inhibitors
by MD simulation of enzyme-substrate complexes for their CoMFA
study [23]. Nevertheless, such accurate calculation is still too timeconsuming and resource-depended to be applied in QSAR modeling.
Bioactive conformations have to be in silico simulated in case
that the receptor experiential structure is not available. A systematic
search [24-26] or annealing process [27-29] can be usually utilized
to generate the low energy conformation for a template molecule;
subsequently the remaining molecules in the dataset can be constructed based on the above-mentioned reference conformation.
Active analog approach generates the possible conformations by
conformational analysis and selects the best ones that satisfy the
interatomic distances in a working hypothesis [30-32]. Local minimization method determines active conformations through systematic conformational searches followed by minimization on the rigid
rotor search surface [33]. Nonetheless, this procedure can exclusively lead to the minimized conformations, even if the bioactive
conformation is not equal to the lowest energy.
A “similarity” problem must be figured out is that if compounds
in the test set have significant conformational diversity compared
with the training set molecules (contain certain features in a region
not explored by the training set), the CoMFA model generated
based on the training set can’t accurately predict activities of the
test set [34].
Molecular Alignment
Molecular superposition plays a decisive role in CoMFA analysis, since the relative interaction energies depend strongly on relative molecular positions. Flexible groups and conformation diversity present dilemmas to molecular alignment. In this case, a low
quality alignment can result in unexpected input noise. Pseudoreceptor modeling [35-40] and 4D QSAR analysis [41-43] show high
performance because they have the advantage of reducing input
noise and decreasing risk of overfitting. As to CoMFA, various
methods were developed aiming to seek a strict or a “perfect”
molecular overlay.
A receptor-based alignment approach can be achieved by the
molecular docking method. In this procedure, similar molecules
were located to the same region of the active site, and subsequently,
an automatic alignment was generated. Using alignment derived
from this receptor-based approach can generate highly predictive
CoMFA models, and even shows superiority to the ligand-based
method [40, 44-46].
The pharmacophore modeling is another automatic molecular
alignment generation approach. Unlike molecular docking, this
method is independent on the receptor structure. It determines the
pharmacophore features of the ligands and uses these features to
align the molecular structures. Since the problem of conformational
diversity and flexibility is solved in some distance, it is widely applied in construction of CoMFA models [47-50].
Force fields can be used to align molecules and the field fit algorithm is used to calculate field values of the grid points. Lattice
points with certain field value magnitude are employed for molecular overlapping. Dove and coworkers compared three different
alignment rules, and found that the weighted field fit alignment rule
gave the best model [51]. It is concluded that the weighted field fit
method reduced the risk of producing artificial redundancy of the
structures and ignoring entropy contributions to the free energy of
Even though the automatic alignment rules are widely practiced
in the CoMFA approach with high performance, the manual alignment method is a competent alternative for yielding high predictive
models as well. Tervo and coworker derived CoMFA models based
on two alignment rules, docking and manual methods, for a set of
Current Medicinal Chemistry, 2011 Vol. 18, No. 1
flexible molecules [52]. Interestingly, the results suggested that
model with better predictive quality was constructed on the basis of
manual alignment rule.
The brandnew topomer CoMFA method introduced since 1998
can break the input structures into fragments and removes any core
fragment structurally common to the entire series [53].The field
values are calculated for the left fragments. It should be spotlighted
that this novel fragment-based alignment method is proved to be
promising CoMFA construction approach for both QSAR experts
and newbies within few minutes, and results are generally equivalent to canonical CoMFA analysis [53, 54].
CoMFA Fields
In CoMFA analysis, a collection of structurally aligned molecules are represented in terms of property fields, which are evaluated between a probe atom and each molecule at regularly spaced
intervals on a grid. The acquiescent CoMFA fields, steric and electrostatic fields, are calculated by Lennard-Jones potential (eq 1) and
Columbic potential (eq 2), respectively.
EvdW = Aij rij12 Cij rij6
EC = i=1
qi q j
where EvdW is van der Waals interaction energy, rij is distance between atom i of the molecule and the grid point j where the probe
atom is located, Aij and Cij are constants depending on the van der
Waals, radii of the corresponding atoms. EC is coulomb interaction
energy, qi is partial charge of atom i of the molecule, qj is charge of
the probe atom, D is dielectric constant.
Both potential functions are remarkably steep nearby the van
der Waals surface of the molecules, thus resulting in rapid changes
in surface descriptions (Fig. 4). The steric fields designate Van der
Waals interactions between the molecule and its receptor. The standard CoMFA method functions the Lennard Jones 6-12 potential,
which is characterized by a highly steep enhancement in energy at
short distances to calculate the steric interaction between a probe
atom and a molecule. So far different methods were applied to calculate steric field, their contribution to QSAR quality were evaluated as well [55, 56]. The indicator method assigned an energy 0 to
steric potentials which fall below the cutoff value, and assigned a
nominal energy (equal to the cutoff) to potentials falling at or above
the cutoff [34]. The parabolic approach made the magnitude of the
calculated potential at each lattice point be squared [11]. Gaussian
function which has a slower and smoother decrease was facilitated
in CoMSIA method for calculating molecular field potentials.
To prevent unjustified large parametric variance, steric energy
truncated at lower values was well documented because of the steep
increase of the steric field contribution at lattice points close to the
molecule. For example, Kim and Martin truncated the steric energy
to 4.0 [58], in the meanwhile Klebe and coworkers set two different
energy truncation of 30 and 5 for evaluating the effect of energy
truncation [59].
In 2008, Sorich and coworkers published their studies for investigating the guidance of steric field settings on the predictive performance of CoMFA [7]. In this case, 3D QSAR models based on
Lennard-Jones, indicator, parabolic and Qaussian steric fields were
sensibly compared using 28 datasets. The results demonstrated that
the preformance of Lennard-Jones and indicator fields that have a
steep value decrease was better than of Gaussian type, which have a
smoother decrease.
Current Medicinal Chemistry, 2011 Vol. 18, No. 1
Zhang et al.
MMFF, PRODRG, Pullman and VC2003, on prediction accuracy in
CoMFA and CoMSIA studies by using several benchmark datasets
[66]. In general, the semi-empirical charges, such as AM1 and
AM1-BCC, provided higher predictive CoMFA and CoMSIA models than the Gasteiger and Gasteiger-Hückel charges which are
commonly used in QSAR studies. Interesting, the CFF partial
charge was found to obtain the most predictive CoMFA and CoMSIA models. As an empirical charge-assigning method with a short
computing time, the CFF charge offer advantages over the other
eleven semi-empirical and empirical charges in performing the
most accurate electrostatic potential calculations for CoMFA and
CoMSIA studies. These results presented should help the selection
of electrostatic potential models in CoMFA and CoMSIA studies.
Grid Spacing
Kroemer and coworker elucidated that the appearance of the
molecular structures should be precisely characterized in the lattice,
whereas the degree of differentiation should not be excessively high
[33]. For that reason, an exceptional grid is constrained to discriminate atoms of different molecules and then formulate the corresponding values into the descriptor matrix.
Fig. (4). Steric and electrostatic fields in CoMFA studies [57].
As one predominant effecter on CoMFA generation, electrostatic field is typically brought about by calculating the Coulomb
potential between a probe and the molecule. The empirical partial
charge methods, including Gasteiger-Marsili, Gasteiger-Huckel and
MMFF94, are comprehensively used in released CoMFA studies.
Developed on the concept of equalization of electronegativities,
these emprical approaches are simple-handled and quick-witted in
assigning the partial charges. The common-used semiempirical
methods, including the modified neglect of differential overlap
(MNDO), Austin model 1 (AM1) and parametric model 3 (PM3),
are a tradeoff betwixt the empirical and ab initio quantum chemical
approaches (HF/STO-3G, HF/3-21G* and HF/6-31G*) in terms of
precision and computational time. The ab initio approaches are of
high meticulousness but low computational efficiency. Contrarily,
the empirical methods have fast speed but relative low accuracy.
The semi-empirical techniques have enhancement in speed over the
ab initio methods. Nevertheless, they did not have significant improvment on improvment on accuracy when compared with the
empirical methods.
Hitherto a number of studies were accomplished to evaluate the
importance of partial charges on CoMFA model’s predictive quality. [60-63] For example, Welsh and coworkers performed CoMFA
on a set of human immunodeficiency type 1 (HIV-1) protease inhibitors and different charge assignment schemes (Discover CVFF,
Gasteiger-Marsili and AM1-ESP) were evaluated [64]. The best
model was constructed using AM1-ESP partial charges.
Recently, Sorich and coworker examined the contribution of
partial charge calculation methods to the predictive ability of the
generated CoMFA models [65]. In their study, Gasteiger, Gasteiger-Huckel, MMFF94, AM1, MNDO and PM3 charges were assigned to 30 data sets. The authors found that semi-empirical charge
calculation methods suggested for the most predictive models,
MMFF94 was also a good alternative for its predictive ability (not
significantly worse than the semiempirical methods) and fast calculation speed.
We also did a study for comparing twelve semi-empirical and
empirical charge-assigning methods, including AM1, AM1-BCC,
CFF, Del-Re, Formal, Gasteiger, Gasteiger-Hückel, Hückel,
The influence of grid spacing has been thoroughly evaluated in
foregoing surveys. Consequently, these results demonstrated that
the domination of lattice location and size is not appreciable because of limited datasets [67, 68]. Tropsha and coworker presented
that changing the dimensions of the region can lead to an unreasonable predictive ability of CoMFA model [69]. They mentioned that
this characteristic should be well taken into account when operating
CoMFA. Likewise, Sorich and coworkers substantiated that there is
a statistically considerable function of grid spacing on predictability
for convinced steric field settings [7]. They also manifested that
lattice density has a somewhat subsidiary influence on the predictive capacity of Gaussian steric fields, whereas such parameters are
able to manipulate the CoMFA quality based on Lennard-Jones,
indicator and parabolic steric fields
Other Descriptors
Hydrophobic property was frequently thought about in CoMFA
studies, for enlightening the CoMFA models’ robustness and predictive capability [70-72]. Wiese and coworkers established more
than 350 CoMFA models and then benchmarked them using steric,
electrostatic and hydrophobic fields alone and in combination [73].
These consequences indicated that hydrophobic fields could boost
the correlative and predictive power in all cases. In the following
year, they compared the model quality by 3D (HINT hydrophobic
field) and logP (HINT and ClogP values) presentations of hydrophobicity, as well as evaluated the likeness between standard
CoMFA and hydrophobic fields [74]. They further exhibited that
ClogP is prior to logP when deriving hydrophobic fields for the
generation of CoMFA model. This thought-provoking evidence
uncovered that hydrophobic properties can perform the indicative
role in describing the molecule. Even so, in the event that the hydrophobic properties have no correlation with the activity, these
properties may present negative contribution to the QSAR quality.
So far a number of examinations showed that consideration of
H-bond descriptors can improve the quality of the CoMFA model
certainly [75, 76]. Xu and coworkers carried out CoMFA models on
a series of gamma-hydroxy butenolide endothelin antagonists in the
presence of additional H-bond fields [77]. The results confirmed
that H-bond fields significantly improved the quality of the derived
model. Moreover, Pan and coworkers emphasized the influence of
H-bond fields on improving the CoMFA model quality based on a
set of protein tyrosine phophatase 1B (PTP1B) inhibitors [78].
The guidances of frontier orbital (HOMO and LUMO) energies
as additional physical descriptors were also evaluated in diverse
How to Generate Reliable and Predictive
Current Medicinal Chemistry, 2011 Vol. 18, No. 1
QSAR studies. In result, the introduction of HOMO and LUMO are
proven to be suggestive for the acquired CoMFA models [79-82].
The incorporation of ligand-receptor energies to the CoMFA model
can contribute to the model quality as well [83].
In CoMFA modeling, a sp3 carbon with +1.0 charges is generally applied as the default probe atom. However, the selection of
probe atoms can involve in the CoMFA generation. Hannongbua
and coworker examined the impacts of different probe atoms (Csp3
(+1), Osp3 (-1) and H (+1)) on the quality of the generated CoMFA
models. Their results elucidated that a combination of these three
probes led to the most predictive model [84].
Statistical Analysis
Partial least squares (PLS), developed in 1986, is a prevailing
statistical algorithm for deriving linear relationships among columns of data [85-88]. Theoretically, PLS, a regression function,
looks for linear correlation of column variance in target properties
with variations in explanatory properties to minimize the sum of
squares of deviations. It is like a factor analysis of the explanatory
properties in which the object is to maximize alignment with the
target property values rather than with the Cartesian or other axes.
For this feature, PLS analysis is sometimes compared to principal
component regression analysis (PCA) in its derivation of vectors
from the Y and X blocks. PCA, introduced by Karl Pearson, involves a mathematical procedure that transforms a number of possibly correlated variables into the smaller number of uncorrelated
variables [89]. Unlike PCA, which only considers the influence of
the input data array, PLS possesses both the input and the output
data matrixes into consideration. Another critical algorithm,
SAMPLS, created by Bruce Bush, tremendously accelerates the
cross-validation procedure [90]. In SAMPLS, the latent variables
are derived from the n n covariance matrix (Fig. 5). Validation
methods such as cross-validation (including leave-one-out [12],
leave-many-out [91-93] and leave-group-out [94]) and bootstrapping [95-97] have employed to examine the robustness of the generated models.
The quality of the resulting QSAR models can be judged by statistical means such as r2 (the fraction of explained variance, eq. 3)
for the test set, and by q2 (the cross-validated or predictive r2, eq. 4)
for the training set. The fraction of explained variance, r2, measures
the QSAR model’s ability to interpret the variance in the data; in
other words, it estimates the goodness-of-fit of the regression model
derived from the training set. The predictive r2, or q2, refers to the
internal robustness of the QSAR model. Model quality estimation
are procured either by using cross-validation procedure (internal) or
by predicting external compounds (previously not used in the
r 2 = 1
Fig. (5). Description of PLS analysis, vectors u and t are derived from the Y
block and the X block, representatively. BAi = logarithms of activities, Sij =
steric field variable of molecule i in the grid point j, Eij = electrostatic field
variable of molecular i in the grid point j.
q2 > 0.5
r > 0.6
Ya )2
Ya )2
= 1
Yp )
(r r )
(r r ' )
< 0.1 or
< 0.1
0.85 k 1.15 or 0.85 k’ 1.15
q 2 = 1
Ym )2
Where Ya is an actual value, Yp is a predictive value, Yc is the average value of predicted values, Ym is the average value of observed
activities. PRESS = predictive sum of squares, SD = sum of squared
Recently Tropsha and coworkers introduced a new validation
criterion for robust QSAR models (eq. 5-8) [98, 99]. They consider
a QSAR model predictive, if the following conditions are satisfied:
where r is the correlation coefficient between the predicted and
observed activities, r02 is the coefficient of determination between
predicted and observed activities characterizing linear regression
with the Y-intercept set to zero (i.e., described by Y = kX, where Y
and X are actual and predicted activity, respectively), r '20 is the
coefficient of determination between observed and predicted activities, and k and k’ are the slopes of the regression lines through the
The aforementioned procedure accentuated that high q2 does
not ensure a predictive QSAR model. The predictive capability can
only be appraised by an external set of molecules, which are not
Current Medicinal Chemistry, 2011 Vol. 18, No. 1
used for building the model. Models with high internal q2 but low
external predictive abilities can be picked out by using these equations that were verified by 160 QSAR models.
Zhang et al.
glycogen synthase kinase-3
c-Jun N-terminal kinase-1
vascular endothelial growth factor receptor
tyrosine kinase-2
modified neglect of differential overlap
Austin model 1
The CoMFA technique has been developed for more than one
couple of decades. Thus far a great number of CoMFA studies
were performed based on this state-of-the-art approach. Scientists
have also contributed everlasting and booming endeavors to improve the predictive quality of the CoMFA model. Herein, the practicable CoMFA descriptors, including molecular conformation,
structural alignment, molecular fields, grid spacing and additional
physical chemical properties, were well presented as a tutorial review to provide possible guidance to the further CoMFA studies.
Among these crucial determinants, bioactive conformation and
molecular superposition engage an essential portrayal in the
CoMFA procedure, while different combination of fields and
physical chemical properties results in diverse predictable levels.
High predictive models can also be realized by adjusting settings,
such as energy cutoff values, lattice size and probe types.
In sum, suggestions for future CoMFA studies are outlined below.
1. The initial geometries of the molecules should be in bioactive or theoretical active framework;
2. Different charge methods should be carefully considered to
establish a muscular CoMFA model;
3. A reasonable molecular alignment is mandatory for a trustworthy CoMFA model;
4. Cut-off values are needed both for the steric and electrostatic energy calculation and for the PLS analysis to reduce
unwanted variance;
5. Other descriptors, such as ClogP, can substantially improve
the reliability of the CoMFA model. In the absence of statistic
significance in CoMFA generation, those descriptors can be
taken into consideration;
6. Different probe atoms could be attentively considered to
ameliorate the credibility of CoMFA model;
7. The lattice location and size should be unanimously deliberated.
parametric model 3
human immunodeficiency type 1
protein tyrosine phophatase 1B
sample-distance partial least squares
The present work was financed by grants from the PhD Programs Foundation of Ministry of Education of China (No.
20090131120080), the Doctoral Fund of Shandong Province (No.
BS2009SW011), Fok Ying Tung Education Foundation (No.
122036), National Natural Science Foundation of China (No.
30901836 and 81001362), Shandong Natural Science Foundation
(No. JQ201019) and Independent Innovation Foundation of Shandong University, IIFSDU (No. 2010JQ005).
comparative molecular field analysis
quantitative structure-activity relationship
partial least-squares
principal component analysis
molecular dynamic
three dimensional quantitative structureactivity relationship
comparative molecular similarity indices
Lill, M.A. Multi-dimensional QSAR in drug discovery. Drug Discov. Today,
2007, 12, 1013-1017.
Yang, G.F.; Huang, X. Development of quantitative structure-activity
relationships and its application in rational drug design. Curr. Pharm. Des.,
2006, 12, 4601-4611.
Cramer, R.D.; Patterson, D.E.; Bunce, J.D. Comparative molecular field
analysis (CoMFA). 1. Effect of shape on binding of steriods to carrier
proteins. J. Am. Chem. Soc., 1988, 110, 5959-5967.
Kubinyi, H. QSAR and 3D QSAR in drug design Part 1: methodology. Drug
Discov. Today, 1997, 2, 457-467.
Kubinyi, H. QSAR and 3D QSAR in drug design Part 2: applications and
problems. Drug Discov. Today, 1997, 2, 538-546.
Podlogar, B.L.; Ferguson, D.M. QSAR and CoMFA: a perspective on the
practical application to drug discovery. Drug Des. Discov., 2000, 17, 4-12.
Mittal, R.R.; McKinnon, R.A.; Sorich, M.J. Effect of steric molecular field
settings on CoMFA predictivity. J. Mol. Model., 2008, 14, 59-67.
Bursi, R.; Grootenhuis, P.D.J. Comparative molecular field analysis and
energy interaction studies of thrombin-inhibitor complexes. J. Comput. Aided
Mol. Des., 1999, 13, 221-232.
Dinan, L.; Hormann, R.E.; Fujimoto, T. An extensive ecdysteroid CoMFA.
J. Comput. Aided Mol. Des., 1999, 13, 185-207.
Melville, J.L.; Hirst, J.D. On the stability of CoMFA models. J. Chem. Inf.
Comput. Sci., 2004, 44, 1294-1300.
Peterson, S.D.; Schaal, W.; Karlen, A. Improved CoMFA modeling by
optimization of settings. J. Chem. Inf. Model., 2006, 46, 355-364.
Richard D. Cramer, I., David E. Patterson, and Jeffrey D. Bunce.
Comparative Molecular Field Analysis (CoMFA). 1. Effect of Shape on
Binding of Steroids to Carrier Proteins. J. Am. Chem. Soc., 1988, 110, 59595967.
Kubinyi, H. QSAR and 3D QSAR in drug design Part 1: methodology. Drug
Discov. Today, 1997, 2, 457-467.
Chavatte, P.; Yous, S.; Beaurain, N.; Mesangeau, C.; Ferry, G.; Lesieur, D.
arylalkylamine N-acetyltransferase (AANAT) inhibitors: A comparative
molecular field analysis. Quant. Struct-Act. Rel., 2002, 20, 414-421.
Tsai, K.C.; Lin, T.H. A ligand-based molecular modeling study on some
matrix metalloproteinase-1 inhibitors using several 3D QSAR techniques. J.
Chem. Inf. Comput. Sci., 2004, 44, 1857-1871.
Yu, Z.H.; Niu, C.W.; Ban, S.R.; Wen, X.; Xi, Z. Study on structure-activity
relationship of mutation-dependent herbicide resistance acetohydroxyacid
synthase through 3D-QSAR and mutation. Chinese Sci. Bull., 2007, 52,
Lei, B.L.; Du, J.; Li, S.Y.; Liu, H.X.; Ren, Y.Y.; Yao, X.J. Comparative
molecular field analysis (CoMFA) and comparative molecular similarity
indices analysis (CoMSIA) of thiazolone derivatives as hepatitis C virus
NS5B polymerase allosteric inhibitors. J. Comput. Aided Mol. Des., 2008,
22, 711-725.
Kaur, K.; Talele, T. Structure-based CoMFA and CoMSIA study of
indolinone inhibitors of PDK1. J. Comput. Aided Mol. Des., 2009, 23, 25-36.
Bernard, P.P.; Kireev, D.B.; Pintore, M.; Chretien, J.R.; Fortier, P.L.;
Froment, D. A CoMFA study of enantiomeric organophosphorus inhibitors
of acetylcholinesterase. J. Mol. Model., 2000, 6, 618-629.
Dessalew, N.; Patel, D.S.; Bharatam, P.V. 3D-QSAR and molecular docking
studies on pyrazolopyrimidine derivatives as glycogen synthase kinase-3 beta
inhibitors. J. Mol. Graph. Model., 2007, 25, 885-895.
Yi, P.; Qiu, M.H. 3D-QSAR and docking studies of aminopyridine
carboxamide inhibitors of c-Jun N-terminal kinase-1. Eur. J. Med. Chem.,
2008, 43, 604-613.
Du, J.; Lei, B.L.; Qin, J.; Liu, H.X.; Yao, X.J. Molecular modeling studies of
vascular endothelial growth factor receptor tyrosine kinase inhibitors using
QSAR and docking. J. Mol. Graph. Model., 2009, 27, 642-654.
Huang, M.L.; Yang, D.Y.; Shang, Z.C.; Zou, J.W.; Yu, Q.S. 3D-QSAR
studies on 4-hydroxyphenylpyruvate dioxygenase inhibitors by comparative
molecular field analysis (CoMFA). Bioorg. Med. Chem. Lett., 2002, 12,
How to Generate Reliable and Predictive
Aher, Y.D.; Agrawal, A.; Bharatam, P.V.; Garg, P. 3D-QSAR studies of
substituted 1-(3,3-diphenylpropyl)-piperidinyl amides and ureas as CCR5
receptor antagonists. J. Mol. Model., 2007, 13, 519-529.
Cho, W.J.; Kim, E.K.; Park, I.Y.; Jeong, E.Y.; Kim, T.S.; Le, T.N.; Kim,
D.D.; Leed, E.S. Molecular Modeling of 3-arylisoquinoline antitumor agents
active against A-549. A comparative molecular field analysis study. Bioorg.
Med. Chem., 2002, 10, 2953-2961.
Juan, A.A.S. Towards predictive inhibitor design for the EGFR
autophosphorylation activity. Eur. J. Med. Chem., 2008, 43, 781-791.
Chakraborti, A.K.; Gopalakrishnan, B.; Sobhia, M.E.; Malde, A. 3D-QSAR
studies of indole derivatives as phosphodiesterase IV inhibitors. Eur. J. Med.
Chem., 2003, 38, 975-982.
Ganguly, S.; Banerjee, S. 3D-QSAR studies of imidazole derivatives as
Candida albicans P450-demethylase inhibitors. Asian J. Chem., 2008, 20,
Adane, L.; Bharatam, P.V. 3D-QSAR analysis of cycloguanil derivatives as
inhibitors of A16V+S108T mutant Plasmodium falciparum dihydrofolate
reductase enzyme. J. Mol. Graph. Model., 2009, 28, 357-367.
Sufrin, J.R.; Dunn, D.A.; Marshall, G.R. Steric mapping of the L-methionine
binding site of ATP:L-methionine S-adenosyltransferase. Mol. Pharmacol.,
1981, 19, 307-313.
J-P Bjorkroth, T.A.P., and J. Lindroos. Comparative Molecular Field
Analysis of Some Clodronic Acid Esters. J. Med. Chem.,1991, 34, 23382343.
Deborah A. Loughney, C.F.S. A comparison of progestin and androgen
receptor binding using the CoMFA technique. J. Comput. Aided Mol.
Des.,1992, 6, 569-581.
Demeter, D.A.; Weintraub, H.J.R.; Knittel, J.J. The local minima method
(LMM) of pharmacophore determination: A protocol for predicting the
bioactive conformation of small, conformationally flexible molecules. J.
Chem. Inf. Comput. Sci., 1998, 38, 1125-1136.
Kroemer, R.T.; Hecht, P.; Guessregen, S.; Liedl, K.R. Improving the
predictive quality of CoMFA models. Perspect. Drug Discov. Des., 1998, 12,
Pei, J.F.; Zhou, J.J.; Xie, G.R.; Chen, H.M.; He, X.F. PARM: A practical
utility for drug design. J. Mol. Graph. Model., 2001, 19, 448-454.
Peng, T.; Pei, J.F.; Zhou, J.J. 3D-QSAR and receptor modeling of tyrosine
kinase inhibitors with flexible atom receptor model (FLARM). J. Chem. Inf.
Comput. Sci., 2003, 43, 298-303.
Pei, J.F.; Chen, H.; Liu, Z.M.; Han, X.F.; Wang, Q.; Shen, B.; Zhou, J.J.;
Lai, L.H. Improving the quality of 3D-QSAR by using flexible-ligand
receptor models. J. Chem. Inf. Model., 2005, 45, 1920-1933.
Lu, A.J.; Zhou, J.J. Pseudoreceptor models and 3D-QSAR for
alpha(x)beta(3)gamma(2) [x=1-3, 5, and 6] via flexible atom receptor model.
J. Chem. Inf. Comput. Sci., 2004, 44, 1130-1136.
Shen, B.; Lu, Z.H.; Chi, X.B.; Lu, H.F.; Ren, T.R. Research on
pseudoreceptor models for the inhibitors at GABA receptors via flexible
atom receptor model. Acta. Phys-Chimi. Sin., 2005, 21, 800-803.
Wichapong, K.; Lindner, M.; Pianwanit, S.; Kokpol, S.; Sippl, W. Receptorbased 3D-QSAR studies of checkpoint Wee1 kinase inhibitors. Eur. J. Med.
Chem., 2009, 44, 1383-1395.
A. J. Hopfinger, S.W., John S. Tokarski, Baiqiang Jin, Magaly Albuquerque,
Prakash J. Madhav, Chaya Duraiswami. Construction of 3D-QSAR Models
Using the 4D-QSAR Analysis Formalism. J. Am. Chem. Soc., 1997, 119,
Ravi, M.; Hopfinger, A.J.; Hormann, R.E.; Dinan, L. 4D-QSAR analysis of a
set of ecdysteroids and a comparison to CoMFA modeling. J. Chem. Inf.
Comput. Sci., 2001, 41, 1587-1604.
Martins, J.P.A.; Barbosa, E.G.; Pasqualoto, K.F.M.; Ferreira, M.M.C.
LQTA-QSAR: A New 4D-QSAR Methodology. J. Chem. Inf. Model., 2009,
49, 1428-1436.
Datar, P.A.; Coutinho, E.C. A CoMFA study of COX-2 inhibitors with
receptor based alignment. J. Mol. Graph. Model., 2004, 23, 239-251.
Huang, H.Q.; Pan, X.L.; Tan, N.H.; Zeng, G.Z.; Ji, C.J. 3D-QSAR study of
sulfonamide inhibitors of human carbonic anhydrase II. Eur. J. Med. Chem.,
2007, 42, 365-372.
Ma, X.; Zhou, L.; Zuo, Z.L.; Liu, J.; Yang, M.; Wang, R.W. Molecular
docking and 3-D QSAR studies of substituted 2,2-bisaryl-bicycloheptanes as
human 5-Lipoxygenase-Activating Protein (FLAP) inhibitors. Qsar. Comb.
Sci., 2008, 27, 1083-1091.
Palomer, A.; Pascual, J.; Cabre, F.; Garcia, M.L.; Mauleon, D. Derivation of
pharmacophore and CoMFA models for leukotriene D-4 receptor antagonists
of the quinolinyl(bridged) aryl series. J. Med. Chem., 2000, 43, 392-400.
Long, W.; Liu, P.X.; Li, Q.; Xu, Y.; Gao, J. 3D-QSAR studies on a class of
IKK-2 inhibitors with GALAHAD used to develop molecular alignment
models. Qsar. Comb. Sci., 2008, 27, 1113-1119.
Chen, Y.D.; Li, F.F.; Tang, W.Q.; Zhu, C.C.; Jiang, Y.J.; Zou, J.W.; Yu,
Q.S.; You, Q.D. 3D-QSAR studies of HDACs inhibitors using
pharmacophore-based alignment. Eur. J. Med. Chem., 2009, 44, 2868-2876.
Narkhede, S.S.; Degani, M.S. Pharmacophore refinement and 3D-QSAR
studies of histamine H-3 antagonists. Qsar. Comb. Sci., 2007, 26, 744-753.
Dove, S.; Buschauer, A. Improved alignment by weighted field fit in
CoMFA of histamine H-2 receptor agonists imidazolylpropylguanidines.
Current Medicinal Chemistry, 2011 Vol. 18, No. 1
Quant. Struct-Act. Rel., 1999, 18, 329-341.
Tervo, A.J.; Nyronen, T.H.; Ronkko, T.; Poso, A. Comparing the quality and
predictiveness between 3D QSAR models obtained from manual and
automated alignment. J. Chem. Inf. Comput. Sci., 2004, 44, 807-816.
Cramer, R.D. Topomer CoMFA: A design methodology for rapid lead
optimization. J. Med. Chem., 2003, 46, 374-388.
Chung, J.Y.; Pasha, F.A.; Chung, H.; Yang, B.S.; Lee, C.; Oh, J.S.; Moon,
M.W.; Cho, S.J.; Cho, A.E. Topomer-CoMFA study of tricyclic azepine
derivatives-EGFR inhibitors. Mol. Cell. Toxicol., 2008, 4, 78-84.
Chuman, H.; Karasawa, M.; Fujita, T. A novel three-dimensional QSAR
procedure: Voronoi field analysis. Quant. Struct-Act. Rel., 1998, 17, 313326.
Timofei, S.; Kurunczi, L.; Schmidt, W.; Simon, Z. Steric and electrostatic
effects in dye-cellulose interactions by the MTD and CoMFA approaches.
Sar. Qsar. Environ. Res., 2002, 13, 219-226.
Böhm, H.-J.; Klebe, G.; Kubiny, H. Wirkstoffdesign. Spektrum
Akademischer: Heidelberg, 1996.
Kim, K.H.M., Y.C. Direct prediction of dissociation constants (pKa's) of
clonidine-like imidazolines, 2-substituted imidazoles and 1-methyl-2substituted-imidazoles from 3D structures using a comparative molecular
field analysis (CoMFA) approach. J. Med. Chem., 1991, 34, 2056-2060.
Klebe, G.A., U. On the prediction of binding properties of drug molecules by
comparative molecular field analysis. J. Med. Chem., 1993, 36, 70-80.
Puri, S.; Chickos, J.S.; Welsh, W.J. Three-dimensional quantitative structureproperty relationship (3D-QSPR) models for prediction of thermodynamic
properties of polychlorinated biphenyls (PCBs): Enthalpy of sublimation. J.
Chem. Inf. Comput. Sci., 2002, 42, 109-116.
Puri, S.; Chickos, J.S.; Welsh, W.J. Three-dimensional Quantitative
Structure-Property Relationship (3D-QSPR) models for prediction of
thermodynamic properties of polychlorinated biphenyls (PCBs): Enthalpy of
vaporization. J. Chem. Inf. Comput. Sci., 2002, 42, 299-304.
Hirons, L.; Holliday, J.D.; Jelfs, S.P.; Willett, P.; Gedeck, P. Use of the Rgroup descriptor for alignment-free QSAR. Qsar. Comb. Sci., 2005, 24, 611619.
Luo, H.B.; Cheng, Y.K. Quantitative structure-retention relationship of
nucleic-acid bases revisited. CoMFA on purine RPLC retention. Qsar.
Comb. Sci., 2005, 24, 968-975.
Jayatilleke, P.R.N.; Nair, A.C.; Zauhar, R.; Welsh, W.J. Computational
studies on HIV-1 protease inhibitors: Influence of calculated inhibitorenzyme binding affinities on the statistical quality of 3D-QSAR CoMFA
models. J. Med. Chem., 2000, 43, 4446-4451.
Mittal, R.R.; Harris, L.; McKinnon, R.A.; Sorich, M.J. Partial Charge
Calculation Method Affects CoMFA QSAR Prediction Accuracy. J. Chem.
Inf. Model., 2009, 49, 704-709.
Tsai, K.C.; Chen, Y.C.; Hsiao, N.W.; Wang, C.L.; Lin, C.L.; Lee, Y.C.; Li,
M.; Wang, B. A comparison of different electrostatic potentials on prediction
accuracy in CoMFA and CoMSIA studies. Eur. J. Med. Chem., 2010, 45,
Jung, M.; Kim, H. CoMFA of artemisinin derivatives: Effect of location and
size of lattice. Bioorg. Med. Chem. Lett., 2001, 11, 2041-2044.
Kunick, C.; Lauenroth, K.; Wieking, K.; Xie, X.; Schultz, C.; Gussio, R.;
Zaharevitz, D.; Leost, M.; Meijer, L.; Weber, A.; Jorgensen, F.S.; Lemcke,
T. Evaluation and comparison of 3D-QSAR CoMSIA models for CDK1,
CDK5, and GSK-3 inhibition by paullones. J. Med. Chem., 2004, 47, 22-36.
Bucholtz, E.C.; Tropsha, A. The effect of region size on CoMFA analyses.
Med. Chem. Res., 1999, 9, 675-685.
Carpy, A. Importance of lipophilicity in molecular design - Foreword.
Analusis, 1999, 27, 3-6.
Vajragupta, O.; Boonchoong, P.; Wongkrajang, Y. Comparative quantitative
structure-activity study of radical scavengers. Bioorg. Med. Chem., 2000, 8,
Nakagawa, Y.; Takahashi, K.; Kishikawa, H.; Ogura, T.; Minakuchi, C.;
Miyagawa, H. Classical and three-dimensional QSAR for the inhibition of
[H-3]ponasterone A binding by diacylhydrazine-type ecdysone agonists to
insect Sf-9 cells. Bioorg. Med., 2005, 13, 1333-1340.
Pajeva, I.; Wiese, M. Molecular modeling of phenothiazines and related
drugs as multidrug resistance modifiers: A comparative molecular field
analysis study. J. Med. Chem.,1998, 41, 1815-1826.
Pajeva, I.K.; Wiese, M. Interpretation of CoMFA results - A probe set study
using hydrophobic fields. Quant. Struct-Act. Rel., 1999, 18, 369-379.
Bursi, R.; Sawa, M.; Hiramatsu, Y.; Kondo, H. A three-dimensional
quantitative structure-activity relationship study of heparin-binding
epidermal growth factor shedding inhibitors using comparative molecular
field analysis. J. Med. Chem., 2002, 45, 781-788.
Bursi, R.; Grootenhuis, A.; van der Louw, J.; Verhagen, J.; de Gooyer, M.;
Jacobs, P.; Leysen, D. Structure-activity relationship study of human liver
microsomes-catalyzed hydrolysis rate of ester prodrugs of MENT by
comparative molecular field analysis (CoMFA). Steroids, 2003, 68, 213-220.
Gu, C.M.; Hou, T.J.; Xu, X.J. Comparative molecular field analysis of
gamma-hydroxy butenolide endothelin antagonists. Chem. J. Chinese U.,
2001, 22, 1864-1868.
Pan, Y.M.; Ji, M.J.; Ye, X.Q.; Kuang, P.X. 3D-QSAR analyses of novel
benzofuranyl and benzothiophenyl biphenyls as PTP1B inhibitors. Chinese J.
Org. Chem., 2003, 23, 167-172.
Tuppurainen, K. Frontier orbital energies, hydrophobicity and steric factors
Current Medicinal Chemistry, 2011 Vol. 18, No. 1
as physical QSAR descriptors of molecular mutagenicity. A review with a
case study: MX compounds. Chemosphere, 1999, 38, 3015-3030.
Chen, H.F.; Yao, X.J.; Petitjean, M.; Xia, H.O.; Yao, J.H.; Panaye, A.;
Doucet, J.P.; Fan, B.T. Insight into the bioactivity and metabolism of human
glucagon receptor antagonists from 3D-QSAR analyses. Qsar. Comb. Sci.,
2004, 23, 603-620.
Christensen, H.S.; Boye, S.V.; Thinggaard, J.; Sinning, S.; Wiborg, O.;
Schiott, B.; Bols, M. QSAR studies and pharmacophore identification for
arylsubstituted cycloalkenecarboxylic acid methyl esters with affinity for the
human dopamine transporter. Bioorg. Med. Chem., 2007, 15, 5262-5274.
Chen, H.F. Computational study of histamine H-3-receptor antagonist with
support vector machines and three dimension quantitative structure activity
relationship methods. Ana. Chim. Acta., 2008, 624, 203-209.
Wolohan, P.; Reichert, D.E. Use of binding energy in comparative molecular
field analysis of isoform selective estrogen receptor ligands. J. Mol. Graph.
Model., 2004, 23, 23-38.
Maitarad, P.; Saparpakorn, P.; Hannongbua, S.; Kamchonwongpaisan, S.;
Tarnchompoo, B.; Yuthavong, Y. Particular interaction between
pyrimethamine derivatives and quadruple mutant type dihydrofolate
reductase of Plasmodium falciparum: CoMFA and quantum chemical
calculations studies. J.Enzyme Inhibi. Med. Chem., 2009, 24, 471-479.
Kubinyi, H. QSAR : Hansch analysis and related approaches. VCH:
Weinheim, 1993.
Kubinyi, H. 3D QSAR in Drug Design. Theory, Methods and Applications.
Kubinyi, Hugo ed. ESCOM: Leiden, 1993.
van de Waterbeemd, H. Chemometric Methods in Molecular Design. VCH:
Weinheim, 1995.
van de Waterbeemd, H. Advanced Computer-Assisted Techniques in Drug
Discovery. VCH: Weinheim, 1995.
Pearson, K. On lines and planes of closest fit to systems of points in space.
Zhang et al.
Philos.Mag., 1901, 6, 559-576.
Bush, B.L.; Nachbar, R.B. Sample-distance Partial Least Squares: PLS
optimized for many variables, with application to CoMFA. J. Comput. Aided
Mol. Des., 1993, 7, 587-619.
Clark, R.D. Boosted leave-many-out cross-validation: the effect of training
and test set diversity on PLS statistics. J. Comput. Aided Mol. Des., 2003, 17,
Polanski, J.; Gieleciak, R.; Bak, A. Probability issues in molecular design:
Predictive and modeling ability in 3D-QSAR schemes. Comb. Chem. High
Throughput Screen., 2004, 7, 793-807.
Jojart, B.; Marki, A. Receptor-based QSAR studies of non-peptide human
oxytocin receptor antagonists. J. Mol. Graph. Model., 2007, 25, 711-720.
Maw, H.H.; Hall, L.H. E-state modeling of corticosteroids binding affinity
validation of model for small data set. J. Chem. Inf. Comput. Sci., 2001, 41,
Puntambekar, D.S.; Giridhar, R.; Yadav, M.R. Understanding the antitumor
activity of novel tricyclicpiperazinyl derivatives as farnesyltransferase
inhibitors using CoMFA and CoMSIA. Eur. J. Med. Chem., 2006, 41, 12791292.
Nair, P.C.; Sobhia, M.E. CoMFA based de novo design of pyridazine
analogs as PTP1B inhibitors. J. Mol. Graph. Model., 2007, 26, 117-123.
Ramar, S.; Bag, S.; Tawari, N.R.; Degani, M.S. 3-D-QSAR analysis of 2(oxalylamino) benzoic acid class of protein tyrosine phosphatase 1B
inhibitors by CoMFA and Cerius2.GA. Qsar. Combi. Sci., 2007, 26, 608617.
Golbraikh, A.; Tropsha, A. Beware of q2! J. Mol. Graph. Model., 2002, 20,
Golbraikh, A.; Shen, M.; Xiao, Z.; Xiao, Y.D.; Lee, K.H.; Tropsha, A.
Rational selection of training and test sets for the development of validated
QSAR models. J. Comput. Aided. Mol. Des., 2003, 17, 241-253.