It is essential to predict aggregation-forming sequences for elucidation of protein misfolding mechanisms and the design of effective antiamyloid inhibitors. In this work, we predict and characterize self-assembled hexapeptides by a quantitative sequence-aggregation relationship (QSAR) model, which involves characterization of factor analysis scale of generalized amino acid information (FASGAI) and modeling of supporting vector machine (SVM) with radial basis function kernel. The QSAR model achieves maximum accuracy of 78.33% and area under the receiver operating characteristic curve of 0.83 with leave one out cross-validation on 180 training hexapeptides. We determine "hotspots" and key factors that largely contribute to the self-assembly of these hexapeptides by analyzing their sequence-aggregation relationships. We also explore the applications of the present model, e.g., the first is to identify the aggregation-forming sequences within both β-amyloid peptide (Aβ42) and human islet amyloid polypeptide (hIAPP) using a 6-residue slide window, and acquire good agreement with previous experimental observations, the second is to perform in silico design of potential aggregation-forming hexapeptides which are validated by all-atom molecular dynamics simulation and density functional theory calculations, and the third is to predict the potential self-assembled tri-, tetra- and pentapeptides, in which hydrophobic amino acids such as isoleucine, leucine, valine, phenylalanine, and methionine occur at higher frequencies. The present QSAR model is helpful for (i) predicting self-assembled behaviors of peptides, (ii) scanning and identifying aggregation-forming sequences within proteins, (iii) understanding action mechanisms of peptide/protein aggregation, and (iv) designing potential self-assembled sequences applied as drug discovery and nano-materials.
- Factor analysis scale of generalized amino acid information (FASGAI)
- Quantitative sequence-aggregation relationship (QSAR)
- Supporting vector machine (SVM)
ASJC Scopus subject areas
- Analytical Chemistry
- Computer Science Applications
- Process Chemistry and Technology