A quantitative sequence-aggregation relationship predictor applied as identification of self-assembled hexapeptides

Chen Chen, Yonglan Liu, Jin Zhang, Mingzhen Zhang, Jie Zheng, Yong Teng, Guizhao Liang

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

It is essential to predict aggregation-forming sequences for elucidation of protein misfolding mechanisms and the design of effective antiamyloid inhibitors. In this work, we predict and characterize self-assembled hexapeptides by a quantitative sequence-aggregation relationship (QSAR) model, which involves characterization of factor analysis scale of generalized amino acid information (FASGAI) and modeling of supporting vector machine (SVM) with radial basis function kernel. The QSAR model achieves maximum accuracy of 78.33% and area under the receiver operating characteristic curve of 0.83 with leave one out cross-validation on 180 training hexapeptides. We determine "hotspots" and key factors that largely contribute to the self-assembly of these hexapeptides by analyzing their sequence-aggregation relationships. We also explore the applications of the present model, e.g., the first is to identify the aggregation-forming sequences within both β-amyloid peptide (Aβ42) and human islet amyloid polypeptide (hIAPP) using a 6-residue slide window, and acquire good agreement with previous experimental observations, the second is to perform in silico design of potential aggregation-forming hexapeptides which are validated by all-atom molecular dynamics simulation and density functional theory calculations, and the third is to predict the potential self-assembled tri-, tetra- and pentapeptides, in which hydrophobic amino acids such as isoleucine, leucine, valine, phenylalanine, and methionine occur at higher frequencies. The present QSAR model is helpful for (i) predicting self-assembled behaviors of peptides, (ii) scanning and identifying aggregation-forming sequences within proteins, (iii) understanding action mechanisms of peptide/protein aggregation, and (iv) designing potential self-assembled sequences applied as drug discovery and nano-materials.

Original languageEnglish (US)
Pages (from-to)7-16
Number of pages10
JournalChemometrics and Intelligent Laboratory Systems
Volume145
DOIs
StatePublished - Jul 5 2015
Externally publishedYes

Fingerprint

Agglomeration
Peptides
Proteins
Amino acids
Islet Amyloid Polypeptide
Amino Acids
Isoleucine
Polypeptides
Valine
Factor analysis
Phenylalanine
Amyloid
Leucine
Methionine
Self assembly
Density functional theory
Molecular dynamics
Scanning
Atoms
Computer simulation

Keywords

  • Aggregation
  • Factor analysis scale of generalized amino acid information (FASGAI)
  • Hexapeptide
  • Quantitative sequence-aggregation relationship (QSAR)
  • Supporting vector machine (SVM)

ASJC Scopus subject areas

  • Analytical Chemistry
  • Software
  • Process Chemistry and Technology
  • Spectroscopy
  • Computer Science Applications

Cite this

A quantitative sequence-aggregation relationship predictor applied as identification of self-assembled hexapeptides. / Chen, Chen; Liu, Yonglan; Zhang, Jin; Zhang, Mingzhen; Zheng, Jie; Teng, Yong; Liang, Guizhao.

In: Chemometrics and Intelligent Laboratory Systems, Vol. 145, 05.07.2015, p. 7-16.

Research output: Contribution to journalArticle

Chen, Chen ; Liu, Yonglan ; Zhang, Jin ; Zhang, Mingzhen ; Zheng, Jie ; Teng, Yong ; Liang, Guizhao. / A quantitative sequence-aggregation relationship predictor applied as identification of self-assembled hexapeptides. In: Chemometrics and Intelligent Laboratory Systems. 2015 ; Vol. 145. pp. 7-16.
@article{0b65c6fd6d234c46b0dfd31c57bf4ff2,
title = "A quantitative sequence-aggregation relationship predictor applied as identification of self-assembled hexapeptides",
abstract = "It is essential to predict aggregation-forming sequences for elucidation of protein misfolding mechanisms and the design of effective antiamyloid inhibitors. In this work, we predict and characterize self-assembled hexapeptides by a quantitative sequence-aggregation relationship (QSAR) model, which involves characterization of factor analysis scale of generalized amino acid information (FASGAI) and modeling of supporting vector machine (SVM) with radial basis function kernel. The QSAR model achieves maximum accuracy of 78.33{\%} and area under the receiver operating characteristic curve of 0.83 with leave one out cross-validation on 180 training hexapeptides. We determine {"}hotspots{"} and key factors that largely contribute to the self-assembly of these hexapeptides by analyzing their sequence-aggregation relationships. We also explore the applications of the present model, e.g., the first is to identify the aggregation-forming sequences within both β-amyloid peptide (Aβ42) and human islet amyloid polypeptide (hIAPP) using a 6-residue slide window, and acquire good agreement with previous experimental observations, the second is to perform in silico design of potential aggregation-forming hexapeptides which are validated by all-atom molecular dynamics simulation and density functional theory calculations, and the third is to predict the potential self-assembled tri-, tetra- and pentapeptides, in which hydrophobic amino acids such as isoleucine, leucine, valine, phenylalanine, and methionine occur at higher frequencies. The present QSAR model is helpful for (i) predicting self-assembled behaviors of peptides, (ii) scanning and identifying aggregation-forming sequences within proteins, (iii) understanding action mechanisms of peptide/protein aggregation, and (iv) designing potential self-assembled sequences applied as drug discovery and nano-materials.",
keywords = "Aggregation, Factor analysis scale of generalized amino acid information (FASGAI), Hexapeptide, Quantitative sequence-aggregation relationship (QSAR), Supporting vector machine (SVM)",
author = "Chen Chen and Yonglan Liu and Jin Zhang and Mingzhen Zhang and Jie Zheng and Yong Teng and Guizhao Liang",
year = "2015",
month = "7",
day = "5",
doi = "10.1016/j.chemolab.2015.04.009",
language = "English (US)",
volume = "145",
pages = "7--16",
journal = "Chemometrics and Intelligent Laboratory Systems",
issn = "0169-7439",
publisher = "Elsevier",

}

TY - JOUR

T1 - A quantitative sequence-aggregation relationship predictor applied as identification of self-assembled hexapeptides

AU - Chen, Chen

AU - Liu, Yonglan

AU - Zhang, Jin

AU - Zhang, Mingzhen

AU - Zheng, Jie

AU - Teng, Yong

AU - Liang, Guizhao

PY - 2015/7/5

Y1 - 2015/7/5

N2 - It is essential to predict aggregation-forming sequences for elucidation of protein misfolding mechanisms and the design of effective antiamyloid inhibitors. In this work, we predict and characterize self-assembled hexapeptides by a quantitative sequence-aggregation relationship (QSAR) model, which involves characterization of factor analysis scale of generalized amino acid information (FASGAI) and modeling of supporting vector machine (SVM) with radial basis function kernel. The QSAR model achieves maximum accuracy of 78.33% and area under the receiver operating characteristic curve of 0.83 with leave one out cross-validation on 180 training hexapeptides. We determine "hotspots" and key factors that largely contribute to the self-assembly of these hexapeptides by analyzing their sequence-aggregation relationships. We also explore the applications of the present model, e.g., the first is to identify the aggregation-forming sequences within both β-amyloid peptide (Aβ42) and human islet amyloid polypeptide (hIAPP) using a 6-residue slide window, and acquire good agreement with previous experimental observations, the second is to perform in silico design of potential aggregation-forming hexapeptides which are validated by all-atom molecular dynamics simulation and density functional theory calculations, and the third is to predict the potential self-assembled tri-, tetra- and pentapeptides, in which hydrophobic amino acids such as isoleucine, leucine, valine, phenylalanine, and methionine occur at higher frequencies. The present QSAR model is helpful for (i) predicting self-assembled behaviors of peptides, (ii) scanning and identifying aggregation-forming sequences within proteins, (iii) understanding action mechanisms of peptide/protein aggregation, and (iv) designing potential self-assembled sequences applied as drug discovery and nano-materials.

AB - It is essential to predict aggregation-forming sequences for elucidation of protein misfolding mechanisms and the design of effective antiamyloid inhibitors. In this work, we predict and characterize self-assembled hexapeptides by a quantitative sequence-aggregation relationship (QSAR) model, which involves characterization of factor analysis scale of generalized amino acid information (FASGAI) and modeling of supporting vector machine (SVM) with radial basis function kernel. The QSAR model achieves maximum accuracy of 78.33% and area under the receiver operating characteristic curve of 0.83 with leave one out cross-validation on 180 training hexapeptides. We determine "hotspots" and key factors that largely contribute to the self-assembly of these hexapeptides by analyzing their sequence-aggregation relationships. We also explore the applications of the present model, e.g., the first is to identify the aggregation-forming sequences within both β-amyloid peptide (Aβ42) and human islet amyloid polypeptide (hIAPP) using a 6-residue slide window, and acquire good agreement with previous experimental observations, the second is to perform in silico design of potential aggregation-forming hexapeptides which are validated by all-atom molecular dynamics simulation and density functional theory calculations, and the third is to predict the potential self-assembled tri-, tetra- and pentapeptides, in which hydrophobic amino acids such as isoleucine, leucine, valine, phenylalanine, and methionine occur at higher frequencies. The present QSAR model is helpful for (i) predicting self-assembled behaviors of peptides, (ii) scanning and identifying aggregation-forming sequences within proteins, (iii) understanding action mechanisms of peptide/protein aggregation, and (iv) designing potential self-assembled sequences applied as drug discovery and nano-materials.

KW - Aggregation

KW - Factor analysis scale of generalized amino acid information (FASGAI)

KW - Hexapeptide

KW - Quantitative sequence-aggregation relationship (QSAR)

KW - Supporting vector machine (SVM)

UR - http://www.scopus.com/inward/record.url?scp=84928597737&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84928597737&partnerID=8YFLogxK

U2 - 10.1016/j.chemolab.2015.04.009

DO - 10.1016/j.chemolab.2015.04.009

M3 - Article

AN - SCOPUS:84928597737

VL - 145

SP - 7

EP - 16

JO - Chemometrics and Intelligent Laboratory Systems

JF - Chemometrics and Intelligent Laboratory Systems

SN - 0169-7439

ER -