TY - JOUR
T1 - Identifying functional single nucleotide polymorphisms in the human CArGome
AU - Benson, Craig C.
AU - Zhou, Qian
AU - Long, Xiaochun
AU - Miano, Joseph M.
PY - 2011/9
Y1 - 2011/9
N2 - Regulatory SNPs (rSNPs) reside primarily within the nonprotein coding genome and are thought to disturb normal patterns of gene expression by altering DNA binding of transcription factors. Nevertheless, despite the explosive rise in SNP association studies, there is little information as to the function of rSNPs in human disease. Serum response factor (SRF) is a widely expressed DNA-binding transcription factor that has variable affinity to at least 1,216 permutations of a 10 bp transcription factor binding site (TFBS) known as the CArG box. We developed a robust in silico bioinformatics screening method to evaluate sequences around RefSeq genes for conserved CArG boxes. Utilizing a predetermined phastCons threshold score, we identified 8,252 strand-specific CArGs within an 8 kb window around the transcription start site of 5,213 genes, including all previously defined SRF target genes. We then interrogated this CArG dataset for the presence of previously annotated common polymorphisms. We found a total of 118 unique CArG boxes harboring a SNP within the 10 bp CArG sequence and 1,130 CArG boxes with SNPs located just outside the CArG element. Gel shift and luciferase reporter assays validated SRF binding and functional activity of several new CArG boxes. Importantly, SNPs within or just outside the CArG box often resulted in altered SRF binding and activity. Collectively, these findings demonstrate a powerful approach to computationally define rSNPs in the human CArGome and provide a foundation for similar analyses of other TFBS. Such information may find utility in genetic association studies of human disease where little insight is known regarding the functionality of rSNPs.
AB - Regulatory SNPs (rSNPs) reside primarily within the nonprotein coding genome and are thought to disturb normal patterns of gene expression by altering DNA binding of transcription factors. Nevertheless, despite the explosive rise in SNP association studies, there is little information as to the function of rSNPs in human disease. Serum response factor (SRF) is a widely expressed DNA-binding transcription factor that has variable affinity to at least 1,216 permutations of a 10 bp transcription factor binding site (TFBS) known as the CArG box. We developed a robust in silico bioinformatics screening method to evaluate sequences around RefSeq genes for conserved CArG boxes. Utilizing a predetermined phastCons threshold score, we identified 8,252 strand-specific CArGs within an 8 kb window around the transcription start site of 5,213 genes, including all previously defined SRF target genes. We then interrogated this CArG dataset for the presence of previously annotated common polymorphisms. We found a total of 118 unique CArG boxes harboring a SNP within the 10 bp CArG sequence and 1,130 CArG boxes with SNPs located just outside the CArG element. Gel shift and luciferase reporter assays validated SRF binding and functional activity of several new CArG boxes. Importantly, SNPs within or just outside the CArG box often resulted in altered SRF binding and activity. Collectively, these findings demonstrate a powerful approach to computationally define rSNPs in the human CArGome and provide a foundation for similar analyses of other TFBS. Such information may find utility in genetic association studies of human disease where little insight is known regarding the functionality of rSNPs.
KW - Bioinformatics
KW - Carg box
KW - Serum response factor
KW - Transcription factor binding site
UR - http://www.scopus.com/inward/record.url?scp=80053153925&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80053153925&partnerID=8YFLogxK
U2 - 10.1152/physiolgenomics.00098.2011
DO - 10.1152/physiolgenomics.00098.2011
M3 - Article
C2 - 21771879
AN - SCOPUS:80053153925
SN - 1094-8341
VL - 43
SP - 1038
EP - 1048
JO - Physiological Genomics
JF - Physiological Genomics
IS - 18
ER -