Genomic interrogation

(This lab was adapted for statsTeachR by Sara Nunez, Nicholas Reich and Andrea Foulkes from our GWAS tutorial.)

The last step in the GWAS would be to conduct a post-analytic genomic interrogation to provide further insight into the association results.

In the previous labs, we’ve generated results for both typed and imputed SNPs based on their association with HDL cholestrol. The next step is to combine these results and isolate the SNPs in a region of interest. First, the typed SNP results are loaded from the file generated by the GWAA function and manipulated to have them in the same format as the imputed SNP results. We follow similar steps as in the last lab to attach chromosome and position information to each SNP, order them by significance, and take \(−log_{10}(\mbox{p values})\). We also want to add the same ‘type’ column to the typed SNP results data as we did to the imputed data. Note that we can get the SNP coordinate information from genoBim which contains genomic information on every SNP.

In the following code chunk, we first load the data from previous labs and install necessary packages. Then, the results from the typed SNP analysis are handled as mentioned above.

load("m3_lab2_save.RData")
library(plyr)

GWASout <- read.table("GWAA.txt", header = TRUE, colClasses = c("character", 
    rep("numeric", 4)))

GWASout$Neg_logP <- -log10(GWASout$p.value)

GWASout <- merge(GWASout, genoBim[, c("SNP", "chr", "position")])

GWASout <- arrange(GWASout, -Neg_logP)

GWASout$type <- "typed"

head(GWASout)
##          SNP   Estimate  Std.Error   t.value      p.value Neg_logP chr
## 1  rs1532625  0.2076611 0.03760084  5.522779 4.041152e-08 7.393495  16
## 2  rs3803768 -0.3128523 0.06769878 -4.621241 4.195818e-06 5.377183  17
## 3 rs16951746 -0.3219660 0.07856772 -4.097943 4.429065e-05 4.353688  15
## 4 rs17643302  0.2422581 0.06183232  3.917984 9.398109e-05 4.026960  17
## 5 rs17778044  0.1882179 0.04804567  3.917479 9.423620e-05 4.025782  17
## 6  rs4149504 -0.1824303 0.04664973 -3.910640 9.685440e-05 4.013881  16
##   position  type
## 1 57005301 typed
## 2 80872028 typed
## 3 68605486 typed
## 4 64770822 typed
## 5 68310164 typed
## 6 75565845 typed

Now that the have the typed SNP results ready, we can merge them with the imputed SNP results. The two tables of typed and imputed genotypes are combined into a single table using function rbind.fill(), which fills in missing columns with NAs.

GWASout$type <- "typed"

GWAScomb<-rbind.fill(GWASout, imputeOut)

head(GWAScomb)
##          SNP   Estimate  Std.Error   t.value      p.value Neg_logP chr
## 1  rs1532625  0.2076611 0.03760084  5.522779 4.041152e-08 7.393495  16
## 2  rs3803768 -0.3128523 0.06769878 -4.621241 4.195818e-06 5.377183  17
## 3 rs16951746 -0.3219660 0.07856772 -4.097943 4.429065e-05 4.353688  15
## 4 rs17643302  0.2422581 0.06183232  3.917984 9.398109e-05 4.026960  17
## 5 rs17778044  0.1882179 0.04804567  3.917479 9.423620e-05 4.025782  17
## 6  rs4149504 -0.1824303 0.04664973 -3.910640 9.685440e-05 4.013881  16
##   position  type
## 1 57005301 typed
## 2 80872028 typed
## 3 68605486 typed
## 4 64770822 typed
## 5 68310164 typed
## 6 75565845 typed
tail(GWAScomb)
##               SNP Estimate Std.Error t.value   p.value     Neg_logP chr
## 190467   rs857176       NA        NA      NA 0.9999661 1.472859e-05  16
## 190468   rs857177       NA        NA      NA 0.9999661 1.472859e-05  16
## 190469   rs860730       NA        NA      NA 0.9999661 1.472859e-05  16
## 190470   rs863228       NA        NA      NA 0.9999661 1.472859e-05  16
## 190471  rs9989429       NA        NA      NA 0.9999661 1.472859e-05  16
## 190472 rs72774978       NA        NA      NA 0.9999699 1.307962e-05  16
##        position    type
## 190467 13930841 imputed
## 190468 13930825 imputed
## 190469 13930975 imputed
## 190470 13931061 imputed
## 190471 13924679 imputed
## 190472  7661950 imputed

Next, we use the map2gene() function defined in the previous lab to subset the data for SNPs in the CETP gene and print the observations. Note that the function was saved and reloaded into your workspace, so you don’t need to go searching for it.

CETP <- map2gene("CETP", coords = genes, SNPs = GWAScomb)

CETP <- CETP[,c("SNP","p.value","Neg_logP","chr","position","type","gene")]

print(CETP)
##                SNP      p.value  Neg_logP chr position    type gene
## 1        rs1532625 4.041152e-08 7.3934948  16 57005301   typed CETP
## 16        rs289742 3.496236e-04 3.4563993  16 57017762   typed CETP
## 130       rs289715 4.047466e-03 2.3928168  16 57008508   typed CETP
## 32369    rs1800775 3.970058e-08 7.4012031  16 56995236 imputed CETP
## 32370    rs3816117 3.970058e-08 7.4012031  16 56996158 imputed CETP
## 32371    rs1532624 4.763363e-08 7.3220863  16 57005479 imputed CETP
## 32372    rs7205804 4.763363e-08 7.3220863  16 57004889 imputed CETP
## 32386   rs17231506 3.464271e-05 4.4603881  16 56994528 imputed CETP
## 32388     rs183130 3.464271e-05 4.4603881  16 56991363 imputed CETP
## 32391    rs3764261 3.464271e-05 4.4603881  16 56993324 imputed CETP
## 32392     rs821840 3.464271e-05 4.4603881  16 56993886 imputed CETP
## 32394     rs820299 4.474057e-05 4.3492985  16 57000284 imputed CETP
## 32420   rs12149545 9.991848e-05 4.0003542  16 56993161 imputed CETP
## 32445   rs12447839 2.061185e-04 3.6858831  16 56993935 imputed CETP
## 32446   rs12447924 2.061185e-04 3.6858831  16 56994192 imputed CETP
## 32448    rs4783962 2.061185e-04 3.6858831  16 56995038 imputed CETP
## 32480   rs34620476 3.516690e-04 3.4538660  16 56996649 imputed CETP
## 32481     rs708272 3.516690e-04 3.4538660  16 56996288 imputed CETP
## 32482     rs711752 3.516690e-04 3.4538660  16 56996211 imputed CETP
## 32525   rs12447620 3.573152e-04 3.4469486  16 57014319 imputed CETP
## 32526     rs158480 3.573152e-04 3.4469486  16 57008227 imputed CETP
## 32527     rs158617 3.573152e-04 3.4469486  16 57008287 imputed CETP
## 32754   rs11508026 1.932329e-03 2.7139189  16 56999328 imputed CETP
## 32755   rs12444012 1.932329e-03 2.7139189  16 57001438 imputed CETP
## 32756   rs12720926 1.932329e-03 2.7139189  16 56998918 imputed CETP
## 32757    rs4784741 1.932329e-03 2.7139189  16 57001216 imputed CETP
## 33224  rs112039804 4.055013e-03 2.3920078  16 57018856 imputed CETP
## 33225   rs12708985 4.055013e-03 2.3920078  16 57014610 imputed CETP
## 33226     rs736274 4.055013e-03 2.3920078  16 57009769 imputed CETP
## 37683     rs289746 3.208886e-02 1.4936456  16 57020205 imputed CETP
## 73171   rs12934552 2.569000e-01 0.5902359  16 57021433 imputed CETP
## 84766   rs12708983 3.234698e-01 0.4901663  16 57014411 imputed CETP
## 116938  rs12598522 5.233758e-01 0.2811864  16 57022352 imputed CETP
## 116939  rs56315364 5.233758e-01 0.2811864  16 57021524 imputed CETP

Exercise

How many CETP SNPs there are there total? How many were imputed and how many were typed?

Saving work for following labs…

Now we save the data containing CETP SNPs from both the imputed and typed analysis, as well as our work for furture labs.

write.csv(CETP, "CETP.csv", row.names = FALSE)  # save for future use

save(imputed, target, rules, phenodata, support, genes, impCETP, impCETPgeno, 
    imputeOut, GWAA, map2gene, GWASout, GWAScomb, CETP, file = "m4_lab1_save.RData")

On your own

  • Subset the SNPs in the CETP gene for those that reach genome-wide significance (less than \(5 x10^{-8}\)). How many are there? If there are any, how many were typed and how many were imputed?
  • Explain why we use a threshold of \(< 5 x10^{-8}\) to imply ‘genome-wide significance.’
  • What would the equivalent negative log p value threshold be? Would a SNP be significant if it was less than or greater than this threshold?