## SNP level filtering (Part II) - Hardy-Weinberg Equilibrium

(This lab was adapted for statsTeachR by Tu Dao, Siying Chen, Nicholas Reich and Andrea Foulkes from our GWAS tutorial.)

After filtering the sample, we proceed to filter the SNPs with Hardy Weinberg Equilibrium (HWE). HWE states that probability of an allele occuring on one homolog does not depend on which allele is present on the other homolog in a large, randomly mating population that remain stable over generations. Violation of HWE can be an indication of the presence of population substructure or occurence of a genotyping error. While it is not always true, it is common to remove any SNPs that violated HWE to avoid genotyping error. Departures of HWE at a given SNPs are usually measured with a chi-square goodnness-of-fit test between the observed and expected genotypes.

In this lab, we filter our SNPs based on HWE p < 1*10^-6 in CAD controls. It results in 7643 SNPs remain for the association analysis.

load("lab3_save.RData")
source("http://bioconductor.org/biocLite.R")
biocLite("snpStats")
##
##  /var/folders/_9/8f93yrbn70v__gsn6550wbqr0000gn/T//Rtmp2scphX/downloaded_packages
biocLite("gdsfmt")
##
##  /var/folders/_9/8f93yrbn70v__gsn6550wbqr0000gn/T//Rtmp2scphX/downloaded_packages
library(snpStats)
library(gdsfmt)
hardy <- 10^-6
HWEuse <- with(snpsum.colCont, !is.na(z.HWE) & ( abs(z.HWE) < abs( qnorm(hardy/2) ) ) )
rm(snpsum.colCont)
HWEuse[is.na(HWEuse)] <- FALSE
cat(ncol(genotype)-sum(HWEuse),"SNPs will be removed due to high HWE.\n") 
## 765 SNPs will be removed due to high HWE.
genotype <- genotype[,HWEuse]
print(genotype)                           
## A SnpMatrix with  1400 rows and  381257 columns
## Row names:  10002 ... 11596
## Col names:  rs4579145 ... rs946221

#### Exercise

Why do we have to filter the sample before filter the SNPs with HWE? How many SNPs have a HWE p-value of 10^-5 or less?

### Saving work for following labs…

You can save the necessary data from this lab by running the following commands:

save(genotype, genoBim, clinical, file= "lab4_save.RData")