Asthma Genetics in the UK Biobank and the Global Biobank Engine

I recently wrote a blog post with Manuel Rivas for the Rivas Lab website that explores the genetics of asthma in UK Biobank participants.  In that post we look for known variants associated with asthma as well as identify a loss of function mutation (LOF) in IL33 that confers a protective effect to carriers.  Any LOF mutations that reduce the incidence of disease is interesting because it might be a good target for drug development.  It is far easier to develop a drug to reduce the function of a protein that it is to enhance it, so finding variants like these is quite exciting.

The post we wrote also serves as a guide to using the Global Biobank Engine (GBE).  GBE is a project I worked on during my rotation in the Rivas Lab that has since gone live and is available for the world to use.  GBE allows users to explore genetic and phenotypic data from the UK Biobank and perform statistical analyses including genome-wide association studies and phenome-wide association studies.  

The UK Biobank is a very exciting dataset for the world and provides an unprecedented opportunity to study the genetics of disease for a broad range of phenotypes.  Phenotypes included are lots of things you would expect, like all sorts of disease, demographics, and medications.  They also include some ridiculous stuff like, "Thickness of butter spread on baguettes".  But what if the individual doesn't like baguettes?  What if they prefer oatcakes?  Well don't worry because they asked about that, too.  Now of course you also want to know how many buttered oatcakes they're eating per day, so they asked that as well.  In case you were actually wondering, most people that eat buttered oatcakes have two per day and apply a thin layer of butter.  You may be disappointed to learn that we do not currently have a GWAS available for thickness of butter spread on oatcakes.  Sorry.

We built GBE in Python with a Flask framework.  The backend database is managed by SciDB.  We initially built it using MongoDB but found that once we scaled up to the full dataset it ran a little too slow to serve an interactive website.

I have found GBE to be a great resource for quickly exploring phenotype-genotype relationships when working on GWAS studies.  I encourage everyone to go take a look and make it part of their regular analysis pipeline.