A Literature And Database Mining Pipeline To Systematically Identify And Curate Renal Cancer Risk Factors

Ryan LANGDON, University of Bristol, United Kingdom

1 School of Social and Community Medicine, University of Bristol, Bristol, United Kingdom


Renal cancer is the twelfth most common cancer in the world, with over 328,000 incidence cases in 2012. A range of risk factors for renal cancer have been identified in observational epidemiology; ranging from diabetes to elevated urinary arsenic levels. However, the causal or confounded nature of these observational associations requires further investigation. Mendelian randomization is an approach to robustly evaluate causality using genetic variants associated with hypothesised risk factors of interest which are not susceptible to confounding or reverse causation. We used a literature and database mining pipeline to systematically identify and curate renal cancer risk factors and their associated genetic proxies in order to appraise their causal nature. 

Methods and Results

Literature outlining hypothesised renal cancer risk factors were collated from the PubMed database and organised by prevalence using the statistical programme R (Version 3.0.1) in conjunction with the bibliographic extraction package 'RISmed'. Of an initial 2296 literature results, 232 papers remained, outlining 205 hypothesised risk factors. Genetic variants associated with these risk factors were then assessed by comparison against ‘traits’ in the form of Experimental Factor Ontologies (EFOs) curated by the GWAS Catalog. After matching literature-reported risk factors to EFOs, the strength of these genetic proxies was assessed using a P-value threshold of 5e-8. Each trait was also classified into a primary or secondary analysis, taking into account the variance explained by each trait. A total of 37 risk factors with robust genetic instruments remained to undergo genetic correlation and Mendelian randomization analyses.


Many risk factors associated with renal cancer can be proxied by common genetic variants. Further steps are being taken to appraise the magnitude of the association between these genetic variants and renal cancer in a large GWAS, which may ultimately be used to establish whether they are causal in influencing disease onset.