Some tips and tricks for creating Manhattan plots in R

Recently, I was trying to create a manhattan plot of gene expression for a genome-wide RNA-seq study. I had earlier created manhattan plots about an year or two ago and I recall i wrote my own functions to do so as all the packages/functions out there required data to be in some particular format or the other. This was frustrating as all i had was  chromosome, start, end, pvalues or any other value such as fold change that I wanted to interrogate. Long story short, I could not find the code I had written and had to rewrite my functions from scratch. This time round i found myself to be more efficient however  as there were some tricks that I found was useful. I just ended up using the R plot function for the plotting

  • Trick #1 Using levels to re-order the chromosome: Lets say we had a data set with 4 chromosomes and when you queried the levels you got something like

>levels(dat$chromosome)

> "chr1" "chr10" "chr2"

In the default order , the manhattan plot gets rendered as chr1 first, chr10, second and chr2 last. To reorder the chromosomes for plots, just reorder the levels of the factor


levels(data$chromosome) <- factor(data$chromosome, levels(data$chromosome)[c(1,3,2)])

  • Other tips are that I just use base R graphics to create a dot plot as this allows a level of control on each point that I have not found in any package. you can use GGplot if you are so inclined, the hardest part was to figure out how to quickly reorder the levels of the Chromosome. I will post my code and an example figure up here some time soon and I would also welcome any comments on improving the code