Installing Kallisto (RNA-Seq quantification program) on our Red Hat Cluster

Today I decided to install the RNA-Seq quantification tool Kallisto  for testing, this appears to be a good tool ( detailed here ) and I have high confidence on the underlying theoretical rigor, which usually the bane of any new tool.

Straight awa, I ran into some installation troubles. After troubleshooting for a couple of hours I finally got the tool installed from source. First, I ran into this error using gcc

Untitled0

After googling around a little bit, i found that the version of gcc we have did not support the C++-11 standard  and i had to make sure  we had the latest  gcc modules loaded .. but lo behold the cmake still failed to recognize the newgcc  modules

Untitled1

Some more googling revealed that perhaps i need to change the CC and CXX environment variables and i did that as well to no avail

Untitled2

More googling revealed that there was a catch to all this, apparently to reset cmake variables for the C compiler, the entire build directory needs to be deleted and we should start clean. So I went ahead and deleted the kalissto source folder, and untarred the download and re-ran CMAKE and voila!!! it finally compiled

Untitled3

a word of caution here is that you need to have the $CC and $CXX variables set as well

Finally, if you have your hd5 libraries installed in a non-standard location or the ones in your standard location are old and you want to use newer libraries for whatever reason, you need to edit the CMakeLists.txt file as below

Untitled5

i have added the two lines whereupon make will find the relevant libraries for you if the find function follows


set(CMAKE_LIBRARY_PATH ${CMAKE_LIBRARY_PATH} /apps/lab/miket/hdf/1.8.14)
set(CMAKE_INCLUDE_PATH /apps/lab/miket/hdf/1.8.14/include {CMAKE_INCLUDE_PATH})

 how to edit your CMakeLists.txt if you want to include libraries from non-standard location

Advertisements

Some tips and tricks for creating Manhattan plots in R

Recently, I was trying to create a manhattan plot of gene expression for a genome-wide RNA-seq study. I had earlier created manhattan plots about an year or two ago and I recall i wrote my own functions to do so as all the packages/functions out there required data to be in some particular format or the other. This was frustrating as all i had was  chromosome, start, end, pvalues or any other value such as fold change that I wanted to interrogate. Long story short, I could not find the code I had written and had to rewrite my functions from scratch. This time round i found myself to be more efficient however  as there were some tricks that I found was useful. I just ended up using the R plot function for the plotting

  • Trick #1 Using levels to re-order the chromosome: Lets say we had a data set with 4 chromosomes and when you queried the levels you got something like

>levels(dat$chromosome)

> "chr1" "chr10" "chr2"

In the default order , the manhattan plot gets rendered as chr1 first, chr10, second and chr2 last. To reorder the chromosomes for plots, just reorder the levels of the factor


levels(data$chromosome) <- factor(data$chromosome, levels(data$chromosome)[c(1,3,2)])

  • Other tips are that I just use base R graphics to create a dot plot as this allows a level of control on each point that I have not found in any package. you can use GGplot if you are so inclined, the hardest part was to figure out how to quickly reorder the levels of the Chromosome. I will post my code and an example figure up here some time soon and I would also welcome any comments on improving the code

Remove Soft Clipped reads from bam files

I came across this awk one liner to remove soft clipped reads from a bam file, while following a post for a cufflinks issue and thought it would be good to keep note

awk 'BEGIN {OFS="\t"} {split($6,C,/[0-9]*/); split($6,L,/[SMDIN]/); if (C[2]=="S") {$10=substr($10,L[1]+1); $11=substr($11,L[1]+1)}; if (C[length(C)]=="S") {L1=length($10)-L[length(L)-1]; $10=substr($10,1,L1); $11=substr($11,1,L1); }; gsub(/[0-9]*S/,"",$6); print}' Aligned.out.sam > Aligned.noS.sam

how to send mail stuck in outbox in thunderbird

Recently, I ran into this conundrum. I was trying to send an email using the Thunderbird mail client when my connectivity was intermittent and the email got stuck in the unsent messages list. I tried resending after restarting the client, but it refused to do so. I forwarded the message to the person I was replying to just in case, I was unable to send this out. I wondered what would happen if i moved the message to the Drafts folder and Voila!!! there my message was ready to edit and I could just hit send.

Installation error in IRanges for R 3.0 str_utils.c:158: error: ‘timezone’

I ran into this error on our HPC cluster when installing IRanges for R-3.0.0

gcc -std=c99 -L/usr/lib64 -L/source/gnu_4.4/lib64 -I/apps/source/R/R-3.0.0/R-3.0.0/lib64/R/include -DNDEBUG -I/usr/local/include -fpic -o2 -c str_utils.c -o str_utils.o
str_utils.c: In function ‘get_svn_time’:
str_utils.c:156: warning: implicit declaration of function ‘tzset’
str_utils.c:158: error: ‘timezone’ undeclared (first use in this function)
str_utils.c:158: error: (Each undeclared identifier is reported only once
str_utils.c:158: error: for each function it appears in.)
make: *** [str_utils.o] Error 1
ERROR: compilation failed for package ‘IRanges’
* removing ‘/PHShome/ar474/R/x86_64-unknown-linux-gnu-library/3.0/IRanges’
WARNING: ignoring environment value of R_HOME
ERROR: dependency ‘IRanges’ is not available for package ‘XVector’
* removing ‘/PHShome/ar474/R/x86_64-unknown-linux-gnu-library/3.0/XVector’
WARNING: ignoring environment value of R_HOME
ERROR: dependencies ‘IRanges’, ‘XVector’ are not available for package ‘GenomicRanges’
* removing ‘/PHShome/ar474/R/x86_64-unknown-linux-gnu-library/3.0/GenomicRanges’  

This is annoying as there are packages in R that depend g on IRanges, For eg. GenomicRanges.   As I don’t have sysadmin privileges, nor do am I interested in reinstalling R from scratch locally, I did the following steps and the install worked:

  1. Download the IRanges package that is compatible with your version of R, in my case it was IRanges_1.22.9.tar.gz
  2. untar the package somewhere local which will unpack the package into folder called IRanges 
     tar -xvzf  IRanges_1.20.7.tar.gz 
  3. I then modified the following lines (156,158)of source code  in IRanges/src/
     vi IRanges/src/str_utils.c
      152 #if defined(__APPLE__) || defined(__FreeBSD__)
     153 //'struct tm' has no member named 'tm_gmtoff' on Windows+MinGW
     154 utc_offset = result.tm_gmtoff / 3600;
     155 #else /* defined(__APPLE__) || defined(__FreeBSD__) */
     156 tzset();
     157 //timezone is not portable (is a function, not a long, on OS X Tiger)
     158 utc_offset = - (timezone / 3600);
      152 #if defined(__APPLE__) || defined(__FreeBSD__)
     153 //'struct tm' has no member named 'tm_gmtoff' on Windows+MinGW
     154 utc_offset = result.tm_gmtoff / 3600;
     155 #else /* defined(__APPLE__) || defined(__FreeBSD__) */
     156 //tzset(); AR
     157 //timezone is not portable (is a function, not a long, on OS X Tiger)
     158 utc_offset = - (time(NULL) / 3600);
  4. Finally I re-tar gzipped the package
    tar -cvf IRanges.tar.gz IRanges
  5. And i now ran  local installation of the IRanges package I created
    > install.packages("~/downloads/IRanges.tar.gz") 
  6. The resulting installation message is
     Creating a generic function for ‘as.table’ from package ‘base’ in package ‘IRanges’
    Creating a generic function for ‘t’ from package ‘base’ in package ‘IRanges’
    ** help
    *** installing help indices
    ** building package indices
    ** installing vignettes
    ** testing if installed package can be loaded
    * DONE (IRanges)
    >