Quick and Dirty Save for R workspace objects

I recently found that if i used the same function on multiple datasets, I needed to save the objects created into a RData object, so that I can upload all the processed data from all datasets simultaneously. One way to do this is to somehow hard code this into your function so that each variable is prefixed with a unique id. For example , if I have dataset1 and dataset2 and I wanted to process them using my function which creates obj1, obj2  then i would have to create objects called d1_obj1, d1_obj2 for the first dataset and d2_obj1 and d2_obj2 for the second dataset. This might be simple enough with a couple of objects, however I wanted to save about 20+ objects from within the function as I planned for some comparisons that would involve those objects. Then i came across the R function get and assign , which made the job i was dreading of ( reassigning each  object somehow and debugging it all ) very easy. I just created a named list and assigned all the objects to the list using get with my specific prefix for the objects just before the function exits and save that list using the save function. Below is the code I use

objectNames <-ls(all.names=T)
res<- list()
for ( o in objectNames)
res[[paste(proj,o,sep="_")]] <- get(o)
format(object.size(res), units="Gb")

If anyone has anything better please do not hesitate to let me know, as i feel that this is somewhat hawkish.


Fixing RDAVIDWebService on Yosemite

I was recently trying to install the R/Bioconductor package RDAVIDWebService and I got the error that the URL has changed. Of course i have to run GO enrichments for about 36 Gene Lists and needed and automated way to do this, which means I needed to figure out if I can install the package.

After some wrangling I got the package to run and while a lot of the steps were mentioned by the package maintainer Cristóbal Fresno here, I found that i had to get some issues resolved myself. So I am putting up the steps i went through in case anyone runs into a similar situation. Briefly, all the steps were described in the post by Fresno and i will walk through those steps as i did myself.

One main thing is to register yourself for an account for using DAVID web service

Work around for the new DAVID Web service configuration V 2.0

1) First of all the HTTPS certificate needs Java 8 in order to run.

Previous versions will not run due to prime size. The maximum size that Java accepts is 1024 bits. This is a known issue (see JDK-6521495).

1.1) Check your java version

java -version

If the version is 1.7.XX or earlier then you need to install Java 8.

I  found that I had to install Java and i went the got the Java 1.8 JDK from the Oracle Website. Specifically I downloaded the files from this particular section


and then proceeded to install the apple package.

Installing openssl

2.3) In MAC (tested in Yosemite) the certificate will not work for the present stable openssl version 0.9.8.

2.3.1) Check your openssl version

openssl version

OpenSSL 0.9.8

If it is >= 1.0.2.d then go to step 2.3.3)

2.3.2) Update your openssl, i.e., download, compile and install it

Download the official release from OpenSSL >= 1.0.2.d

tar -xzvf openssl-1.0.2d.tar.gz

cd openssl-1.0.2d

#Compile it with 64 bits support

./Configure darwin64-x86_64-cc
make test
sudo make install

Now you may need to reflect the change in your system if openssl version keeps pointing to the old version.

cd /usr/bin

mv sudo openssl openssl098
sudo ln -s /usr/local/ssl/bin/openssl openssl

Now i tried to do this using homebrew  that is just trying to update using my brew installation. However I ran into a ruby version error see this post.  Luckily i think i had installed brew from github, so one of the solutions there worked for me


However when I did a it pull i got an error regrarding one of the files and I ended up deleting it. See below for the set of commands i used [I have to apologize that i am unable to reconstruct the error messages as I am writing this post after successfully installing the R packages and I had closed my terminal window]


Once openssl was installed I realized that it was installed in


So I ended up making a symlink to the openssl over there as mentioned about

 467 cd /usr/bin/
 468 ls -ltrh openssl
 469 openssl version
 470 sudo mv openssl openssl098zg
 471 sudo ln -s /usr/local/Cellar/openssl/1.0.2d_1/bin/openssl

Now finally we have openssl working.

Adding the CA cert for DAVID worked as the instructions below

2.3.3)  Get DAVID’s certificate and install it into cacerts

Get the certificate:


echo -n | openssl s_client -connect david.ncifcrf.gov:443 | sed -ne ‘/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p’ > ncifcrf.cert

Check if it was properly downloaded:
openssl x509 -in ncifcrf.cert -text

Backup the cacerts file. In my case the 1.8.0_60 jdk version is located in /Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/security/ directory

sudo find / -name cacerts
/Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home/lib/security/cacerts

sudo cp /Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/security/cacerts .

sudo cp /Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/security/cacerts cacerts.org
sudo keytool -import -trustcacerts -keystore cacerts -storepass changeit -noprompt -alias david -file ncifcrf.cert
Certificate was added to keystore

The certificate should be added to the keystore. Now, copy the new cacerts version to the original position

sudo cp cacerts /Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/security/cacerts

The only hiccups in updating the R Java configuration were the following

  • I wanted to update the R.app  install. The actually executable was under

however if you to call this in terminal it kept popping up the GUI and i could not use R CMD as i originally planned. The workaround is to cd into the directory

cd /Applications/R.app/Contents/MacOS/

and then call R locally which works

sudo R CMD javareconf

Also to reinstall the rJava  package  from source I also had to specify the repo i.e use

 install.packages ("rJava", type="source",repos="http://cran.case.edu")

as there was an issue with the tcl/tk libraries in opening up the pop-up for selecting can mirrors

3) Update Java configuration in R. The output may slightly change from windows, linux or mac.

R CMD javareconf

Java interpreter : /usr/bin/java

Java version     : 1.8.0_60

Java home path   : /usr/lib/jvm/java-8-oracle/jre

Java compiler    : /usr/bin/javac

Java headers gen.: /usr/bin/javah

Java archive tool: /usr/bin/jar

Please check that both Java version and path are appropriate. In addition JNI support should also be available.

4) Check that rJava R library works as supposed to.




.jcall(“java/lang/System”, “S”, “getProperty”, “java.runtime.version”)

[1] “1.8.0_20-b26”

In Mac the rJava that downloads is tied to 1.6 java version. If it is the case, you should install it from the source.

install.packages(‘rJava’, type=’source’)



.jcall(“java/lang/System”, “S”, “getProperty”, “java.runtime.version”)

[1] “1.8.0_20-b26”

And thats it now DAVID works.

Installing Kallisto (RNA-Seq quantification program) on our Red Hat Cluster

Today I decided to install the RNA-Seq quantification tool Kallisto  for testing, this appears to be a good tool ( detailed here ) and I have high confidence on the underlying theoretical rigor, which usually the bane of any new tool.

Straight awa, I ran into some installation troubles. After troubleshooting for a couple of hours I finally got the tool installed from source. First, I ran into this error using gcc


After googling around a little bit, i found that the version of gcc we have did not support the C++-11 standard  and i had to make sure  we had the latest  gcc modules loaded .. but lo behold the cmake still failed to recognize the newgcc  modules


Some more googling revealed that perhaps i need to change the CC and CXX environment variables and i did that as well to no avail


More googling revealed that there was a catch to all this, apparently to reset cmake variables for the C compiler, the entire build directory needs to be deleted and we should start clean. So I went ahead and deleted the kalissto source folder, and untarred the download and re-ran CMAKE and voila!!! it finally compiled


a word of caution here is that you need to have the $CC and $CXX variables set as well

Finally, if you have your hd5 libraries installed in a non-standard location or the ones in your standard location are old and you want to use newer libraries for whatever reason, you need to edit the CMakeLists.txt file as below


i have added the two lines whereupon make will find the relevant libraries for you if the find function follows

set(CMAKE_LIBRARY_PATH ${CMAKE_LIBRARY_PATH} /apps/lab/miket/hdf/1.8.14)
set(CMAKE_INCLUDE_PATH /apps/lab/miket/hdf/1.8.14/include {CMAKE_INCLUDE_PATH})

 how to edit your CMakeLists.txt if you want to include libraries from non-standard location

Some tips and tricks for creating Manhattan plots in R

Recently, I was trying to create a manhattan plot of gene expression for a genome-wide RNA-seq study. I had earlier created manhattan plots about an year or two ago and I recall i wrote my own functions to do so as all the packages/functions out there required data to be in some particular format or the other. This was frustrating as all i had was  chromosome, start, end, pvalues or any other value such as fold change that I wanted to interrogate. Long story short, I could not find the code I had written and had to rewrite my functions from scratch. This time round i found myself to be more efficient however  as there were some tricks that I found was useful. I just ended up using the R plot function for the plotting

  • Trick #1 Using levels to re-order the chromosome: Lets say we had a data set with 4 chromosomes and when you queried the levels you got something like


> "chr1" "chr10" "chr2"

In the default order , the manhattan plot gets rendered as chr1 first, chr10, second and chr2 last. To reorder the chromosomes for plots, just reorder the levels of the factor

levels(data$chromosome) <- factor(data$chromosome, levels(data$chromosome)[c(1,3,2)])

  • Other tips are that I just use base R graphics to create a dot plot as this allows a level of control on each point that I have not found in any package. you can use GGplot if you are so inclined, the hardest part was to figure out how to quickly reorder the levels of the Chromosome. I will post my code and an example figure up here some time soon and I would also welcome any comments on improving the code

Remove Soft Clipped reads from bam files

I came across this awk one liner to remove soft clipped reads from a bam file, while following a post for a cufflinks issue and thought it would be good to keep note

awk 'BEGIN {OFS="\t"} {split($6,C,/[0-9]*/); split($6,L,/[SMDIN]/); if (C[2]=="S") {$10=substr($10,L[1]+1); $11=substr($11,L[1]+1)}; if (C[length(C)]=="S") {L1=length($10)-L[length(L)-1]; $10=substr($10,1,L1); $11=substr($11,1,L1); }; gsub(/[0-9]*S/,"",$6); print}' Aligned.out.sam > Aligned.noS.sam

how to send mail stuck in outbox in thunderbird

Recently, I ran into this conundrum. I was trying to send an email using the Thunderbird mail client when my connectivity was intermittent and the email got stuck in the unsent messages list. I tried resending after restarting the client, but it refused to do so. I forwarded the message to the person I was replying to just in case, I was unable to send this out. I wondered what would happen if i moved the message to the Drafts folder and Voila!!! there my message was ready to edit and I could just hit send.

Installation error in IRanges for R 3.0 str_utils.c:158: error: ‘timezone’

I ran into this error on our HPC cluster when installing IRanges for R-3.0.0

gcc -std=c99 -L/usr/lib64 -L/source/gnu_4.4/lib64 -I/apps/source/R/R-3.0.0/R-3.0.0/lib64/R/include -DNDEBUG -I/usr/local/include -fpic -o2 -c str_utils.c -o str_utils.o
str_utils.c: In function ‘get_svn_time’:
str_utils.c:156: warning: implicit declaration of function ‘tzset’
str_utils.c:158: error: ‘timezone’ undeclared (first use in this function)
str_utils.c:158: error: (Each undeclared identifier is reported only once
str_utils.c:158: error: for each function it appears in.)
make: *** [str_utils.o] Error 1
ERROR: compilation failed for package ‘IRanges’
* removing ‘/PHShome/ar474/R/x86_64-unknown-linux-gnu-library/3.0/IRanges’
WARNING: ignoring environment value of R_HOME
ERROR: dependency ‘IRanges’ is not available for package ‘XVector’
* removing ‘/PHShome/ar474/R/x86_64-unknown-linux-gnu-library/3.0/XVector’
WARNING: ignoring environment value of R_HOME
ERROR: dependencies ‘IRanges’, ‘XVector’ are not available for package ‘GenomicRanges’
* removing ‘/PHShome/ar474/R/x86_64-unknown-linux-gnu-library/3.0/GenomicRanges’  

This is annoying as there are packages in R that depend g on IRanges, For eg. GenomicRanges.   As I don’t have sysadmin privileges, nor do am I interested in reinstalling R from scratch locally, I did the following steps and the install worked:

  1. Download the IRanges package that is compatible with your version of R, in my case it was IRanges_1.22.9.tar.gz
  2. untar the package somewhere local which will unpack the package into folder called IRanges 
     tar -xvzf  IRanges_1.20.7.tar.gz 
  3. I then modified the following lines (156,158)of source code  in IRanges/src/
     vi IRanges/src/str_utils.c
      152 #if defined(__APPLE__) || defined(__FreeBSD__)
     153 //'struct tm' has no member named 'tm_gmtoff' on Windows+MinGW
     154 utc_offset = result.tm_gmtoff / 3600;
     155 #else /* defined(__APPLE__) || defined(__FreeBSD__) */
     156 tzset();
     157 //timezone is not portable (is a function, not a long, on OS X Tiger)
     158 utc_offset = - (timezone / 3600);
      152 #if defined(__APPLE__) || defined(__FreeBSD__)
     153 //'struct tm' has no member named 'tm_gmtoff' on Windows+MinGW
     154 utc_offset = result.tm_gmtoff / 3600;
     155 #else /* defined(__APPLE__) || defined(__FreeBSD__) */
     156 //tzset(); AR
     157 //timezone is not portable (is a function, not a long, on OS X Tiger)
     158 utc_offset = - (time(NULL) / 3600);
  4. Finally I re-tar gzipped the package
    tar -cvf IRanges.tar.gz IRanges
  5. And i now ran  local installation of the IRanges package I created
    > install.packages("~/downloads/IRanges.tar.gz") 
  6. The resulting installation message is
     Creating a generic function for ‘as.table’ from package ‘base’ in package ‘IRanges’
    Creating a generic function for ‘t’ from package ‘base’ in package ‘IRanges’
    ** help
    *** installing help indices
    ** building package indices
    ** installing vignettes
    ** testing if installed package can be loaded
    * DONE (IRanges)