UPGRADING R on Mac OSX: Quick and Dirty

A quick note on painless upgrade of R in the OSX environment and I think this should work on most systems. I was originally running 3.2.2 on my macbook and when i tried to install a package i ran into dependency issues and bugs that were fixed in the later version. So, of course back to installing the latest version of R. One issue that always crops up is that since the last install, I have downloaded a bunch of packages from both CRAN and BioConductor and wanted a quick way of updating and re-installing. Previously, I had just worked from a clean install as I did not have any major analysis ongoing and this time it was different as I do have analysis ongoing and did not want to deal with the install on demand. Googling turned up a few suggestions and I used it to construct a quick way to upgrade

First, before re-installing get the .libPaths() value for the current install. Now reinstall R. After reinstalling, we get a list of packages installed in the previous version using the commands below

## shows the path for the new version
>.libPaths()
[1] "/Library/Frameworks/R.framework/Versions/3.3/Resources/library"

So we use the paths from the previous version of R

> package_df <- as.data.frame(installed.packages("/Library/Frameworks/R.framework/Versions/3.2/Resources/library"))
> package_list <- as.character(package_df$Package)
> install.packages(package_list)

This works to install all packages from CRAN, however I got this error message after:

Warning message:
packages ‘affy’, ‘affyio’, ‘airway’, ‘ALL’, ‘annotate’, ‘AnnotationDbi’, ‘AnnotationForge’, ‘aroma.light’, ‘Biobase’, ‘BiocGenerics’, ‘biocGraph’, ‘BiocInstaller’, ‘BiocParallel’, ‘BiocStyle’, ‘biomaRt’, ‘BioNet’, ‘Biostrings’, ‘biovizBase’, ‘BSgenome’, ‘BSgenome.Hsapiens.UCSC.hg19’, ‘Category’, ‘cellHTS2’, ‘chipseq’, ‘clipper’, ‘clusterProfiler’, ‘ComplexHeatmap’, ‘cqn’, ‘DEGraph’, ‘DESeq’, ‘DESeq2’, ‘DO.db’, ‘DOSE’, ‘DynDoc’, ‘EDASeq’, ‘edgeR’, ‘EnrichmentBrowser’, ‘FGNet’, ‘fibroEset’, ‘gage’, ‘gageData’, ‘genefilter’, ‘geneplotter’, ‘GenomeInfoDb’, ‘GenomicAlignments’, ‘GenomicFeatures’, ‘GenomicRanges’, ‘ggbio’, ‘globaltest’, ‘GO.db’, ‘GOSemSim’, ‘GOSim’, ‘GOstats’, ‘GOsummaries’, ‘graph’, ‘graphite’, ‘GSEABase’, ‘hgu133a.db’, ‘hgu133plus2.db’, ‘hgu95 [... truncated]

As I suspected, the packages from BioConductor were not installed. So I decided to use the same approach and came up with the following:

  • Get a list of the packages installed in the current version from CRAN
> package_df_new <- as.data.frame(installed.packages("/Library/Frameworks/R.framework/Versions/3.3/Resources/library"))
> package_list_new <- as.character(package_df_new$Package)
  • Compare that list to the old list and the packages not in the new list are from BioConductor
> package_bioc <- package_list[-c(which(package_list %in%package_list_new))]
  • Finally, install those packages from Bioconductor
> source("https://bioconductor.org/biocLite.R")
trying URL 'https://bioconductor.org/packages/3.3/bioc/bin/macosx/mavericks/contrib/3.3/BiocInstaller_1.22.3.tgz'
Content type 'application/x-gzip' length 54312 bytes (53 KB)
==================================================
downloaded 53 KB


The downloaded binary packages are in
        /var/folders/vd/jrcwgw214pv9svrflbdtjl3m0000gn/T//RtmpKKXIWa/downloaded_packages
Bioconductor version 3.3 (BiocInstaller 1.22.3), ?biocLite for help
> biocLite(package_bioc)
Advertisements

Fixing rJava error on Mac OSX El Capitan

I was trying to reinstall RDAVIDWebService after upgrading to R-3.3.1 on El Capitan and I followed the steps from my previous blog post. Now you dont have to fix the ssl issue, but you still have to reconfigure java. After the step for reinstalling rJava I still ran into this error

Error : .onLoad failed in loadNamespace() for 'rJava', details:
  call: dyn.load(file, DLLpath = DLLpath, ...)
  error: unable to load shared object '/Library/Frameworks/R.framework/Versions/3.3/Resources/library/rJava/libs/rJava.so':
  dlopen(/Library/Frameworks/R.framework/Versions/3.3/Resources/library/rJava/libs/rJava.so, 6): Library not loaded: @rpath/libjvm.dylib
  Referenced from: /Library/Frameworks/R.framework/Versions/3.3/Resources/library/rJava/libs/rJava.so
  Reason: image not found
Error: package or namespace load failed for ‘RDAVIDWebService’

Ugh.. some issues with the rJava library. Google turned up with this result from StackOverflow and i just ran the following command in the terminal

sudo ln -f -s $(/usr/libexec/java_home)/jre/lib/server/libjvm.dylib /usr/local/lib

and it worked right away

Steps to create R packages in Emacs Org-mode

Create a package skeleton

I am following the steps in Hadley Wickam’s excellent tutorial on how to create R-packages, but i have decided to see whether I can use emacs-org mode to do it. First lets load the devtools package

library(devtools)
devtools
stats
graphics
grDevices
utils
datasets
methods
base

Now we create a skeleton for the package using devtools::*create*. Since, we are using emacs and not RStudio i set the rstudio option to false

devtools::create("~/Documents/Research/R-packages/bootWGCNA",rstudio=F)
Package: bootWGCNA
Title: What the Package Does (one line, title case)
Version: 0.0.0.9000
Authors@R: person("First", "Last", email = "first.last@example.com", role = c("aut", "cre"))
Description: What the package does (one paragraph).
Depends: R (>= 3.2.2)
License: What license is it under?
LazyData: true

Now we edit the basic DESCRIPTION in the meta data file and add the title etc

We can add Imports and Suggests to the DESCRIPTION using devtools as follows

devtools::use_package("WGCNA", pkg="~/Documents/Research/R-packages/bootWGCNA")
Adding WGCNA to Imports
Refer to functions with WGCNA::fun()

It might print the message below based on whether you run the function in the R console or not. The best way is to check the DESCRIPTION file in another emacs buffer

Finally, we create an automated tests folder as follows

devtools::use_testthat(pkg="~/Documents/Research/R-packages/bootWGCNA")

I think now you can just create your R files in the package and edit clean it up using emacs. You could test a function in org-mode keeping track of your work and once done, just create an R file in the package directory structure

Quick Note: ggplot2 namespace error on loading

More than anything else this is a quick reminder to myself and perhaps anyone else who encounters this error. When running an R session with the WGCNA package loaded, i get an error with ggplot2. This is because WGCNA loads the Hmisc namespace which in turn loads the ggplot2 namesake. However, both ggplot2 and Hmisc packages are not loaded. Therefore, when  i try to load ggplot2 i get this error

library(ggplot2)
Error in unloadNamespace(package) :
namespace ggplot2€™ is imported by €˜Hmisc€™ so cannot be unloaded
Error in library(ggplot2) :
Package €ggplot2€™ version 2.0.0 cannot be unloaded

I keep forgetting the load/detach/unload stuff.. so seeing that I had to google this a second time I decided to write this up so that I remember

First unload the WGCNA package

detach(package:WGCNA)

then unload the WGCNA namespace

unloadNamespace(“WGCNA”)

finally unload the Hmisc namespace

unloadNamespace(“Hmisc”)

and now library(ggplot2) will work as before

 

R sink() … closeAllConnections

This is more to remind me about the closeAllConnections() function in R and hope its also useful for someone out there. While using sink() in a function to write output to a file, if that function exits then R does not resume writing to the console, even if you use the sink() function to close. I think this is because the connection is still open in the functions environment. The closeAllConnections tip that i found from this stackOverflow thread resolved it.  See also  documentation in base R 

Quick and Dirty Save for R workspace objects

I recently found that if i used the same function on multiple datasets, I needed to save the objects created into a RData object, so that I can upload all the processed data from all datasets simultaneously. One way to do this is to somehow hard code this into your function so that each variable is prefixed with a unique id. For example , if I have dataset1 and dataset2 and I wanted to process them using my function which creates obj1, obj2  then i would have to create objects called d1_obj1, d1_obj2 for the first dataset and d2_obj1 and d2_obj2 for the second dataset. This might be simple enough with a couple of objects, however I wanted to save about 20+ objects from within the function as I planned for some comparisons that would involve those objects. Then i came across the R function get and assign , which made the job i was dreading of ( reassigning each  object somehow and debugging it all ) very easy. I just created a named list and assigned all the objects to the list using get with my specific prefix for the objects just before the function exits and save that list using the save function. Below is the code I use

objectNames <-ls(all.names=T)
res<- list()
for ( o in objectNames)
{
res[[paste(proj,o,sep="_")]] <- get(o)
}
format(object.size(res), units="Gb")

If anyone has anything better please do not hesitate to let me know, as i feel that this is somewhat hawkish.

Fixing RDAVIDWebService on Yosemite

I was recently trying to install the R/Bioconductor package RDAVIDWebService and I got the error that the URL has changed. Of course i have to run GO enrichments for about 36 Gene Lists and needed and automated way to do this, which means I needed to figure out if I can install the package.

After some wrangling I got the package to run and while a lot of the steps were mentioned by the package maintainer Cristóbal Fresno here, I found that i had to get some issues resolved myself. So I am putting up the steps i went through in case anyone runs into a similar situation. Briefly, all the steps were described in the post by Fresno and i will walk through those steps as i did myself.

One main thing is to register yourself for an account for using DAVID web service

Work around for the new DAVID Web service configuration V 2.0

1) First of all the HTTPS certificate needs Java 8 in order to run.

Previous versions will not run due to prime size. The maximum size that Java accepts is 1024 bits. This is a known issue (see JDK-6521495).

1.1) Check your java version

java -version

If the version is 1.7.XX or earlier then you need to install Java 8.

I  found that I had to install Java and i went the got the Java 1.8 JDK from the Oracle Website. Specifically I downloaded the files from this particular section

javaLocation

and then proceeded to install the apple package.

Installing openssl

2.3) In MAC (tested in Yosemite) the certificate will not work for the present stable openssl version 0.9.8.

2.3.1) Check your openssl version

openssl version

OpenSSL 0.9.8

If it is >= 1.0.2.d then go to step 2.3.3)

2.3.2) Update your openssl, i.e., download, compile and install it

Download the official release from OpenSSL >= 1.0.2.d

tar -xzvf openssl-1.0.2d.tar.gz

cd openssl-1.0.2d

#Compile it with 64 bits support

./Configure darwin64-x86_64-cc
make
make test
sudo make install

Now you may need to reflect the change in your system if openssl version keeps pointing to the old version.

cd /usr/bin

mv sudo openssl openssl098
sudo ln -s /usr/local/ssl/bin/openssl openssl

Now i tried to do this using homebrew  that is just trying to update using my brew installation. However I ran into a ruby version error see this post.  Luckily i think i had installed brew from github, so one of the solutions there worked for me

brewfix

However when I did a it pull i got an error regrarding one of the files and I ended up deleting it. See below for the set of commands i used [I have to apologize that i am unable to reconstruct the error messages as I am writing this post after successfully installing the R packages and I had closed my terminal window]

brewIssues

Once openssl was installed I realized that it was installed in

 /usr/local/Cellar

So I ended up making a symlink to the openssl over there as mentioned about

 467 cd /usr/bin/
 468 ls -ltrh openssl
 469 openssl version
 470 sudo mv openssl openssl098zg
 471 sudo ln -s /usr/local/Cellar/openssl/1.0.2d_1/bin/openssl

Now finally we have openssl working.

Adding the CA cert for DAVID worked as the instructions below

2.3.3)  Get DAVID’s certificate and install it into cacerts

Get the certificate:

cd

echo -n | openssl s_client -connect david.ncifcrf.gov:443 | sed -ne ‘/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p’ > ncifcrf.cert

Check if it was properly downloaded:
openssl x509 -in ncifcrf.cert -text

Backup the cacerts file. In my case the 1.8.0_60 jdk version is located in /Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/security/ directory

sudo find / -name cacerts
/Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home/lib/security/cacerts
/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/lib/security/cacerts
/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/security/cacerts
/System/Library/Java/Support/CoreDeploy.bundle/Contents/Home/lib/security/cacerts
/System/Library/Java/Support/Deploy.bundle/Contents/Home/lib/security/cacerts

sudo cp /Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/security/cacerts .

sudo cp /Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/security/cacerts cacerts.org
sudo keytool -import -trustcacerts -keystore cacerts -storepass changeit -noprompt -alias david -file ncifcrf.cert
Certificate was added to keystore

The certificate should be added to the keystore. Now, copy the new cacerts version to the original position

sudo cp cacerts /Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/security/cacerts

The only hiccups in updating the R Java configuration were the following

  • I wanted to update the R.app  install. The actually executable was under
    /Applications/R.app/Contents/MacOS/R

however if you to call this in terminal it kept popping up the GUI and i could not use R CMD as i originally planned. The workaround is to cd into the directory

cd /Applications/R.app/Contents/MacOS/

and then call R locally which works

sudo R CMD javareconf

Also to reinstall the rJava  package  from source I also had to specify the repo i.e use

 install.packages ("rJava", type="source",repos="http://cran.case.edu")

as there was an issue with the tcl/tk libraries in opening up the pop-up for selecting can mirrors

3) Update Java configuration in R. The output may slightly change from windows, linux or mac.

R CMD javareconf

Java interpreter : /usr/bin/java

Java version     : 1.8.0_60

Java home path   : /usr/lib/jvm/java-8-oracle/jre

Java compiler    : /usr/bin/javac

Java headers gen.: /usr/bin/javah

Java archive tool: /usr/bin/jar

Please check that both Java version and path are appropriate. In addition JNI support should also be available.

4) Check that rJava R library works as supposed to.

R

library(rJava)

.jinit()

.jcall(“java/lang/System”, “S”, “getProperty”, “java.runtime.version”)

[1] “1.8.0_20-b26”

In Mac the rJava that downloads is tied to 1.6 java version. If it is the case, you should install it from the source.

install.packages(‘rJava’, type=’source’)

library(rJava)

.jinit()

.jcall(“java/lang/System”, “S”, “getProperty”, “java.runtime.version”)

[1] “1.8.0_20-b26”

And thats it now DAVID works.

Installing Kallisto (RNA-Seq quantification program) on our Red Hat Cluster

Today I decided to install the RNA-Seq quantification tool Kallisto  for testing, this appears to be a good tool ( detailed here ) and I have high confidence on the underlying theoretical rigor, which usually the bane of any new tool.

Straight awa, I ran into some installation troubles. After troubleshooting for a couple of hours I finally got the tool installed from source. First, I ran into this error using gcc

Untitled0

After googling around a little bit, i found that the version of gcc we have did not support the C++-11 standard  and i had to make sure  we had the latest  gcc modules loaded .. but lo behold the cmake still failed to recognize the newgcc  modules

Untitled1

Some more googling revealed that perhaps i need to change the CC and CXX environment variables and i did that as well to no avail

Untitled2

More googling revealed that there was a catch to all this, apparently to reset cmake variables for the C compiler, the entire build directory needs to be deleted and we should start clean. So I went ahead and deleted the kalissto source folder, and untarred the download and re-ran CMAKE and voila!!! it finally compiled

Untitled3

a word of caution here is that you need to have the $CC and $CXX variables set as well

Finally, if you have your hd5 libraries installed in a non-standard location or the ones in your standard location are old and you want to use newer libraries for whatever reason, you need to edit the CMakeLists.txt file as below

Untitled5

i have added the two lines whereupon make will find the relevant libraries for you if the find function follows


set(CMAKE_LIBRARY_PATH ${CMAKE_LIBRARY_PATH} /apps/lab/miket/hdf/1.8.14)
set(CMAKE_INCLUDE_PATH /apps/lab/miket/hdf/1.8.14/include {CMAKE_INCLUDE_PATH})

 how to edit your CMakeLists.txt if you want to include libraries from non-standard location

Some tips and tricks for creating Manhattan plots in R

Recently, I was trying to create a manhattan plot of gene expression for a genome-wide RNA-seq study. I had earlier created manhattan plots about an year or two ago and I recall i wrote my own functions to do so as all the packages/functions out there required data to be in some particular format or the other. This was frustrating as all i had was  chromosome, start, end, pvalues or any other value such as fold change that I wanted to interrogate. Long story short, I could not find the code I had written and had to rewrite my functions from scratch. This time round i found myself to be more efficient however  as there were some tricks that I found was useful. I just ended up using the R plot function for the plotting

  • Trick #1 Using levels to re-order the chromosome: Lets say we had a data set with 4 chromosomes and when you queried the levels you got something like

>levels(dat$chromosome)

> "chr1" "chr10" "chr2"

In the default order , the manhattan plot gets rendered as chr1 first, chr10, second and chr2 last. To reorder the chromosomes for plots, just reorder the levels of the factor


levels(data$chromosome) <- factor(data$chromosome, levels(data$chromosome)[c(1,3,2)])

  • Other tips are that I just use base R graphics to create a dot plot as this allows a level of control on each point that I have not found in any package. you can use GGplot if you are so inclined, the hardest part was to figure out how to quickly reorder the levels of the Chromosome. I will post my code and an example figure up here some time soon and I would also welcome any comments on improving the code

Remove Soft Clipped reads from bam files

I came across this awk one liner to remove soft clipped reads from a bam file, while following a post for a cufflinks issue and thought it would be good to keep note

awk 'BEGIN {OFS="\t"} {split($6,C,/[0-9]*/); split($6,L,/[SMDIN]/); if (C[2]=="S") {$10=substr($10,L[1]+1); $11=substr($11,L[1]+1)}; if (C[length(C)]=="S") {L1=length($10)-L[length(L)-1]; $10=substr($10,1,L1); $11=substr($11,1,L1); }; gsub(/[0-9]*S/,"",$6); print}' Aligned.out.sam > Aligned.noS.sam