Fixing rJava error on Mac OSX El Capitan

I was trying to reinstall RDAVIDWebService after upgrading to R-3.3.1 on El Capitan and I followed the steps from my previous blog post. Now you dont have to fix the ssl issue, but you still have to reconfigure java. After the step for reinstalling rJava I still ran into this error

Error : .onLoad failed in loadNamespace() for 'rJava', details:
  call: dyn.load(file, DLLpath = DLLpath, ...)
  error: unable to load shared object '/Library/Frameworks/R.framework/Versions/3.3/Resources/library/rJava/libs/':
  dlopen(/Library/Frameworks/R.framework/Versions/3.3/Resources/library/rJava/libs/, 6): Library not loaded: @rpath/libjvm.dylib
  Referenced from: /Library/Frameworks/R.framework/Versions/3.3/Resources/library/rJava/libs/
  Reason: image not found
Error: package or namespace load failed for ‘RDAVIDWebService’

Ugh.. some issues with the rJava library. Google turned up with this result from StackOverflow and i just ran the following command in the terminal

sudo ln -f -s $(/usr/libexec/java_home)/jre/lib/server/libjvm.dylib /usr/local/lib

and it worked right away


UPGRADING R on Mac OSX: Quick and Dirty

A quick note on painless upgrade of R in the OSX environment and I think this should work on most systems. I was originally running 3.2.2 on my macbook and when i tried to install a package i ran into dependency issues and bugs that were fixed in the later version. So, of course back to installing the latest version of R. One issue that always crops up is that since the last install, I have downloaded a bunch of packages from both CRAN and BioConductor and wanted a quick way of updating and re-installing. Previously, I had just worked from a clean install as I did not have any major analysis ongoing and this time it was different as I do have analysis ongoing and did not want to deal with the install on demand. Googling turned up a few suggestions and I used it to construct a quick way to upgrade

First, before re-installing get the .libPaths() value for the current install. Now reinstall R. After reinstalling, we get a list of packages installed in the previous version using the commands below

## shows the path for the new version
[1] "/Library/Frameworks/R.framework/Versions/3.3/Resources/library"

So we use the paths from the previous version of R

> package_df <-"/Library/Frameworks/R.framework/Versions/3.2/Resources/library"))
> package_list <- as.character(package_df$Package)
> install.packages(package_list)

This works to install all packages from CRAN, however I got this error message after:

Warning message:
packages ‘affy’, ‘affyio’, ‘airway’, ‘ALL’, ‘annotate’, ‘AnnotationDbi’, ‘AnnotationForge’, ‘aroma.light’, ‘Biobase’, ‘BiocGenerics’, ‘biocGraph’, ‘BiocInstaller’, ‘BiocParallel’, ‘BiocStyle’, ‘biomaRt’, ‘BioNet’, ‘Biostrings’, ‘biovizBase’, ‘BSgenome’, ‘BSgenome.Hsapiens.UCSC.hg19’, ‘Category’, ‘cellHTS2’, ‘chipseq’, ‘clipper’, ‘clusterProfiler’, ‘ComplexHeatmap’, ‘cqn’, ‘DEGraph’, ‘DESeq’, ‘DESeq2’, ‘DO.db’, ‘DOSE’, ‘DynDoc’, ‘EDASeq’, ‘edgeR’, ‘EnrichmentBrowser’, ‘FGNet’, ‘fibroEset’, ‘gage’, ‘gageData’, ‘genefilter’, ‘geneplotter’, ‘GenomeInfoDb’, ‘GenomicAlignments’, ‘GenomicFeatures’, ‘GenomicRanges’, ‘ggbio’, ‘globaltest’, ‘GO.db’, ‘GOSemSim’, ‘GOSim’, ‘GOstats’, ‘GOsummaries’, ‘graph’, ‘graphite’, ‘GSEABase’, ‘hgu133a.db’, ‘hgu133plus2.db’, ‘hgu95 [... truncated]

As I suspected, the packages from BioConductor were not installed. So I decided to use the same approach and came up with the following:

  • Get a list of the packages installed in the current version from CRAN
> package_df_new <-"/Library/Frameworks/R.framework/Versions/3.3/Resources/library"))
> package_list_new <- as.character(package_df_new$Package)
  • Compare that list to the old list and the packages not in the new list are from BioConductor
> package_bioc <- package_list[-c(which(package_list %in%package_list_new))]
  • Finally, install those packages from Bioconductor
> source("")
trying URL ''
Content type 'application/x-gzip' length 54312 bytes (53 KB)
downloaded 53 KB

The downloaded binary packages are in
Bioconductor version 3.3 (BiocInstaller 1.22.3), ?biocLite for help
> biocLite(package_bioc)

Steps to create R packages in Emacs Org-mode

Create a package skeleton

I am following the steps in Hadley Wickam’s excellent tutorial on how to create R-packages, but i have decided to see whether I can use emacs-org mode to do it. First lets load the devtools package


Now we create a skeleton for the package using devtools::*create*. Since, we are using emacs and not RStudio i set the rstudio option to false

Package: bootWGCNA
Title: What the Package Does (one line, title case)
Authors@R: person("First", "Last", email = "", role = c("aut", "cre"))
Description: What the package does (one paragraph).
Depends: R (>= 3.2.2)
License: What license is it under?
LazyData: true

Now we edit the basic DESCRIPTION in the meta data file and add the title etc

We can add Imports and Suggests to the DESCRIPTION using devtools as follows

devtools::use_package("WGCNA", pkg="~/Documents/Research/R-packages/bootWGCNA")
Adding WGCNA to Imports
Refer to functions with WGCNA::fun()

It might print the message below based on whether you run the function in the R console or not. The best way is to check the DESCRIPTION file in another emacs buffer

Finally, we create an automated tests folder as follows


I think now you can just create your R files in the package and edit clean it up using emacs. You could test a function in org-mode keeping track of your work and once done, just create an R file in the package directory structure

Quick Note: ggplot2 namespace error on loading

More than anything else this is a quick reminder to myself and perhaps anyone else who encounters this error. When running an R session with the WGCNA package loaded, i get an error with ggplot2. This is because WGCNA loads the Hmisc namespace which in turn loads the ggplot2 namesake. However, both ggplot2 and Hmisc packages are not loaded. Therefore, when  i try to load ggplot2 i get this error

Error in unloadNamespace(package) :
namespace ggplot2€™ is imported by €˜Hmisc€™ so cannot be unloaded
Error in library(ggplot2) :
Package €ggplot2€™ version 2.0.0 cannot be unloaded

I keep forgetting the load/detach/unload stuff.. so seeing that I had to google this a second time I decided to write this up so that I remember

First unload the WGCNA package


then unload the WGCNA namespace


finally unload the Hmisc namespace


and now library(ggplot2) will work as before


Quick Hacks for R BatchJobs: An awesome package that instantly enhanced my R workflow:

I work on a HPC cluster and we use the LSF scheduler to run our jobs. I conduct most of my interactive data analysis on my laptop if I am testing code on small datasets or daily for post processing data such as plots etc.
Working with NGS datasets, even my interactive analysis workflow has shifted to the cluster for two reasons

  1. Most of the interactive work still involves quite large datasets and so my heavy lifting is usually done on the cluster using an interactive ( i use emacs org mode quite heavily for this).
  2. Eventually, my workflows usually turn into multiple parallel jobs that are submitted to the scheduler and working on the cluster directly ensures that there are no hiccups in terms of dependencies, version etc

As many might agree, this was a little cumbersome as say once i had a working piece of code that could now be run in parallel many times, i would have to somehow transform this to a script that could be submitted on a command line using Rscript or R CMD BATCH as batch jobs. Two issues that immediately come to mind with this workflow is that

  1. if something changed in the main code it would take some time to get everything working right again.
  2. Many times I would have something ready with part of the analysis which could take a few hours, say a bootstrap estimation, and to get it up and running on the scheduler would either require interrupting my current interactive session or  start another session while the process runs in the current session. This means I would have to go through setting up my environment ( data and variables) exactly as the previous one to pick up where I left off.

I came across the R BatchJobs package a few months ago, and was excited, but was unable to play around with it. Recently, I started working on some co-expression analysis using the WGCNA package and was also testing out some glasso approaches to test. With the number of genes some of my code for bootstrapping WGCNA was taking about a couple of hours, and one of the  glasso runs was taking about 10 hours.  I now had the bootstrap code ready to go, but i would  have had to take myself away from my interactive analysis to write my scripts to submit the batch jobs.

Enter R BatchJobs to save the day. Here i will present a quick hack to get started right away as I haven’t gone through the entire package in detail, rather I just picked the functions that would get me off the ground running on  a cluster using an LSF scheduler ( I can imagine that it will be much different for the other schedulers supported).

Th first is to create a cluster scripts template file and my file is posted below.

## Default resources can be set in your .BatchJobs.R by defining the variable
## 'default.resources' as a named list.

## remove everthing in [] if your cluster does not support arrayjobs
#BSUB -J <%= %>[1-<%= arrayjobs %>] # name of the job / array jobs
#BSUB -o <%= log.file %> # output is sent to logfile, stdout + stderr by default
#BSUB -q <%= resources$queue %> # Job queue
##BSUB -W <%= resources$walltime %> # Walltime in minutes
##BSUB -M <%= resources$memory %> # Memory requirements in Kbytes

# we merge R output with stdout from LSF, which gets then logged via -o option
module load R/3.2.2
Rscript –no-save –no-restore –verbose <%= rscript %> /dev/stdout


Then  i saved it in a specific location with the name lsfTemplate.tmpl and I was off and running with just 3 functions as below

The first function reads in the LSF template and sets up the configuration for the scheduler in the environment
cluster.functions <- makeClusterFunctionsLSF(“/data/talkowski/ar474/lsfTemplate.tmpl”)

Next is to create a registry for which you need 3 pieces of information

  1.  the Id : I think of this as a project Id rather than a job id and I will explain this soon
  2. the file.dir : This is where everything gets stored, so make sure you have plenty of space here
  3. src.dirs: Any .R files in this folder will be “sourced” in the bsub job

reg <- makeRegistry(id=”test_boot_reg”, seed=123,

The next function is the batchMap function which I think of as the .lsf scripts we write to submit jobs. You need to pass the registry, a function that you want to run,  a vector to split over and any additional arguments that you want to pass to the function

batchMap(reg,bootRun,seq(from=10,by=120, length.out=10),more.args=list(indat=thresh.wide.dat,nBoot=nBoot, nSamp=nSamp, nCol=nCol))

The main trick here is that you can wrap most anything within the function, but you need to specify  a vector or list after the function, which in the documentation is given as

… [any]
Arguments to vectorize over (list or vector).

Suppose you wanted to pass a large  job of using a for loop to bootstrap a dataset 10000 times ,calculating the correlation matrix each time and then return the average of the correlation matrix. Say you just wanted to submit this a single job to the cluster with the batchMap function. I found that if i used the dataset as the main argument it gets vectorized, so i made the first argument to my function a dummy variable and pass it a single value

exampleBoot<- function(a, indat, nBoot) { bootstrap using for loop here and return the result}

and then call batchMap like below


now the above generates code to submit one job to the cluster and we can see the jobIds usin


Finally, you submit the jobs using the command below and any additional bsub options can be passed using the resources argument.

submitJobs(reg,resources=list(queue=”medium”), progressbar=FALSE,max.retries=0)

One thing to note is that i had to change the max.retries value from the default as i was getting an error documented in my other post, so you should check to see if that works for you.

The fun part is that say suppose you wanted to submit 10 jobs of a 100 jobs of a 100 bootstraps each that is exactly what i have done in earlier in first batchMap example. Here i now provide a sequence for the vector and the function automatically creates a job for each element of the vector. So i decided to use the dummy variable and pass a sequence of seeds. I also now specify the number of bootstraps I want in my function which i pass as additional arguments

batchMap(reg,bootRun,seq(from=10,by=120, length.out=10),more.args=list(indat=thresh.wide.dat,nBoot=nBoot, nSamp=nSamp, nCol=nCol))

The tremendous advantage  of this whole process  is that I am still in my interactive session. So i tested out some code to run WGCNA and wanted to run a full fledged bootstrap. I just wrapped my code in a function in the session and submitted them as jobs to the LSF scheduler and now i can continue working with the same data set for testing other types of analysis.

Other useful functions are


res1 <- loadResult(reg,1)

To Be UPDATED .. multiruns

I wanted to run bootstrap estimates at multiple sample sizes, so I created a sampleSizes vector to loop over. You can also point a directory where you store your generic R scripts and these will get sourced for each job that you run on the cluster, otherwise if you have already sourced those files in your current environment they will be accessible for the jobs. So I am setting up to run 1000 bootstraps for each of the sample sizes below and create a list for for the registry as well which seems to work well. I will add more explanation if anyone needs it.

sampleSizes <- c(8,10,12,14,16,18,20,25,30,50)
outDirPrefix <- “/data/talkowski/Samples/16pMouse_TissueAtlas/DataAnalysis/wgcnaAnalysis/boot_wgcna/BatchJobs”
funcDirs <- “/data/talkowski/Samples/16pMouse_TissueAtlas/DataAnalysis/wgcnaAnalysis/Rfunctions”
seedMat <- matrix(seq(from=10,by=110, length.out=10*length(sampleSizes)),ncol=10, byrow=T)
if(nCol==”all”)nCol <- ncol(expr)
nBoot <- 100
##regId <- “boot_wgcna”
reg <- list()
for (i in 1:length(sampleSizes))
sSize <- sampleSizes[i]
regId <- paste(“boot_wgcna”,sSize, sep=”_”)
outDir <- paste(outDirPrefix,sSize, sep=”_”)
reg[[i]] <- makeRegistry(id=regId, seed=123,file.dir=outDir,src.dirs=funcDirs)
more.args=list(indat=expr,nBoot=nBoot, nSamp=sSize, nCol=nCol))

for( i in 1:length(reg))
submitJobs(reg[[i]],resources=list(queue=”medium”), progressbar=FALSE,max.retries=0)


##reg <- loadRegistry(file.dir=outDirPrefix)

Fixing Kernel panic in mac OSX El Capitan with Reference to hk.uds.netusb.controller

I had installed the Hawking wireless function app on Yosemite and on update to El Capitan, a few weeks ago, I realized that it was no longer compatible. More annoyingly, my mac kept rebooting with kernel panic error  and the backtrace pointed to the hk.uds.netusb.controller kernel extension. Googling  traced the error back to the Hawking kernel module. First time round, I ended up booting into recovery mode and re-installing the OS.

Today, my mac gave the same issues, apparently the kernel module got installed again somehow (or it was never removed) and I was geared up to re-install. However, when booting into recovery mode ( press option while boot up) exploring a little i found that I could get a terminal open. I was relieved as I had not backed up the mac and was a little worried about re-install wiping my drive.

So I wondered if I could somehow remove the kernel module manually instead of re-installing like last time. I came across this post on the Apple forums.. the pointed that I could remove the kernel module. One point to remember is that your root is the recovery HD not your normal boot up disk. To get into your normal boot-up disk you need to actually cd into /Volumes/MacHD ( my startup disk was called MacHD) and then follow the instructions in the above post..  Basically move the  /Library/Extensions/kudshawking.kext   folder under /Volumes/MacHD  somewhere else.

As an aside this might be also be a good time to back up, at least over the network using rsync or something, if needed… but of course i didn’t do it yet.

Then i came across this post on manually installing kernel extensions. However, I could not find the kext caches they referenced. Further Googling brought me to this post on someone trying to re-install a custom kernel and i found they used the kextcache command.  The kextcache command has a -system-cache option that updates the system cache, but we need to point it to the right system cache ( in my case on MacHD)  using the  -u option. My final command was something like

kextcache -system-cache -u /Volumes/MacHD

and Voila !!!….after reboot my mac was back online again.

Hopefully this post is useful to someone… I think this might work for any kernel modules that break under El Capitan… the one thing that comes to mind is the NetGear Genie modules.

Troubleshooting R BatchJobs Error is.list(control)

I was testing out the R package BatchJobs and ran into this error which was a little hard to trouble shoot.

The output (if any) follows:

‘/source/R/3.2.2/lib64/R/bin/R –slave –no-restore –no-save –no-restore –file=/data/talkowski/Samples/16pMouse:
_TissueAtlas/DataAnalysis/wgcnaAnalysis/boot_wgcna/BatchJobs/jobs/01/1.R –args /dev/stdout’

WARNING: ignoring environment value of R_HOME
Loading required package: BBmisc
Loading required package: methods
Loading registry: /data/talkowski/Samples/16pMouse_TissueAtlas/DataAnalysis/wgcnaAnalysis/boot_wgcna/BatchJobs/regis:
Loading conf:
2016-03-24 16:09:17: Starting job on node
Auto-mailer settings: start=first+last, done=first+last, error=all.
Setting work dir: /data/talkowski/Samples/16pMouse_TissueAtlas/DataAnalysis/wgcnaAnalysis/Rscripts
Error : is.list(control) is not TRUE
Error in doWithOneRestart(return(expr), restart) : bad error message
Calls: <Anonymous> -> sendMail -> warningf
Setting work back to: /data/talkowski/Samples/16pMouse_TissueAtlas/DataAnalysis/wgcnaAnalysis/Rscripts
Memory usage according to gc:
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 352464 18.9 592000 31.7 460000 24.6
Vcells 475908 3.7 1023718 7.9 786431 6.0
Execution halted

After loading the .RData files in the registry folder and running through trace route I finally figured out that it was mainly due to the max.retries option in the submitJobs function. I changed it form the default 10 to 0.  The error was resolved and I could start submitting my jobs to the LSF using R.

I am really glad for the people who developed this package and their efforts. Given the fix i found, I would rather not take away their time by posting a bug on github, as this might be potentially a system specific issue. If the developers have any time and are inclined to post their comments on this someday it would be great.

R sink() … closeAllConnections

This is more to remind me about the closeAllConnections() function in R and hope its also useful for someone out there. While using sink() in a function to write output to a file, if that function exits then R does not resume writing to the console, even if you use the sink() function to close. I think this is because the connection is still open in the functions environment. The closeAllConnections tip that i found from this stackOverflow thread resolved it.  See also  documentation in base R 

Quick and Dirty Save for R workspace objects

I recently found that if i used the same function on multiple datasets, I needed to save the objects created into a RData object, so that I can upload all the processed data from all datasets simultaneously. One way to do this is to somehow hard code this into your function so that each variable is prefixed with a unique id. For example , if I have dataset1 and dataset2 and I wanted to process them using my function which creates obj1, obj2  then i would have to create objects called d1_obj1, d1_obj2 for the first dataset and d2_obj1 and d2_obj2 for the second dataset. This might be simple enough with a couple of objects, however I wanted to save about 20+ objects from within the function as I planned for some comparisons that would involve those objects. Then i came across the R function get and assign , which made the job i was dreading of ( reassigning each  object somehow and debugging it all ) very easy. I just created a named list and assigned all the objects to the list using get with my specific prefix for the objects just before the function exits and save that list using the save function. Below is the code I use

objectNames <-ls(all.names=T)
res<- list()
for ( o in objectNames)
res[[paste(proj,o,sep="_")]] <- get(o)
format(object.size(res), units="Gb")

If anyone has anything better please do not hesitate to let me know, as i feel that this is somewhat hawkish.

Fixing RDAVIDWebService on Yosemite

I was recently trying to install the R/Bioconductor package RDAVIDWebService and I got the error that the URL has changed. Of course i have to run GO enrichments for about 36 Gene Lists and needed and automated way to do this, which means I needed to figure out if I can install the package.

After some wrangling I got the package to run and while a lot of the steps were mentioned by the package maintainer Cristóbal Fresno here, I found that i had to get some issues resolved myself. So I am putting up the steps i went through in case anyone runs into a similar situation. Briefly, all the steps were described in the post by Fresno and i will walk through those steps as i did myself.

One main thing is to register yourself for an account for using DAVID web service

Work around for the new DAVID Web service configuration V 2.0

1) First of all the HTTPS certificate needs Java 8 in order to run.

Previous versions will not run due to prime size. The maximum size that Java accepts is 1024 bits. This is a known issue (see JDK-6521495).

1.1) Check your java version

java -version

If the version is 1.7.XX or earlier then you need to install Java 8.

I  found that I had to install Java and i went the got the Java 1.8 JDK from the Oracle Website. Specifically I downloaded the files from this particular section


and then proceeded to install the apple package.

Installing openssl

2.3) In MAC (tested in Yosemite) the certificate will not work for the present stable openssl version 0.9.8.

2.3.1) Check your openssl version

openssl version

OpenSSL 0.9.8

If it is >= 1.0.2.d then go to step 2.3.3)

2.3.2) Update your openssl, i.e., download, compile and install it

Download the official release from OpenSSL >= 1.0.2.d

tar -xzvf openssl-1.0.2d.tar.gz

cd openssl-1.0.2d

#Compile it with 64 bits support

./Configure darwin64-x86_64-cc
make test
sudo make install

Now you may need to reflect the change in your system if openssl version keeps pointing to the old version.

cd /usr/bin

mv sudo openssl openssl098
sudo ln -s /usr/local/ssl/bin/openssl openssl

Now i tried to do this using homebrew  that is just trying to update using my brew installation. However I ran into a ruby version error see this post.  Luckily i think i had installed brew from github, so one of the solutions there worked for me


However when I did a it pull i got an error regrarding one of the files and I ended up deleting it. See below for the set of commands i used [I have to apologize that i am unable to reconstruct the error messages as I am writing this post after successfully installing the R packages and I had closed my terminal window]


Once openssl was installed I realized that it was installed in


So I ended up making a symlink to the openssl over there as mentioned about

 467 cd /usr/bin/
 468 ls -ltrh openssl
 469 openssl version
 470 sudo mv openssl openssl098zg
 471 sudo ln -s /usr/local/Cellar/openssl/1.0.2d_1/bin/openssl

Now finally we have openssl working.

Adding the CA cert for DAVID worked as the instructions below

2.3.3)  Get DAVID’s certificate and install it into cacerts

Get the certificate:


echo -n | openssl s_client -connect | sed -ne ‘/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p’ > ncifcrf.cert

Check if it was properly downloaded:
openssl x509 -in ncifcrf.cert -text

Backup the cacerts file. In my case the 1.8.0_60 jdk version is located in /Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/security/ directory

sudo find / -name cacerts
/Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home/lib/security/cacerts

sudo cp /Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/security/cacerts .

sudo cp /Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/security/cacerts
sudo keytool -import -trustcacerts -keystore cacerts -storepass changeit -noprompt -alias david -file ncifcrf.cert
Certificate was added to keystore

The certificate should be added to the keystore. Now, copy the new cacerts version to the original position

sudo cp cacerts /Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/security/cacerts

The only hiccups in updating the R Java configuration were the following

  • I wanted to update the  install. The actually executable was under

however if you to call this in terminal it kept popping up the GUI and i could not use R CMD as i originally planned. The workaround is to cd into the directory

cd /Applications/

and then call R locally which works

sudo R CMD javareconf

Also to reinstall the rJava  package  from source I also had to specify the repo i.e use

 install.packages ("rJava", type="source",repos="")

as there was an issue with the tcl/tk libraries in opening up the pop-up for selecting can mirrors

3) Update Java configuration in R. The output may slightly change from windows, linux or mac.

R CMD javareconf

Java interpreter : /usr/bin/java

Java version     : 1.8.0_60

Java home path   : /usr/lib/jvm/java-8-oracle/jre

Java compiler    : /usr/bin/javac

Java headers gen.: /usr/bin/javah

Java archive tool: /usr/bin/jar

Please check that both Java version and path are appropriate. In addition JNI support should also be available.

4) Check that rJava R library works as supposed to.




.jcall(“java/lang/System”, “S”, “getProperty”, “java.runtime.version”)

[1] “1.8.0_20-b26”

In Mac the rJava that downloads is tied to 1.6 java version. If it is the case, you should install it from the source.

install.packages(‘rJava’, type=’source’)



.jcall(“java/lang/System”, “S”, “getProperty”, “java.runtime.version”)

[1] “1.8.0_20-b26”

And thats it now DAVID works.