Sphinx docs make : ImportError: cannot import name normalize

I ran into this simple error that could have consumed my whole day. I have set up sphinx in a conda environment. However, when i try to run `make html` I end up with the following error

CIS2X1NFGTF1:bioflows aragaven$ make html
Traceback (most recent call last):
  File "/Users/aragaven/anaconda2/lib/python2.7/site-packages/Sphinx-1.4.6-py2.7.egg/sphinx/__main__.py", line 14, in <module>
    sys.exit(main(sys.argv))
  File "/Users/aragaven/anaconda2/lib/python2.7/site-packages/Sphinx-1.4.6-py2.7.egg/sphinx/__init__.py", line 51, in main
    sys.exit(build_main(argv))
  File "/Users/aragaven/anaconda2/lib/python2.7/site-packages/Sphinx-1.4.6-py2.7.egg/sphinx/__init__.py", line 61, in build_main
    from sphinx import cmdline
  File "/Users/aragaven/anaconda2/lib/python2.7/site-packages/Sphinx-1.4.6-py2.7.egg/sphinx/cmdline.py", line 14, in <module>
    import optparse
  File "/Users/aragaven/anaconda2/lib/python2.7/optparse.py", line 419, in <module>
    _builtin_cvt = { "int" : (_parse_int, _("integer")),
  File "/Users/aragaven/anaconda2/lib/python2.7/gettext.py", line 569, in gettext
    return dgettext(_current_domain, message)
  File "/Users/aragaven/anaconda2/lib/python2.7/gettext.py", line 533, in dgettext
    codeset=_localecodesets.get(domain))
  File "/Users/aragaven/anaconda2/lib/python2.7/gettext.py", line 468, in translation
    mofiles = find(domain, localedir, languages, all=1)
  File "/Users/aragaven/anaconda2/lib/python2.7/gettext.py", line 440, in find
    for nelang in _expand_lang(lang):
  File "/Users/aragaven/anaconda2/lib/python2.7/gettext.py", line 133, in _expand_lang
    from locale import normalize
ImportError: cannot import name normalize
make: *** [html] Error 1

I was puzzled as this had just worked before. Some googling revealed that this could be due to some conflicts in python version. This immediately led me to consider that perhaps there were python version differences in my root environment and another environment i was using for creating some of the documentation. This was indeed the case and once i source activated into the other environment it worked without any errors.

Advertisements

Error installing R/igraph unable to load shared object ‘../igraph.so’: libgfortran.so.4: cannot open shared object file: No such file or directory

#+BLOG

I ran into this tiny error that could have consumed my whole day. I had set up an AWS ubuntu 16.04 (Xenial) image and installed R. I think I followed some random web page and ended up installing the latest version of R v3.4.2

I was trying to install this package “`phangorn“` which has igraph as it’s dependency and lo behold, i could not install it kept failing with this error:

igraph-error.png

Google turned up a few links that seemed to be helpful including installing libxml2-dev

The link below helped me first trouble shoot the foreign-graphml error #+BEGIN_SRC sh igraph_hacks_internal.h:42:0: warning: “strdup” redefined

^ In file included from /usr/include/string.h:630:0, from src/foreign-gml-parser.y:54: /usr/include/x86_64-linux-gnu/bits/string2.h:1291:0: note: this is the location of the previous definition

^ gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG -DUSING_R -I. -Iinclude -Ics -Iglpk -Iplfit -ICHOLMOD/Include -IAMD/Include -ICOLAMD/Include -ISuiteSparse_config -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -DNDEBUG -DNPARTITION -DNTIMER -DNCAMD -DNPRINT -DPACKAGE_VERSION=\”1.1.1\” -DINTERNAL_ARPACK -DIGRAPH_THREAD_LOCAL=/**/ -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c foreign-graphml.c -o foreign-graphml.o foreign-graphml.c: In function ‘igraph_write_graph_graphml’: foreign-graphml.c:1408:46: error: expected ‘)’ before ‘GRAPHML_NAMESPACE_URI’ ret=fprintf(outstream, “\n”); ^ /usr/lib/R/etc/Makeconf:159: recipe for target ‘foreign-graphml.o’ failed make: * [foreign-graphml.o] Error 1 ERROR: compilation failed for package ‘igraph’

removing ‘/home/ubuntu/R/x86_64-pc-linux-gnu-library/3.4/igraph’

ERROR: dependency ‘igraph’ is not available for package ‘phangorn’

removing ‘/home/ubuntu/R/x86_64-pc-linux-gnu-library/3.4/phangorn’

The downloaded source packages are in ‘/tmp/Rtmp7mCd4h/downloaded_packages’ Warning messages: 1: In install.packages(“phangorn”, repos = “http://cran.mtu.edu“) : installation of package ‘igraph’ had non-zero exit status 2: In install.packages(“phangorn”, repos = “http://cran.mtu.edu“) : installation of package ‘phangorn’ had non-zero exit status #+END_SRC

However, they all did not seem to be address the second round of errors where ever after compiling, the igraph.so failed to load.

Turns out it was a simple thing to fix. The key is to recognize that the second line of the error message was the culprit, even though it does not actually throw an error

libgfortran.so.4: cannot open shared object file: No such file or directory

There was no libgfortran.so.4 installed on my machine and this is not available by default on Xenial.

ubuntu@ip-172-31-93-178:/usr/local/lib$ find /usr -name "libgfortran*"
/usr/share/doc/libgfortran3
/usr/share/doc/libgfortran-5-dev
/usr/lib/gcc/x86_64-linux-gnu/5/libgfortran.a
/usr/lib/gcc/x86_64-linux-gnu/5/libgfortranbegin.a
/usr/lib/gcc/x86_64-linux-gnu/5/libgfortran.so
/usr/lib/gcc/x86_64-linux-gnu/5/libgfortran.spec
/usr/lib/x86_64-linux-gnu/libgfortran.so.3
/usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0

However, the R version I had installed was somehow compiled with this version and further investigation reveraled that this was basically part of the gcc-7 toolchain. So, for my purposes I installed gcc-7 and gfortran-7 from a ‘ppa’ on ubuntu based on this SO post and this post. So I added the ppa:jonathonf/gcc-7.1 as specified in one of the comments and then installed as follows

sudo apt-get install gcc-7 g++-7 gfortran-7

the gfortran-7 is key as that is what installs the gfortran command and Voila!!! now I can install igraph.

Hope this is helpful to someone out there. #+END_SRC

UPGRADING R on Mac OSX: Quick and Dirty

A quick note on painless upgrade of R in the OSX environment and I think this should work on most systems. I was originally running 3.2.2 on my macbook and when i tried to install a package i ran into dependency issues and bugs that were fixed in the later version. So, of course back to installing the latest version of R. One issue that always crops up is that since the last install, I have downloaded a bunch of packages from both CRAN and BioConductor and wanted a quick way of updating and re-installing. Previously, I had just worked from a clean install as I did not have any major analysis ongoing and this time it was different as I do have analysis ongoing and did not want to deal with the install on demand. Googling turned up a few suggestions and I used it to construct a quick way to upgrade

First, before re-installing get the .libPaths() value for the current install. Now reinstall R. After reinstalling, we get a list of packages installed in the previous version using the commands below

## shows the path for the new version
>.libPaths()
[1] "/Library/Frameworks/R.framework/Versions/3.3/Resources/library"

So we use the paths from the previous version of R

> package_df <- as.data.frame(installed.packages("/Library/Frameworks/R.framework/Versions/3.2/Resources/library"))
> package_list <- as.character(package_df$Package)
> install.packages(package_list)

This works to install all packages from CRAN, however I got this error message after:

Warning message:
packages ‘affy’, ‘affyio’, ‘airway’, ‘ALL’, ‘annotate’, ‘AnnotationDbi’, ‘AnnotationForge’, ‘aroma.light’, ‘Biobase’, ‘BiocGenerics’, ‘biocGraph’, ‘BiocInstaller’, ‘BiocParallel’, ‘BiocStyle’, ‘biomaRt’, ‘BioNet’, ‘Biostrings’, ‘biovizBase’, ‘BSgenome’, ‘BSgenome.Hsapiens.UCSC.hg19’, ‘Category’, ‘cellHTS2’, ‘chipseq’, ‘clipper’, ‘clusterProfiler’, ‘ComplexHeatmap’, ‘cqn’, ‘DEGraph’, ‘DESeq’, ‘DESeq2’, ‘DO.db’, ‘DOSE’, ‘DynDoc’, ‘EDASeq’, ‘edgeR’, ‘EnrichmentBrowser’, ‘FGNet’, ‘fibroEset’, ‘gage’, ‘gageData’, ‘genefilter’, ‘geneplotter’, ‘GenomeInfoDb’, ‘GenomicAlignments’, ‘GenomicFeatures’, ‘GenomicRanges’, ‘ggbio’, ‘globaltest’, ‘GO.db’, ‘GOSemSim’, ‘GOSim’, ‘GOstats’, ‘GOsummaries’, ‘graph’, ‘graphite’, ‘GSEABase’, ‘hgu133a.db’, ‘hgu133plus2.db’, ‘hgu95 [... truncated]

As I suspected, the packages from BioConductor were not installed. So I decided to use the same approach and came up with the following:

  • Get a list of the packages installed in the current version from CRAN
> package_df_new <- as.data.frame(installed.packages("/Library/Frameworks/R.framework/Versions/3.3/Resources/library"))
> package_list_new <- as.character(package_df_new$Package)
  • Compare that list to the old list and the packages not in the new list are from BioConductor
> package_bioc <- package_list[-c(which(package_list %in%package_list_new))]
  • Finally, install those packages from Bioconductor
> source("https://bioconductor.org/biocLite.R")
trying URL 'https://bioconductor.org/packages/3.3/bioc/bin/macosx/mavericks/contrib/3.3/BiocInstaller_1.22.3.tgz'
Content type 'application/x-gzip' length 54312 bytes (53 KB)
==================================================
downloaded 53 KB


The downloaded binary packages are in
        /var/folders/vd/jrcwgw214pv9svrflbdtjl3m0000gn/T//RtmpKKXIWa/downloaded_packages
Bioconductor version 3.3 (BiocInstaller 1.22.3), ?biocLite for help
> biocLite(package_bioc)

Fixing rJava error on Mac OSX El Capitan

I was trying to reinstall RDAVIDWebService after upgrading to R-3.3.1 on El Capitan and I followed the steps from my previous blog post. Now you dont have to fix the ssl issue, but you still have to reconfigure java. After the step for reinstalling rJava I still ran into this error

Error : .onLoad failed in loadNamespace() for 'rJava', details:
  call: dyn.load(file, DLLpath = DLLpath, ...)
  error: unable to load shared object '/Library/Frameworks/R.framework/Versions/3.3/Resources/library/rJava/libs/rJava.so':
  dlopen(/Library/Frameworks/R.framework/Versions/3.3/Resources/library/rJava/libs/rJava.so, 6): Library not loaded: @rpath/libjvm.dylib
  Referenced from: /Library/Frameworks/R.framework/Versions/3.3/Resources/library/rJava/libs/rJava.so
  Reason: image not found
Error: package or namespace load failed for ‘RDAVIDWebService’

Ugh.. some issues with the rJava library. Google turned up with this result from StackOverflow and i just ran the following command in the terminal

sudo ln -f -s $(/usr/libexec/java_home)/jre/lib/server/libjvm.dylib /usr/local/lib

and it worked right away

Steps to create R packages in Emacs Org-mode

Create a package skeleton

I am following the steps in Hadley Wickam’s excellent tutorial on how to create R-packages, but i have decided to see whether I can use emacs-org mode to do it. First lets load the devtools package

library(devtools)
devtools
stats
graphics
grDevices
utils
datasets
methods
base

Now we create a skeleton for the package using devtools::*create*. Since, we are using emacs and not RStudio i set the rstudio option to false

devtools::create("~/Documents/Research/R-packages/bootWGCNA",rstudio=F)
Package: bootWGCNA
Title: What the Package Does (one line, title case)
Version: 0.0.0.9000
Authors@R: person("First", "Last", email = "first.last@example.com", role = c("aut", "cre"))
Description: What the package does (one paragraph).
Depends: R (>= 3.2.2)
License: What license is it under?
LazyData: true

Now we edit the basic DESCRIPTION in the meta data file and add the title etc

We can add Imports and Suggests to the DESCRIPTION using devtools as follows

devtools::use_package("WGCNA", pkg="~/Documents/Research/R-packages/bootWGCNA")
Adding WGCNA to Imports
Refer to functions with WGCNA::fun()

It might print the message below based on whether you run the function in the R console or not. The best way is to check the DESCRIPTION file in another emacs buffer

Finally, we create an automated tests folder as follows

devtools::use_testthat(pkg="~/Documents/Research/R-packages/bootWGCNA")

I think now you can just create your R files in the package and edit clean it up using emacs. You could test a function in org-mode keeping track of your work and once done, just create an R file in the package directory structure

Quick Note: ggplot2 namespace error on loading

More than anything else this is a quick reminder to myself and perhaps anyone else who encounters this error. When running an R session with the WGCNA package loaded, i get an error with ggplot2. This is because WGCNA loads the Hmisc namespace which in turn loads the ggplot2 namesake. However, both ggplot2 and Hmisc packages are not loaded. Therefore, when  i try to load ggplot2 i get this error

library(ggplot2)
Error in unloadNamespace(package) :
namespace ggplot2€™ is imported by €˜Hmisc€™ so cannot be unloaded
Error in library(ggplot2) :
Package €ggplot2€™ version 2.0.0 cannot be unloaded

I keep forgetting the load/detach/unload stuff.. so seeing that I had to google this a second time I decided to write this up so that I remember

First unload the WGCNA package

detach(package:WGCNA)

then unload the WGCNA namespace

unloadNamespace(“WGCNA”)

finally unload the Hmisc namespace

unloadNamespace(“Hmisc”)

and now library(ggplot2) will work as before

 

R sink() … closeAllConnections

This is more to remind me about the closeAllConnections() function in R and hope its also useful for someone out there. While using sink() in a function to write output to a file, if that function exits then R does not resume writing to the console, even if you use the sink() function to close. I think this is because the connection is still open in the functions environment. The closeAllConnections tip that i found from this stackOverflow thread resolved it.  See also  documentation in base R 

Quick and Dirty Save for R workspace objects

I recently found that if i used the same function on multiple datasets, I needed to save the objects created into a RData object, so that I can upload all the processed data from all datasets simultaneously. One way to do this is to somehow hard code this into your function so that each variable is prefixed with a unique id. For example , if I have dataset1 and dataset2 and I wanted to process them using my function which creates obj1, obj2  then i would have to create objects called d1_obj1, d1_obj2 for the first dataset and d2_obj1 and d2_obj2 for the second dataset. This might be simple enough with a couple of objects, however I wanted to save about 20+ objects from within the function as I planned for some comparisons that would involve those objects. Then i came across the R function get and assign , which made the job i was dreading of ( reassigning each  object somehow and debugging it all ) very easy. I just created a named list and assigned all the objects to the list using get with my specific prefix for the objects just before the function exits and save that list using the save function. Below is the code I use

objectNames <-ls(all.names=T)
res<- list()
for ( o in objectNames)
{
res[[paste(proj,o,sep="_")]] <- get(o)
}
format(object.size(res), units="Gb")

If anyone has anything better please do not hesitate to let me know, as i feel that this is somewhat hawkish.

Fixing RDAVIDWebService on Yosemite

I was recently trying to install the R/Bioconductor package RDAVIDWebService and I got the error that the URL has changed. Of course i have to run GO enrichments for about 36 Gene Lists and needed and automated way to do this, which means I needed to figure out if I can install the package.

After some wrangling I got the package to run and while a lot of the steps were mentioned by the package maintainer Cristóbal Fresno here, I found that i had to get some issues resolved myself. So I am putting up the steps i went through in case anyone runs into a similar situation. Briefly, all the steps were described in the post by Fresno and i will walk through those steps as i did myself.

One main thing is to register yourself for an account for using DAVID web service

Work around for the new DAVID Web service configuration V 2.0

1) First of all the HTTPS certificate needs Java 8 in order to run.

Previous versions will not run due to prime size. The maximum size that Java accepts is 1024 bits. This is a known issue (see JDK-6521495).

1.1) Check your java version

java -version

If the version is 1.7.XX or earlier then you need to install Java 8.

I  found that I had to install Java and i went the got the Java 1.8 JDK from the Oracle Website. Specifically I downloaded the files from this particular section

javaLocation

and then proceeded to install the apple package.

Installing openssl

2.3) In MAC (tested in Yosemite) the certificate will not work for the present stable openssl version 0.9.8.

2.3.1) Check your openssl version

openssl version

OpenSSL 0.9.8

If it is >= 1.0.2.d then go to step 2.3.3)

2.3.2) Update your openssl, i.e., download, compile and install it

Download the official release from OpenSSL >= 1.0.2.d

tar -xzvf openssl-1.0.2d.tar.gz

cd openssl-1.0.2d

#Compile it with 64 bits support

./Configure darwin64-x86_64-cc
make
make test
sudo make install

Now you may need to reflect the change in your system if openssl version keeps pointing to the old version.

cd /usr/bin

mv sudo openssl openssl098
sudo ln -s /usr/local/ssl/bin/openssl openssl

Now i tried to do this using homebrew  that is just trying to update using my brew installation. However I ran into a ruby version error see this post.  Luckily i think i had installed brew from github, so one of the solutions there worked for me

brewfix

However when I did a it pull i got an error regrarding one of the files and I ended up deleting it. See below for the set of commands i used [I have to apologize that i am unable to reconstruct the error messages as I am writing this post after successfully installing the R packages and I had closed my terminal window]

brewIssues

Once openssl was installed I realized that it was installed in

 /usr/local/Cellar

So I ended up making a symlink to the openssl over there as mentioned about

 467 cd /usr/bin/
 468 ls -ltrh openssl
 469 openssl version
 470 sudo mv openssl openssl098zg
 471 sudo ln -s /usr/local/Cellar/openssl/1.0.2d_1/bin/openssl

Now finally we have openssl working.

Adding the CA cert for DAVID worked as the instructions below

2.3.3)  Get DAVID’s certificate and install it into cacerts

Get the certificate:

cd

echo -n | openssl s_client -connect david.ncifcrf.gov:443 | sed -ne ‘/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p’ > ncifcrf.cert

Check if it was properly downloaded:
openssl x509 -in ncifcrf.cert -text

Backup the cacerts file. In my case the 1.8.0_60 jdk version is located in /Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/security/ directory

sudo find / -name cacerts
/Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home/lib/security/cacerts
/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/lib/security/cacerts
/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/security/cacerts
/System/Library/Java/Support/CoreDeploy.bundle/Contents/Home/lib/security/cacerts
/System/Library/Java/Support/Deploy.bundle/Contents/Home/lib/security/cacerts

sudo cp /Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/security/cacerts .

sudo cp /Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/security/cacerts cacerts.org
sudo keytool -import -trustcacerts -keystore cacerts -storepass changeit -noprompt -alias david -file ncifcrf.cert
Certificate was added to keystore

The certificate should be added to the keystore. Now, copy the new cacerts version to the original position

sudo cp cacerts /Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/security/cacerts

The only hiccups in updating the R Java configuration were the following

  • I wanted to update the R.app  install. The actually executable was under
    /Applications/R.app/Contents/MacOS/R

however if you to call this in terminal it kept popping up the GUI and i could not use R CMD as i originally planned. The workaround is to cd into the directory

cd /Applications/R.app/Contents/MacOS/

and then call R locally which works

sudo R CMD javareconf

Also to reinstall the rJava  package  from source I also had to specify the repo i.e use

 install.packages ("rJava", type="source",repos="http://cran.case.edu")

as there was an issue with the tcl/tk libraries in opening up the pop-up for selecting can mirrors

3) Update Java configuration in R. The output may slightly change from windows, linux or mac.

R CMD javareconf

Java interpreter : /usr/bin/java

Java version     : 1.8.0_60

Java home path   : /usr/lib/jvm/java-8-oracle/jre

Java compiler    : /usr/bin/javac

Java headers gen.: /usr/bin/javah

Java archive tool: /usr/bin/jar

Please check that both Java version and path are appropriate. In addition JNI support should also be available.

4) Check that rJava R library works as supposed to.

R

library(rJava)

.jinit()

.jcall(“java/lang/System”, “S”, “getProperty”, “java.runtime.version”)

[1] “1.8.0_20-b26”

In Mac the rJava that downloads is tied to 1.6 java version. If it is the case, you should install it from the source.

install.packages(‘rJava’, type=’source’)

library(rJava)

.jinit()

.jcall(“java/lang/System”, “S”, “getProperty”, “java.runtime.version”)

[1] “1.8.0_20-b26”

And thats it now DAVID works.

Installing Kallisto (RNA-Seq quantification program) on our Red Hat Cluster

Today I decided to install the RNA-Seq quantification tool Kallisto  for testing, this appears to be a good tool ( detailed here ) and I have high confidence on the underlying theoretical rigor, which usually the bane of any new tool.

Straight awa, I ran into some installation troubles. After troubleshooting for a couple of hours I finally got the tool installed from source. First, I ran into this error using gcc

Untitled0

After googling around a little bit, i found that the version of gcc we have did not support the C++-11 standard  and i had to make sure  we had the latest  gcc modules loaded .. but lo behold the cmake still failed to recognize the newgcc  modules

Untitled1

Some more googling revealed that perhaps i need to change the CC and CXX environment variables and i did that as well to no avail

Untitled2

More googling revealed that there was a catch to all this, apparently to reset cmake variables for the C compiler, the entire build directory needs to be deleted and we should start clean. So I went ahead and deleted the kalissto source folder, and untarred the download and re-ran CMAKE and voila!!! it finally compiled

Untitled3

a word of caution here is that you need to have the $CC and $CXX variables set as well

Finally, if you have your hd5 libraries installed in a non-standard location or the ones in your standard location are old and you want to use newer libraries for whatever reason, you need to edit the CMakeLists.txt file as below

Untitled5

i have added the two lines whereupon make will find the relevant libraries for you if the find function follows


set(CMAKE_LIBRARY_PATH ${CMAKE_LIBRARY_PATH} /apps/lab/miket/hdf/1.8.14)
set(CMAKE_INCLUDE_PATH /apps/lab/miket/hdf/1.8.14/include {CMAKE_INCLUDE_PATH})

 how to edit your CMakeLists.txt if you want to include libraries from non-standard location