Creating and sharing an R package which calls C code

In this post I will demonstrate the minimal amount of steps needed in order to write and share an R package which calls C code. It grew out of my own experiences in writing and sharing the LXB package with colleagues at work.

I have created a repository at GitHub called reverse which contains a complete example of a minimal R package (as well as the latest version of this document). In order to follow this post you may want to clone that repository.

The Writing R Extensions manual contains a reference for everything that is mentioned here (and more).

Packages with C code (more painful than you’d like unless your intended audience is programmers)

My assumption is that you want to build an R package and share it with your colleagues. As it turns out, this is not so simple. When you create a package it will by default just be an archive containing the source code that you write. This means that a C compiler is needed in order to install the package. If your colleagues all have a C compiler installed (maybe they are programmers or all use Linux) then this is no problem. If not, get ready for a world of pain.

It is possible to create binary packages that do not require a C compiler for installation, the catch is that they will only install on a machine with the same operating system version and R version as was used to build the package. You can get around this problem by uploading your package to CRAN, since it creates binary versions of your package for you. However, submitting to CRAN involves quite a bit of work. You have to make sure your package compiles on Linux, Mac OS X as well as Windows. The submission process is not automated, so expect a delay of a couple of days between submission and a binary version of your package being available.

Before going any further you should ask yourself if the pain is worth it. If you can get away with writing all your intended functionality in R without it being too slow or using too much memory, then I’d suggest you stick with a pure R solution.

How to call C from R

Let me outline how R calls out to C without getting into the details of the R Foreign Function Interface since this would take too much space.

R calls a C compiler to build a shared library of your C code. This library can then be dynamically loaded into R and you can call functions that you have exported in your C code. Here are the steps involved:

  • To generate a shared library call

    R CMD shlib reverse.c

    in the src/ directory. This will create the shared library reverse.so. The file name extension depends on which operating system you are using (I am using Mac OS X).

  • To load the library inside R you call dyn.load('reverse.so').
  • Finally, to call the function exported in your C call .Call('reverse', 1:10). Here reverse is the name of the function exported in reverse.c and 1:10 is the (only) parameter this function takes.

Manually generating a shared library is a bit messy (it generates .o and .so files in the directory where your source code is) and using .Call() can be slightly dangerous (e.g. what happens if you pass the wrong type of parameters?). The solution is to generate a package.

Preparing a package

A package is a directory with the structure of the reverse repository (there may be more files and folders, but I tried to make the package minimal). Here is an overview of what goes where in the package:

  • Put the C source code in the src/ subdirectory.
  • Create an R wrapper function which calls out to your C code and put it in the R/ subdirectory. In this example it is sufficient to have a wrapper function like reverse <- function(x) .Call('reverse', x) but you may want to coerce any variables before passing them to your C function. Note that it is not necessary to load the shared library with dyn.load() in the wrapper. R takes care of this for us.
  • A NAMESPACE file which tells R to load the shared library and what wrapper functions to expose from the R/ subdirectory. In this example useDynLib(reverse) loads the library, and export(reverse) exports the reverse function from the R wrapper code.
  • A DESCRIPTION file which contains a summary of the package. For example the package version is entered here. Please use a sensible scheme for versioning your package (e.g. X.Y, where X is incremented when changes to the public API exposed inside R/ breaks backward compatibility, and Y is incremented otherwise). You also need to enter the version of R your package depends on; at the moment I am not sure what the best practice is when filling out this field. A safe bet is the version of R that you are using, but this may be too restrictive. Another idea is to pick the lowest version that is used by the people you are sharing your package with (if this is known to you). As for the license field -- if you pick a non-standard license then you will get a warning when you check your package (see below), so it may make sense to pick a standard license.

These four items are all that is needed to create a functioning package, but you will get a warning about missing documentation when checking the package (see below). To avoid warnings you will also need

  • A man/ folder with documentation for the package itself and for each function that is exported in the NAMESPACE file. The format used for documentation is described in the guide on extending R. Note: you should include code examples in your help files. These examples are run when you check the files so make sure your examples are complete. If you want to write examples that can't be run for some reason, you need to wrap them in \dontrun{}.

Building and installing

Creating a package which is ready to be shared consists of the following steps:

  • Building the package. This creates an archive that you can share. Change directory to the parent of the package and type

    R CMD build reverse

    This will generate the archive reverse_1.0.tar.gz, where 1.0 is the version of the package.

  • Checking the package. This ensures that the package can be installed. Type

    R CMD check reverse_1.0.tar.gz

    If there are any problems you will be notified and a log file is created in the directory reverse.Rcheck/. Note that you should pass the name of the archive to the check command, not the name of the directory the package resides in (this can be confusing because the latter works but it creates temporary files inside your package directory structure).

At this point you can go ahead and install the package by typing

R CMD install reverse_1.0.tar.gz

Now start R, make sure the library loads by typing library(reverse) and that the exported function works by typing reverse(1:10) (you should see the numbers 1 to 10 in reversed order).

To uninstall the package type

R CMD remove reverse

If you only want to distribute the source version of your package then you are done at this point. Simply send the reverse_1.0.tar.gz archive to the people you want to share with and tell them how to install your package. The downside to this is that everybody you share with must have a C compiler installed. If you want to share with people who may not have a C compiler, then you need to create a binary version of your package.

To create a binary version of your package you use the command

R CMD install --build reverse_1.0.tar.gz

This will first install your package and then create a binary package archive called reverse_1.0.tgz which you can share (the installation proceedure for binary packages is the same as for source packages). The problem with this method is that the binary package will only install on computers with the exact same operating system version and R version that you used to build the binary package. To work around this problem you will have to submit your package to CRAN.

Submitting to CRAN

Before submitting to CRAN you should ensure that your package passes all checks. This is not really a problem. What is a problem however is that packages on CRAN should compile on Linux, Mac OS X and Windows and it is up to you to make sure that it does. Getting your C code to compile on all three platforms can be a big problem. There is a site called win-builder to which you can upload a source package and it will automatically check it on a Windows machine. This is useful if you do not have access to Windows, but it can be very time consuming to fix compilation problems this way. I do not know of any similar sites for Linux and Mac OS X so if you do not have access to one of these operating systems then you are out of luck.

To actually submit a package you should first read through the submission guidelines (you will be forced to confirm that you've read through this later anyway). Next, upload a source package to the CRAN ftp and send an email to the CRAN mailing list (current addresses to the ftp and mailing list can be found at CRAN). In your email the subject should include package name and version (e.g. "reverse 1.0") and in the body simply ask for the package to be added to CRAN. The submission process is not automated, instead the mailing list is read by the maintainers so be polite. You will get a reply to your submission, if you need to reply back make sure you CC the mailing list again as there may be more than one maintainer handling your case. If problems are found in your package and you need to upload a new version, make sure you send a new submission email as well since the maintainers expect a submission email to accompany each archive uploaded to the ftp.

Setting up Xcode 4.3 (for MacVim, Homebrew and Haskell)

Xcode 4.3 was released recently and one of the changes it brought with it was that the /Developer folder now has moved into the Xcode app bundle. This has caused headaches for lots of developers and MacVim was not spared either. I recently did a clean install of Mac OS X Lion and Xcode 4.3 and thought I'd document my experiences in this post.

After installing Xcode 4.3 (my version says 4.3.1) you must manually go into the Xcode preferences, select the Downloads tab and install the Command Line Tools. Even after this step you will not be able to use automake and friends; these have to be installed manually.

My intention was to use Homebrew with my fresh install (I had never bothered with this before as my /usr/local was occupied by my stuff and Homebrew strongly advices against that). However, before Homebrew will work you have to tell Xcode where the Developer dir is, otherwise Xcode still thinks it should be at /Developer (this is after a clean install mind you, so I think this is a bug in Xcode 4.3). Open up Terminal and enter:

$ xcode-select -switch /Applications/Xcode.app/Contents/Developer

Now you can install autotools with Homebrew by typing

$ brew install autoconf automake

With this setup it is now possible to compile MacVim without any problems (actually, you don't need autotools to build MacVim since it comes with a pre-generated configure script but I need autoconf to generate said script).

Lastly, it turns out Haskell was broken by Xcode 4.3 as well (cabal would complain about gcc not being found). To fix it, open up /usr/bin/ghc with a text editor and look for the line which says pgmgcc="/Developer/usr/bin/gcc" and change this to say

pgmgcc="/usr/bin/gcc"

Using the conceal Vim feature with LaTeX

Vim 7.3 has just been released and with it comes the “conceal” feature (you can download MacVim 7.3 here). One neat application of this feature is that when editing LaTeX files certain backslash commands are replaced by their corresponding Unicode glyph. This is what I am talking about:

You’ll see that Greek letter commands, superscripts/subscripts and mathematics commands are concealed and in their place is rendered the corresponding Unicode glyph. The cursor line however is rendered as is without any concealment so that you can still edit the LaTeX code. (I have on purpose made this line identical to the line above it to let you see what has been concealed.) Inline mathematics (that which goes between two dollar signs) is shown on the last line. Note that the dollar signs are hidden. All in all this makes it a lot more pleasant to skim through a .tex file (but it won’t replace compiling and reading the pdf instead).

To help get you on your way there are a few things you need to know in order to get started with the conceal feature. First of all you need to enable it by typing :set cole=2. You'll immediately notice lots of grey on grey characters...uh, what? This is the (unfortunate) default syntax coloring for concealed items. To fix it you have to change the Conceal highlight group, e.g. try :hi Conceal guibg=white guifg=black (reverse the colors if you are using a dark color scheme). After fiddling a bit with the colors to match your color scheme you are ready to go!

However, I have found that concealed superscripts and subscripts often do not look very good and fortunately there is a way to disable them. Namely by adding the line let g:tex_conceal="adgm" to your ~/.vimrc file (it also works to put this line in ~/.vim/ftplugin/tex.vim as mentioned below). The g:tex_conceal variable is a string of one-character flags:

a = conceal accents/ligatures
d = conceal delimiters
g = conceal Greek
m = conceal math symbols
s = conceal superscripts/subscripts

Thus "adgm" means conceal everything except superscripts and subscripts. (I did not mention accents/ligatures earlier but "a" does what you'd expect: for example, \"a and \ae will turn into ä and æ respectively, if accents are enabled.)

The conceal support for editing tex files is still in its early stages and you may come across commands that do not get concealed, or perhaps you have some custom LaTeX commands that you would like to add to the list of concealed commands. In either case you should edit the file ~/.vim/after/syntax/tex.vim (create the folders and the file if they don't exist). Say you would like \eps to render as ε, then add this line:

syn match texGreek '\\eps\>' contained conceal cchar=ε

Mathematics commands should be added to the texMathSymbol group. For example, if you want \arr to be concealed by ←, then add this line:

syn match texMathSymbol '\\arr\>' contained conceal cchar=←

If you find standard LaTeX commands that should be concealed but aren't, please notify the tex syntax file author so that he may add them (you can find the contact details by looking at the syntax file :tabe $VIMRUNTIME/syntax/tex.vim).

Finally, I personally edit several different types of files and like to keep separate settings for each file type. The simplest way of doing this is to keep your filetype-specific settings inside ~/.vim/ftplugin/filetype.vim. Here's an excerpt from my ~/.vim/ftplugin/tex.vim file:

" Set colorscheme, enable conceal (except for
" subscripts/superscripts), and match conceal
" highlight to colorscheme
colorscheme topfunky-light
set cole=2
let g:tex_conceal= 'adgm'
hi Conceal guibg=White guifg=DarkMagenta

Some of the relevant help files on this topic are :h 'cole, :h 'cocu, and :h conceal. I should also mention :h 'ambw; it may be helpful to set this to double if you find that some Unicode glyphs "spill over" into the neighboring display cell.

MacVim Services (again)

In a previous post I discussed MacVim Services on Mac OS X 10.5 (Leopard) and earlier. With Mac OS X 10.6 (Snow Leopard) Apple has polished Services to make them more easily accessible, but unfortunately this broke some of the MacVim Services at the same time.  As of Snapshot 52 (released today!) MacVim Services work on Snow Leopard and in this post I’ll quickly demonstrate how they can be put to good use.

MacVim now exposes two Services: “New MacVim Buffer With Selection” and “New MacVim Buffer Here”. Both can be accessed in the usual (pre-10.6) manner via the Services submenu of the current applications menu, or via a context menu that pops up when you control-click (or right-click) something. The context menu is new in Snow Leopard and makes it so much easier to access Services.

The first Service (New MacVim Buffer With Selection) is available when you control-click the selection in any application (e.g. Safari). When used it will copy the selection, open a new MacVim buffer, and paste the selection into the buffer so you can start editing it.

The second Service (New MacVim Buffer Here) is available when you control-click a file or folder inside a Finder window. When used it will open a new MacVim buffer and set the current directory to that of the file or folder you had selected. This can be handy if you’ve browsed to some folder in the Finder and want to create a new text file inside that folder: simply control-click on any file in the folder, select the Service, add some text, then type :w filename to save the buffer in a file called filename in the folder you had open in the Finder.

Finally, if you don’t want these menu entries clogging up your context menus there is an easy way to disable them: open up System Services, click on Keyboard and select the Keyboard Shortcuts tab. In the left-hand list click on Services to bring up a list of avaiable Services in the right-hand view. Search for the Services you don’t want and untick them one at a time and they won’t bother you again.

MacVim on Snow Leopard

As a courtesy to early adopters I am posting a link to a custom binary of MacVim that I currently use on Snow Leopard (+cscope, +perl, +python, +ruby, +tcl, 32 bit Intel, 10.6 only).  When I get time I will make a proper snapshot and post it via the usual channels and remove this binary [edit: it has been removed now].  Note that I cannot provide any support for this binary.  If you do run into problems I would appreciate if you report it on the vim_mac mailing list (not in the comments here).

You can always build your own binary but do note that the icon generation currently is broken.  This can be worked around by commenting out lines 52-57 and 242 in the src/MacVim/icons/docerator.py script. All build issues have been fixed now and the build procedure has been simplified, so go ahead and build your own 64 bit binary — it has never been easier.

To conclude this story: I have now uploaded a new snapshot that will run as 64 bit on Snow Leopard.  Enjoy!

Using gettext in a Cocoa application

Vim uses gettext (from libintl) to support localized messages like (I’m guessing) many *nix programs do. Mac OS X on the other hand uses bundles for localized messages so I had to struggle somewhat to get these two to cooperate when internationalizing MacVim. In this post I’ll describe how gettext and Cocoa decides which language to use for localized messages and how I managed to get both to choose the same language.

The “International” System Preference is used to control which language to use for localized messages. Unfortunately, Cocoa applications uses the “Language” tab setting whereas gettext uses the “Formats” tab. For example, if I choose English as my preferred language in the “Language” tab, but my “Formats” tab region set to “Sweden”, then gettext will use Swedish for messages (if available) but Cocoa will use English.

My preference was to get gettext to behave in the same manner as Cocoa instead of the other way around. Fortunately, this is made easy via the LANGUAGE environment variable. This is a colon separated list of preferred languages (e.g. sv:en) and it takes precedence over the LC_* and LANG environment variables when gettext chooses which language to use for localized messages. The list of user-preferred languages (as set in the "Language" tab) can be accessed via +[NSLocale preferredLanguages]. The following Objective-C code sets LANGUAGE to match the user's choice:

NSArray *languages = [NSLocale preferredLanguages];
if (languages && [languages count] > 0) {
    int i, count = [languages count];
    for (i = 0; i < count; ++i) {
        if ([[languages objectAtIndex:i]
                isEqualToString:@"en"]) {
            count = i+1;
            break;
        }
    }
    NSRange r = { 0, count };
    NSString *s = [[languages subarrayWithRange:r]
            componentsJoinedByString:@":"];
    setenv("LANGUAGE", [s UTF8String], 0);
}

One note about this code: I've only included fallback languages up to (and including) "en" (English) since Vim has strings in English in the actual code and hence there is no .mo file for English translations. If I did not disregard all languages after "en" then English would never be used. Also note that the last parameter in the call to setenv() is 0 so that if the user has already set LANGUAGE then we do not override the user's choice.

Inverse functions in Haskell

In this post I will show a simple way of finding the inverse of a function on the real line in Haskell.

Let f be a continuous (real) function defined on a closed interval [a,b]. The inverse of f is well-defined if and only if f is injective (i.e. no value can be assumed at two distinct points, i.e. f(x)=f(y) implies x=y). This is not a property that Haskell can easily check for us, so it is up to us to make sure that f is injective. From now on, we assume that f is injective.

The problem at hand is this: for each y in the range of f find x such that f(x)=y (the range of f in our case is simply the interval [f(a),f(b)], or [f(b),f(a)] if f(b)<f(a)). The function which maps y to x is called the inverse of f. This problem can be solved by introducing a new function F(x)=f(x)-y and then finding the zero of F for each y in the range of f. Note that I say the zero here, since f is assumed to be injective and hence the zero is unique. In general the function F need not be linear, so we’ll need a non-linear equation solver to tackle this problem.

To find zeros of a non-linear equation I’ll use the bisection method. Given f and two points l<r such that f changes sign on the open interval (l,r), the bisection method halves the interval and picks a subinterval where f changes sign and repeats. If the function is zero at the midpoint the method stops (otherwise f must change sign on at least one of the two subintervals since f is continuous and hence the intermediate value theorem applies).

When implementing the bisection method we will eventually run out of precision and the interval can no longer be halved. Our implementation checks for this condition first, and bisects only if it is possible. (For simplicity, we always bisect to maximum precision in this implementation instead of allowing an arbitrary precision to be specified. This is fine as long as we’re using fixed precision arithmetic and the function f does not take too long to evaluate.)

> bisect' f l r
>     | m <= l    = l
>     | m >= r    = r
>     | f l * f m < 0 = bisect' f l m
>     | f m * f r < 0 = bisect' f m r
>     | otherwise = m
>     where m  = (l+r)/2

Note that f(l)*f(m)<0 implies that f changes sign on the open interval (l,m) and f(m)*f(r)<0 implies that f changes sign on (m,r).

The function bisect' is guaranteed to find a zero if f changes sign on (l,r). If f does not change sign then it will still return a solution even though there may be none. To fix this problem so that our bisection method only returns a zero if there is one we call this function from a "wrapper" which checks that f changes sign. Also, we make some sanity checks on the endpoints a and b.

> bisect f a b
>     | a > b = bisect f b a
>     | f a * f b < 0 = bisect' f a b
>     | f a == 0 = a
>     | f b == 0 = b
>     | otherwise = error "bisect: failed"

Normally the return value should be wrapped in a Maybe and return Nothing instead of raising an exception in the case where the method fails, but for my purposes I really want the program to halt if there is no solution, hence the use of error.

With bisect in hand we can now easily find the inverse of a continuous and injective function f defined on the closed interval [a,b]:

> inverse f a b y = bisect (\x -> f x - y) a b

Let's use this to find the inverse of a function f on [0,1] which cannot be inverted "by hand". Note the f below is injective on this interval, but not on any interval which strictly contains [0,1]. In GHCi:

*Main> let f x = x^2 * abs (tan x)
*Main> let g = inverse f 0 1
*Main> g 1
0.8952060453842319
*Main> f it
1.0
*Main> f (g 0.3)
0.30000000000000004
*Main> it - 0.3
5.551115123125783e-17

That looks good! (Haskell uses double precision floating point, so the last result is zero up to machine epsilon [approx. 10-16 for double precision].) Note that you have to be careful with the inverse g since it is only defined on the range of f, namely [f(0),f(1)] in this case. At this point it would probably make sense to write some QuickCheck tests, but I'll stop now anyway.