This manual covers installation, configuration, and various uses of the rApache software distribution and supporting packages. This document is intended to contain the most up-to-date information about rApache. Comments, suggestions should be forwarded to the maintainer.
rApache is a project supporting web application development using the R statistical language and environment and the Apache web server. The current software distribution runs on UNIX/Linux and Mac OS X operating systems. Apache servers with threaded Multi-Processing Modules are now supported, but the the Apache Prefork Multi-Processing Module is still recommended (refer to the Multi-Processing Modules chapter from Apache for more about this).
The rApache software distribution provides the Apache module named mod_R that embeds the R interpreter inside the web server. It also comes bundled with libapreq, an Apache module for manipulating client request data. Together, they provide the glue to transform R into a server-side scripting environment.
Another important project that's not bundled with rApache, but plays an important role in server-side scripting, is the R package brew (also available on CRAN). It implements a templating framework for report generation, and it's perfect for generating HTML on the fly. it's syntax is similar to PHP, Ruby's erb module, Java Server Pages, and Python's psp module. brew can be used stand-alone as well, so it's not part of the distribution.
As with any web server-side scripting environment, rApache is at the mercy of YOU THE PROGRAMMER. If you allow your R code to accept information from a 3rd party, then it is up to you to vet that information appropriately. So, the author maintains that rApache is more or less as secure as other popular web scripting environments such as PHP, Ruby on Rails, etc.
rApache follows the typical GNU/Linux source install procedure: run 'configure', then 'make', and 'make install' from the shell.
At the moment, rApache is a source-only downloadable package. I've made attempts in the past to learn the Debian packaging system but have not quite got it down yet. If anyone is interested in providing binary packages, for any Linux distribution, please email me.
Requirements for installing and using rApache are as follows:
configure does it's best to probe your system to meet the above requirements. Failing that, you should use the following flags:
Here's an example with some common argument values:
./configure \ --with-R=/usr/bin/R \ --with-apache2-apxs=/usr/bin/apxs2 \ --with-apreq2-config=/usr/bin/apreq2-config
Running make with no target will build rApache. Use 'make install' to install. The following make targets are also available:
Run the following as root or use sudo before each command:
apt-get install r-base-dev apache2-mpm-prefork apache2-prefork-dev wget http://biostat.mc.vanderbilt.edu/rapache/files/rapache-latest.tar.gz rapachedir=`tar tzf rapache-latest.tar.gz | head -1` tar xzvf rapache-latest.tar.gz cd $rapachedir ./configure make make install
This chapter details what to put in your Apache server configuration file(s). If you install Apache from source, then you will only have to edit the main file httpd.conf. If you install from a binary distribution, chances are the configuration is split among multiple files, so you may want to research your distribution's config layout before making any changes.
This is how Apache loads mod_R (must be placed first before any rApache directives):
# All Apache modules are loaded this way. The most important thing to # remember is that the string "R_module" is case sensitive, so get it # right in the config file. LoadModule R_module /apache/module/path/mod_R.so
LoadModule is an apache directive that links in object files that contain module structures. rApache's module structure is named "R_Module".
NOTE:
When attempting to start apache2 on some UNIX systems, you may see an error message similar to this one:
apache2: Syntax error on line 186 of /etc/apache2/apache2.conf: Cannot load
/usr/lib/apache2/modules/mod_R.so into server: libR.so: cannot open shared
object file: No such file or directory
To fix this, you will want to instruct your run-time linker, ld.so, on where to find libR.so. The easiest way to do this is by adding the directory to the /etc/ld.so.conf file and re-running ldconfig as root. If you don't know the location you can easily find out by running:
$ R CMD config --ldflags
-L/usr/lib/R/lib -lR
In the output above, /usr/lib/R/lib is the directory you will want to place in /etc/ld.so.conf.
3.2 ROutputErrors
The presence of this directive tells rApache to turn R errors into HTML and print them to the browser rather than to apache's error log file. Note that this alters the HTTP response status code sent to the browser. When ROutputErrors is placed in the apache config file, 200 OK is returned. When it is omitted from the config file, 500 INTERNAL_SERVER_ERROR is returned. Here's a config snippet:
# Place this in the config file to turn error messages into HTML which # are printed in the browser. Without this, all warning and error messages # are printed to the Apache error log file. ROutputErrors
When configuring rApache for the first time, you may want to add the following directive to ensure that your system is working. It produces a report about R and Apache when you visit the url at /RApacheInfo. For production systems, you might want to leave it out.
# Prints out a nice report about R running within Apache <Location /RApacheInfo> SetHandler r-info </Location>View a sample here.
This directive takes one argument: a string containing R expressions to evaluate upon startup. Any number of these directives can appear throughout the config files, and they are evaluated in the global environment in the order they appear. Useful for setting options and loading libraries like so:
# Load required DBI and RMySQL packages REvalOnStartup "library(DBI); library(RMySQL)"
Sometimes you want to evaluate quite a bit of code on startup. Equivalent to calling source() in the global environment. Just like REvalOnStartup, these directives can appear anywhere in the config files and they are evaluated in the order that they appear.
# Configure system with startup file RSourceOnStartup "/var/www/lib/R/startup.R"
It's important to read through the following section carefully as it will easily cause the most confusion. You have two Apache directives, two rApache directives, and two SetHandler options to learn.
The Apache directives Location and Directory are used here to match up urls and files to R handlers. Certainly, Apache is very configurable and other directives exist to provide more fine-grained control over urls and files, but please refer to the Apache documentation for further info.
"Location" is used to define the behavior of url's that do not map to files on the filesytem. In rApache, they can be used to invoke an R handler. For instance, suppose we have set up rApache on the site example.com and added the following "Location" directive as in section 2.3:
<Location /Risneat> SetHandler r-info </Location>
Then, the url http://example.com/Risneat will invoke the handler r-info, but that url doesn't map to any file on the filesystem.
"Directory" is used to define the behavior of files on the filesystem. In rApache, they can be used to evaluate files containing R expressions through an R handler like so:
# Any file under /var/www/brew is passed through the function brew located in # the package brew <Directory /var/www/brew> SetHandler r-script RHandler brew::brew </Directory>
Suppose example.com's Apache DocumentRoot was /var/www/, then the file /var/www/brew/foo.html maps to the url http://example.com/brew/foo.html and is run throuh the R package brew.
r-handler, r-script, and r-info are valid arguments to Apache's SetHandler directive ("r-info" is already described in section 2.3). Calling SetHandler forces url's to be parsed through the named handler.
r-handler is used when you want to call an R function without arguments. You can also specify a particular file as well (see below). It is generally used within Location directives.
r-script is used when you want to call an R function with 2 arguments: the first is the full path to the file, and the second is an R environment. It is generally used within Directory directives.
Using either of these requires that the function or script return a suitable HTTP response code. For most cases this will be the value OK, which sends an HTTP response code of 200 to the browser. If the function or script would like to signal an error condition, then it should return an object with an S3 class of 'try-error' (read the R documentation for try and tryCatch for further info on the 'try-error' class).
RHandler is used to specify an R function to handle incoming web requests. The function must exist either in an attached package or it must be found on the R search path. You can use the "::" to preface the function with the package name, just as in R. Examples:
# Specify foo as the function to run. Probably created # by REvalOnStartup or RSourceOnStartup RHandler foo # Run the function bar located in the package foo RHandler foo::bar
RFileHandler is used to specify a file and/or the function to handle incoming web requests. The "::" notation is used to specifiy the file and the function together. Absolute paths to files are expected. Examples:
# Hello world example. equivalent to calling source('/var/www/R/hello.R') # on each request. RFileHandler /var/www/R/hello.R # Call the function foo within the file bar.R RFileHandler /var/www/R/bar.R::foo
Another note about RFileHandler. Each file specified is parsed only when its timestamp changes. This is useful for debugging, and once you are happy with the functionality, you may want to place it in a package use RSourceOnStartup and turn it into a function call instead for more efficiency.
By far the easiest configuration will be the brew example. Each file under /var/www/brew is treated as a brew script:
# Any file under /var/www/brew is passed through the function brew located in # the package brew. <Directory /var/www/brew> SetHandler r-script RHandler brew::brew </Directory>
Another option is to use sys.source:
# Any file under /var/www/R-files is passed through the function sys.source. <Directory /var/www/R-files> SetHandler r-script RHandler sys.source </Directory>
Hello World for the url at /test/helloworld:
# Runs the R expressions in helloworld.r for every request # that matches /test/helloworld, including /test/helloworld/foobar <Location /test/helloworld> SetHandler r-handler RFileHandler /path/to/R/scripts/helloworld.r </Location>
The following functions can be used inside R handlers.
Add HTTP Response Headers (RFC2616) to the response. All headers must be added before the first output from print() or cat().
Example:
setHeader(header='X-Powered-By',value='rApache')
Arguments:
Returns:
Allows handler to set the content type of the request. Must be called before output with print or cat().
Example:
setContentType(type='image/png')
Arguments:
Returns:
Add HTTP Cookies to the response headers. In the simplest case, calling setCookie('foo','bar') sets the cookie 'foo' to the value 'bar'. Calling setCookie('foo') will delete the cookie 'foo'. Any non-standard key value pairs can be appended by using ...
Example:
setCookie(name='sessionID',value=paste(rnorm(1)))
Arguments:
Returns:
Percent encoding and decoding (url for short) of character vectors.
Example:
urlEncode(str='hello world@example.com') urlDecode(str='hello+world%40example.com')
Arguments:
Returns:
Print out a report (sample) about rApache. Should be the only call in your R handler. Equivalent to using "SetHandler r-info".
Example:
RApacheInfo()
Arguments:
Returns:
Sends binary data to the browser. This function is equivalent to R's writeBin() function, but the connection argument is ignored. See the documentation to writeBin() in your R distribution for more information.
sendBin(object=readBin(t,'raw',n=file.info(t)$size))
Arguments:
Returns:
Determines where rApache sends error output: to the browser or to the apache error log. Overrides the module-wide apache config directive ROutputErrors. It also governs which HTTP response code to send to the requestor, either 200 when status=TRUE or 500 status=FALSE.
# Turn warnings and errors into HTML comments RApacheOutputErrors(status=TRUE,prefix='<!--\n',suffix='-->\n')
Arguments:
Returns:
In previous releases of rApache, information from the web server was passed to R handlers in a single variable. This system design element was copied from other apache modules, but it has proven to be too cumbersome to support in software maintenance. Rather, because R has support for lexical scoping and in a far broader sense the ability to manipulate the language, a simpler approach was implemented.
rApache variables, named similar to PHP variables, are read-only list variables whose values are in most cases character vectors. They are injected into the environment of the R handler, and they are found by your R code via lexical scoping rules.
The GET variable contains those values obtained from an HTTP GET method, i.e. the key-value pairs found after the "?" of an URL, or data passed from an HTTP form when the method attribute is "GET". For example, the following form:
<form method="GET" action="http://example.com/brew/get.html"> <input type="text" name="p1" value="0.95"> <input type="text" name="p2" value="0.7"> <input type="submit" name="Submit"> </form>produces the following GET list variable:
> str(GET) List of 3 $ p1 : chr "0.95" $ p2 : chr "0.7" $ Submit: chr "Submit Query"
The POST variable contains those values obtained from an HTTP POST method, i.e. data passed from an HTTP form when the method attribute is "POST". Switching the method to "POST" in the example form from the previous section will produce the same values, yet in the POST variable:
> str(POST) List of 3 $ p1 : chr "0.95" $ p2 : chr "0.7" $ Submit: chr "Submit Query"
The COOKIES variable contains those values obtained from the HTTP response header named "Cookie". It is a list variable whose values are character vectors. See the setCookie function and this link for more info.
The FILES variable contains information about uploaded files via HTTP forms when the enctype attribute is set to "multipart/form-data". The following form:
<form enctype="multipart/form-data" method="POST" action="URL"> <input type="file" name="FirstFile"> <input type="file" name="SecondFile"> <input type="submit" name="Upload">produces this FILES variable:
> str(FILES) List of 2 $ FirstFile :List of 2 ..$ name : chr "useR2007poster.pdf" ..$ tmp_name: chr "/tmp/apreqc9GlXE" $ SecondFile:List of 2 ..$ name : chr "rapache-1.0.0-useR2007.tar.gz" ..$ tmp_name: chr "/tmp/apreqoQ2hhX"
It is a list of lists, with the nested list giving you the "name" of the file and the "tmp_name", the location of the temporary file. For instance, to reference the uploaded file from the input tag named "FirstFile" you would use FILES$FirstFile$tmp_name. Here's a code snippet to copy the file to the '/usr/local/uploaded_files' directory:
destination <- file.path('/usr/local/uploaded_files',FILES$FirstFile$name) file.copy(FILES$FirstFile$tmp_name,destination,overwrite=TRUE)
NOTE: the temporary files are deleted after the R handler completes handling the request. Thus, it is imperative that you copy/move this file to your desired location before the R handler returns.
As you can see from the below output, the SERVER variable contains a wealth of information about the incoming web request:
> str(SERVER) List of 30 $ headers_in :List of 9 ..$ Host : chr "localhost:8181" ..$ User-Agent : chr "Mozilla/5.0 (X11; U; Linux i686; en-US; ..." ..$ Accept : chr "text/xml,application/xml,application/x..." ..$ Accept-Language: chr "en-us,en;q=0.5" ..$ Accept-Encoding: chr "gzip,deflate" ..$ Accept-Charset : chr "ISO-8859-1,utf-8;q=0.7,*;q=0.7" ..$ Keep-Alive : chr "300" ..$ Connection : chr "keep-alive" ..$ Cache-Control : chr "max-age=0" $ proto_num : int 1001 $ protocol : chr "HTTP/1.1" $ unparsed_uri : chr "/brew/server.html/beetles/?foo=bar" $ uri : chr "/brew/server.html/beetles/" $ filename : chr "/home/hornerj/rapache/branches/rapache-1-0-br..." $ canonical_filename: chr "/home/hornerj/rapache/branches/rapache-1-0-br..." $ path_info : chr "/beetles/" $ args : chr "foo=bar" $ content_type : chr "text/html" $ handler : chr "r-script" $ content_encoding : NULL $ range : NULL $ hostname : chr "localhost" $ user : NULL $ header_only : logi FALSE $ no_cache : logi FALSE $ no_local_copy : logi FALSE $ assbackwards : logi FALSE $ status : int 200 $ method_number : int 0 $ eos_sent : logi FALSE $ the_request : chr "GET /brew/server.html/beetles/?foo=bar HTTP/1.1" $ method : chr "GET" $ status_line : NULL $ bytes_sent : num 0 $ clength : num 0 $ remaining : num 0 $ read_length : num 0 $ request_time :'POSIXct', format: chr "2007-08-15 11:11:49" $ mtime :'POSIXct', format: chr "1969-12-31 18:00:00"
Here's a description of each list element:
headers_in list containing all the HTTP headers sent by the client.
proto_num Integer. Protocol version number of protocol; 1.1 = 1001
protocol Character. Protocol string, as given to us, or HTTP/0.9.
unparsed_uri Character. The URI without any parsing performed.
uri Character. The path portion of the URI.
filename Character. The name of the file with full path information.
canonical_filename Character. The true filename. Case and aliases/symbolic links have been resolved.
path_info Character. The suffix portion of the url after it has been matched to an asset that the web server knows about. An asset is either a file or an url defined by an Apache Location directive.
args Character. The HTTP GET data extracted from this request.
content_type Character. The content-type for the current request.
handler Character. The handler string that we use to call a handler function.
content_encoding Character. How to encode the data.
range Character. The HTTP Response header named "Range:".
hostname Character. The server hostname.
user Character. If an HTTP authentication check was made, this gets set to the user name.
header_only Logical. HEAD request, as opposed to GET.
no_cache Logical. This response can not be cached.
no_local_copy Logical. There is no local copy of this response.
assbackwards Logical. HTTP/0.9, 'simple' request (e.g. GET /foo\n w/no headers). Developers have found this a useful way to internally redirect without headers.
status Integer. Status line.
method_number Integer value of GET, POST, etc.
eos_sent Logical. A flag to determine if the eos bucket has been sent yet.
the_request Character. First line of the request.
method Character. Request method (eg. GET, HEAD, POST, etc.)
status_line Character. Status line, if set by script.
bytes_sent Numeric. Number of bytes sent.
clength Numeric. The 'real' content length.
remaining Numeric. Remaining bytes left to read from the request body.
read_length Numeric. Number of bytes that have been read from the request body.
request_time POSIXct DateTime object. Time when the request started.
mtime POSIXct DateTime object. Last modified time of the requested resource .
The following table describes variables that exist as integer vectors of length 1 in the R handler environment and are proper return values for R handlers. They consist of Apache module return values and HTTP Status Codes(see Status Code Definitions from RFC2616 for more info). The most reasonable response value and the one that most handlers will return is the value DONE.
name | value | description |
---|---|---|
DECLINDED | -1 | Module declines to handle |
DONE | -2 | Module has served response completely. |
OK | -0 | Module has handled this Apache response stage. |
HTTP_CONTINUE | 100 | |
HTTP_SWITCHING_PROTOCOLS | 101 | |
HTTP_PROCESSING | 102 | |
HTTP_OK | 200 | |
HTTP_CREATED | 201 | |
HTTP_ACCEPTED | 202 | |
HTTP_NON_AUTHORITATIVE | 203 | |
HTTP_NO_CONTENT | 204 | |
HTTP_RESET_CONTENT | 205 | |
HTTP_PARTIAL_CONTENT | 206 | |
HTTP_MULTI_STATUS | 207 | |
HTTP_MULTIPLE_CHOICES | 300 | |
HTTP_MOVED_PERMANENTLY | 301 | |
HTTP_MOVED_TEMPORARILY | 302 | |
HTTP_SEE_OTHER | 303 | |
HTTP_NOT_MODIFIED | 304 | |
HTTP_USE_PROXY | 305 | |
HTTP_TEMPORARY_REDIRECT | 307 | |
HTTP_BAD_REQUEST | 400 | |
HTTP_UNAUTHORIZED | 401 | |
HTTP_PAYMENT_REQUIRED | 402 | |
HTTP_FORBIDDEN | 403 | |
HTTP_NOT_FOUND | 404 | |
HTTP_METHOD_NOT_ALLOWED | 405 | |
HTTP_NOT_ACCEPTABLE | 406 | |
HTTP_PROXY_AUTHENTICATION_REQUIRED | 407 | |
HTTP_REQUEST_TIME_OUT | 408 | |
HTTP_CONFLICT | 409 | |
HTTP_GONE | 410 | |
HTTP_LENGTH_REQUIRED | 411 | |
HTTP_PRECONDITION_FAILED | 412 | |
HTTP_REQUEST_ENTITY_TOO_LARGE | 413 | |
HTTP_REQUEST_URI_TOO_LARGE | 414 | |
HTTP_UNSUPPORTED_MEDIA_TYPE | 415 | |
HTTP_RANGE_NOT_SATISFIABLE | 416 | |
HTTP_EXPECTATION_FAILED | 417 | |
HTTP_UNPROCESSABLE_ENTITY | 422 | |
HTTP_LOCKED | 423 | |
HTTP_FAILED_DEPENDENCY | 424 | |
HTTP_UPGRADE_REQUIRED | 426 | |
HTTP_INTERNAL_SERVER_ERROR | 500 | |
HTTP_NOT_IMPLEMENTED | 501 | |
HTTP_BAD_GATEWAY | 502 | |
HTTP_SERVICE_UNAVAILABLE | 503 | |
HTTP_GATEWAY_TIME_OUT | 504 | |
HTTP_VERSION_NOT_SUPPORTED | 505 | |
HTTP_VARIANT_ALSO_VARIES | 506 | |
HTTP_INSUFFICIENT_STORAGE | 507 | |
HTTP_NOT_EXTENDED | 510 |
Another suitable value to return from a handler is an object with S3 class of 'try-error'.
For a complete look at an rApache appliction, download the useR2007 application which uses Hmisc and brew for power and sample size calculations.
The following code exercises all rApache functionality by just echoing what was sent from the browser. Copy and paste the following code to a file and then set up apache with the following configuration. You should then be able to point your browser at http://example.com/rapachetest (replacing example.com with your own hostname).
# # Place this in your Apache config file. # <Location /rapachetest> SetHandler r-handler RFileHandler /var/www/R/test.R </Location>
# # Copy and save this code to /var/www/R/test.R # hrefify <- function(title) gsub('[\\.()]','_',title,perl=TRUE) scrub <- function(str){ if (is.null(str)) return('NULL') if (length(str) == 0) return('length 0 string') cat("\n<!-- before as.character: (",str,")-->\n",sep='') str <- as.character(str) cat("\n<!-- after as.character: (",str,")-->\n",sep='') str <- gsub('&','&',str); str <- gsub('@','_at_',str); str <- gsub('<','<',str); str <- gsub('>','>',str); if (length(str) == 0 || is.null(str) || str == '') str <- ' ' str } cl<-'e' zebary <- function(i){ cl <<- ifelse(cl=='e','o','e') cat('<tr class="',cl,'"><td>',scrub(i),'</td></tr>\n',sep='') } zeblist <- function(i,l){ cl <<- ifelse(cl=='e','o','e') cat('<tr class="',cl,'"><td class="l">',names(l)[i],'</td><td>') if(is.list(l[[i]])) zebra(names(l)[i],l[[i]]) else { if (length(l[[i]]) > 1) zebary(l[[i]]) else cat(scrub(l[[i]])) } cat('</td></tr>\n',sep='') } zebra <- function(title,l){ cat('<h2><a name="',hrefify(title),'"> </a>',title,'</h2>\n<table><tbody>',sep='') ifelse(is.list(l),lapply(1:length(l),zeblist,l), lapply(l,zebary)) cat('</tbody></table>\n<br/><hr/>') } # Output starts here setContentType("text/html") if(is.null(GET)){ called <- 1 } else { called <- as.integer(GET$called) + 1 } setCookie('called',called,expires=Sys.time()+100) cat('<HTML><head><style type="text/css">\n') cat('table { border: 1px solid #8897be; border-spacing: 0px; font-size: 10pt; }') cat('td { border-bottom:1px solid #d9d9d9; border-left:1px solid #d9d9d9; border-spacing: 0px; padding: 3px 8px; }') cat('td.l { font-weight: bold; width: 10%; }\n') cat('tr.e { background-color: #eeeeee; border-spacing: 0px; }\n') cat('tr.o { background-color: #ffffff; border-spacing: 0px; }\n') cat('</style></head><BODY><H1>Canonical Test for rApache</H1>\n') cat('<form enctype=multipart/form-data method=POST action="?called=',called,'">\n',sep='') cat('Enter a string: <input type=text name=name value=""><br>\n',sep='') cat('Enter another string: <input type=text name=name value=""><br>\n',sep='') cat('Upload a file: <input type=file name=fileUpload><br>\n') cat('Upload another file: <input type=file name=anotherFile><br>\n') cat('<input type=submit name=Submit>') cat("<hr>\n") zebra('CGI GET Data',GET) zebra('CGI POST Data',POST) zebra('Cookies',COOKIES) if (!is.null(FILES)){ cat('<h2>Files Uploaded in POST Data</h2>\n') for (n in names(FILES)){ zebra(paste("Form Variable",n),FILES[[n]]) } } zebra("SERVER Variables",SERVER) cat("</BODY></HTML>\n") DONE
The rApache source code is licensed under the Apache License Version 2.0.
To cite rApache, use the following:
A BibTeX entry for LaTeX users is
@Manual{, title = {rApache: Web application development with R and Apache.}, author = {Jeffrey Horner}, year = {2011}, url = {http://www.rapache.net/}, }
Thanks to the following people for their contributions, giving advice, noticing when things were broken and such. If I've forgotten to mention you, please email me.
Gregoire Thomas Jan de Leeuw Keven E. Thorpe Jeremy Stephens Aleksander Wawer David Konerding Robert Kofler Jeroen Ooms Michael Driscoll