Introduction
savvy is a simple R extension interface using Rust, like the
extendr framework. The name “savvy” comes
from the Japanese word “錆” (pronounced as sàbí), which means “Rust”.
With savvy, you can automatically generate R functions from Rust code. This is an example of what a savvy-powered function would look like:
Rust
use savvy::savvy;
use savvy::NotAvailableValue; // for is_na() and na()
/// Convert to Upper-case
///
/// @param x A character vector.
/// @export
#[savvy]
fn to_upper(x: StringSexp) -> savvy::Result<savvy::Sexp> {
// Use `Owned{type}Sexp` to allocate an R vector for output.
let mut out = OwnedStringSexp::new(x.len())?;
for (i, e) in x.iter().enumerate() {
// To Rust, missing value is an ordinary value. In `&str`'s case, it's just "NA".
// You have to use `.is_na()` method to distinguish the missing value.
if e.is_na() {
// Set the i-th element to NA
out.set_na(i)?;
continue;
}
let e_upper = e.to_uppercase();
out.set_elt(i, e_upper.as_str())?;
}
out.into()
}
R
to_upper(c("a", "b", "c"))
#> [1] "A" "B" "C"
Examples
A toy example R package can be found in R-package/
directory.
Links
Thanks
Savvy is not quite unique. This project is made possible by heavily taking inspiration from other great projects:
- The basic idea is of course based on extendr. Savvy would not exist without extendr.
- cpp11’s “writable” concept influenced the design a lot. Also, I learned a lot from the great implementation such as the protection mechanism.
- PyO3 made me realize that the FFI crate doesn’t need to be a “sys” crate.
Get Started
Prerequisite
Rust
First of all, you need a Rust toolchain installed. You can follow the official instruction.
If you are on Windows, you need an additional step of installing
x86_64-pc-windows-gnu target.
rustup target add x86_64-pc-windows-gnu
A helper R package
Then, install a helper R package for savvy.
install.packages(
"savvy",
repos = c("https://yutannihilation.r-universe.dev", "https://cloud.r-project.org")
)
Note that, under the hood, this is just a simple wrapper around savvy-cli. So,
if you prefer shell, you can directly use the CLI instead, which is available on
the releases.
Create a new R package
First, create a new R package. usethis::create_package() is convenient for
this.
usethis::create_package("path/to/foo")
Then, move to the package directory and generate necessary files like Makevars
and Cargo.toml, as well as the C and R wrapper code corresponding to the Rust
code. savvy::savvy_init() does this all (under the hood, this simply runs
savvy-cli init).
Lastly, run devtools::document() to generate NAMESPACE and documents.
savvy::savvy_init()
devtools::document()
Now, this package is ready to install! After installing (e.g. by running “Install Package” on RStudio IDE), confirm you can run this example function that multiplies the first argument by the second argument.
library(<your package>)
int_times_int(1:4, 2L)
#> [1] 2 4 6 8
Package structure
After savvy::savvy_init(), the structure of your R package should look like below.
.
├── .Rbuildignore
├── DESCRIPTION
├── NAMESPACE
├── R
│ └── 000-wrappers.R <-------(1)
├── configure <-------(2)
├── configure.win <-------(2)
├── cleanup <-------(2)
├── cleanup.win <-------(2)
├── foofoofoofoo.Rproj
└── src
├── Makevars.in <-------(2)
├── Makevars.win.in <-------(2)
├── init.c <-------(3)
├── <your package>-win.def <---(4)
└── rust
├── .cargo
│ └── config.toml <-------(4)
├── api.h <-------(3)
├── Cargo.toml <-------(5)
└── src
└── lib.rs <-------(5)
000-wrappers.R: R functions for the corresponding Rust functionsconfigure*,cleanup*,Makevars.in, andMakevars.win.in: Necessary build settings for compiling Rust codeinit.candapi.h: C functions for the corresponding Rust functions<your package>-win.defand.cargo/config.toml: These are tricks to avoid a minor error on Windows. See extendr/rextendr#211 and savvy#98 for the details.Cargo.tomlandlib.rs: Rust code
Write your own function
The most revolutionary point of savvy::savvy_init() is that it kindly leaves
the most important task to you; let’s define a typical hello-world function for
practice!
Write some Rust code
Open src/rust/lib.rs and add the following lines. r_println! is the R
version of println! macro.
/// @export
#[savvy]
fn hello() -> savvy::Result<()> {
savvy::r_println!("Hello world!");
Ok(())
}
Update wrapper files
Every time you modify or add some Rust code, you need to update the C and R
wrapper files by running savvy::savvy_update() (under the hood, this simply
runs savvy-cli update). Don’t forget to run devtools::document() as well.
savvy::savvy_update()
devtools::document()
After re-installing your package, you should be able to run the hello()
function on your R session.
hello()
#> Hello world!
Key Ideas
Treating external SEXP and owned SEXP differently
Savvy is opinionated in many points. Among these, one thing I think should be explained first is that savvy uses separate types for SEXP passed from outside and that created within Rust function. The former, external SEXP, is read-only, and the latter, owned SEXP, is writable. Here’s the list:
| R type | Read-only version | Writable version |
|---|---|---|
INTSXP (integer) | IntegerSexp | OwnedIntegerSexp |
REALSXP (double) | RealSexp | OwnedRealSexp |
RAWSXP (raw) | RawSexp | OwnedRawSexp |
LGLSXP (logical) | LogicalSexp | OwnedLogicalSexp |
STRSXP (character) | StringSexp | OwnedStringSexp |
VECSXP (list) | ListSexp | OwnedListSexp |
EXTPTRSXP (external pointer) | ExternalPointerSexp | n/a |
CPLXSXP (complex)1 | ComplexSexp | OwnedComplexSexp |
You might wonder why this is needed when we can just use mut to distinguish
the difference of mutability. I mainly had two motivations for this:
- avoid unnecessary protection: an external SEXP are already protected by the caller, while an owned SEXP needs to be protected by ourselves.
- avoid unnecessary ALTREP checks: an external SEXP can be ALTREP, so it’s better to handle them in ALTREP-aware way, while an owned SEXP is not.
This would be a bit lengthy, so let’s skip here. You can read the details on my blog post. But, one correction is that I found the second reason might not be very important because a benchmark showed it’s more efficient to be non-ALTREP-aware in most of the cases. Actually, the current implementation of savvy is non-ALTREP-aware for int, real, and logical (See #18).
No implicit conversions
Savvy doesn’t provide conversion between types unless you do explicitly. For
example, you cannot supply a double vector to a function with a IntegerSexp
argument.
#[savvy]
fn identity_int(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
let mut out = OwnedIntegerSexp::new(x.len())?;
for (i, &v) in x.iter().enumerate() {
out[i] = v;
}
out.into()
}
identity_int(c(1, 2))
#> Error in identity_int(c(1, 2)) :
#> Unexpected type: Cannot convert double to integer
While you probably feel this is inconvenient, this is also a design decision. My concerns on supporting these conversion are
- Complexity. It would make savvy’s spec and implemenatation complicated.
- Hidden allocation. Conversion requires a new allocation for storing the converted values, which might be unhappy in some cases.
So, you have to write some wrapper R function like below. This might feel a bit tiring, but, in general, please do not avoid writing R code. Since you are creating an R package, there’s a lot you can do in R code instead of making things complicated in Rust code. Especially, it’s easier on R’s side to show user-friendly error messages.
identity_int_wrapper <- function(x) {
x <- vctrs::vec_cast(x, integer())
identity_int(x)
}
Alternatively, you can use NumericSexp as input. This provides a method to
convert the input either to i32 or to f64 on the fly. For more details,
please read the section about NumericSexp
#[savvy]
fn identity_num(x: NumericSexp) -> savvy::Result<savvy::Sexp> {
let mut out = OwnedIntegerSexp::new(x.len())?;
for (i, &v) in x.iter_i32().enumerate() {
out[i] = v;
}
out.into()
}
-
Complex is optionally supported under feature flag
complex↩
#[savvy] macro
This is a simple Rust function to add the specified suffix to the input
character vector. #[savvy] macro turns this into an R function.
use savvy::NotAvailableValue; // for is_na() and na()
/// Add Suffix
///
/// @export
#[savvy]
fn add_suffix(x: StringSexp, y: &str) -> savvy::Result<savvy::Sexp> {
let mut out = OwnedStringSexp::new(x.len())?;
for (i, e) in x.iter().enumerate() {
if e.is_na() {
out.set_na(i)?;
continue;
}
out.set_elt(i, &format!("{e}_{y}"))?;
}
out.into()
}
Convention for a #[savvy] function
The example function above has this signature.
fn add_suffix(x: StringSexp, y: &str) -> savvy::Result<savvy::Sexp>
As you can guess, #[savvy] macro cannot be applied to arbitrary functions. The
function must satisfy the following conditions:
- The function’s inputs can be
- a non-owned savvy type (e.g.,
IntegerSexpandRealSexp) - a corresponding Rust type for scalar (e.g.,
i32andf64) - a user-defined struct marked with
#[savvy](&T,&mut T, orT) - a user-defined enum marked with
#[savvy](&T, orT) - any of above wrapped with
Option(this is translated as an optional arg)
- a non-owned savvy type (e.g.,
- The function’s return value must be either
savvy::Result<()>for the case of no actual return valuesavvy::Result<savvy::Sexp>for the case of some return value of R objectsavvy::Result<T>for the case of some return value of a user-defined struct or enum marked with#[savvy]
How things work under the hood
If you mark a funtion with #[savvy] macro, the corresponding implementations are generated:
- Rust functions
- a wrapper function to handle Rust and R errors gracefully
- a function with the original body and some conversion from raw
SEXPs to savvy types.
- C function signature for the Rust function
- C implementation for bridging between R and Rust
- R implementation
For example, the above implementation generates the following codes. (#[savvy]
macro can also be used on struct and enum, but let’s focus on function’s
case for now for simplicity.)
Rust functions
(The actual code is a bit more complex to handle possible panic! properly.)
#[allow(clippy::missing_safety_doc)]
#[no_mangle]
pub unsafe extern "C" fn savvy_add_suffix__ffi(x: SEXP, y: SEXP) -> SEXP {
match savvy_add_suffix_inner(x, y) {
Ok(result) => result.0,
Err(e) => savvy::handle_error(e),
}
}
unsafe fn savvy_add_suffix_inner(x: SEXP, y: SEXP) -> savvy::Result<savvy::Sexp> {
let x = <savvy::RealSexp>::try_from(savvy::Sexp(x))?;
let y = <&str>::try_from(savvy::Sexp(y))?;
// original function
add_suffix(x, y)
}
// original function
fn add_suffix(x: StringSexp, y: &str) -> savvy::Result<savvy::Sexp> {
// ..original body..
}
C function signature
SEXP savvy_add_suffix__ffi(SEXP c_arg__x, SEXP c_arg__y);
C implementation
(let’s skip the details about handle_result for now)
SEXP savvy_add_suffix__impl(SEXP c_arg__x, SEXP c_arg__y) {
SEXP res = savvy_add_suffix__ffi(c_arg__x, c_arg__y);
return handle_result(res);
}
R implementation
The Rust comments with three slashes (///) is converted into Roxygen comments
on R code.
#' Add Suffix
#'
#' @export
add_suffix <- function(x, y) {
.Call(add_suffix__impl, x, y)
}
Using #[savvy] on other files than lib.rs
You can use #[savvy] macro just the same as lib.rs. Since #[savvy]
automatically marks the functions necessary to be exposed as pub, you don’t
need to care about the visibility.
For exampple, if you define a function in src/foo.rs,
#[savvy]
fn do_nothing() -> savvy::Result<()> {
Ok(())
}
just declaring mod foo in src/lib.rs is enough to make do_nothing()
available to R.
mod foo;
Handling Vector Input
Basic rule
As described in Key Ideas, the input SEXP is read-only. You cannot modify the values in place.
Methods
1. iter()
IntegerSexp, RealSexp, LogicalSexp, and StringSexp provide iter()
method so that you can access to the value one by one.
for (i, e) in x.iter().enumerate() {
// ...snip...
}
Similarly, NumericSexp, which handles both integer and double, provides
iter_i32() and iter_f64(). But, this might allocate if the type conversion
is needed.
2. as_slice() (for integer and double)
IntegerSexp and RealSexp can expose their underlying C array as a Rust slice
by as_slice().
/// @export
#[savvy]
fn foo(x: IntegerSexp) -> savvy::Result<()> {
some_function_takes_slice(x.as_slice());
Ok(())
}
Similarly, NumericSexp, which handles both integer and double, provides
as_slice_i32() and as_slice_f64(). But, this might allocate if the type
conversion is needed.
3. to_vec()
As the name indicates, to_vec() copies the values to a new Rust vector.
Copying can be costly for big data, but a vector is handy if you need to pass
the data around among Rust functions.
let mut v = x.to_vec();
some_function_takes_vec(v);
If a function requires a slice and the type is not integer or double, you have
no choice but to_vec() to create a new vector and then convert it to a slice.
let mut v = x.to_vec();
another_function_takes_slice(&v);
Missing values
There’s no concept of “missing value” on the corresponding types of Rust. So,
it looks a normal value to Rust’s side.
The good news is that R uses the sentinel values to represent NA, so it’s
possible to check if a value is NA to R in case the type is either i32,
f64 or &str.
i32: The minimum value ofintis used for representingNA.f64: A special value is used for representingNA.&str: ACHARSXPof string"NA"is used for representingNA; this cannot be distinguished by comparing the content of the string, but we can compare the pointer address of the underlying Cchararray.
By using NotAvailableValue trait, you can check if the value is NA by
is_na(), and refer to the sentinel value of NA by <T>::na(). If you care
about missing values, you always have to have an if branch for missing values
like below.
use savvy::NotAvailableValue;
/// @export
#[savvy]
fn sum_real(x: RealSexp) -> savvy::Result<savvy::Sexp> {
let mut sum: f64 = 0.0;
for e in x.iter() {
if !e.is_na() {
sum += e;
}
}
...snip...
}
The bad news is that bool is not the case. bool doesn’t have is_na() or
na(). NA is treated as TRUE without any errors. So, you have to make sure
the input doesn’t contain any missing values on R’s side. For example, this
function is not an identity function.
/// @export
#[savvy]
fn identity_logical(x: LogicalSexp) -> savvy::Result<savvy::Sexp> {
let mut out = OwnedLogicalSexp::new(x.len())?;
for (i, e) in x.iter().enumerate() {
out.set_elt(i, e)?;
}
out.into()
}
identity_logical(c(TRUE, FALSE, NA))
#> [1] TRUE FALSE TRUE
The good news is that LogicalSexp has an expert-only method as_slice_raw().
See “Logical” section of Integer, Real, String, Logical, And Complex
for the details.
Handling a scalar NA
You might find it a bit inconvenient that these functions that takes RealSexp
and IntegerSexp doesn’t accept NA; NA is logical.
sum_real(NA)
#> Error:
#> ! Argument `x` must be double, not logical
If you want to accept such a scalar NA, the primary recommendation is to
handle it in R code. But, you can also use Sexp as input. You can detect a
missing value by is_scalar_na() and then convert it to a specific type by
try_into().
/// @export
#[savvy]
fn sum_v2(x: savvy::Sexp) -> savvy::Result<savvy::Sexp> {
if x.is_scalar_na() {
return 0.0.try_into();
}
let x_real: RealSexp = x.try_into()?;
let mut sum: f64 = 0.0;
for e in x_real.iter() {
if !e.is_na() {
sum += e;
}
}
...snip...
}
Handling Vector Output
Basically, there are two ways to prepare an output to the R session.
1. Create a new R object first and put values on it
An owned SEXP can be allocated by using Owned{type}Sexp::new(). new() takes
the length of the vector as the argument. If you need the same length of vector
as the input, you can pass the len() of the input SEXP.
new() returns Result because the memory allocation can fail in case when the
vector is too large. You can probably just add ? to it to handle the error.
let mut out = OwnedStringSexp::new(x.len())?;
Use set_elt() to put the values one by one. Note that you can also assign
values like out[i] = value for integer and double. See Type-specific
Topics for more details.
for (i, e) in x.iter().enumerate() {
// ...snip...
out.set_elt(i, &format!("{e}_{y}"))?;
}
You can use set_na() to set the specified element as NA. For example, it’s a
common case to use this in order to propagate the missingness like below.
for (i, e) in x.iter().enumerate() {
// ...snip...
if e.is_na() {
out.set_na(i)?;
} else {
// ...snip...
}
}
After putting the values to the vector, you can convert it to Result<Sexp> by
into().
/// @export
#[savvy]
fn foo() -> savvy::Result<savvy::Sexp> {
let mut out = OwnedStringSexp::new(x.len())?;
// ...snip...
out.into()
}
2. Convert a Rust vector by methods like try_into()
Another way is to use a Rust vector to store the results and convert it to an R
object at the end of the function. This is also fallible because this anyway
needs to create a new R object under the hood, which can fail. So, this time,
the conversion is try_into(), not into().
// Let's not consider for handling NAs at all for simplicity...
/// @export
#[savvy]
fn times_two(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
let out: Vec<i32> = x.iter().map(|v| v * 2).collect();
out.try_into()
}
Note that, while this looks handy, this might not be very efficient; for example,
times_two() above allocates a Rust vector, and then copy the values into a new
R vector in try_into(). The copying cost can be innegligible when the vector
is very huge.
try_from_slice()
The same conversions are also available in the form of
Owned{type}Sexp::try_from_slice(). While this says “slice”, this accepts
AsRef<[T]>, which means both Vec<T> and &[T] can be used.
For converting the return value, probably try_from() is shorter in most of the
cases. But, sometimes you might find this useful (e.g., the return value is a
list and you need to construct the elements of it).
/// @export
#[savvy]
fn times_two2(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
let out: Vec<i32> = x.iter().map(|v| v * 2).collect();
let out_sexp: OwnedIntegerSexp::try_from_slice(out);
out_sexp.into()
}
try_from_iter()
If you only have an iterator, try_from_iter() is more efficient. This example
function is the case. The previous examples first collect()ed into a Vec,
but it’s not necessary in theory.
/// @export
#[savvy]
fn times_two3(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
let iter = x.iter().map(|v| v * 2);
let out_sexp: OwnedIntegerSexp::try_from_iter(iter);
out_sexp.into()
}
Note that, if you already have a slice or vec, you should use try_from_slice()
instead of calling iter() on the slice or vec and using try_from_iter(). In
such cases, try_from_slice() is more performant for integer, double, and
complex because it just copies the underlying memory into SEXP rather than
handling the elements one by one.
Handling Scalar
Input
Scalar inputs are handled transparently. The corresponding types are shown in the table below.
/// @export
#[savvy]
fn scalar_input_int(x: i32) -> savvy::Result<()> {
savvy::r_println!("{x}");
Ok(())
}
| R type | Rust scalar type |
|---|---|
| integer | i32 |
| double | f64 |
| logical | bool |
| raw | u8 |
| character | &str |
| complex | num_complex::Complex64 |
| integer or double | savvy::NumericScalar |
NumericScalar
NumericScalar is a special type that can handle both integeer and double. You
can get the value from it by as_i32() for i32, or as_f64() for f64.
These method converts the value if the input type is different from the target
type.
#[savvy]
fn times_two_numeric_i32_scalar(x: NumericScalar) -> savvy::Result<Sexp> {
let v = x.as_i32()?;
if v.is_na() {
(i32::na()).try_into()
} else {
(v * 2).try_into()
}
}
Note that, while as_f64() is infallible, as_i32() can fail when the
conversion is from f64 to i32 and
- the value is
Infor-Inf - the value is out of range for
i32 - the value is not integer-ish (e.g.
1.1)
For convenience, NumericScalar also provides a conversion to usize by
as_usize(). What’s good is that this can handle integer-ish numeric, which
means you can allow users to input a larger number than the integer max
(2147483647)!
fn usize_to_string_scalar(x: NumericScalar) -> savvy::Result<Sexp> {
let x_usize = x.as_usize()?;
x_usize.to_string().try_into()
}
usize_to_string_scalar(2147483648)
#> [1] "2147483648"
Output
Just like a Rust vector, a Rust scalar value can be converted into Sexp by
try_from(). It’s as simple as.
/// @export
#[savvy]
fn scalar_output_int() -> savvy::Result<savvy::Sexp> {
1.try_into()
}
Alternatively, the same conversion is available in the form of
Owned{type}Sexp::try_from_scalar().
/// @export
#[savvy]
fn scalar_output_int() -> savvy::Result<savvy::Sexp> {
let out = OwnedIntegerSexp::try_from_scalar(1)?;
out.into()
}
Missing values
If the type of the input is scalar, NA is always rejected. This is
inconsistent with the rule for vector input, but, this is my design decision in
the assumption that a scalar missing value is rarely found useful on Rust’s
side.
/// @export
#[savvy]
fn identity_logical_single(x: bool) -> savvy::Result<savvy::Sexp> {
let mut out = OwnedLogicalSexp::new(1)?;
out.set_elt(0, x)?;
out.into()
}
identity_logical_single(NA)
#> Error in identity_logical_single(NA) :
#> Must be length 1 of non-missing value
If you want to accept NA, the primary recommendation is to handle it in R
code. But, you can also use Sexp as input. You can detect a missing value
by is_scalar_na() and then convert it to a specific type by try_into().
/// @export
#[savvy]
fn times_two_numeric_i32_scalar_v2(x: savvy::Sexp) -> savvy::Result<savvy::Sexp> {
if x.is_scalar_na() {
return (i32::na()).try_into();
}
let x_num: NumericScalar = x.try_into()?;
let v = x_num.as_i32()?;
// Note: NA check is already done, so you don't need to check v.is_na()
(v * 2).try_into()
}
Optional Argument
To represent an optional argument, you can wrap it with Option. Then, the
corresponding R function sets the default value of NULL on the argument.
#[savvy]
fn default_value_vec(x: Option<IntegerSexp>) -> savvy::Result<Sexp> {
if let Some(x) = x {
x.iter().sum::<i32>().try_into()
} else {
(-1).try_into()
}
}
function(x = NULL) {
.Call(savvy_default_value_vec__impl, x)
}
This function works with or without the argument.
default_value_vec(1:10)
#> [1] 55
default_value_vec()
#> [1] -1
Type-specific Topics
You can use these types as an argument of a #[savvy] function.
| R type | vector | scalar |
|---|---|---|
| integer | IntegerSexp | i32 |
| double | RealSexp | f64 |
| integer or double | NumericSexp | NumericScalar |
| logical | LogicalSexp | bool |
| raw | RawSexp | u8 |
| character | StringSexp | &str |
| complex1 | ComplexSexp | Complex64 |
| list | ListSexp | n/a |
| (any) | Sexp | n/a |
If you want to handle multiple types, you can cast an Sexp into a specific
type by .into_typed() and write match branches to deal with each type. This
is important when the interface returns Sexp. For example, ListSexp returns
Sexp because the list element can be any type. For more details about List,
please read List section.
#[savvy]
fn print_list(x: ListSexp) -> savvy::Result<()> {
for (k, v) in x.iter() {
let content = match v.into_typed() {
TypedSexp::Integer(x) => {
format!(
"integer [{}]",
x.iter().map(|i| i.to_string()).collect::<Vec<String>>().join(", ")
)
}
TypedSexp::Real(x) => {
format!(
"double [{}]",
x.iter().map(|r| r.to_string()).collect::<Vec<String>>().join(", ")
)
}
TypedSexp::Logical(x) => {
format!(
"logical [{}]",
x.iter().map(|l| if l { "TRUE" } else { "FALSE" }).collect::<Vec<&str>>().join(", ")
)
}
TypedSexp::String(x) => {
format!(
"character [{}]",
x.iter().collect::<Vec<&str>>().join(", ")
)
}
TypedSexp::List(_) => "list".to_string(),
TypedSexp::Null(_) => "NULL".to_string(),
_ => "other".to_string(),
};
let name = if k.is_empty() { "(no name)" } else { k };
r_print!("{name}: {content}\n");
}
Ok(())
}
Likewise, NumericSxep also provides into_typed(). You can match it with
either IntegerSexp or RealSexp and apply an appropriate function.
Alternatively, you can rely on the type conversion that NumericSexp provides.
See more details in the next section.
#[savvy]
fn identity_num(x: NumericSexp) -> savvy::Result<savvy::Sexp> {
match x.into_typed() {
NumericTypedSexp::Integer(i) => identity_int(i),
NumericTypedSexp::Real(r) => identity_real(r),
}
}
-
Complex is optionally supported under feature flag
complex↩
Integer, Real, String, Logical, Raw, And Complex
Integer and real
In cases of integer (IntegerSexp, OwnedIntegerSexp) and real (RealSexp,
OwnedRealSexp), the internal representation of the SEXPs match with the Rust
type we expect, i.e., i32 and f64. By taking this advantage, these types has
more methods than other types:
as_slice()andas_mut_slice()IndexandIndexMut- efficient
TryFrom<&[T]>
as_slice() and as_mut_slice()
These types can expose its underlying C array as a Rust slice by as_slice().
as_mut_slice() is available only for the owned versions. So, you don’t need to
use to_vec() to create a new vector just to pass the data to the function that
requires slice.
/// @export
#[savvy]
fn foo(x: IntegerSexp) -> savvy::Result<()> {
some_function_takes_slice(x.as_slice());
Ok(())
}
Index and IndexMut
You can also access to the underlying data by [. These methods are available
only for the owned versions. This means you can write assignment operation like
below instead of set_elt().
/// @export
#[savvy]
fn times_two(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
let mut out = OwnedIntegerSexp::new(x.len())?;
for (i, &v) in x.iter().enumerate() {
if v.is_na() {
out[i] = i32::na();
} else {
out[i] = v * 2;
}
}
out.into()
}
Efficient TryFrom<&[T]>
TryFrom<&[T]> is not special to real and integer, but the implementation is
different from that of logical and string; since the internal representations
are the same, savvy uses copy_from_slice(), which does a
memcpy, to copy the data efficently (in logical and string case, the values
are copied one by one).
NumericSexp
It’s ideal to ensure the function takes the expected type on R’s side (e.g., you
can use vctrs::vec_cast(), or define S3 methods for integer and double
separately). But, it’s not always possible.
You can use NumericSexp to accept both real and integer. NumericSexp
provides a method to get either i32 or f64 values:
as_slice_i32()returns&[i32]. This is fallible.as_slice_f64()returns&[f64].iter_i32()returns an iterator ofResult<i32>.iter_f64()returns an iterator off64.
These functions return the underlying data directly if the type is the same as
wanted, otherwise converts the values. If the conversion is from f64 to i32,
it fails when any of the values is
Infor-Inf- out of range for
i32 - not integer-ish (e.g.
1.1)
For convenience, NumericSexp also provides iter_usize(), which returns an
iterator of Result<usize>.
With NumericSexp, you can rewrite the above times_two function like this:
#[savvy]
fn times_two(x: NumericSexp) -> savvy::Result<Sexp> {
let mut out = OwnedIntegerSexp::new(x.len())?;
for (i, v) in x.iter_i32().enumerate() {
let v = v?;
if v.is_na() {
out[i] = i32::na();
} else {
out[i] = v * 2;
}
}
out.into()
}
Alternatively, you can use .into_typed() and match the result to apply an
appropriate function depneding on the type. In this case, you need to define two
different functions, but this might be useful when the logic is very different
for integer values and real values.
#[savvy]
fn times_two(x: NumericSexp) -> savvy::Result<savvy::Sexp> {
match x.into_typed() {
NumericTypedSexp::Integer(i) => times_two_int(i),
NumericTypedSexp::Real(r) => times_two_real(r),
}
}
Logical
While logical is 3-state (TRUE, FALSE and NA) on R’s side, bool can
represent only 2 states (true and false). This mismatch is a headache. There
are many possible ways to handle this (e.g., use Option<bool>), but savvy
chose to convert NA to true silently, assuming NA is not useful on Rust’s
side anyway. So, you have to make sure the input logical vector doesn’t contain
NA on R’s side. For example,
wrapper_of_some_savvy_fun <- function(x) {
out <- rep(NA, length(x))
idx <- is.na(x)
# apply function only non-NA elements
out[x] <- some_savvy_fun(x[idx])
out
}
If you really want to handle the 3 states, use an expert-only method
as_slice_raw(). This returns &[i32] instead of &[bool]. Why i32? It’s
the internal representation of a logical vector, which is the same as an integer
vector. By treating the data as i32, you can use is_na().
use savvy::NotAvailableValue; // for is_na()
/// @export
#[savvy]
fn flip_logical_expert_only(x: LogicalSexp) -> savvy::Result<savvy::Sexp> {
let mut out = OwnedLogicalSexp::new(x.len())?;
for (i, e) in x.as_slice_raw().iter().enumerate() {
if e.is_na() {
out.set_na(i)?;
} else {
out.set_elt(i, *e != 1)?; // 1 means TRUE
}
}
out.into()
}
flip_logical_expert_only(c(TRUE, FALSE, NA))
#> [1] TRUE FALSE NA
String
STRSXP is a vector of CHARSXP, not something like *char. So, it’s not
possible to expose the internal representation as &str. So, it requires
several R’s C API calls. To get a &str
STRING_ELT()to subset aCHARSXPR_CHAR()to extract the string fromCHARSXP
Similarly, to set a &str
Rf_mkCharLenCE()to convert&strto aCHARSEXPSET_STRING_ELT()to put theCHARSXPto theSTRSXP
This is a bit costly. So, if the strings need to be referenced and updated
frequently, probably you should avoid using OwnedStringSexp as a substitute of
Vec<String>.
Encoding and 'static lifetime
While Rust’s string is UTF-8, R’s string is not guaranteed to be UTF-8. R
provides Rf_translateCharUTF8() to convert the string to UTF-8. However, savvy
chose not to use it. There are two reasons:
- As of version 4.2.0, R uses UTF-8 as the native encoding even on Windows systems. While old Windows systems are not the case, I bravely assumes it’s rare and time will solve.
- The result of
R_CHAR()is the string stored inR_StringHash, the globalCHARSXPcache. In my understanding, this will never be removed during the session. So, this allows savvy to mark the result&strwith'staticlifetime. However, the result ofRf_translateCharUTF8()is on anR_alloc()-ed memory (code), which can be claimed by GC.
In short, in order to stick with 'static lifetime for the sake of simplicity,
I decided to neglect relatively-rare case. Note that, invalid UTF-8 charactars
are rejected (= currently, silently replaced with "") by CStr, so it’s not
very unsafe.
Raw
A raw vector is the sequence of u8, which can be used for representing various
binary data. But, please be aware that you can use a Rust struct (see the
section about struct) to store the data instead of copying the
whole data into R’s memory.
Complex
Complex is optionally supported under feature flag complex. If it’s enabled,
you can use ComplexSexp and OwnedComplexSexp to use a complex vector for
input or output, and you can extract the slice of num_complex::Complex64 from
it.
/// @export
#[savvy]
fn abs_complex(x: savvy::ComplexSexp) -> savvy::Result<savvy::Sexp> {
let mut out = savvy::OwnedRealSexp::new(x.len())?;
for (i, c) in x.iter().enumerate() {
if !c.is_na() {
out[i] = (c.re * c.re + c.im * c.im).sqrt();
} else {
out.set_na(i)?;
}
}
out.into()
}
List
List is a different beast. It’s pretty complex. You might think of it as a
HashMap, but it’s different in that:
- List elements can be either named or unnamed individually (e.g.,
list(a = 1, 2, c = 3)). - List names can be duplicated (e.g.,
list(a = 1, a = 2)).
To make things simple, savvy treats a list as a pair of the same length of
- a character vector containing names, using
""(empty string) to represent missingness (actually, this is the convention of R itself) - a collection of arbitrary
SEXPelements
Since list is a very convenient data structure in R, you can come up with a lot of convenient interfaces for list. However, savvy intentionally provides only very limited interfaces. In my opinion, Rust should touch list data as little as possible because it’s too complex.
Read values from a list
names_iter()
names_iter() returns an iterator of &str.
/// @export
#[savvy]
fn print_list_names(x: ListSexp) -> savvy::Result<()> {
for k in x.names_iter() {
if k.is_empty() {
r_println!("(no name)");
} else {
r_println!(k);
}
r_println!("");
}
Ok(())
}
print_list_names(list(a = 1, 2, c = 3))
#> a
#> (no name)
#> c
values_iter()
values_iter() returns an iterator of Sexp enum. You can convert Sexp to
TypedSexp by .into_typed() and then use match to extract the inner data.
/// @export
#[savvy]
fn print_list_values_if_int(x: ListSexp) -> savvy::Result<()> {
for v in x.values_iter() {
match v.into_typed() {
TypedSexp::Integer(i) => r_println!("int {}\n", i.as_slice()[0]),
_ => r_println("not int")
}
}
Ok(())
}
print_list_values_if_int(list(a = 1, b = 1L, c = "1"))
#> not int
#> int 1
#> not int
iter()
If you want pairs of name and value, you can use iter(). This is basically a
std::iter::Zip of the two iterators explained above.
/// @export
#[savvy]
fn print_list(x: ListSexp) -> savvy::Result<()> {
for (k, v) in x.iter() {
// ...snip...
}
Ok(())
}
Put values to a list
new()
OwnedListSexp’s new() is different than other types; the second argument
(named) indicates whether the list is named or unnamed. If false, the list
doesn’t have name and all operations on name like set_name() are simply
ignored.
set_name()
set_name() simply sets a name at the specified position.
/// @export
#[savvy]
fn list_with_no_values() -> savvy::Result<savvy::Sexp> {
let mut out = OwnedListSexp::new(2, true)?;
out.set_name(0, "foo")?;
out.set_name(1, "bar")?;
out.into()
}
list_with_no_values()
#> $foo
#> NULL
#>
#> $bar
#> NULL
#>
set_value()
set_value() sets a value at the specified position. “Value” is an arbitrary
type that implmenents Into<Sexp> trait. Since all {type}Sexp types
implements it, you can simply pass it like below.
/// @export
#[savvy]
fn list_with_no_names() -> savvy::Result<savvy::Sexp> {
let mut out = OwnedListSexp::new(2, false)?;
let mut e1 = OwnedIntegerSexp::new(1)?;
e1[0] = 100;
let mut e2 = OwnedStringSexp::new(1)?;
e2.set_elt(0, "cool")?;
out.set_value(0, e1)?;
out.set_value(1, e2)?;
out.into()
}
list_with_no_names()
#> [[1]]
#> [1] 100
#>
#> [[2]]
#> [1] "cool"
#>
set_name_and_value()
set_name_and_value() is simply set_name() + set_value(). Probably this is
what you need in most of the cases.
/// @export
#[savvy]
fn list_with_both() -> savvy::Result<savvy::Sexp> {
let mut out = OwnedListSexp::new(2, true)?;
let mut e1 = OwnedIntegerSexp::new(1)?;
e1[0] = 100;
let mut e2 = OwnedStringSexp::new(1)?;
e2.set_elt(0, "cool")?;
out.set_name_and_value(0, "foo", e1)?;
out.set_name_and_value(1, "bar", e2)?;
out.into()
}
list_with_both()
#> $foo
#> [1] 100
#>
#> $bar
#> [1] "cool"
#>
Struct
Basic usage
You can use #[savvy] macro on a struct to convert it to an R object. More
precisely, this macro adds implementations of TryFrom between Sexp and the
struct so you can specify the type as the function input and output.
/// @export
#[savvy]
struct Person {
pub name: String,
}
The most handy form is to implement methods and associated functions for the
type. You can add #[savvy] before the impl block to make it available on R
sessions.
#[savvy]
impl Person {
fn new() -> Self {
Self {
name: "".to_string(),
}
}
fn set_name(&mut self, name: &str) -> savvy::Result<()> {
self.name = name.to_string();
Ok(())
}
fn name(&self) -> savvy::Result<savvy::Sexp> {
let mut out = OwnedStringSexp::new(1)?;
out.set_elt(0, &self.name)?;
out.into()
}
fn say_hello() -> savvy::Result<savvy::Sexp> {
"Hello!".try_into()
}
}
If we focus on the arguments, there are two types of functions here:
- method: the first argument is
self1 (set_name()andname()) - associated function: no
selfargument (new()andsay_hello())
On an R session, associated functions are available as the element of the same
name of R object as the Rust type (in this case, Person).
p <- Person$new()
Person$say_hello()
#> [1] "Hello"
Among these two associated functions, new() is a constructor which returns
Self. This creates an instance of the struct.
The instance has the methods. You can call them like below.
# create an instance
p <- Person$new()
# call methods
p$set_name("たかし")
p$name()
#> [1] "たかし"
The instance has two S3 claesses that you can use for implementing S3 methods for the class;
PKG_NAME::STRUCT_NAME and STRUCT_NAME.
For example, in this case, if the package name is savvyExamples, p is a savvyExamples::Person and a Person.
class(p)
#> [1] "savvyExamples::Person" "Person" "savvy_savvyExamples__sealed"
The short one is handy to use. For example, you can implement a print.Person method.
print.Person <- function(x, ...) print(x$name())
registerS3method("print", "Person", print.Person)
p
#> たかし
In order to avoid name collisions between packages (e.g. since Person is very general name, other packages might use the same S3 class name), you can also use the longer one instead.
Struct output
The above example uses -> Self as the return type of the associated function,
but it’s not the only specification. You can wrap it with savvy::Result<Self>.
#[savvy]
impl Person {
fn new_fallible() -> savvy::Result<Self> {
let x = Self {
name: "".to_string(),
};
Ok(x)
}
}
More generally, you can specify an arbitrary struct marked with #[savvy] as
the return type. For example, you can create an instance of the struct outside
of impl,
/// @export
#[savvy]
fn create_person() -> savvy::Result<Person> {
let x = Self {
name: "".to_string(),
};
Ok(x)
}
and you can generate another type of instance from an instance.
/// @export
#[savvy]
struct UpperPerson {
pub name: String,
}
#[savvy]
impl Person {
fn reborn_as_upper_person(&self) -> savvy::Result<UpperPerson> {
let x = UpperPerson {
name: self.name.to_uppercase(),
};
Ok(x)
}
}
Struct input
You can also use the struct as the argument of a #[savvy]-ed function. Note
that, in most of the cases, you should specify &T or &mut T, not T.
/// @export
#[savvy]
fn get_name_external(x: &Person) -> savvy::Result<savvy::Sexp> {
x.name()
}
get_name_external(x)
#> [1] "たかし"
&T vs T
If you are familiar with Rust, you should know the difference. T moves the
ownership while &T is just borrowing. But, what does this matter savvy? What
happens in actual when you specify T in a #[savvy] function?
Say, you mistyped &Person above as Person like this:
/// @export
#[savvy]
fn get_name_external2(x: Person) -> savvy::Result<savvy::Sexp> {
x.name()
}
This function works the same as the previous one. The result of the first call is the same. Yay!
get_name_external2(p)
#> [1] "たかし"
Then, what’s wrong? You’ll find it when you call the function on the same object second time; it doesn’t work anymore.
get_name_external2(p)
#> Error: This external pointer is already consumed or deleted
This is because the Person object is already moved. The R variable p doesn’t
hold the ownership anymore. So, you should almost always specify &T (or &mut T),
not T.
The same is true for a method. Use &self and &mut self instead of self
unless you want such a method like this!
#[savvy]
impl Person {
fn invalidate(self) -> savvy::Result<()> {
r_println!("This instance is invalidated!");
Ok(())
}
}
When is T useful?
You might wonder why savvy allows this specification at all. Are there any cases when this is useful?
The answer is yes. The advantage of moving the ownership is that you can avoid
copying. For example, consider there’s a type HeavyData, which contains huge
size of data, and HeavyDataBundle which bundles two HeavyDatas.
/// @export
#[savvy]
#[derive(Clone)]
struct HeavyData(Vec<i32>);
/// @export
#[savvy]
struct HeavyDataBundle {
data1: HeavyData,
data2: HeavyData,
}
#[savvy]
impl HeavyData {
// ...snip...
}
HeavyDataBundle requires the ownership of the DataBundles. So, if the input
is &, you need to clone() the data, which can be costly.
/// @export
#[savvy]
impl HeavyDataBundle {
fn new(
data1: &HeavyData,
data2: &HeavyData,
) -> Self {
Self {
data1: data1.clone(),
data2: data2.clone(),
}
}
}
In this case, you can move the ownership to avoid copying.
/// @export
#[savvy]
impl HeavyDataBundle {
fn new(
data1: HeavyData,
data2: HeavyData,
) -> Self {
Self { data1, data2 }
}
}
Of course, this is an expert-only usage and is rarely needed. Again, you should
almost always use &T or &mut T instead of T. If you are really sure it
doesn’t work well, you can use T.
Lifetime
#[savvy] macro doesn’t support a struct with lifetimes. This is because
crossing the boundary of FFI means losing the track of the lifetimes.
For example, the struct below contains a reference to a variable of usize.
However, once an instance of Foo is passed to R’s side, Rust cannot know
whether the variable is still alive when Foo is passed back to Rust’s side.
struct Foo<'a>(&'a usize)
Then, what should we do to deal with such structs? I’m yet to find the best practices, but you might be able to
- use
'staticlifetime (i.e.struct Foo(&'static usize)) probably by referencing a global variable - instead of passing the struct itself to R, store the struct in a global
std::sync::OnceLock<HashMap>and pass the key
External pointer?
Under the hood, the Person struct is stored in EXTPTRSXP. But, you don’t
need to care about how to deal with EXTPTRSXP. This is because it’s stored in
a closure environment on creation and never exposed to the user. As it’s
guaranteed on R’s side that self is always a EXTPTRSXP of Person, Rust
code just restore a Person instance from the EXTPTRSXP without any checks.
.savvy_wrap_Person <- function(ptr) {
e <- new.env(parent = emptyenv())
e$.ptr <- ptr
e$set_name <- Person_set_name(ptr)
e$name <- Person_name(ptr)
class(e) <- "Person"
e
}
Person <- new.env(parent = emptyenv())
Person$new <- function() {
.savvy_wrap_Person(.Call(Person_new__impl))
}
Person$say_hello <- function() {
.Call(Person_say_hello__impl)
}
Person_set_name <- function(self) {
function(name) {
invisible(.Call(Person_set_name__impl, self, name))
}
}
Person_name <- function(self) {
function() {
.Call(Person_name__impl, self)
}
}
It’s important to mention that savvy only wraps the EXTPTRSXP in a closure
environment when the type is used directly as the returning type of the function.
If the user wants to return Person inside a List, for example, the external
pointer will be directly exposed to the user and it will be the user’s responsibility
to deal with it.
#[savvy]
struct Person {}
// This case savvy handles nicely.
/// @export
#[savvy]
impl Person {
fn new() -> savvy::Result<Person> {
Ok(Person {})
}
}
// In this case, the user is handled an external pointer.
/// @export
#[savvy]
fn create_list() -> savvy::Result<Sexp> {
let mut list = OwnedListSexp::new(1, false)?;
let person = Person {};
list.set_value(0, Sexp::try_from(person)?)?;
list.into()
}
in R:
> person = Person$new()
> print(person)
<environment: 0x0000027cf9d46a20>
attr(,"class")
[1] "Person"
> l = create_list()
> print(l)
[[1]]
<pointer: 0x0000000000000001>
Traps about protection
This is a bit advanced topic. It’s okay to have a struct to contain arbitrary
things, however, if you want to pass an SEXP from an R session, it’s your
responsibility to take care of the protection on it.
The SEXP passed from outside doesn’t need an additional protection at the time
of the function call because it belongs to some environment on R session, which
means it’s not GC-ed accidentally. However, after the function call, it’s
possible the SEXP loses its link to any other R objects. To prevent the
tragedy (i.e., R session crash), you should create a owned version and copy the
values into it because savvy takes care of the protection on it. So, in short,
you should never define such a struct like this:
struct Foo {
a: IntegerSexp
}
Instead, you should write
struct Foo {
a: OwnedIntegerSexp
}
-
You should almost always use
&selfor&mut self, notself, except when you are an expert and your intention is really to comsume it. Let’s discuss later. ↩
Enum
Savvy supports fieldless enum to express the possible options for a
parameter. For example, if you define such an enum with #[savvy],
/// @export
#[savvy]
enum LineType {
Solid,
Dashed,
Dotted,
}
it will be available on R’s side as this.
LineType$Solid
LineType$Dashed
LineType$Dotted
You can use the enum type as the argument of such a function like this
/// @export
#[savvy]
fn plot_line(x: IntegerSexp, y: IntegerSexp, line_type: &LineType) -> savvy::Result<()> {
match line_type {
LineType::Solid => {
...
},
LineType::Dashed => {
...
},
LineType::Dotted => {
...
},
}
}
so that the users can use it instead of specifying it by an integer or a character, which might be mistyped.
plot_line(x, y, LineType$Solid)
Of course, you can archive the same thing with i32 or &str as the input and
match the value. The difference is that enum is typo-proof. But, you might feel
it more handy to use a plain integer or character.
/// @export
#[savvy]
fn plot_line(x: IntegerSexp, y: IntegerSexp, line_type: &str) -> savvy::Result<()> {
match line_type {
"solid" => {
...
},
"dashed" => {
...
},
"dotted" => {
...
},
_ => {
return Err(savvy_err!("Unsupported line type!"));
}
}
}
Limitation
As noted above, savvy supports only fieldless enum for simplicity. If you want to use an enum that contains some value, please wrap it with struct.
// You don't need to mark this with #[savvy]
enum AnimalEnum {
Dog(String, f64),
Cat { name: String, weight: f64 },
}
/// @export
#[savvy]
struct Animal(AnimalEnum);
Also, savvy currently doesn’t support discriminants. For example, this one won’t compile.
/// @export
#[savvy]
enum HttpStatus {
Ok = 200,
NotFound = 404,
}
Error handling
To propagate your errors to the R session, you can return a savvy::Error.
savvy_err!() macro is a shortcut of savvy::Error::new(format!(...)) to
create a new error.
use savvy::savvy_err;
#[savvy]
fn raise_error() -> savvy::Result<savvy::Sexp> {
Err(savvy_err!("This is my custom error"))
}
raise_error()
#> Error: This is my custom error
Like anyhow, you can use ? to easily propagate any error that implements the
std::error::Error trait.
#[savvy]
fn no_such_file() -> savvy::Result<()> {
let _ = std::fs::read_to_string("no_such_file")?;
Ok(())
}
Custom error
If you want to implement your own error type and the conversion to
savvy::Error, it would conflict with the conversion of From<dyn std::error::Error>.
To avoid an compile error, please sepcify use-custom-error feature to opt-out
the conversion.
savvy = { version = "...", features = ["use-custom-error"] }
Show a warning
To show a warning, you can use r_warn().
savvy::io::r_warn("foo")?;
Note that, a warning can raise error when options(warn = 2), so you should not
ignore the error from r_warn(). The error should be propagated to the R
session.
Dealing with panic!
First of all, don’t use panic!
If you are familiar with extendr, you might get used to use panic! casually.
But, in the savvy framework, panic! crashes your R session. So, please don’t
use panic! directly. Also, please avoid operations that can cause panic!
(e.g., unrwap()) when you are unsure.
This is because, in Rust, the meaning of panic! is an unrecoverable
error. In theory, it’s a sign that something impossible happens and
there’s no hope of recovery so there should be no way but to terminate the
entire session. Savvy just respects what is supposed to happen.
But, if the session terminates immediately, it’s hard to investigate the cause. What can I do?
Use debug build
If DEBUG envvar is set to true on building (i.e., devtools::load_all()),
savvy catches panic! and shows the backtrace instead of crashing the R
session.
For example, if you write this Rust function and load it by devtools::load_all(),
#[savvy]
fn must_panic() -> savvy::Result<()> {
let x = &[1];
let _ = x[1]; // Rust's index starts from 0!
Ok(())
}
you’ll see such an error like this with a backtrace instead of the RStudio bomb icon. You can check the line of the file suggested in the error message to guess what was happening.
must_panic()
#> panic occured!
#>
#> Original message:
#> panicked at src\error_handling.rs:33:13:
#> index out of bounds: the len is 1 but the index is 1
#>
#> Backtrace:
#> ...
#> 18: std::panic::catch_unwind
#> at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04\library\std\src/panic.rs:142:14
#> 19: simple_savvy::error_handling::savvy_must_panic_inner
#> at .\src\rust\src\error_handling.rs:30:1
#> 20: must_panic
#> at .\src\rust\src\error_handling.rs:30:1
#> 21: must_panic__impl
#> at .\src\init.c:291:16
#> ...
#>
#> note: Run with `RUST_BACKTRACE=1` for a full backtrace.
#>
#>
#> Error: panic happened
Set panic="unwind"
As described above, panic! is an unrecoverable error. It should not be
recovered on the release build in principle.
That said, in some cases, panic! happens from the code out of your control.
For example, if it is thrown by some of the dependency crates, there’s litte you
can do. You should report the author about the problem, but it’s not always the
behavior is fixed immediately and the fixed version is published. Also, keep in
mind that depending on what originates the error, some authors can deliberately
prefer to use panic! instead of Result.
Note that panic! also happens in rust std library in situations such as division
by zero or out-of-bounds error when indexing a Vec.
In such cases, you can change the following setting included in the template
Cargo.toml generated by savvy-cli init. Set this to panic = "unwind"
to gracefully convert a panic into an R error just like the debug build.
Note that the backtrace is not available on the release build because
there’s no debug info.
[profile.release]
# ...snip...
panic = "unwind"
Handling Attributes
You sometimes need to deal with attributes like names and class. Savvy
provides the following methods for getting and setting the value of the
attribute.
| Getter method | Setter method | Type | |
|---|---|---|---|
names | get_names() | set_names() | Vec<&str> |
class | get_class() | set_class() | Vec<&str> |
dim | get_dim() | set_dim() | &[i32] |
| arbitrary | get_attrib() | set_attrib() | Sexp |
The getter methods return Option<T> because the object doesn’t always have the
attribute. You can match the result like this:
/// @export
#[savvy]
fn get_class_int(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
match x.get_class() {
Some(class) => class.try_into(),
None => ().try_into(),
}
}
The setter methods are available only for owned SEXPs. The return type is
savvy::Result<()> becuase the conversion from a Rust type to SEXP is fallible.
/// @export
#[savvy]
fn set_class_int() -> savvy::Result<savvy::Sexp> {
let mut x = OwnedIntegerSexp::new(1)?;
x.set_class(&["foo", "bar"])?;
x.into()
}
For attributes other than names, class, dim, you can use get_attrib()
and set_attrib(). Since an attribute can store arbitrary values, the type is
Sexp. In order to extract the underlying value, you can use .into_typed()
and match.
/// @export
#[savvy]
fn print_attr_values_if_int(attr: &str, value: savvy::Sexp) -> savvy::Result<()> {
let attr_value = value.get_attrib(attr)?;
match attr_value.into_typed() {
TypedSexp::Integer(i) => r_println!("int {:?}", i.as_slice()]),
_ => r_println("not int")
}
Ok(())
}
In order to set values, you can use .into() to convert from the owned SEXP to
a savvy::Sexp.
/// @export
#[savvy]
fn set_attr_int(attr: &str) -> savvy::Result<savvy::Sexp> {
let s: &[i32] = &[1, 2, 3];
let attr_value: OwnedIntegerSexp = s.try_into()?;
let mut out = OwnedIntegerSexp::new(1)?;
out.set_attrib(attr, attr_value.into())?;
out.into()
}
Handling Data Frames
A data.frame is a list. You should simply handle it as a list in Rust code, and
all data.frame-related operations should be done in R code.
For example, if you want to return the result as a data.frame, the Rust
function should return a list, and wrapped by an R function that converts the
list into a data.frame. tibble::as_tibble() should be the right choice for
this purpose. Or, if you prefer lightweight dependency, you can use
vctrs::new_data_frame(), or simply as.data.frame().
/// @export
#[savvy]
fn foo_impl() -> savvy::Result<savvy::Sexp> {
// create a named list
let mut out = savvy::OwnedListSexp::new(2, true)?;
let x: Vec<f64> = some_function();
let y: Vec<f64> = another_function();
out.set_name_and_value(0, "x", OwnedRealSexp::try_from_slice(x)?)?;
out.set_name_and_value(1, "y", OwnedRealSexp::try_from_slice(y)?)?;
out.into()
}
foo <- function() {
result <- foo_impl()
tibble::as_tibble(result)
}
Handling Factors
A factor is internally an integer vector with the levels attribute. You can
handle this on Rust’s side, but the recommended way is to write a wrapper R
function to convert the factor vector to a character vector.
Say there’s a Rust function that takes a character vector as its argument.
/// @export
#[extendr]
fn foo_impl(x: StringSexp) -> savvy::Result<()> {
...
}
Then, you can write a function like below to convert the input to a character
vector. If you want better validation, you can use vctrs::vec_cast() instead.
foo <- function(x) {
x <- as.character(x)
foo_impl(x)
}
If you need the information of the order of the levels, you should pass it as an another argument.
/// @export
#[extendr]
fn foo_impl2(x: StringSexp, levels: StringSexp) -> savvy::Result<()> {
...
}
foo2 <- function(x) {
levels <- levels(x)
x <- as.character(x)
foo_impl2(x, levels)
}
Handling Matrices And Arrays
Savvy doesn’t provide a convenient way of converting matrices and arrays. You have to do it by yourself. But, don’t worry, it’s probably not very difficult thanks to the fact that major Rust matrix crates are column-majo, or at least support column-major.
- ndarray: row-major is default (probably for compatibility with Python ndarray?), but it offers column-major as well
- nalgebra: column-major
- glam (and probably all other rust-gamedev crates): column-major, probably because GLSL is column-major
The example code can be found at https://github.com/yutannihilation/savvy-matrix-examples/tree/master/src/rust/src.
R to Rust
ndarray
By default, ndarray is row-major, but you can specify column-major by
f().
So, all you have to do is simply to extract the dim and pass it to ndarray.
use ndarray::Array;
use ndarray::ShapeBuilder;
use savvy::{r_println, savvy, RealSexp};
/// @export
#[savvy]
fn ndarray_input(x: RealSexp) -> savvy::Result<()> {
// In R, dim is i32, so you need to convert it to usize first.
let dim_i32 = x.get_dim().ok_or("no dimension found")?;
let dim: Vec<usize> = dim_i32.iter().map(|i| *i as usize).collect();
// f() changes the order from row-major (C-style convention) to column-major (Fortran-style convention).
let a = Array::from_shape_vec(dim.f(), x.to_vec());
r_println!("{a:?}");
Ok(())
}
nalgebra
nalgebra is column-major, so you can simply pass the dim.
use nalgebra::DMatrix;
use savvy::{r_println, savvy, RealSexp};
/// @export
#[savvy]
fn nalgebra_input(x: RealSexp) -> savvy::Result<()> {
let dim = x.get_dim().ok_or("no dimension found")?;
if dim.len() != 2 {
return Err(savvy_err!("Input must be matrix!"));
}
let m = DMatrix::from_vec(dim[0] as _, dim[1] as _, x.to_vec());
r_println!("{m:?}");
Ok(())
}
glam
glam is also column-major. In the case with glam, probably the dimension is fixed (e.g. 3 x 3 in the following code). You can check the dimension is as expected before passing it to the constructor of a matrix.
use glam::{dmat3, dvec3, DMat3};
use savvy::{r_println, savvy, OwnedRealSexp, RealSexp};
/// @export
#[savvy]
fn glam_input(x: RealSexp) -> savvy::Result<()> {
let dim = x.get_dim().ok_or("no dimension found")?;
if dim != [3, 3] {
return Err(savvy_err!("Input must be 3x3 matrix!"));
}
// As we already check the dimension, this must not fail
let x_array: &[f64; 9] = x.as_slice().try_into().unwrap();
let m = DMat3::from_cols_array(x_array);
r_println!("{m:?}");
Ok(())
}
Rust to R
The matrix libraries typically provides method to get the dimension and the
slice of underlying memory. You set the dimension by set_dim().
/// @export
#[savvy]
fn nalgebra_output() -> savvy::Result<savvy::Sexp> {
let m = DMatrix::from_vec(2, 3, vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0]);
let shape = m.shape();
let dim = &[shape.0, shape.1];
let mut out = OwnedRealSexp::try_from(m.as_slice())?;
out.set_dim(dim)?;
out.into()
}
Testing
Write integration tests on R’s side
The most recommended way is to write tests on R’s side just as you do with an ordinary R package. You can write tests on Rust’s side as described later, but, ultimately, the R functions are the user interface, so you should test the behavior of actual R functions.
Write Rust tests
The sad news is that cargo test doesn’t work with savvy. This is because savvy
always requires a real R session to work. But, don’t worry, savvy-cli test is
the tool for this. savvy-cli test does
- extract the Rust code of the test modules and the doc tests
- create a temporary R package1 and inject the extracted Rust code
- build and run the test functions via the R package
Note that, this takes the path to the root of a crate, not that of an R package.
savvy-cli test path/to/your_crate
Limitations
savvy-cli test tries to mimic what cargo test does as much as possible, but
there’s some limitations.
First, in order to run tests, you need to add "lib" to the crate-type. This
is because your crate is used as a Rust library when run by savvy-cli test.
[lib]
crate-type = ["staticlib", "lib"]
^^^^^
Second, if you want to test a function or a struct, it must be public. For the
ones marked with #[savvy] are automatically made public, but, if you want to
test other functions, you need to add pub to it by yourself.
pub fn foo() -> savvy::Result<()> {
^^^
Test module
You can write tests under a module marked with #[cfg(feature = "savvy-test")] instead of
#[cfg(test)]. A #[test] function needs to have the return value of
savvy::Result<()>, which is the same convention as #[savvy].
To check if an SEXP contains the expected data, assert_eq_r_code is convenient.
#[cfg(feature = "savvy-test")]
mod test {
use savvy::{OwnedIntegerSexp, assert_eq_r_code};
#[test]
fn test_integer() -> savvy::Result<()> {
let mut x = OwnedIntegerSexp::new(3)?;
assert_eq_r_code(x, "c(0L, 0L, 0L)");
Ok(())
}
}
Note that savvy-test is just a marker for savvy-cli, not a real feature. So,
in theory, you don’t really need this. However, in reality, you probably want to
add it to the [features] section of Cargo.toml because otherwise Cargo warns.
[features]
savvy-test = []
To test a function that takes user-supplied SEXPs like IntegerSexp, you can
use .as_read_only() to convert from the corresponding Owned- type. For
example, if you have a function your_fn() that accepts IntegerSexp, you can
construct an OwnedIntegerSexp and convert it to IntegerSexp before passing
it to your_fn().
#[savvy]
pub fn your_fn(x: IntegerSexp) -> savvy::Result<()> {
// ...snip...
}
#[cfg(feature = "savvy-test")]
mod test {
use savvy::OwnedIntegerSexp;
#[test]
fn test_integer() -> savvy::Result<()> {
let x = savvy::OwnedIntegerSexp::new(3)?;
let x_ro = x.as_read_only();
let result = super::your_fn(x_ro);
assert_eq_r_code(result, "...");
Ok(())
}
}
Doc tests
You can also write doc tests. savvy-cli test wraps it with a function with the
return value of savvy::Result<()>, you can use ? to extract the Result
value in the code.
/// ```
/// let x = savvy::OwnedIntegerSexp::new(3)?;
/// assert_eq!(x.as_slice(), &[0, 0, 0]);
/// ```
Features and dependencies
If you need to specify some features for testing, use --features argument.
savvy-cli test --features foo path/to/your_crate
For dependencies, savvy-cli test picks all dependencies in [dependencies]
and [dev-dependencies]. If you need some additional crate for the test code,
you can just use [dev-dependencies] section of the Cargo.toml just as you do
when you do cargo test.
Reminder: You can use cargo test
While #[savvy] requires a real session, you can utilize cargo test by
separating the actual logic to a function that doesn’t rely on savvy. For
example, suppose you have the following function times_two_int() that doubles
the input numbers.
#[savvy]
fn times_two_int(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
let mut out = OwnedIntegerSexp::new(x.len())?;
for (i, e) in x.iter().enumerate() {
if e.is_na() {
out.set_na(i)?;
} else {
out[i] = e * 2;
}
}
out.into()
}
In this case, you can rewrite the code to the following so that you can test
times_two_int_impl() with cargo test.
#[savvy]
fn times_two_int(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
let result: Vec<i32> = times_two_int_impl(x.as_slice());
result.try_into()
}
fn times_two_int_impl(x: &[i32]) -> Vec<i32> {
x.iter()
.map(|x| if x.is_na() { *x } else { *x * 2 })
.collect::<Vec<i32>>()
}
But, as you might notice, this implementation is a bit inefficient that it
allocates a Vec<i32> just to store the temporary result. Like this, separating
a function might be a bit tricky and it might not be really worth in some cases.
(In this case, probably the function can return an iterator).
-
The R package is created in the OS’s cache dir by default, but you can specify the location by
--cache-dir. ↩
Advanced Topics
“External” external pointers
As described in Struct section, a struct marked with #[savvy] is
transparently converted from and into an SEXP of an external pointer. So,
usually, you don’t need to think about external pointers.
However, in some cases, you might need to deal with an external pointer created
by another R package. For example, you might want to access an Apache Arrow data
created by nanoarrow R package. In such caes, you can use unsafe methods
.cast_unchecked() or .cast_mut_unchecked().
let foo: &Foo = unsafe { &*ext_ptr_sexp.cast_unchecked::<Foo>() };
Initialization Routine
#[savvy_init] is a special version of #[savvy]. The function marked with
this macro is called when the package is loaded, which is what Writing R
Extension calls “initialization routine”. The function must take *mut DllInfo as its argument.
For example, if you write such a Rust function like this,
use savvy::ffi::DllInfo;
#[savvy_init]
fn init_foo(_dll_info: *mut DllInfo) -> savvy::Result<()> {
r_eprintln!("Initialized!");
Ok(())
}
You’ll see the following message on your R session when you load the package.
library(yourPackage)
#> Initialized!
Under the hood, savvy-cli update . inserts the following line in a C function
R_init_*(), which is called when the DLL is loaded.
void R_init_yourPackage(DllInfo *dll) {
R_registerRoutines(dll, NULL, CallEntries, NULL, NULL);
R_useDynamicSymbols(dll, FALSE);
savvy_init_foo__impl(dll); // added!
}
This is useful for initializing resources. For example, you can initialize a global variable.
use std::sync::OnceLock;
static GLOBAL_FOO: OnceLock<Foo> = OnceLock::new();
#[savvy_init]
fn init_global_foo(dll_info: *mut DllInfo) -> savvy::Result<()> {
GLOBAL_FOO.get_or_init(|| Foo::new());
Ok(())
}
You can also register an ALTREP class using this mechanism see the next page.
ALTREP
You can implement an ALTREP class using savvy.
Disclaimer
-
This feature is very experimental, so it’s possible that the interface will be significantly changed or even removed in future.
-
The current API might be a bit oversimplified. For example, you cannot stop the vector is materialized (i.e., allocated as a normal
SEXPand put into thedata2slot of the ALTREP object).
Using ALTREP
Savvy currently provides only the following traits for ALTREP. The other ALTREPs
like ALTCOMPLEX are not yet supported.
For example, consider the following struct that simply wraps a Vec<i32>.
struct MyAltInt(Vec<i32>);
impl MyAltInt {
fn new(x: Vec<i32>) -> Self {
Self(x)
}
}
First, you need to implement IntoExtPtrSexp trait for the struct, which is
required by Alt* traits. This trait is what works under the hood of #[savvy]
when it’s placed on a struct. You can just rely on the default implementation.
impl savvy::IntoExtPtrSexp for MyAltInt {}
Second, you need to implement one of the Alt* traits. More specifically, the
trait has 4 members you need to implement:
CLASS_NAMEis the name of the class. This is used for distinguishing the class, so please use a unique string.PACKAGE_NAMEis the name of your package. This probably doesn’t matter much.length()returns the length of the object.elt(i)returns thei-th element of the object. An important note is that, usually R handles the out-of-bound check and returnsNAif it exceeds the length. So, you don’t need to check the length here.
In this case, the actual data is i32, so let’s implement AltInteger.
impl AltInteger for MyAltInt {
const CLASS_NAME: &'static str = "MyAltInt";
const PACKAGE_NAME: &'static str = "TestPackage";
fn length(&mut self) -> usize {
self.0.len()
}
fn elt(&mut self, i: usize) -> i32 {
self.0[i]
}
}
Optionally, you can implement these methods:
sum(): This is used when R functionsum()is called.min(),max(): This is used when R functionmin()ormax()is called. Note that, you need to handle empty cases (e.g.min(integer(0L))andmin(NA, na.rm = TRUE)), which is supposed to returnInfand-Inf.copy_date(dst, offset): This copies the range of values starting fromoffsetintodst, a&mut [T]. The default implementation does just callelt()repeatedly, but there might be more efficient implementation (e.g.copy_from_slice()).inspect(): This is called when.Internal(inspect(x)). You might want to print some information useful for debugging.
Next step is a bit advanced. You need to create a definition of ALTREP class
from the above trait. This is done by the corresponding register_alt*_class()
function (for example, register_altinteger_class for an integer class). This
function generates an ALTREP class and registers it to an R session.
The registration needs to happen when an R session loads the DLL of your crate.
As explained in the section of initialization routine,
you can define a #[savvy_init] function, which will be called in the
initialization routine.
#[savvy_init]
fn init_altrep_class(dll_info: *mut DllInfo) -> savvy::Result<()> {
register_altinteger_class::<MyAltInt>(dll_info)?;
Ok(())
}
Finally, you’ll probably want to implement a user-visible function to create the
instance of the ALTREP class. You can convert the struct into an ALTREP by
.into_altrep() method, which is provided by the Alt* trait. For example, you
can create the following function that returns the length 3 of the ALTREP vector
to the R session.
#[savvy]
fn altint() -> savvy::Result<savvy::Sexp> {
let v = MyAltInt::new(vec![1, 2, 3]);
v.into_altrep()
}
This function can be used like this:
x <- altint()
x
#> [1] 1 2 3
This looks like a normal integer vector, but this is definitely an ALTREP.
.Internal(inspect(x))
#> @0x0000021684acac40 13 INTSXP g0c0 [REF(65535)] (MyAltInt)
Going deeper…
Once the ALTREP object leaves your hand, it looks like a normal vector. But, if
you really wish, you can convert it back to the original object. Alt* trait
provides 3 methods for this conversion:
try_from_altrep_ref()for&Ttry_from_altrep_mut()for&mut Ttry_from_altrep()forT
For example, you can print the underlying data using Debug trait.
#[savvy]
fn print_altint(x: IntegerSexp) -> savvy::Result<()> {
if let Ok(x) = MyAltInt::try_from_altrep_ref(&x) {
r_println!("{x:?}");
return Ok(());
};
Err(savvy_err!("Not a known ALTREP"))
}
print_altint(x)
#> MyAltInt([1, 2, 3])
But, before getting excited, you need to be aware about the tricky nature of R.
First, your ALTREP object can be easily lost in the sea of copy-on-modify. For example, if the object is get modified, it’s no longer an ALTREP object.
x <- altint()
x[1L] <- 3L
print_altint(x)
#> Error: Not a known ALTREP
Second, this is much trickier. As there is try_from_altrep_mut(), you can
modify the underlying data. For example, you can mutiply each number by two.
#[savvy]
fn tweak_altint(mut x: IntegerSexp) -> savvy::Result<()> {
if let Ok(x) = MyAltInt::try_from_altrep_mut(&mut x, false) {
for i in x.0.iter_mut() {
*i *= 2;
}
return Ok(());
};
Err(savvy_err!("Not a known ALTREP"))
}
Let’s confirm this function modifies the underlying data as expected.
x <- altint()
c(x) # This is for a side effect! Let's discuss later.
#> [1] 1 2 3
tweak_altint(x)
print_altint(x)
#> MyAltInt([2, 4, 6])
So far, so good. But, if you print x, you’ll find the values are diverged
between Rust and R… Why can this happen?
x
#> [1] 1 2 3
This is because savvy’s implementation caches the SEXP object converted from the
underlying data. It’s can be costly if it creates a fresh SEXP object everytime
the R session requires, so the result is cached at the first time it’s created
(in the above case, it’s c(x)). As far as I know, most of the ALTREP
implementation adopt this caching strategy (more specifically, an ALTREP object
has two slots, data1 and data2, and data2 is usually used for the cache).
But, don’t worry. try_from_altrep_mut() has a second argument,
invalidate_cache. You can set this to true to clear the cache.
#[savvy]
fn tweak_altint2(mut x: IntegerSexp) -> savvy::Result<()> {
if let Ok(x) = MyAltInt::try_from_altrep_mut(&mut x, true) {
// ^^^^^
// changed!
tweak_altint2(x)
print_altint(x)
#> MyAltInt([2, 4, 6])
x
#> [1] 2, 4, 6
This API is still experimental and I’m yet to find some nicer design. Feedback is really appreciated!
Linkage
Savvy compiles the Rust code into a static library and then use it to generate a DLL for the R package. There’s one tricky thing about static library. The Rust’s official document about linkage says
Note that any dynamic dependencies that the static library may have (such as dependencies on system libraries, or dependencies on Rust libraries that are compiled as dynamic libraries) will have to be specified manually when linking that static library from somewhere.
What does this mean? If some of the dependency crate needs linking to a native
library, the necessary compiler flags are added by cargo. But, after creating
the static library, cargo’s turn is over. It’s you who have to tell the linker
the necessary flags because there’s no automatic mechanism.
If some of the flags are missing, you’ll see a “symbol not found” error. For example, this is what I got on macOS. Some dependency of my package uses the objc2 crate, and it needs to be linked against Apple’s Objective-C frameworks.
unable to load shared object '.../foo.so':
dlopen(../foo.so, 0x0006): symbol not found in flat namespace '_NSAppKitVersionNumber'
Execution halted
So, how can we know the necessary flags? The official document provides a pro-tip!
The
--print=native-static-libsflag may help with this.
You can add this option to src/Makevars.in and src/Makevars.win.in via
RUSTFLAGS envvar. Please edit this line.
# Add flags if necessary
- RUSTFLAGS =
+ RUSTFLAGS = --print=native-static-libs
Then, you’ll find this note in the installation log.
Compiling ahash v0.8.11
Compiling serde v1.0.210
Compiling zerocopy v0.7.35
...snip...
note: Link against the following native artifacts when linking against this static library. The order and any duplication can be significant on some platforms.
note: native-static-libs: -framework CoreText -framework CoreGraphics -framework CoreFoundation -framework Foundation -lobjc -liconv -lSystem -lc -lm
Finished `dev` profile [unoptimized + debuginfo] target(s) in 19.17s
gcc -shared -L/usr/lib64/R/lib -Wl,-O1 -Wl,--sort-common -Wl,...
installing to /tmp/RtmpvQv8Ur/devtools_install_...
** checking absolute paths in shared objects and dynamic libraries
You can copy these flags to cargo build. Please be aware that this differs on
platforms, so you probably need to run this command on CI, not on your local.
Also, since Linux and macOS requires different options, you need to tweak it in
the configure script.
For example, here’s my setup on the vellogd package.
./configure:
if [ "$(uname)" = "Darwin" ]; then
FEATURES=""
# result of --print=native-static-libs
ADDITIONAL_PKG_LIBS="-framework CoreText -framework CoreGraphics -framework CoreFoundation -framework Foundation -lobjc -liconv -lSystem -lc -lm"
else
FEATURES="--features use_winit"
fi
src/Makevars.in:
PKG_LIBS = -L$(LIBDIR) -lvellogd @ADDITIONAL_PKG_LIBS@
Comparison with extendr
What the hell is this?? Why do you need another framework when there’s extendr?
extendr is great and ready to use! However, I needed to create a new, simple framework to experiment with. The main goal of savvy is to provide a simpler option other than extendr, not to be a complete alternative to extendr.
Pros and cons compared to extendr
Pros:
(Now that extendr has been improved so much, I think savvy lost all the obvious pros. Kudos to the extendr developers!)
Cos:
- savvy prefers explicitness over ergonomics
- savvy provides limited amount of APIs and might not fit for complex usages