Introduction

savvy is a simple R extension interface using Rust, like the extendr framework. The name “savvy” comes from the Japanese word “錆” (pronounced as sàbí), which means “Rust”.

With savvy, you can automatically generate R functions from Rust code. This is an example of what a savvy-powered function would look like:

Rust

use savvy::savvy;
use savvy::NotAvailableValue;   // for is_na() and na()

/// Convert to Upper-case
/// 
/// @param x A character vector.
/// @export
#[savvy]
fn to_upper(x: StringSexp) -> savvy::Result<savvy::Sexp> {
    // Use `Owned{type}Sexp` to allocate an R vector for output.
    let mut out = OwnedStringSexp::new(x.len())?;

    for (i, e) in x.iter().enumerate() {
        // To Rust, missing value is an ordinary value. In `&str`'s case, it's just "NA".
        // You have to use `.is_na()` method to distinguish the missing value.
        if e.is_na() {
            // Set the i-th element to NA
            out.set_na(i)?;
            continue;
        }

        let e_upper = e.to_uppercase();
        out.set_elt(i, e_upper.as_str())?;
    }

    out.into()
}

R

to_upper(c("a", "b", "c"))
#> [1] "A" "B" "C"

Examples

A toy example R package can be found in R-package/ directory.

Thanks

Savvy is not quite unique. This project is made possible by heavily taking inspiration from other great projects:

  • The basic idea is of course based on extendr. Savvy would not exist without extendr.
  • cpp11's "writable" concept influenced the design a lot. Also, I learned a lot from the great implementation such as the protection mechanism.
  • PyO3 made me realize that the FFI crate doesn't need to be a "sys" crate.

Get Started

Prerequisite

Rust

First of all, you need a Rust toolchain installed. You can follow the official instruction.

If you are on Windows, you need an additional step of installing x86_64-pc-windows-gnu target.

rustup target add x86_64-pc-windows-gnu

A helper R package

Then, install a helper R package for savvy.

install.packages(
  "savvy",
  repos = c("https://yutannihilation.r-universe.dev", "https://cloud.r-project.org")
)

Note that, under the hood, this is just a simple wrapper around savvy-cli. So, if you prefer shell, you can directly use the CLI instead, which is available on the releases.

Create a new R package

First, create a new R package. usethis::create_package() is convenient for this.

usethis::create_package("path/to/foo")

Then, move to the package directory and generate necessary files like Makevars and Cargo.toml, as well as the C and R wrapper code corresponding to the Rust code. savvy::savvy_init() does this all (under the hood, this simply runs savvy-cli init).

Lastly, run devtools::document() to generate NAMESPACE and documents.

savvy::savvy_init()
devtools::document()

Now, this package is ready to install! After installing (e.g. by running "Install Package" on RStudio IDE), confirm you can run this example function that multiplies the first argument by the second argument.

library(<your package>)

int_times_int(1:4, 2L)
#> [1] 2 4 6 8

Package structure

After savvy::savvy_init(), the structure of your R package should look like below.

.
├── .Rbuildignore
├── DESCRIPTION
├── NAMESPACE
├── R
│   └── 000-wrappers.R      <-------(1)
├── configure               <-------(2)
├── configure.win           <-------(2)
├── cleanup                 <-------(2)
├── cleanup.win             <-------(2)
├── foofoofoofoo.Rproj
└── src
    ├── Makevars.in         <-------(2)
    ├── Makevars.win.in     <-------(2)
    ├── init.c              <-------(3)
    ├── <your package>-win.def  <---(4)
    └── rust
        ├── .cargo
        │   └── config.toml <-------(4)
        ├── api.h           <-------(3)
        ├── Cargo.toml      <-------(5)
        └── src
            └── lib.rs      <-------(5)
  1. 000-wrappers.R: R functions for the corresponding Rust functions
  2. configure*, cleanup*, Makevars.in, and Makevars.win.in: Necessary build settings for compiling Rust code
  3. init.c and api.h: C functions for the corresponding Rust functions
  4. <your package>-win.def and .cargo/config.toml: These are tricks to avoid a minor error on Windows. See extendr/rextendr#211 and savvy#98 for the details.
  5. Cargo.toml and lib.rs: Rust code

Write your own function

The most revolutionary point of savvy::savvy_init() is that it kindly leaves the most important task to you; let's define a typical hello-world function for practice!

Write some Rust code

Open src/rust/lib.rs and add the following lines. r_println! is the R version of println! macro.

/// @export
#[savvy]
fn hello() -> savvy::Result<()> {
    savvy::r_println!("Hello world!");
    Ok(())
}

Update wrapper files

Every time you modify or add some Rust code, you need to update the C and R wrapper files by running savvy::savvy_update() (under the hood, this simply runs savvy-cli update). Don't forget to run devtools::document() as well.

savvy::savvy_update()
devtools::document()

After re-installing your package, you should be able to run the hello() function on your R session.

hello()
#> Hello world!

Key Ideas

Treating external SEXP and owned SEXP differently

Savvy is opinionated in many points. Among these, one thing I think should be explained first is that savvy uses separate types for SEXP passed from outside and that created within Rust function. The former, external SEXP, is read-only, and the latter, owned SEXP, is writable. Here's the list:

R typeRead-only versionWritable version
INTSXP (integer)IntegerSexpOwnedIntegerSexp
REALSXP (double)RealSexpOwnedRealSexp
RAWSXP (raw)RawSexpOwnedRawSexp
LGLSXP (logical)LogicalSexpOwnedLogicalSexp
STRSXP (character)StringSexpOwnedStringSexp
VECSXP (list)ListSexpOwnedListSexp
EXTPTRSXP (external pointer)ExternalPointerSexpn/a
CPLXSXP (complex)1ComplexSexpOwnedComplexSexp
1

Complex is optionally supported under feature flag complex

You might wonder why this is needed when we can just use mut to distinguish the difference of mutability. I mainly had two motivations for this:

  1. avoid unnecessary protection: an external SEXP are already protected by the caller, while an owned SEXP needs to be protected by ourselves.
  2. avoid unnecessary ALTREP checks: an external SEXP can be ALTREP, so it's better to handle them in ALTREP-aware way, while an owned SEXP is not.

This would be a bit lengthy, so let's skip here. You can read the details on my blog post. But, one correction is that I found the second reason might not be very important because a benchmark showed it's more efficient to be non-ALTREP-aware in most of the cases. Actually, the current implementation of savvy is non-ALTREP-aware for int, real, and logical (See #18).

No implicit conversions

Savvy doesn't provide conversion between types unless you do explicitly. For example, you cannot supply a double vector to a function with a IntegerSexp argument.

#[savvy]
fn identity_int(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
    let mut out = OwnedIntegerSexp::new(x.len())?;

    for (i, &v) in x.iter().enumerate() {
        out[i] = v;
    }

    out.into()
}
identity_int(c(1, 2))
#> Error in identity_int(c(1, 2)) : 
#>   Unexpected type: Cannot convert double to integer

While you probably feel this is inconvenient, this is also a design decision. My concerns on supporting these conversion are

  • Complexity. It would make savvy's spec and implemenatation complicated.
  • Hidden allocation. Conversion requires a new allocation for storing the converted values, which might be unhappy in some cases.

So, you have to write some wrapper R function like below. This might feel a bit tiring, but, in general, please do not avoid writing R code. Since you are creating an R package, there's a lot you can do in R code instead of making things complicated in Rust code. Especially, it's easier on R's side to show user-friendly error messages.

identity_int_wrapper <- function(x) {
  x <- vctrs::vec_cast(x, integer())
  identity_int(x)
}

Alternatively, you can use NumericSexp as input. This provides a method to convert the input either to i32 or to f64 on the fly. For more details, please read the section about NumericSexp

#[savvy]
fn identity_num(x: NumericSexp) -> savvy::Result<savvy::Sexp> {
    let mut out = OwnedIntegerSexp::new(x.len())?;

    for (i, &v) in x.iter_i32().enumerate() {
        out[i] = v;
    }

    out.into()
}

#[savvy] macro

This is a simple Rust function to add the specified suffix to the input character vector. #[savvy] macro turns this into an R function.

use savvy::NotAvailableValue;   // for is_na() and na()

/// Add Suffix
/// 
/// @export
#[savvy]
fn add_suffix(x: StringSexp, y: &str) -> savvy::Result<savvy::Sexp> {
    let mut out = OwnedStringSexp::new(x.len())?;

    for (i, e) in x.iter().enumerate() {
        if e.is_na() {
            out.set_na(i)?;
            continue;
        }

        out.set_elt(i, &format!("{e}_{y}"))?;
    }

    out.into()
}

Convention for a #[savvy] function

The example function above has this signature.

fn add_suffix(x: StringSexp, y: &str) -> savvy::Result<savvy::Sexp>

As you can guess, #[savvy] macro cannot be applied to arbitrary functions. The function must satisfy the following conditions:

  • The function's inputs can be
    • a non-owned savvy type (e.g., IntegerSexp and RealSexp)
    • a corresponding Rust type for scalar (e.g., i32 and f64)
    • a user-defined struct marked with #[savvy] (&T, &mut T, or T)
    • a user-defined enum marked with #[savvy] (&T, or T)
    • any of above wrapped with Option (this is translated as an optional arg)
  • The function's return value must be either
    • savvy::Result<()> for the case of no actual return value
    • savvy::Result<savvy::Sexp> for the case of some return value of R object
    • savvy::Result<T> for the case of some return value of a user-defined struct or enum marked with #[savvy]

How things work under the hood

If you mark a funtion with #[savvy] macro, the corresponding implementations are generated:

  1. Rust functions
    1. a wrapper function to handle Rust and R errors gracefully
    2. a function with the original body and some conversion from raw SEXPs to savvy types.
  2. C function signature for the Rust function
  3. C implementation for bridging between R and Rust
  4. R implementation

For example, the above implementation generates the following codes. (#[savvy] macro can also be used on struct and enum, but let's focus on function's case for now for simplicity.)

Rust functions

(The actual code is a bit more complex to handle possible panic! properly.)

#[allow(clippy::missing_safety_doc)]
#[no_mangle]
pub unsafe extern "C" fn savvy_add_suffix__ffi(x: SEXP, y: SEXP) -> SEXP {
    match savvy_add_suffix_inner(x, y) {
        Ok(result) => result.0,
        Err(e) => savvy::handle_error(e),
    }
}

unsafe fn savvy_add_suffix_inner(x: SEXP, y: SEXP) -> savvy::Result<savvy::Sexp> {
    let x = <savvy::RealSexp>::try_from(savvy::Sexp(x))?;
    let y = <&str>::try_from(savvy::Sexp(y))?;
    
    // original function
    add_suffix(x, y)
}

// original function
fn add_suffix(x: StringSexp, y: &str) -> savvy::Result<savvy::Sexp> {

    // ..original body..

}

C function signature

SEXP savvy_add_suffix__ffi(SEXP c_arg__x, SEXP c_arg__y);

C implementation

(let's skip the details about handle_result for now)

SEXP savvy_add_suffix__impl(SEXP c_arg__x, SEXP c_arg__y) {
    SEXP res = savvy_add_suffix__ffi(c_arg__x, c_arg__y);
    return handle_result(res);
}

R implementation

The Rust comments with three slashes (///) is converted into Roxygen comments on R code.

#' Add Suffix
#' 
#' @export
add_suffix <- function(x, y) {
  .Call(add_suffix__impl, x, y)
}

Using #[savvy] on other files than lib.rs

You can use #[savvy] macro just the same as lib.rs. Since #[savvy] automatically marks the functions necessary to be exposed as pub, you don't need to care about the visibility.

For exampple, if you define a function in src/foo.rs,

#[savvy]
fn do_nothing() -> savvy::Result<()> {
    Ok(())
}

just declaring mod foo in src/lib.rs is enough to make do_nothing() available to R.

mod foo;

Handling Vector Input

Basic rule

As described in Key Ideas, the input SEXP is read-only. You cannot modify the values in place.

Methods

1. iter()

IntegerSexp, RealSexp, LogicalSexp, and StringSexp provide iter() method so that you can access to the value one by one.

for (i, e) in x.iter().enumerate() {
    // ...snip...
}

Similarly, NumericSexp, which handles both integer and double, provides iter_i32() and iter_f64(). But, this might allocate if the type conversion is needed.

2. as_slice() (for integer and double)

IntegerSexp and RealSexp can expose their underlying C array as a Rust slice by as_slice().

/// @export
#[savvy]
fn foo(x: IntegerSexp) -> savvy::Result<()> {
    some_function_takes_slice(x.as_slice());
    Ok(())
}

Similarly, NumericSexp, which handles both integer and double, provides as_slice_i32() and as_slice_f64(). But, this might allocate if the type conversion is needed.

3. to_vec()

As the name indicates, to_vec() copies the values to a new Rust vector. Copying can be costly for big data, but a vector is handy if you need to pass the data around among Rust functions.

let mut v = x.to_vec();
some_function_takes_vec(v);

If a function requires a slice and the type is not integer or double, you have no choice but to_vec() to create a new vector and then convert it to a slice.

let mut v = x.to_vec();
another_function_takes_slice(&v);

Missing values

There's no concept of "missing value" on the corresponding types of Rust. So, it looks a normal value to Rust's side.

The good news is that R uses the sentinel values to represent NA, so it's possible to check if a value is NA to R in case the type is either i32, f64 or &str.

By using NotAvailableValue trait, you can check if the value is NA by is_na(), and refer to the sentinel value of NA by <T>::na(). If you care about missing values, you always have to have an if branch for missing values like below.

use savvy::NotAvailableValue;

/// @export
#[savvy]
fn sum(x: RealSexp) -> savvy::Result<savvy::Sexp> {
    let mut sum: f64 = 0.0;
    for e in x.iter() {
        if !e.is_na() {
            sum += e;
        }
    }

    ...snip...
}

The bad news is that bool is not the case. bool doesn't have is_na() or na(). NA is treated as TRUE without any errors. So, you have to make sure the input doesn't contain any missing values on R's side. For example, this function is not an identity function.

/// @export
#[savvy]
fn identity_logical(x: LogicalSexp) -> savvy::Result<savvy::Sexp> {
    let mut out = OwnedLogicalSexp::new(x.len())?;

    for (i, e) in x.iter().enumerate() {
        out.set_elt(i, e)?;
    }

    out.into()
}
identity_logical(c(TRUE, FALSE, NA))
#> [1]  TRUE FALSE  TRUE

The good news is that LogicalSexp has an expert-only method as_slice_raw(). See "Logical" section of Integer, Real, String, Logical, And Complex for the details.

Handling Vector Output

Basically, there are two ways to prepare an output to the R session.

1. Create a new R object first and put values on it

An owned SEXP can be allocated by using Owned{type}Sexp::new(). new() takes the length of the vector as the argument. If you need the same length of vector as the input, you can pass the len() of the input SEXP.

new() returns Result because the memory allocation can fail in case when the vector is too large. You can probably just add ? to it to handle the error.

let mut out = OwnedStringSexp::new(x.len())?;

Use set_elt() to put the values one by one. Note that you can also assign values like out[i] = value for integer and double. See Type-specific Topics for more details.

for (i, e) in x.iter().enumerate() {
    // ...snip...

    out.set_elt(i, &format!("{e}_{y}"))?;
}

You can use set_na() to set the specified element as NA. For example, it's a common case to use this in order to propagate the missingness like below.

for (i, e) in x.iter().enumerate() {
    // ...snip...
    if e.is_na() {
        out.set_na(i)?;
    } else {
        // ...snip...
    }
}

After putting the values to the vector, you can convert it to Result<Sexp> by into().

/// @export
#[savvy]
fn foo() -> savvy::Result<savvy::Sexp> {
    let mut out = OwnedStringSexp::new(x.len())?;

    // ...snip...

    out.into()
}

2. Convert a Rust vector by methods like try_into()

Another way is to use a Rust vector to store the results and convert it to an R object at the end of the function. This is also fallible because this anyway needs to create a new R object under the hood, which can fail. So, this time, the conversion is try_into(), not into().

// Let's not consider for handling NAs at all for simplicity...

/// @export
#[savvy]
fn times_two(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
    let out: Vec<i32> = x.iter().map(|v| v * 2).collect();
    out.try_into()
}

Note that, while this looks handy, this might not be very efficient; for example, times_two() above allocates a Rust vector, and then copy the values into a new R vector in try_into(). The copying cost can be innegligible when the vector is very huge.

try_from_slice()

The same conversions are also available in the form of Owned{type}Sexp::try_from_slice(). While this says "slice", this accepts AsRef<[T]>, which means both Vec<T> and &[T] can be used.

For converting the return value, probably try_from() is shorter in most of the cases. But, sometimes you might find this useful (e.g., the return value is a list and you need to construct the elements of it).

/// @export
#[savvy]
fn times_two2(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
    let out: Vec<i32> = x.iter().map(|v| v * 2).collect();
    let out_sexp: OwnedIntegerSexp::try_from_slice(out);
    out_sexp.into()
}

try_from_iter()

If you only have an iterator, try_from_iter() is more efficient. This example function is the case. The previous examples first collect()ed into a Vec, but it's not necessary in theory.

/// @export
#[savvy]
fn times_two3(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
    let iter = x.iter().map(|v| v * 2);
    let out_sexp: OwnedIntegerSexp::try_from_iter(iter);
    out_sexp.into()
}

Note that, if you already have a slice or vec, you should use try_from_slice() instead of calling iter() on the slice or vec and using try_from_iter(). In such cases, try_from_slice() is more performant for integer, double, and complex because it just copies the underlying memory into SEXP rather than handling the elements one by one.

Handling Scalar

Input

Scalar inputs are handled transparently. The corresponding types are shown in the table below.

/// @export
#[savvy]
fn scalar_input_int(x: i32) -> savvy::Result<()> {
    savvy::r_println!("{x}");
    Ok(())
}
R typeRust scalar type
integeri32
doublef64
logicalbool
rawu8
character&str
complexnum_complex::Complex64
integer or doublesavvy::NumericScalar

NumericScalar

NumericScalar is a special type that can handle both integeer and double. You can get the value from it by as_i32() for i32, or as_f64() for f64. These method converts the value if the input type is different from the target type.

#[savvy]
fn times_two_numeric_i32_scalar(x: NumericScalar) -> savvy::Result<Sexp> {
    let v = x.as_i32()?;
    if v.is_na() {
        (i32::na()).try_into()
    } else {
        (v * 2).try_into()
    }
}

Note that, while as_f64() is infallible, as_i32() can fail when the conversion is from f64 to i32 and

  • the value is Inf or -Inf
  • the value is out of range for i32
  • the value is not integer-ish (e.g. 1.1)

For convenience, NumericScalar also provides a conversion to usize by as_usize(). What's good is that this can handle integer-ish numeric, which means you can allow users to input a larger number than the integer max (2147483647)!

fn usize_to_string_scalar(x: NumericScalar) -> savvy::Result<Sexp> {
    let x_usize = x.as_usize()?;
    x_usize.to_string().try_into()
}
usize_to_string_scalar(2147483648)
#> [1] "2147483648"

Output

Just like a Rust vector, a Rust scalar value can be converted into Sexp by try_from(). It's as simple as.

/// @export
#[savvy]
fn scalar_output_int() -> savvy::Result<savvy::Sexp> {
    1.try_into()
}

Alternatively, the same conversion is available in the form of Owned{type}Sexp::try_from_scalar().

/// @export
#[savvy]
fn scalar_output_int() -> savvy::Result<savvy::Sexp> {
    let out = OwnedIntegerSexp::try_from_scalar(1)?;
    out.into()
}

Missing values

If the type of the input is scalar, NA is always rejected. This is inconsistent with the rule for vector input, but, this is my design decision in the assumption that a scalar missing value is rarely found useful on Rust's side.

/// @export
#[savvy]
fn identity_logical_single(x: bool) -> savvy::Result<savvy::Sexp> {
    let mut out = OwnedLogicalSexp::new(1)?;
    out.set_elt(0, x)?;
    out.into()
}
identity_logical_single(NA)
#> Error in identity_logical_single(NA) : 
#>   Must be length 1 of non-missing value

Optional Argument

To represent an optional argument, you can wrap it with Option. Then, the corresponding R function sets the default value of NULL on the argument.

#[savvy]
fn default_value_vec(x: Option<IntegerSexp>) -> savvy::Result<Sexp> {
    if let Some(x) = x {
        x.iter().sum::<i32>().try_into()
    } else {
        (-1).try_into()
    }
}
function(x = NULL) {
  .Call(savvy_default_value_vec__impl, x)
}

This function works with or without the argument.

default_value_vec(1:10)
#> [1] 55

default_value_vec()
#> [1] -1

Type-specific Topics

You can use these types as an argument of a #[savvy] function.

R typevectorscalar
integerIntegerSexpi32
doubleRealSexpf64
integer or doubleNumericSexpNumericScalar
logicalLogicalSexpbool
rawRawSexpu8
characterStringSexp&str
complex1ComplexSexpComplex64
listListSexpn/a
(any)Sexpn/a
1

Complex is optionally supported under feature flag complex

If you want to handle multiple types, you can cast an Sexp into a specific type by .into_typed() and write match branches to deal with each type. This is important when the interface returns Sexp. For example, ListSexp returns Sexp because the list element can be any type. For more details about List, please read List section.

#[savvy]
fn print_list(x: ListSexp) -> savvy::Result<()> {
    for (k, v) in x.iter() {
        let content = match v.into_typed() {
            TypedSexp::Integer(x) => {
                format!(
                    "integer [{}]",
                    x.iter().map(|i| i.to_string()).collect::<Vec<String>>().join(", ")
                )
            }
            TypedSexp::Real(x) => {
                format!(
                    "double [{}]",
                    x.iter().map(|r| r.to_string()).collect::<Vec<String>>().join(", ")
                )
            }
            TypedSexp::Logical(x) => {
                format!(
                    "logical [{}]",
                    x.iter().map(|l| if l { "TRUE" } else { "FALSE" }).collect::<Vec<&str>>().join(", ")
                )
            }
            TypedSexp::String(x) => {
                format!(
                    "character [{}]",
                    x.iter().collect::<Vec<&str>>().join(", ")
                )
            }
            TypedSexp::List(_) => "list".to_string(),
            TypedSexp::Null(_) => "NULL".to_string(),
            _ => "other".to_string(),
        };

        let name = if k.is_empty() { "(no name)" } else { k };

        r_print!("{name}: {content}\n");
    }

    Ok(())
}

Likewise, NumericSxep also provides into_typed(). You can match it with either IntegerSexp or RealSexp and apply an appropriate function. Alternatively, you can rely on the type conversion that NumericSexp provides. See more details in the next section.

#[savvy]
fn identity_num(x: NumericSexp) -> savvy::Result<savvy::Sexp> {
    match x.into_typed() {
        NumericTypedSexp::Integer(i) => identity_int(i),
        NumericTypedSexp::Real(r) => identity_real(r),
    }
}

Integer, Real, String, Logical, Raw, And Complex

Integer and real

In cases of integer (IntegerSexp, OwnedIntegerSexp) and real (RealSexp, OwnedRealSexp), the internal representation of the SEXPs match with the Rust type we expect, i.e., i32 and f64. By taking this advantage, these types has more methods than other types:

  • as_slice() and as_mut_slice()
  • Index and IndexMut
  • efficient TryFrom<&[T]>

as_slice() and as_mut_slice()

These types can expose its underlying C array as a Rust slice by as_slice(). as_mut_slice() is available only for the owned versions. So, you don't need to use to_vec() to create a new vector just to pass the data to the function that requires slice.

/// @export
#[savvy]
fn foo(x: IntegerSexp) -> savvy::Result<()> {
    some_function_takes_slice(x.as_slice());
    Ok(())
}

Index and IndexMut

You can also access to the underlying data by [. These methods are available only for the owned versions. This means you can write assignment operation like below instead of set_elt().

/// @export
#[savvy]
fn times_two(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
    let mut out = OwnedIntegerSexp::new(x.len())?;

    for (i, &v) in x.iter().enumerate() {
        if v.is_na() {
            out[i] = i32::na();
        } else {
            out[i] = v * 2;
        }
    }

    out.into()
}

Efficient TryFrom<&[T]>

TryFrom<&[T]> is not special to real and integer, but the implementation is different from that of logical and string; since the internal representations are the same, savvy uses copy_from_slice(), which does a memcpy, to copy the data efficently (in logical and string case, the values are copied one by one).

NumericSexp

It's ideal to ensure the function takes the expected type on R's side (e.g., you can use vctrs::vec_cast(), or define S3 methods for integer and double separately). But, it's not always possible.

You can use NumericSexp to accept both real and integer. NumericSexp provides a method to get either i32 or f64 values:

  • as_slice_i32() returns &[i32]. This is fallible.
  • as_slice_f64() returns &[f64].
  • iter_i32() returns an iterator of Result<i32>.
  • iter_f64() returns an iterator of f64.

These functions return the underlying data directly if the type is the same as wanted, otherwise converts the values. If the conversion is from f64 to i32, it fails when any of the values is

  • Inf or -Inf
  • out of range for i32
  • not integer-ish (e.g. 1.1)

For convenience, NumericSexp also provides iter_usize(), which returns an iterator of Result<usize>.

With NumericSexp, you can rewrite the above times_two function like this:

#[savvy]
fn times_two(x: NumericSexp) -> savvy::Result<Sexp> {
    let mut out = OwnedIntegerSexp::new(x.len())?;

    for (i, v) in x.iter_i32().enumerate() {
        let v = v?;
        if v.is_na() {
            out[i] = i32::na();
        } else {
            out[i] = v * 2;
        }
    }

    out.into()
}

Alternatively, you can use .into_typed() and match the result to apply an appropriate function depneding on the type. In this case, you need to define two different functions, but this might be useful when the logic is very different for integer values and real values.

#[savvy]
fn times_two(x: NumericSexp) -> savvy::Result<savvy::Sexp> {
    match x.into_typed() {
        NumericTypedSexp::Integer(i) => times_two_int(i),
        NumericTypedSexp::Real(r) => times_two_real(r),
    }
}

Logical

While logical is 3-state (TRUE, FALSE and NA) on R's side, bool can represent only 2 states (true and false). This mismatch is a headache. There are many possible ways to handle this (e.g., use Option<bool>), but savvy chose to convert NA to true silently, assuming NA is not useful on Rust's side anyway. So, you have to make sure the input logical vector doesn't contain NA on R's side. For example,

wrapper_of_some_savvy_fun <- function(x) {
  out <- rep(NA, length(x))
  idx <- is.na(x)

  # apply function only non-NA elements
  out[x] <- some_savvy_fun(x[idx])

  out
}

If you really want to handle the 3 states, use an expert-only method as_slice_raw(). This returns &[i32] instead of &[bool]. Why i32? It's the internal representation of a logical vector, which is the same as an integer vector. By treating the data as i32, you can use is_na().

use savvy::NotAvailableValue;   // for is_na()

/// @export
#[savvy]
fn flip_logical_expert_only(x: LogicalSexp) -> savvy::Result<savvy::Sexp> {
    let mut out = OwnedLogicalSexp::new(x.len())?;

    for (i, e) in x.as_slice_raw().iter().enumerate() {
        if e.is_na() {
            out.set_na(i)?;
        } else {
            out.set_elt(i, *e != 1)?; // 1 means TRUE
        }
    }

    out.into()
}
flip_logical_expert_only(c(TRUE, FALSE, NA))
#> [1]  TRUE FALSE    NA

String

STRSXP is a vector of CHARSXP, not something like *char. So, it's not possible to expose the internal representation as &str. So, it requires several R's C API calls. To get a &str

  1. STRING_ELT() to subset a CHARSXP
  2. R_CHAR() to extract the string from CHARSXP

Similarly, to set a &str

  1. Rf_mkCharLenCE() to convert &str to a CHARSEXP
  2. SET_STRING_ELT() to put the CHARSXP to the STRSXP

This is a bit costly. So, if the strings need to be referenced and updated frequently, probably you should avoid using OwnedStringSexp as a substitute of Vec<String>.

Encoding and 'static lifetime

While Rust's string is UTF-8, R's string is not guaranteed to be UTF-8. R provides Rf_translateCharUTF8() to convert the string to UTF-8. However, savvy chose not to use it. There are two reasons:

  1. As of version 4.2.0, R uses UTF-8 as the native encoding even on Windows systems. While old Windows systems are not the case, I bravely assumes it's rare and time will solve.
  2. The result of R_CHAR() is the string stored in R_StringHash, the global CHARSXP cache. In my understanding, this will never be removed during the session. So, this allows savvy to mark the result &str with 'static lifetime. However, the result of Rf_translateCharUTF8() is on an R_alloc()-ed memory (code), which can be claimed by GC.

In short, in order to stick with 'static lifetime for the sake of simplicity, I decided to neglect relatively-rare case. Note that, invalid UTF-8 charactars are rejected (= currently, silently replaced with "") by CStr, so it's not very unsafe.

Raw

A raw vector is the sequence of u8, which can be used for representing various binary data. But, please be aware that you can use a Rust struct (see the section about struct) to store the data instead of copying the whole data into R's memory.

Complex

Complex is optionally supported under feature flag complex. If it's enabled, you can use ComplexSexp and OwnedComplexSexp to use a complex vector for input or output, and you can extract the slice of num_complex::Complex64 from it.

/// @export
#[savvy]
fn abs_complex(x: savvy::ComplexSexp) -> savvy::Result<savvy::Sexp> {
    let mut out = savvy::OwnedRealSexp::new(x.len())?;

    for (i, c) in x.iter().enumerate() {
        if !c.is_na() {
            out[i] = (c.re * c.re + c.im * c.im).sqrt();
        } else {
            out.set_na(i)?;
        }
    }

    out.into()
}

List

List is a different beast. It's pretty complex. You might think of it as a HashMap, but it's different in that:

  • List elements can be either named or unnamed individually (e.g., list(a = 1, 2, c = 3)).
  • List names can be duplicated (e.g., list(a = 1, a = 2)).

To make things simple, savvy treats a list as a pair of the same length of

  • a character vector containing names, using "" (empty string) to represent missingness (actually, this is the convention of R itself)
  • a collection of arbitrary SEXP elements

Since list is a very convenient data structure in R, you can come up with a lot of convenient interfaces for list. However, savvy intentionally provides only very limited interfaces. In my opinion, Rust should touch list data as little as possible because it's too complex.

Read values from a list

names_iter()

names_iter() returns an iterator of &str.

/// @export
#[savvy]
fn print_list_names(x: ListSexp) -> savvy::Result<()> {
    for k in x.names_iter() {
        if k.is_empty() {
            r_println!("(no name)");
        } else {
            r_println!(k);
        }
        r_println!("");
    }

    Ok(())
}
print_list_names(list(a = 1, 2, c = 3))
#> a
#> (no name)
#> c

values_iter()

values_iter() returns an iterator of Sexp enum. You can convert Sexp to TypedSexp by .into_typed() and then use match to extract the inner data.

/// @export
#[savvy]
fn print_list_values_if_int(x: ListSexp) -> savvy::Result<()>  {
    for v in x.values_iter() {
        match v.into_typed() {
            TypedSexp::Integer(i) => r_println!("int {}\n", i.as_slice()[0]),
            _ => r_println("not int")
        }
    }

    Ok(())
}
print_list_values_if_int(list(a = 1, b = 1L, c = "1"))
#> not int
#> int 1
#> not int

iter()

If you want pairs of name and value, you can use iter(). This is basically a std::iter::Zip of the two iterators explained above.

/// @export
#[savvy]
fn print_list(x: ListSexp)  -> savvy::Result<()> {
    for (k, v) in x.iter() {
        // ...snip...
    }

    Ok(())
}

Put values to a list

new()

OwnedListSexp's new() is different than other types; the second argument (named) indicates whether the list is named or unnamed. If false, the list doesn't have name and all operations on name like set_name() are simply ignored.

set_name()

set_name() simply sets a name at the specified position.

/// @export
#[savvy]
fn list_with_no_values() -> savvy::Result<savvy::Sexp> {
    let mut out = OwnedListSexp::new(2, true)?;

    out.set_name(0, "foo")?;
    out.set_name(1, "bar")?;

    out.into()
}
list_with_no_values()
#> $foo
#> NULL
#> 
#> $bar
#> NULL
#> 

set_value()

set_value() sets a value at the specified position. "Value" is an arbitrary type that implmenents Into<Sexp> trait. Since all {type}Sexp types implements it, you can simply pass it like below.

/// @export
#[savvy]
fn list_with_no_names() -> savvy::Result<savvy::Sexp> {
    let mut out = OwnedListSexp::new(2, false)?;

    let mut e1 = OwnedIntegerSexp::new(1)?;
    e1[0] = 100;
    
    let mut e2 = OwnedStringSexp::new(1)?;
    e2.set_elt(0, "cool")?;

    out.set_value(0, e1)?;
    out.set_value(1, e2)?;

    out.into()
}
list_with_no_names()
#> [[1]]
#> [1] 100
#> 
#> [[2]]
#> [1] "cool"
#> 

set_name_and_value()

set_name_and_value() is simply set_name() + set_value(). Probably this is what you need in most of the cases.

/// @export
#[savvy]
fn list_with_both() -> savvy::Result<savvy::Sexp> {
    let mut out = OwnedListSexp::new(2, true)?;

    let mut e1 = OwnedIntegerSexp::new(1)?;
    e1[0] = 100;
    
    let mut e2 = OwnedStringSexp::new(1)?;
    e2.set_elt(0, "cool")?;

    out.set_name_and_value(0, "foo", e1)?;
    out.set_name_and_value(1, "bar", e2)?;

    out.into()
}
list_with_both()
#> $foo
#> [1] 100
#> 
#> $bar
#> [1] "cool"
#> 

Struct

Basic usage

You can use #[savvy] macro on a struct to convert it to an R object. More precisely, this macro adds implementations of TryFrom between Sexp and the struct so you can specify the type as the function input and output.

/// @export
#[savvy]
struct Person {
    pub name: String,
}

The most handy form is to implement methods and associated functions for the type. You can add #[savvy] before the impl block to make it available on R sessions.

#[savvy]
impl Person {
    fn new() -> Self {
        Self {
            name: "".to_string(),
        }
    }

    fn set_name(&mut self, name: &str) -> savvy::Result<()> {
        self.name = name.to_string();
        Ok(())
    }

    fn name(&self) -> savvy::Result<savvy::Sexp> {
        let mut out = OwnedStringSexp::new(1)?;
        out.set_elt(0, &self.name)?;
        out.into()
    }

    fn say_hello() -> savvy::Result<savvy::Sexp> {
        "Hello!".try_into()
    }
}

If we focus on the arguments, there are two types of functions here:

  1. method: the first argument is self 1 (set_name() and name())
  2. associated function: no self argument (new() and say_hello())
1

You should almost always use &self or &mut self, not self, except when you are an expert and your intention is really to comsume it. Let's discuss later.

On an R session, associated functions are available as the element of the same name of R object as the Rust type (in this case, Person).

p <- Person$new()

Person$say_hello()
#> [1] "Hello"

Among these two associated functions, new() is a constructor which returns Self. This creates an instance of the struct.

The instance has the methods. You can call them like below.

# create an instance
p <- Person$new()

# call methods
p$set_name("たかし")
p$name()
#> [1] "たかし"

The instance has the same name of S3 class as the Rust type, so you can implement S3 methods such as print.<your struct>() if necessary.

class(p)
#> [1] "Person"

# register print() S3 method for Person
print.Person <- function(x, ...) print(x$name())
registerS3method("print", "Person", print.Person)

p
#> たかし

Struct output

The above example uses -> Self as the return type of the associated function, but it's not the only specification. You can wrap it with savvy::Result<Self>.

#[savvy]
impl Person {
    fn new_fallible() -> savvy::Result<Self> {
        let x = Self {
            name: "".to_string(),
        };
        Ok(x)
    }
}

More generally, you can specify an arbitrary struct marked with #[savvy] as the return type. For example, you can create an instance of the struct outside of impl,

/// @export
#[savvy]
fn create_person() -> savvy::Result<Person> {
    let x = Self {
        name: "".to_string(),
    };
    Ok(x)
}

and you can generate another type of instance from an instance.

/// @export
#[savvy]
struct UpperPerson {
    pub name: String,
}

#[savvy]
impl Person {
    fn reborn_as_upper_person(&self) -> savvy::Result<UpperPerson> {
        let x = UpperPerson {
            name: self.name.to_uppercase(),
        };
        Ok(x)
    }
}

Struct input

You can also use the struct as the argument of a #[savvy]-ed function. Note that, in most of the cases, you should specify &T or &mut T, not T.

/// @export
#[savvy]
fn get_name_external(x: &Person) -> savvy::Result<savvy::Sexp> {
    x.name()
}
get_name_external(x)
#> [1] "たかし"

&T vs T

If you are familiar with Rust, you should know the difference. T moves the ownership while &T is just borrowing. But, what does this matter savvy? What happens in actual when you specify T in a #[savvy] function?

Say, you mistyped &Person above as Person like this:

/// @export
#[savvy]
fn get_name_external2(x: Person) -> savvy::Result<savvy::Sexp> {
    x.name()
}

This function works the same as the previous one. The result of the first call is the same. Yay!

get_name_external2(p)
#> [1] "たかし"

Then, what's wrong? You'll find it when you call the function on the same object second time; it doesn't work anymore.

get_name_external2(p)
#> Error: This external pointer is already consumed or deleted

This is because the Person object is already moved. The R variable p doesn't hold the ownership anymore. So, you should almost always specify &T (or &mut T), not T.

The same is true for a method. Use &self and &mut self instead of self unless you want such a method like this!

#[savvy]
impl Person {
    fn invalidate(self) -> savvy::Result<()> {
        r_println!("This instance is invalidated!");
        Ok(())
    }
}

When is T useful?

You might wonder why savvy allows this specification at all. Are there any cases when this is useful?

The answer is yes. The advantage of moving the ownership is that you can avoid copying. For example, consider there's a type HeavyData, which contains huge size of data, and HeavyDataBundle which bundles two HeavyDatas.

/// @export
#[savvy]
#[derive(Clone)]
struct HeavyData(Vec<i32>);

/// @export
#[savvy]
struct HeavyDataBundle {
    data1: HeavyData,
    data2: HeavyData,
}

#[savvy]
impl HeavyData {
    // ...snip...
}

HeavyDataBundle requires the ownership of the DataBundles. So, if the input is &, you need to clone() the data, which can be costly.

/// @export
#[savvy]
impl HeavyDataBundle {
    fn new(
        data1: &HeavyData,
        data2: &HeavyData,
    ) -> Self {
        Self {
            data1: data1.clone(),
            data2: data2.clone(),
        }
    }
}

In this case, you can move the ownership to avoid copying.

/// @export
#[savvy]
impl HeavyDataBundle {
    fn new(
        data1: HeavyData,
        data2: HeavyData,
    ) -> Self {
        Self { data1, data2 }
    }
}

Of course, this is an expert-only usage and is rarely needed. Again, you should almost always use &T or &mut T instead of T. If you are really sure it doesn't work well, you can use T.

Lifetime

#[savvy] macro doesn't support a struct with lifetimes. This is because crossing the boundary of FFI means losing the track of the lifetimes.

For example, the struct below contains a reference to a variable of usize. However, once an instance of Foo is passed to R's side, Rust cannot know whether the variable is still alive when Foo is passed back to Rust's side.

struct Foo<'a>(&'a usize)

Then, what should we do to deal with such structs? I'm yet to find the best practices, but you might be able to

  • use 'static lifetime (i.e. struct Foo(&'static usize)) probably by referencing a global variable
  • instead of passing the struct itself to R, store the struct in a global OnceCell<HashMap> and pass the key

External pointer?

Under the hood, the Person struct is stored in EXTPTRSXP. But, you don't need to care about how to deal with EXTPTRSXP. This is because it's stored in a closure environment on creation and never exposed to the user. As it's guaranteed on R's side that self is always a EXTPTRSXP of Person, Rust code just restore a Person instance from the EXTPTRSXP without any checks.

.savvy_wrap_Person <- function(ptr) {
  e <- new.env(parent = emptyenv())
  e$.ptr <- ptr
  e$set_name <- Person_set_name(ptr)
  e$name <- Person_name(ptr)

  class(e) <- "Person"
  e
}

Person <- new.env(parent = emptyenv())
Person$new <- function() {
  .savvy_wrap_Person(.Call(Person_new__impl))
}

Person$say_hello <- function() {
  .Call(Person_say_hello__impl)
}

Person_set_name <- function(self) {
  function(name) {
    invisible(.Call(Person_set_name__impl, self, name))
  }
}

Person_name <- function(self) {
  function() {
    .Call(Person_name__impl, self)
  }
}

It's important to mention that savvy only wraps the EXTPTRSXP in a closure environment when the type is used directly as the returning type of the function. If the user wants to return Person inside a List, for example, the external pointer will be directly exposed to the user and it will be the user's responsibility to deal with it.

#[savvy]
struct Person {}

// This case savvy handles nicely.
/// @export
#[savvy]
impl Person {
    fn new() -> savvy::Result<Person> {
        Ok(Person {})
    }
}

// In this case, the user is handled an external pointer.
/// @export
#[savvy]
fn create_list() -> savvy::Result<Sexp> {
    let mut list = OwnedListSexp::new(1, false)?;
    let person = Person {};
    list.set_value(0, Sexp::try_from(person)?)?;
    list.into()
}

in R:

> person = Person$new()
> print(person)
<environment: 0x0000027cf9d46a20>
attr(,"class")
[1] "Person"

> l = create_list()
> print(l)
[[1]]
<pointer: 0x0000000000000001>

Traps about protection

This is a bit advanced topic. It's okay to have a struct to contain arbitrary things, however, if you want to pass an SEXP from an R session, it's your responsibility to take care of the protection on it.

The SEXP passed from outside doesn't need an additional protection at the time of the function call because it belongs to some environment on R session, which means it's not GC-ed accidentally. However, after the function call, it's possible the SEXP loses its link to any other R objects. To prevent the tragedy (i.e., R session crash), you should create a owned version and copy the values into it because savvy takes care of the protection on it. So, in short, you should never define such a struct like this:

struct Foo {
    a: IntegerSexp
}

Instead, you should write

struct Foo {
    a: OwnedIntegerSexp
}

Enum

Savvy supports fieldless enum to express the possible options for a parameter. For example, if you define such an enum with #[savvy],

/// @export
#[savvy]
enum LineType {
    Solid,
    Dashed,
    Dotted,
}

it will be available on R's side as this.

LineType$Solid
LineType$Dashed
LineType$Dotted

You can use the enum type as the argument of such a function like this

/// @export
#[savvy]
fn plot_line(x: IntegerSexp, y: IntegerSexp, line_type: &LineType) -> savvy::Result<()> {
    match line_type {
        LineType::Solid => {
            ...
        },
        LineType::Dashed => {
            ...
        },
        LineType::Dotted => {
            ...
        },
    }
}

so that the users can use it instead of specifying it by an integer or a character, which might be mistyped.

plot_line(x, y, LineType$Solid)

Of course, you can archive the same thing with i32 or &str as the input and match the value. The difference is that enum is typo-proof. But, you might feel it more handy to use a plain integer or character.

/// @export
#[savvy]
fn plot_line(x: IntegerSexp, y: IntegerSexp, line_type: &str) -> savvy::Result<()> {
    match line_type {
        "solid" => {
            ...
        },
        "dashed" => {
            ...
        },
        "dotted" => {
            ...
        },
        _ => {
            return Err(savvy_err!("Unsupported line type!"));
        }
    }
}

Limitation

As noted above, savvy supports only fieldless enum for simplicity. If you want to use an enum that contains some value, please wrap it with struct.

// You don't need to mark this with #[savvy]
enum AnimalEnum {
    Dog(String, f64),
    Cat { name: String, weight: f64 },
}

/// @export
#[savvy]
struct Animal(AnimalEnum);

Also, savvy currently doesn't support discriminants. For example, this one won't compile.

/// @export
#[savvy]
enum HttpStatus {
    Ok = 200,
    NotFound = 404,
}

Error handling

To propagate your errors to the R session, you can return a savvy::Error. savvy_err!() macro is a shortcut of savvy::Error::new(format!(...)) to create a new error.

use savvy::savvy_err;

#[savvy]
fn raise_error() -> savvy::Result<savvy::Sexp> {
    Err(savvy_err!("This is my custom error"))
}
raise_error()
#> Error: This is my custom error

Like anyhow, you can use ? to easily propagate any error that implements the std::error::Error trait.

#[savvy]
fn no_such_file() -> savvy::Result<()> {
    let _ = std::fs::read_to_string("no_such_file")?;
    Ok(())
}

Custom error

If you want to implement your own error type and the conversion to savvy::Error, it would conflict with the conversion of From<dyn std::error::Error>. To avoid an compile error, please sepcify use-custom-error feature to opt-out the conversion.

savvy = { version = "...", features = ["use-custom-error"] }

Show a warning

To show a warning, you can use r_warn().

savvy::io::r_warn("foo")?;

Note that, a warning can raise error when options(warn = 2), so you should not ignore the error from r_warn(). The error should be propagated to the R session.

Dealing with panic!

First of all, don't use panic!

If you are familiar with extendr, you might get used to use panic! casually. But, in the savvy framework, panic! crashes your R session. So, please don't use panic! directly. Also, please avoid operations that can cause panic! (e.g., unrwap()) when you are unsure.

This is because, in Rust, the meaning of panic! is an unrecoverable error. In theory, it's a sign that something impossible happens and there's no hope of recovery so there should be no way but to terminate the entire session. Savvy just respects what is supposed to happen.

But, if the session terminates immediately, it's hard to investigate the cause. What can I do?

Use debug build

If DEBUG envvar is set to true on building (i.e., devtools::load_all()), savvy catches panic! and shows the backtrace instead of crashing the R session.

For example, if you write this Rust function and load it by devtools::load_all(),

#[savvy]
fn must_panic() -> savvy::Result<()> {
    let x = &[1];
    let _ = x[1];  // Rust's index starts from 0!
    Ok(())
}

you'll see such an error like this with a backtrace instead of the RStudio bomb icon. You can check the line of the file suggested in the error message to guess what was happening.

must_panic()
#> panic occured!
#> 
#> Original message:
#>     panicked at src\error_handling.rs:33:13:
#>     index out of bounds: the len is 1 but the index is 1
#> 
#> Backtrace:
#>     ...
#>       18: std::panic::catch_unwind
#>                  at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04\library\std\src/panic.rs:142:14
#>       19: simple_savvy::error_handling::savvy_must_panic_inner
#>                  at .\src\rust\src\error_handling.rs:30:1
#>       20: must_panic
#>                  at .\src\rust\src\error_handling.rs:30:1
#>       21: must_panic__impl
#>                  at .\src\init.c:291:16
#>     ...
#> 
#> note: Run with `RUST_BACKTRACE=1` for a full backtrace.
#> 
#> 
#> Error: panic happened

Set panic="unwind"

As described above, panic! is an unrecoverable error. It should not be recovered on the release build in principle.

That said, in some cases, panic! happens from the code out of your control. For example, if it is thrown by some of the dependency crates, there's litte you can do. You should report the author about the problem, but it's not always the behavior is fixed immediately and the fixed version is published. Also, keep in mind that depending on what originates the error, some authors can deliberately prefer to use panic! instead of Result. Note that panic! also happens in rust std library in situations such as division by zero or out-of-bounds error when indexing a Vec.

In such cases, you can change the following setting included in the template Cargo.toml generated by savvy-cli init. Set this to panic = "unwind" to gracefully convert a panic into an R error just like the debug build. Note that the backtrace is not available on the release build because there's no debug info.

[profile.release]
# ...snip...
panic = "unwind"

Handling Attributes

You sometimes need to deal with attributes like names and class. Savvy provides the following methods for getting and setting the value of the attribute.

Getter methodSetter methodType
namesget_names()set_names()Vec<&str>
classget_class()set_class()Vec<&str>
dimget_dim()set_dim()&[i32]
arbitraryget_attrib()set_attrib()Sexp

The getter methods return Option<T> because the object doesn't always have the attribute. You can match the result like this:

/// @export
#[savvy]
fn get_class_int(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
    match x.get_class() {
        Some(class) => class.try_into(),
        None => ().try_into(),
    }
}

The setter methods are available only for owned SEXPs. The return type is savvy::Result<()> becuase the conversion from a Rust type to SEXP is fallible.

/// @export
#[savvy]
fn set_class_int() -> savvy::Result<savvy::Sexp> {
    let mut x = OwnedIntegerSexp::new(1)?;

    x.set_class(&["foo", "bar"])?;

    x.into()
}

For attributes other than names, class, dim, you can use get_attrib() and set_attrib(). Since an attribute can store arbitrary values, the type is Sexp. In order to extract the underlying value, you can use .into_typed() and match.

/// @export
#[savvy]
fn print_attr_values_if_int(attr: &str, value: savvy::Sexp) -> savvy::Result<()>  {
    let attr_value = value.get_attrib(attr)?;
    match attr_value.into_typed() {
        TypedSexp::Integer(i) => r_println!("int {:?}", i.as_slice()]),
        _ => r_println("not int")
    }

    Ok(())
}

In order to set values, you can use .into() to convert from the owned SEXP to a savvy::Sexp.

/// @export
#[savvy]
fn set_attr_int(attr: &str) -> savvy::Result<savvy::Sexp> {
    let s: &[i32] = &[1, 2, 3];
    let attr_value: OwnedIntegerSexp = s.try_into()?;
    let mut out = OwnedIntegerSexp::new(1)?;

    out.set_attrib(attr, attr_value.into())?;

    out.into()
}

Handling Data Frames

A data.frame is a list. You should simply handle it as a list in Rust code, and all data.frame-related operations should be done in R code.

For example, if you want to return the result as a data.frame, the Rust function should return a list, and wrapped by an R function that converts the list into a data.frame. tibble::as_tibble() should be the right choice for this purpose. Or, if you prefer lightweight dependency, you can use vctrs::new_data_frame(), or simply as.data.frame().

/// @export
#[savvy]
fn foo_impl() -> savvy::Result<savvy::Sexp> {
    // create a named list
    let mut out = savvy::OwnedListSexp::new(2, true)?;

    let x: Vec<f64> = some_function();
    let y: Vec<f64> = another_function();
    
    out.set_name_and_value(0, "x", OwnedRealSexp::try_from_slice(x)?)?;
    out.set_name_and_value(1, "y", OwnedRealSexp::try_from_slice(y)?)?;

    out.into()
}
foo <- function() {
  result <- foo_impl()
  tibble::as_tibble(result)
}

Handling Factors

A factor is internally an integer vector with the levels attribute. You can handle this on Rust's side, but the recommended way is to write a wrapper R function to convert the factor vector to a character vector.

Say there's a Rust function that takes a character vector as its argument.

/// @export
#[extendr]
fn foo_impl(x: StringSexp) -> savvy::Result<()> {
    ...
}

Then, you can write a function like below to convert the input to a character vector. If you want better validation, you can use vctrs::vec_cast() instead.

foo <- function(x) {
    x <- as.character(x)
    foo_impl(x)
}

If you need the information of the order of the levels, you should pass it as an another argument.

/// @export
#[extendr]
fn foo_impl2(x: StringSexp, levels: StringSexp) -> savvy::Result<()> {
    ...
}
foo2 <- function(x) {
    levels <- levels(x)
    x <- as.character(x)
    foo_impl2(x, levels)
}

Handling Matrices And Arrays

Savvy doesn't provide a convenient way of converting matrices and arrays. You have to do it by yourself. But, don't worry, it's probably not very difficult thanks to the fact that major Rust matrix crates are column-majo, or at least support column-major.

  • ndarray: row-major is default (probably for compatibility with Python ndarray?), but it offers column-major as well
  • nalgebra: column-major
  • glam (and probably all other rust-gamedev crates): column-major, probably because GLSL is column-major

The example code can be found at https://github.com/yutannihilation/savvy-matrix-examples/tree/master/src/rust/src.

R to Rust

ndarray

By default, ndarray is row-major, but you can specify column-major by f(). So, all you have to do is simply to extract the dim and pass it to ndarray.

use ndarray::Array;
use ndarray::ShapeBuilder;
use savvy::{r_println, savvy, RealSexp};

/// @export
#[savvy]
fn ndarray_input(x: RealSexp) -> savvy::Result<()> {
    // In R, dim is i32, so you need to convert it to usize first.
    let dim_i32 = x.get_dim().ok_or("no dimension found")?;
    let dim: Vec<usize> = dim_i32.iter().map(|i| *i as usize).collect();

    // f() changes the order from row-major (C-style convention) to column-major (Fortran-style convention).
    let a = Array::from_shape_vec(dim.f(), x.to_vec());

    r_println!("{a:?}");

    Ok(())
}

nalgebra

nalgebra is column-major, so you can simply pass the dim.

use nalgebra::DMatrix;
use savvy::{r_println, savvy, RealSexp};

/// @export
#[savvy]
fn nalgebra_input(x: RealSexp) -> savvy::Result<()> {
    let dim = x.get_dim().ok_or("no dimension found")?;

    if dim.len() != 2 {
        return Err(savvy_err!("Input must be matrix!"));
    }

    let m = DMatrix::from_vec(dim[0] as _, dim[1] as _, x.to_vec());

    r_println!("{m:?}");

    Ok(())
}

glam

glam is also column-major. In the case with glam, probably the dimension is fixed (e.g. 3 x 3 in the following code). You can check the dimension is as expected before passing it to the constructor of a matrix.

use glam::{dmat3, dvec3, DMat3};
use savvy::{r_println, savvy, OwnedRealSexp, RealSexp};

/// @export
#[savvy]
fn glam_input(x: RealSexp) -> savvy::Result<()> {
    let dim = x.get_dim().ok_or("no dimension found")?;

    if dim != [3, 3] {
        return Err(savvy_err!("Input must be 3x3 matrix!"));
    }

    // As we already check the dimension, this must not fail
    let x_array: &[f64; 9] = x.as_slice().try_into().unwrap();

    let m = DMat3::from_cols_array(x_array);

    r_println!("{m:?}");

    Ok(())
}

Rust to R

The matrix libraries typically provides method to get the dimension and the slice of underlying memory. You set the dimension by set_dim().

/// @export
#[savvy]
fn nalgebra_output() -> savvy::Result<savvy::Sexp> {
    let m = DMatrix::from_vec(2, 3, vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0]);

    let shape = m.shape();
    let dim = &[shape.0, shape.1];

    let mut out = OwnedRealSexp::try_from(m.as_slice())?;
    out.set_dim(dim)?;

    out.into()
}

Testing

Write integration tests on R's side

The most recommended way is to write tests on R's side just as you do with an ordinary R package. You can write tests on Rust's side as described later, but, ultimately, the R functions are the user interface, so you should test the behavior of actual R functions.

Write Rust tests

The sad news is that cargo test doesn't work with savvy. This is because savvy always requires a real R session to work. But, don't worry, savvy-cli test is the tool for this. savvy-cli test does

  1. extract the Rust code of the test modules and the doc tests
  2. create a temporary R package1 and inject the extracted Rust code
  3. build and run the test functions via the R package
1

The R package is created in the OS's cache dir by default, but you can specify the location by --cache-dir.

Note that, this takes the path to the root of a crate, not that of an R package.

savvy-cli test path/to/your_crate

Limitations

savvy-cli test tries to mimic what cargo test does as much as possible, but there's some limitations.

First, in order to run tests, you need to add "lib" to the crate-type. This is because your crate is used as a Rust library when run by savvy-cli test.

[lib]
crate-type = ["staticlib", "lib"]
                           ^^^^^

Second, if you want to test a function or a struct, it must be public. For the ones marked with #[savvy] are automatically made public, but, if you want to test other functions, you need to add pub to it by yourself.

pub fn foo() -> savvy::Result<()> {
^^^

Test module

You can write tests under a module marked with #[cfg(feature = "savvy-test")] instead of #[cfg(test)]. A #[test] function needs to have the return value of savvy::Result<()>, which is the same convention as #[savvy]. To check if an SEXP contains the expected data, assert_eq_r_code is convenient.

#[cfg(feature = "savvy-test")]
mod test {
    use savvy::{OwnedIntegerSexp, assert_eq_r_code};

    #[test]
    fn test_integer() -> savvy::Result<()> {
        let mut x = OwnedIntegerSexp::new(3)?;

        assert_eq_r_code(x, "c(0L, 0L, 0L)");

        Ok(())
    }
}

Note that savvy-test is just a marker for savvy-cli, not a real feature. So, in theory, you don't really need this. However, in reality, you probably want to add it to the [features] section of Cargo.toml because otherwise Cargo warns.

[features]
savvy-test = []

To test a function that takes user-supplied SEXPs like IntegerSexp, you can use .as_read_only() to convert from the corresponding Owned- type. For example, if you have a function your_fn() that accepts IntegerSexp, you can construct an OwnedIntegerSexp and convert it to IntegerSexp before passing it to your_fn().

#[savvy]
pub fn your_fn(x: IntegerSexp) -> savvy::Result<()> {
    // ...snip...
}

#[cfg(feature = "savvy-test")]
mod test {
    use savvy::OwnedIntegerSexp;

    #[test]
    fn test_integer() -> savvy::Result<()> {
        let x = savvy::OwnedIntegerSexp::new(3)?;
        let x_ro = x.as_read_only();
        let result = super::your_fn(x_ro);

        assert_eq_r_code(result, "...");
        
        Ok(())
    }
}

Doc tests

You can also write doc tests. savvy-cli test wraps it with a function with the return value of savvy::Result<()>, you can use ? to extract the Result value in the code.

/// ```
/// let x = savvy::OwnedIntegerSexp::new(3)?;
/// assert_eq!(x.as_slice(), &[0, 0, 0]);
/// ```

Features and dependencies

If you need to specify some features for testing, use --features argument.

savvy-cli test --features foo path/to/your_crate

For dependencies, savvy-cli test picks all dependencies in [dependencies] and [dev-dependencies]. If you need some additional crate for the test code, you can just use [dev-dependencies] section of the Cargo.toml just as you do when you do cargo test.

Reminder: You can use cargo test

While #[savvy] requires a real session, you can utilize cargo test by separating the actual logic to a function that doesn't rely on savvy. For example, suppose you have the following function times_two_int() that doubles the input numbers.

#[savvy]
fn times_two_int(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
    let mut out = OwnedIntegerSexp::new(x.len())?;

    for (i, e) in x.iter().enumerate() {
        if e.is_na() {
            out.set_na(i)?;
        } else {
            out[i] = e * 2;
        }
    }

    out.into()
}

In this case, you can rewrite the code to the following so that you can test times_two_int_impl() with cargo test.

#[savvy]
fn times_two_int(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
    let result: Vec<i32> = times_two_int_impl(x.as_slice());
    result.try_into()
}

fn times_two_int_impl(x: &[i32]) -> Vec<i32> {
    x.iter()
        .map(|x| if x.is_na() { *x } else { *x * 2 })
        .collect::<Vec<i32>>()
}

But, as you might notice, this implementation is a bit inefficient that it allocates a Vec<i32> just to store the temporary result. Like this, separating a function might be a bit tricky and it might not be really worth in some cases. (In this case, probably the function can return an iterator).

Advanced Topics

"External" external pointers

As described in Struct section, a struct marked with #[savvy] is transparently converted from and into an SEXP of an external pointer. So, usually, you don't need to think about external pointers.

However, in some cases, you might need to deal with an external pointer created by another R package. For example, you might want to access an Apache Arrow data created by nanoarrow R package. In such caes, you can use unsafe methods .cast_unchecked() or .cast_mut_unchecked().

let foo: &Foo = unsafe { &*ext_ptr_sexp.cast_unchecked::<Foo>() };

Initialization Routine

#[savvy_init] is a special version of #[savvy]. The function marked with this macro is called when the package is loaded, which is what Writing R Extension calls "initialization routine". The function must take *mut DllInfo as its argument.

For example, if you write such a Rust function like this,

use savvy::ffi::DllInfo;

#[savvy_init]
fn init_foo(_dll_info: *mut DllInfo) -> savvy::Result<()> {
    r_eprintln!("Initialized!");
    Ok(())
}

You'll see the following message on your R session when you load the package.

library(yourPackage)
#> Initialized!

Under the hood, savvy-cli update . inserts the following line in a C function R_init_*(), which is called when the DLL is loaded.

void R_init_yourPackage(DllInfo *dll) {
    R_registerRoutines(dll, NULL, CallEntries, NULL, NULL);
    R_useDynamicSymbols(dll, FALSE);

    savvy_init_foo__impl(dll); // added!
}

This is useful for initializing resources. For example, you can initialize a global variable.

use std::sync::OnceLock;

static GLOBAL_FOO: OnceLock<Foo> = OnceLock::new();

#[savvy_init]
fn init_global_foo(dll_info: *mut DllInfo) -> savvy::Result<()> {
    GLOBAL_FOO.get_or_init(|| Foo::new());

    Ok(())
}

You can also register an ALTREP class using this mechanism see the next page.

ALTREP

You can implement an ALTREP class using savvy.

Disclaimer

  • This feature is very experimental, so it's possible that the interface will be significantly changed or even removed in future.

  • The current API might be a bit oversimplified. For example, you cannot stop the vector is materialized (i.e., allocated as a normal SEXP and put into the data2 slot of the ALTREP object).

Using ALTREP

Savvy currently provides only the following traits for ALTREP. The other ALTREPs like ALTCOMPLEX are not yet supported.

For example, consider the following struct that simply wraps a Vec<i32>.

struct MyAltInt(Vec<i32>);

impl MyAltInt {
    fn new(x: Vec<i32>) -> Self {
        Self(x)
    }
}

First, you need to implement IntoExtPtrSexp trait for the struct, which is required by Alt* traits. This trait is what works under the hood of #[savvy] when it's placed on a struct. You can just rely on the default implementation.

impl savvy::IntoExtPtrSexp for MyAltInt {}

Second, you need to implement one of the Alt* traits. More specifically, the trait has 4 members you need to implement:

  • CLASS_NAME is the name of the class. This is used for distinguishing the class, so please use a unique string.
  • PACKAGE_NAME is the name of your package. This probably doesn't matter much.
  • length() returns the length of the object.
  • elt(i) returns the i-th element of the object. An important note is that, usually R handles the out-of-bound check and returns NA if it exceeds the length. So, you don't need to check the length here.

In this case, the actual data is i32, so let's implement AltInteger.

impl AltInteger for MyAltInt {
    const CLASS_NAME: &'static str = "MyAltInt";
    const PACKAGE_NAME: &'static str = "TestPackage";

    fn length(&mut self) -> usize {
        self.0.len()
    }

    fn elt(&mut self, i: usize) -> i32 {
        self.0[i]
    }
}

Optionally, you can implement these methods:

  • copy_date(dst, offset): This copies the range of values starting from offset into dst, a &mut [T]. The default implementation does just call elt() repeatedly, but there might be more efficient implementation (e.g. copy_from_slice()).
  • inspect(): This is called when .Internal(inspect(x)). You might want to print some information useful for debugging.

Next step is a bit advanced. You need to create a definition of ALTREP class from the above trait. This is done by the corresponding register_alt*_class() function (for example, register_altinteger_class for an integer class). This function generates an ALTREP class and registers it to an R session.

The registration needs to happen when an R session loads the DLL of your crate. As explained in the section of initialization routine, you can define a #[savvy_init] function, which will be called in the initialization routine.

#[savvy_init]
fn init_altrep_class(dll_info: *mut DllInfo) -> savvy::Result<()> {
    register_altinteger_class::<MyAltInt>(dll_info)?;
    Ok(())
}

Finally, you'll probably want to implement a user-visible function to create the instance of the ALTREP class. You can convert the struct into an ALTREP by .into_altrep() method, which is provided by the Alt* trait. For example, you can create the following function that returns the length 3 of the ALTREP vector to the R session.

#[savvy]
fn altint() -> savvy::Result<savvy::Sexp> {
    let v = MyAltInt::new(vec![1, 2, 3]);
    v.into_altrep()
}

This function can be used like this:

x <- altint()

x
#> [1] 1 2 3

This looks like a normal integer vector, but this is definitely an ALTREP.

.Internal(inspect(x))
#> @0x0000021684acac40 13 INTSXP g0c0 [REF(65535)] (MyAltInt)

Going deeper...

Once the ALTREP object leaves your hand, it looks like a normal vector. But, if you really wish, you can convert it back to the original object. Alt* trait provides 3 methods for this conversion:

  • try_from_altrep_ref() for &T
  • try_from_altrep_mut() for &mut T
  • try_from_altrep() for T

For example, you can print the underlying data using Debug trait.

#[savvy]
fn print_altint(x: IntegerSexp) -> savvy::Result<()> {
    if let Ok(x) = MyAltInt::try_from_altrep_ref(&x) {
        r_println!("{x:?}");
        return Ok(());
    };

    Err(savvy_err!("Not a known ALTREP"))
}
print_altint(x)
#> MyAltInt([1, 2, 3])

But, before getting excited, you need to be aware about the tricky nature of R.

First, your ALTREP object can be easily lost in the sea of copy-on-modify. For example, if the object is get modified, it's no longer an ALTREP object.

x <- altint()

x[1L] <- 3L

print_altint(x)
#> Error: Not a known ALTREP

Second, this is much trickier. As there is try_from_altrep_mut(), you can modify the underlying data. For example, you can mutiply each number by two.

#[savvy]
fn tweak_altint(mut x: IntegerSexp) -> savvy::Result<()> {
    if let Ok(x) = MyAltInt::try_from_altrep_mut(&mut x, false) {
        for i in x.0.iter_mut() {
            *i *= 2;
        }
        return Ok(());
    };

    Err(savvy_err!("Not a known ALTREP"))
}

Let's confirm this function modifies the underlying data as expected.

x <- altint()
c(x) # This is for a side effect! Let's discuss later.
#> [1] 1 2 3

tweak_altint(x)

print_altint(x)
#> MyAltInt([2, 4, 6])

So far, so good. But, if you print x, you'll find the values are diverged between Rust and R... Why can this happen?

x
#> [1] 1 2 3

This is because savvy's implementation caches the SEXP object converted from the underlying data. It's can be costly if it creates a fresh SEXP object everytime the R session requires, so the result is cached at the first time it's created (in the above case, it's c(x)). As far as I know, most of the ALTREP implementation adopt this caching strategy (more specifically, an ALTREP object has two slots, data1 and data2, and data2 is usually used for the cache).

But, don't worry. try_from_altrep_mut() has a second argument, invalidate_cache. You can set this to true to clear the cache.

#[savvy]
fn tweak_altint2(mut x: IntegerSexp) -> savvy::Result<()> {
    if let Ok(x) = MyAltInt::try_from_altrep_mut(&mut x, true) {
      //                                                 ^^^^^
      //                                                   changed!
tweak_altint2(x)
print_altint(x)
#> MyAltInt([2, 4, 6])

x
#> [1] 2, 4, 6

This API is still experimental and I'm yet to find some nicer design. Feedback is really appreciated!

Linkage

Savvy compiles the Rust code into a static library and then use it to generate a DLL for the R package. There's one tricky thing about static library. The Rust's official document about linkage says

Note that any dynamic dependencies that the static library may have (such as dependencies on system libraries, or dependencies on Rust libraries that are compiled as dynamic libraries) will have to be specified manually when linking that static library from somewhere.

What does this mean? If some of the dependency crate needs linking to a native library, the necessary compiler flags are added by cargo. But, after creating the static library, cargo's turn is over. It's you who have to tell the linker the necessary flags because there's no automatic mechanism.

If some of the flags are missing, you'll see a "symbol not found" error. For example, this is what I got on macOS. Some dependency of my package uses the objc2 crate, and it needs to be linked against Apple's Objective-C frameworks.

 unable to load shared object '.../foo.so':
  dlopen(../foo.so, 0x0006): symbol not found in flat namespace '_NSAppKitVersionNumber'
Execution halted

So, how can we know the necessary flags? The official document provides a pro-tip!

The --print=native-static-libs flag may help with this.

You can add this option to src/Makevars.in and src/Makevars.win.in via RUSTFLAGS envvar. Please edit this line.

  # Add flags if necessary
- RUSTFLAGS = 
+ RUSTFLAGS = --print=native-static-libs

Then, you'll find this note in the installation log.

   Compiling ahash v0.8.11
   Compiling serde v1.0.210
   Compiling zerocopy v0.7.35

...snip...

note: Link against the following native artifacts when linking against this static library. The order and any duplication can be significant on some platforms.

note: native-static-libs: -framework CoreText -framework CoreGraphics -framework CoreFoundation -framework Foundation -lobjc -liconv -lSystem -lc -lm

    Finished `dev` profile [unoptimized + debuginfo] target(s) in 19.17s
   gcc -shared -L/usr/lib64/R/lib -Wl,-O1 -Wl,--sort-common -Wl,...
   installing to /tmp/RtmpvQv8Ur/devtools_install_...
   ** checking absolute paths in shared objects and dynamic libraries

You can copy these flags to cargo build. Please be aware that this differs on platforms, so you probably need to run this command on CI, not on your local. Also, since Linux and macOS requires different options, you need to tweak it in the configure script.

For example, here's my setup on the vellogd package.

./configure:

if [ "$(uname)" = "Darwin" ]; then
  FEATURES=""
  # result of --print=native-static-libs
  ADDITIONAL_PKG_LIBS="-framework CoreText -framework CoreGraphics -framework CoreFoundation -framework Foundation -lobjc -liconv -lSystem -lc -lm"
else
  FEATURES="--features use_winit"
fi

src/Makevars.in:

PKG_LIBS = -L$(LIBDIR) -lvellogd @ADDITIONAL_PKG_LIBS@

Comparison with extendr

What the hell is this?? Why do you need another framework when there's extendr?

extendr is great and ready to use, but it's not perfect in some points (e.g., error handling) and it's kind of stuck; extendr is too feature-rich and complex that no one can introduce a big breaking change easily. So, I needed to create a new, simple framework to experiment with. The main goal of savvy is to provide a simpler option other than extendr, not to be a complete alternative to extendr.

Pros and cons compared to extendr

Pros:

  • You can use Result for error handling instead of panic!
  • You can compile your package for webR (I hope extendr gets webR-ready soon)

Cos:

  • savvy prefers explicitness over ergonomics
  • savvy provides limited amount of APIs and might not fit for complex usages