Introduction
savvy is a simple R extension interface using Rust, like the
extendr framework. The name “savvy” comes
from the Japanese word “錆” (pronounced as sàbí
), which means “Rust”.
With savvy, you can automatically generate R functions from Rust code. This is an example of what a savvy-powered function would look like:
Rust
use savvy::savvy;
use savvy::NotAvailableValue; // for is_na() and na()
/// Convert to Upper-case
///
/// @param x A character vector.
/// @export
#[savvy]
fn to_upper(x: StringSexp) -> savvy::Result<savvy::Sexp> {
// Use `Owned{type}Sexp` to allocate an R vector for output.
let mut out = OwnedStringSexp::new(x.len())?;
for (i, e) in x.iter().enumerate() {
// To Rust, missing value is an ordinary value. In `&str`'s case, it's just "NA".
// You have to use `.is_na()` method to distinguish the missing value.
if e.is_na() {
// Set the i-th element to NA
out.set_na(i)?;
continue;
}
let e_upper = e.to_uppercase();
out.set_elt(i, e_upper.as_str())?;
}
out.into()
}
R
to_upper(c("a", "b", "c"))
#> [1] "A" "B" "C"
Examples
A toy example R package can be found in R-package/
directory.
Links
Thanks
Savvy is not quite unique. This project is made possible by heavily taking inspiration from other great projects:
- The basic idea is of course based on extendr. Savvy would not exist without extendr.
- cpp11's "writable" concept influenced the design a lot. Also, I learned a lot from the great implementation such as the protection mechanism.
- PyO3 made me realize that the FFI crate doesn't need to be a "sys" crate.
Get Started
Prerequisite
Rust
First of all, you need a Rust toolchain installed. You can follow the official instruction.
If you are on Windows, you need an additional step of installing
x86_64-pc-windows-gnu
target.
rustup target add x86_64-pc-windows-gnu
A helper R package
Then, install a helper R package for savvy.
install.packages(
"savvy",
repos = c("https://yutannihilation.r-universe.dev", "https://cloud.r-project.org")
)
Note that, under the hood, this is just a simple wrapper around savvy-cli
. So,
if you prefer shell, you can directly use the CLI instead, which is available on
the releases.
Create a new R package
First, create a new R package. usethis::create_package()
is convenient for
this.
usethis::create_package("path/to/foo")
Then, move to the package directory and generate necessary files like Makevars
and Cargo.toml
, as well as the C and R wrapper code corresponding to the Rust
code. savvy::savvy_init()
does this all (under the hood, this simply runs
savvy-cli init
).
Lastly, run devtools::document()
to generate NAMESPACE
and documents.
savvy::savvy_init()
devtools::document()
Now, this package is ready to install! After installing (e.g. by running "Install Package" on RStudio IDE), confirm you can run this example function that multiplies the first argument by the second argument.
library(<your package>)
int_times_int(1:4, 2L)
#> [1] 2 4 6 8
Package structure
After savvy::savvy_init()
, the structure of your R package should look like below.
.
├── .Rbuildignore
├── DESCRIPTION
├── NAMESPACE
├── R
│ └── 000-wrappers.R <-------(1)
├── configure <-------(2)
├── configure.win <-------(2)
├── cleanup <-------(2)
├── cleanup.win <-------(2)
├── foofoofoofoo.Rproj
└── src
├── Makevars.in <-------(2)
├── Makevars.win.in <-------(2)
├── init.c <-------(3)
├── <your package>-win.def <---(4)
└── rust
├── .cargo
│ └── config.toml <-------(4)
├── api.h <-------(3)
├── Cargo.toml <-------(5)
└── src
└── lib.rs <-------(5)
000-wrappers.R
: R functions for the corresponding Rust functionsconfigure*
,cleanup*
,Makevars.in
, andMakevars.win.in
: Necessary build settings for compiling Rust codeinit.c
andapi.h
: C functions for the corresponding Rust functions<your package>-win.def
and.cargo/config.toml
: These are tricks to avoid a minor error on Windows. See extendr/rextendr#211 and savvy#98 for the details.Cargo.toml
andlib.rs
: Rust code
Write your own function
The most revolutionary point of savvy::savvy_init()
is that it kindly leaves
the most important task to you; let's define a typical hello-world function for
practice!
Write some Rust code
Open src/rust/lib.rs
and add the following lines. r_println!
is the R
version of println!
macro.
/// @export
#[savvy]
fn hello() -> savvy::Result<()> {
savvy::r_println!("Hello world!");
Ok(())
}
Update wrapper files
Every time you modify or add some Rust code, you need to update the C and R
wrapper files by running savvy::savvy_update()
(under the hood, this simply
runs savvy-cli update
). Don't forget to run devtools::document()
as well.
savvy::savvy_update()
devtools::document()
After re-installing your package, you should be able to run the hello()
function on your R session.
hello()
#> Hello world!
Key Ideas
Treating external SEXP and owned SEXP differently
Savvy is opinionated in many points. Among these, one thing I think should be explained first is that savvy uses separate types for SEXP passed from outside and that created within Rust function. The former, external SEXP, is read-only, and the latter, owned SEXP, is writable. Here's the list:
R type | Read-only version | Writable version |
---|---|---|
INTSXP (integer) | IntegerSexp | OwnedIntegerSexp |
REALSXP (double) | RealSexp | OwnedRealSexp |
RAWSXP (raw) | RawSexp | OwnedRawSexp |
LGLSXP (logical) | LogicalSexp | OwnedLogicalSexp |
STRSXP (character) | StringSexp | OwnedStringSexp |
VECSXP (list) | ListSexp | OwnedListSexp |
EXTPTRSXP (external pointer) | ExternalPointerSexp | n/a |
CPLXSXP (complex)1 | ComplexSexp | OwnedComplexSexp |
Complex is optionally supported under feature flag complex
You might wonder why this is needed when we can just use mut
to distinguish
the difference of mutability. I mainly had two motivations for this:
- avoid unnecessary protection: an external SEXP are already protected by the caller, while an owned SEXP needs to be protected by ourselves.
- avoid unnecessary ALTREP checks: an external SEXP can be ALTREP, so it's better to handle them in ALTREP-aware way, while an owned SEXP is not.
This would be a bit lengthy, so let's skip here. You can read the details on my blog post. But, one correction is that I found the second reason might not be very important because a benchmark showed it's more efficient to be non-ALTREP-aware in most of the cases. Actually, the current implementation of savvy is non-ALTREP-aware for int, real, and logical (See #18).
No implicit conversions
Savvy doesn't provide conversion between types unless you do explicitly. For
example, you cannot supply a double vector to a function with a IntegerSexp
argument.
#[savvy]
fn identity_int(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
let mut out = OwnedIntegerSexp::new(x.len())?;
for (i, &v) in x.iter().enumerate() {
out[i] = v;
}
out.into()
}
identity_int(c(1, 2))
#> Error in identity_int(c(1, 2)) :
#> Unexpected type: Cannot convert double to integer
While you probably feel this is inconvenient, this is also a design decision. My concerns on supporting these conversion are
- Complexity. It would make savvy's spec and implemenatation complicated.
- Hidden allocation. Conversion requires a new allocation for storing the converted values, which might be unhappy in some cases.
So, you have to write some wrapper R function like below. This might feel a bit tiring, but, in general, please do not avoid writing R code. Since you are creating an R package, there's a lot you can do in R code instead of making things complicated in Rust code. Especially, it's easier on R's side to show user-friendly error messages.
identity_int_wrapper <- function(x) {
x <- vctrs::vec_cast(x, integer())
identity_int(x)
}
Alternatively, you can use NumericSexp
as input. This provides a method to
convert the input either to i32
or to f64
on the fly. For more details,
please read the section about NumericSexp
#[savvy]
fn identity_num(x: NumericSexp) -> savvy::Result<savvy::Sexp> {
let mut out = OwnedIntegerSexp::new(x.len())?;
for (i, &v) in x.iter_i32().enumerate() {
out[i] = v;
}
out.into()
}
#[savvy]
macro
This is a simple Rust function to add the specified suffix to the input
character vector. #[savvy]
macro turns this into an R function.
use savvy::NotAvailableValue; // for is_na() and na()
/// Add Suffix
///
/// @export
#[savvy]
fn add_suffix(x: StringSexp, y: &str) -> savvy::Result<savvy::Sexp> {
let mut out = OwnedStringSexp::new(x.len())?;
for (i, e) in x.iter().enumerate() {
if e.is_na() {
out.set_na(i)?;
continue;
}
out.set_elt(i, &format!("{e}_{y}"))?;
}
out.into()
}
Convention for a #[savvy]
function
The example function above has this signature.
fn add_suffix(x: StringSexp, y: &str) -> savvy::Result<savvy::Sexp>
As you can guess, #[savvy]
macro cannot be applied to arbitrary functions. The
function must satisfy the following conditions:
- The function's inputs can be
- a non-owned savvy type (e.g.,
IntegerSexp
andRealSexp
) - a corresponding Rust type for scalar (e.g.,
i32
andf64
) - a user-defined struct marked with
#[savvy]
(&T
,&mut T
, orT
) - a user-defined enum marked with
#[savvy]
(&T
, orT
) - any of above wrapped with
Option
(this is translated as an optional arg)
- a non-owned savvy type (e.g.,
- The function's return value must be either
savvy::Result<()>
for the case of no actual return valuesavvy::Result<savvy::Sexp>
for the case of some return value of R objectsavvy::Result<T>
for the case of some return value of a user-defined struct or enum marked with#[savvy]
How things work under the hood
If you mark a funtion with #[savvy]
macro, the corresponding implementations are generated:
- Rust functions
- a wrapper function to handle Rust and R errors gracefully
- a function with the original body and some conversion from raw
SEXP
s to savvy types.
- C function signature for the Rust function
- C implementation for bridging between R and Rust
- R implementation
For example, the above implementation generates the following codes. (#[savvy]
macro can also be used on struct
and enum
, but let's focus on function's
case for now for simplicity.)
Rust functions
(The actual code is a bit more complex to handle possible panic!
properly.)
#[allow(clippy::missing_safety_doc)]
#[no_mangle]
pub unsafe extern "C" fn savvy_add_suffix__ffi(x: SEXP, y: SEXP) -> SEXP {
match savvy_add_suffix_inner(x, y) {
Ok(result) => result.0,
Err(e) => savvy::handle_error(e),
}
}
unsafe fn savvy_add_suffix_inner(x: SEXP, y: SEXP) -> savvy::Result<savvy::Sexp> {
let x = <savvy::RealSexp>::try_from(savvy::Sexp(x))?;
let y = <&str>::try_from(savvy::Sexp(y))?;
// original function
add_suffix(x, y)
}
// original function
fn add_suffix(x: StringSexp, y: &str) -> savvy::Result<savvy::Sexp> {
// ..original body..
}
C function signature
SEXP savvy_add_suffix__ffi(SEXP c_arg__x, SEXP c_arg__y);
C implementation
(let's skip the details about handle_result
for now)
SEXP savvy_add_suffix__impl(SEXP c_arg__x, SEXP c_arg__y) {
SEXP res = savvy_add_suffix__ffi(c_arg__x, c_arg__y);
return handle_result(res);
}
R implementation
The Rust comments with three slashes (///
) is converted into Roxygen comments
on R code.
#' Add Suffix
#'
#' @export
add_suffix <- function(x, y) {
.Call(add_suffix__impl, x, y)
}
Using #[savvy]
on other files than lib.rs
You can use #[savvy]
macro just the same as lib.rs
. Since #[savvy]
automatically marks the functions necessary to be exposed as pub
, you don't
need to care about the visibility.
For exampple, if you define a function in src/foo.rs
,
#[savvy]
fn do_nothing() -> savvy::Result<()> {
Ok(())
}
just declaring mod foo
in src/lib.rs
is enough to make do_nothing()
available to R.
mod foo;
Handling Vector Input
Basic rule
As described in Key Ideas, the input SEXP is read-only. You cannot modify the values in place.
Methods
1. iter()
IntegerSexp
, RealSexp
, LogicalSexp
, and StringSexp
provide iter()
method so that you can access to the value one by one.
for (i, e) in x.iter().enumerate() {
// ...snip...
}
Similarly, NumericSexp
, which handles both integer and double, provides
iter_i32()
and iter_f64()
. But, this might allocate if the type conversion
is needed.
2. as_slice()
(for integer and double)
IntegerSexp
and RealSexp
can expose their underlying C array as a Rust slice
by as_slice()
.
/// @export
#[savvy]
fn foo(x: IntegerSexp) -> savvy::Result<()> {
some_function_takes_slice(x.as_slice());
Ok(())
}
Similarly, NumericSexp
, which handles both integer and double, provides
as_slice_i32()
and as_slice_f64()
. But, this might allocate if the type
conversion is needed.
3. to_vec()
As the name indicates, to_vec()
copies the values to a new Rust vector.
Copying can be costly for big data, but a vector is handy if you need to pass
the data around among Rust functions.
let mut v = x.to_vec();
some_function_takes_vec(v);
If a function requires a slice and the type is not integer or double, you have
no choice but to_vec()
to create a new vector and then convert it to a slice.
let mut v = x.to_vec();
another_function_takes_slice(&v);
Missing values
There's no concept of "missing value" on the corresponding types of Rust
. So,
it looks a normal value to Rust's side.
The good news is that R uses the sentinel values to represent NA
, so it's
possible to check if a value is NA
to R in case the type is either i32
,
f64
or &str
.
i32
: The minimum value ofint
is used for representingNA
.f64
: A special value is used for representingNA
.&str
: ACHARSXP
of string"NA"
is used for representingNA
; this cannot be distinguished by comparing the content of the string, but we can compare the pointer address of the underlying Cchar
array.
By using NotAvailableValue
trait, you can check if the value is NA
by
is_na()
, and refer to the sentinel value of NA
by <T>::na()
. If you care
about missing values, you always have to have an if
branch for missing values
like below.
use savvy::NotAvailableValue;
/// @export
#[savvy]
fn sum(x: RealSexp) -> savvy::Result<savvy::Sexp> {
let mut sum: f64 = 0.0;
for e in x.iter() {
if !e.is_na() {
sum += e;
}
}
...snip...
}
The bad news is that bool
is not the case. bool
doesn't have is_na()
or
na()
. NA
is treated as TRUE
without any errors. So, you have to make sure
the input doesn't contain any missing values on R's side. For example, this
function is not an identity function.
/// @export
#[savvy]
fn identity_logical(x: LogicalSexp) -> savvy::Result<savvy::Sexp> {
let mut out = OwnedLogicalSexp::new(x.len())?;
for (i, e) in x.iter().enumerate() {
out.set_elt(i, e)?;
}
out.into()
}
identity_logical(c(TRUE, FALSE, NA))
#> [1] TRUE FALSE TRUE
The good news is that LogicalSexp
has an expert-only method as_slice_raw()
.
See "Logical" section of Integer, Real, String, Logical, And Complex
for the details.
Handling Vector Output
Basically, there are two ways to prepare an output to the R session.
1. Create a new R object first and put values on it
An owned SEXP can be allocated by using Owned{type}Sexp::new()
. new()
takes
the length of the vector as the argument. If you need the same length of vector
as the input, you can pass the len()
of the input SEXP
.
new()
returns Result
because the memory allocation can fail in case when the
vector is too large. You can probably just add ?
to it to handle the error.
let mut out = OwnedStringSexp::new(x.len())?;
Use set_elt()
to put the values one by one. Note that you can also assign
values like out[i] = value
for integer and double. See Type-specific
Topics for more details.
for (i, e) in x.iter().enumerate() {
// ...snip...
out.set_elt(i, &format!("{e}_{y}"))?;
}
You can use set_na()
to set the specified element as NA. For example, it's a
common case to use this in order to propagate the missingness like below.
for (i, e) in x.iter().enumerate() {
// ...snip...
if e.is_na() {
out.set_na(i)?;
} else {
// ...snip...
}
}
After putting the values to the vector, you can convert it to Result<Sexp>
by
into()
.
/// @export
#[savvy]
fn foo() -> savvy::Result<savvy::Sexp> {
let mut out = OwnedStringSexp::new(x.len())?;
// ...snip...
out.into()
}
2. Convert a Rust vector by methods like try_into()
Another way is to use a Rust vector to store the results and convert it to an R
object at the end of the function. This is also fallible because this anyway
needs to create a new R object under the hood, which can fail. So, this time,
the conversion is try_into()
, not into()
.
// Let's not consider for handling NAs at all for simplicity...
/// @export
#[savvy]
fn times_two(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
let out: Vec<i32> = x.iter().map(|v| v * 2).collect();
out.try_into()
}
Note that, while this looks handy, this might not be very efficient; for example,
times_two()
above allocates a Rust vector, and then copy the values into a new
R vector in try_into()
. The copying cost can be innegligible when the vector
is very huge.
try_from_slice()
The same conversions are also available in the form of
Owned{type}Sexp::try_from_slice()
. While this says "slice", this accepts
AsRef<[T]>
, which means both Vec<T>
and &[T]
can be used.
For converting the return value, probably try_from()
is shorter in most of the
cases. But, sometimes you might find this useful (e.g., the return value is a
list and you need to construct the elements of it).
/// @export
#[savvy]
fn times_two2(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
let out: Vec<i32> = x.iter().map(|v| v * 2).collect();
let out_sexp: OwnedIntegerSexp::try_from_slice(out);
out_sexp.into()
}
try_from_iter()
If you only have an iterator, try_from_iter()
is more efficient. This example
function is the case. The previous examples first collect()
ed into a Vec
,
but it's not necessary in theory.
/// @export
#[savvy]
fn times_two3(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
let iter = x.iter().map(|v| v * 2);
let out_sexp: OwnedIntegerSexp::try_from_iter(iter);
out_sexp.into()
}
Note that, if you already have a slice or vec, you should use try_from_slice()
instead of calling iter()
on the slice or vec and using try_from_iter()
. In
such cases, try_from_slice()
is more performant for integer, double, and
complex because it just copies the underlying memory into SEXP rather than
handling the elements one by one.
Handling Scalar
Input
Scalar inputs are handled transparently. The corresponding types are shown in the table below.
/// @export
#[savvy]
fn scalar_input_int(x: i32) -> savvy::Result<()> {
savvy::r_println!("{x}");
Ok(())
}
R type | Rust scalar type |
---|---|
integer | i32 |
double | f64 |
logical | bool |
raw | u8 |
character | &str |
complex | num_complex::Complex64 |
integer or double | savvy::NumericScalar |
NumericScalar
NumericScalar
is a special type that can handle both integeer and double. You
can get the value from it by as_i32()
for i32
, or as_f64()
for f64
.
These method converts the value if the input type is different from the target
type.
#[savvy]
fn times_two_numeric_i32_scalar(x: NumericScalar) -> savvy::Result<Sexp> {
let v = x.as_i32()?;
if v.is_na() {
(i32::na()).try_into()
} else {
(v * 2).try_into()
}
}
Note that, while as_f64()
is infallible, as_i32()
can fail when the
conversion is from f64
to i32
and
- the value is
Inf
or-Inf
- the value is out of range for
i32
- the value is not integer-ish (e.g.
1.1
)
For convenience, NumericScalar
also provides a conversion to usize by
as_usize()
. What's good is that this can handle integer-ish numeric, which
means you can allow users to input a larger number than the integer max
(2147483647)!
fn usize_to_string_scalar(x: NumericScalar) -> savvy::Result<Sexp> {
let x_usize = x.as_usize()?;
x_usize.to_string().try_into()
}
usize_to_string_scalar(2147483648)
#> [1] "2147483648"
Output
Just like a Rust vector, a Rust scalar value can be converted into Sexp
by
try_from()
. It's as simple as.
/// @export
#[savvy]
fn scalar_output_int() -> savvy::Result<savvy::Sexp> {
1.try_into()
}
Alternatively, the same conversion is available in the form of
Owned{type}Sexp::try_from_scalar()
.
/// @export
#[savvy]
fn scalar_output_int() -> savvy::Result<savvy::Sexp> {
let out = OwnedIntegerSexp::try_from_scalar(1)?;
out.into()
}
Missing values
If the type of the input is scalar, NA
is always rejected. This is
inconsistent with the rule for vector input, but, this is my design decision in
the assumption that a scalar missing value is rarely found useful on Rust's
side.
/// @export
#[savvy]
fn identity_logical_single(x: bool) -> savvy::Result<savvy::Sexp> {
let mut out = OwnedLogicalSexp::new(1)?;
out.set_elt(0, x)?;
out.into()
}
identity_logical_single(NA)
#> Error in identity_logical_single(NA) :
#> Must be length 1 of non-missing value
Optional Argument
To represent an optional argument, you can wrap it with Option
. Then, the
corresponding R function sets the default value of NULL
on the argument.
#[savvy]
fn default_value_vec(x: Option<IntegerSexp>) -> savvy::Result<Sexp> {
if let Some(x) = x {
x.iter().sum::<i32>().try_into()
} else {
(-1).try_into()
}
}
function(x = NULL) {
.Call(savvy_default_value_vec__impl, x)
}
This function works with or without the argument.
default_value_vec(1:10)
#> [1] 55
default_value_vec()
#> [1] -1
Type-specific Topics
You can use these types as an argument of a #[savvy]
function.
R type | vector | scalar |
---|---|---|
integer | IntegerSexp | i32 |
double | RealSexp | f64 |
integer or double | NumericSexp | NumericScalar |
logical | LogicalSexp | bool |
raw | RawSexp | u8 |
character | StringSexp | &str |
complex1 | ComplexSexp | Complex64 |
list | ListSexp | n/a |
(any) | Sexp | n/a |
Complex is optionally supported under feature flag complex
If you want to handle multiple types, you can cast an Sexp
into a specific
type by .into_typed()
and write match
branches to deal with each type. This
is important when the interface returns Sexp
. For example, ListSexp
returns
Sexp
because the list element can be any type. For more details about List
,
please read List section.
#[savvy]
fn print_list(x: ListSexp) -> savvy::Result<()> {
for (k, v) in x.iter() {
let content = match v.into_typed() {
TypedSexp::Integer(x) => {
format!(
"integer [{}]",
x.iter().map(|i| i.to_string()).collect::<Vec<String>>().join(", ")
)
}
TypedSexp::Real(x) => {
format!(
"double [{}]",
x.iter().map(|r| r.to_string()).collect::<Vec<String>>().join(", ")
)
}
TypedSexp::Logical(x) => {
format!(
"logical [{}]",
x.iter().map(|l| if l { "TRUE" } else { "FALSE" }).collect::<Vec<&str>>().join(", ")
)
}
TypedSexp::String(x) => {
format!(
"character [{}]",
x.iter().collect::<Vec<&str>>().join(", ")
)
}
TypedSexp::List(_) => "list".to_string(),
TypedSexp::Null(_) => "NULL".to_string(),
_ => "other".to_string(),
};
let name = if k.is_empty() { "(no name)" } else { k };
r_print!("{name}: {content}\n");
}
Ok(())
}
Likewise, NumericSxep
also provides into_typed()
. You can match it with
either IntegerSexp
or RealSexp
and apply an appropriate function.
Alternatively, you can rely on the type conversion that NumericSexp
provides.
See more details in the next section.
#[savvy]
fn identity_num(x: NumericSexp) -> savvy::Result<savvy::Sexp> {
match x.into_typed() {
NumericTypedSexp::Integer(i) => identity_int(i),
NumericTypedSexp::Real(r) => identity_real(r),
}
}
Integer, Real, String, Logical, Raw, And Complex
Integer and real
In cases of integer (IntegerSexp
, OwnedIntegerSexp
) and real (RealSexp
,
OwnedRealSexp
), the internal representation of the SEXPs match with the Rust
type we expect, i.e., i32
and f64
. By taking this advantage, these types has
more methods than other types:
as_slice()
andas_mut_slice()
Index
andIndexMut
- efficient
TryFrom<&[T]>
as_slice()
and as_mut_slice()
These types can expose its underlying C array as a Rust slice by as_slice()
.
as_mut_slice()
is available only for the owned versions. So, you don't need to
use to_vec()
to create a new vector just to pass the data to the function that
requires slice.
/// @export
#[savvy]
fn foo(x: IntegerSexp) -> savvy::Result<()> {
some_function_takes_slice(x.as_slice());
Ok(())
}
Index
and IndexMut
You can also access to the underlying data by [
. These methods are available
only for the owned versions. This means you can write assignment operation like
below instead of set_elt()
.
/// @export
#[savvy]
fn times_two(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
let mut out = OwnedIntegerSexp::new(x.len())?;
for (i, &v) in x.iter().enumerate() {
if v.is_na() {
out[i] = i32::na();
} else {
out[i] = v * 2;
}
}
out.into()
}
Efficient TryFrom<&[T]>
TryFrom<&[T]>
is not special to real and integer, but the implementation is
different from that of logical and string; since the internal representations
are the same, savvy uses copy_from_slice()
, which does a
memcpy
, to copy the data efficently (in logical and string case, the values
are copied one by one).
NumericSexp
It's ideal to ensure the function takes the expected type on R's side (e.g., you
can use vctrs::vec_cast()
, or define S3 methods for integer and double
separately). But, it's not always possible.
You can use NumericSexp
to accept both real and integer. NumericSexp
provides a method to get either i32
or f64
values:
as_slice_i32()
returns&[i32]
. This is fallible.as_slice_f64()
returns&[f64]
.iter_i32()
returns an iterator ofResult<i32>
.iter_f64()
returns an iterator off64
.
These functions return the underlying data directly if the type is the same as
wanted, otherwise converts the values. If the conversion is from f64
to i32
,
it fails when any of the values is
Inf
or-Inf
- out of range for
i32
- not integer-ish (e.g.
1.1
)
For convenience, NumericSexp
also provides iter_usize()
, which returns an
iterator of Result<usize>
.
With NumericSexp
, you can rewrite the above times_two
function like this:
#[savvy]
fn times_two(x: NumericSexp) -> savvy::Result<Sexp> {
let mut out = OwnedIntegerSexp::new(x.len())?;
for (i, v) in x.iter_i32().enumerate() {
let v = v?;
if v.is_na() {
out[i] = i32::na();
} else {
out[i] = v * 2;
}
}
out.into()
}
Alternatively, you can use .into_typed()
and match
the result to apply an
appropriate function depneding on the type. In this case, you need to define two
different functions, but this might be useful when the logic is very different
for integer values and real values.
#[savvy]
fn times_two(x: NumericSexp) -> savvy::Result<savvy::Sexp> {
match x.into_typed() {
NumericTypedSexp::Integer(i) => times_two_int(i),
NumericTypedSexp::Real(r) => times_two_real(r),
}
}
Logical
While logical is 3-state (TRUE
, FALSE
and NA
) on R's side, bool
can
represent only 2 states (true
and false
). This mismatch is a headache. There
are many possible ways to handle this (e.g., use Option<bool>
), but savvy
chose to convert NA
to true
silently, assuming NA
is not useful on Rust's
side anyway. So, you have to make sure the input logical vector doesn't contain
NA
on R's side. For example,
wrapper_of_some_savvy_fun <- function(x) {
out <- rep(NA, length(x))
idx <- is.na(x)
# apply function only non-NA elements
out[x] <- some_savvy_fun(x[idx])
out
}
If you really want to handle the 3 states, use an expert-only method
as_slice_raw()
. This returns &[i32]
instead of &[bool]
. Why i32
? It's
the internal representation of a logical vector, which is the same as an integer
vector. By treating the data as i32
, you can use is_na()
.
use savvy::NotAvailableValue; // for is_na()
/// @export
#[savvy]
fn flip_logical_expert_only(x: LogicalSexp) -> savvy::Result<savvy::Sexp> {
let mut out = OwnedLogicalSexp::new(x.len())?;
for (i, e) in x.as_slice_raw().iter().enumerate() {
if e.is_na() {
out.set_na(i)?;
} else {
out.set_elt(i, *e != 1)?; // 1 means TRUE
}
}
out.into()
}
flip_logical_expert_only(c(TRUE, FALSE, NA))
#> [1] TRUE FALSE NA
String
STRSXP
is a vector of CHARSXP
, not something like *char
. So, it's not
possible to expose the internal representation as &str
. So, it requires
several R's C API calls. To get a &str
STRING_ELT()
to subset aCHARSXP
R_CHAR()
to extract the string fromCHARSXP
Similarly, to set a &str
Rf_mkCharLenCE()
to convert&str
to aCHARSEXP
SET_STRING_ELT()
to put theCHARSXP
to theSTRSXP
This is a bit costly. So, if the strings need to be referenced and updated
frequently, probably you should avoid using OwnedStringSexp
as a substitute of
Vec<String>
.
Encoding and 'static
lifetime
While Rust's string is UTF-8, R's string is not guaranteed to be UTF-8. R
provides Rf_translateCharUTF8()
to convert the string to UTF-8. However, savvy
chose not to use it. There are two reasons:
- As of version 4.2.0, R uses UTF-8 as the native encoding even on Windows systems. While old Windows systems are not the case, I bravely assumes it's rare and time will solve.
- The result of
R_CHAR()
is the string stored inR_StringHash
, the globalCHARSXP
cache. In my understanding, this will never be removed during the session. So, this allows savvy to mark the result&str
with'static
lifetime. However, the result ofRf_translateCharUTF8()
is on anR_alloc()
-ed memory (code), which can be claimed by GC.
In short, in order to stick with 'static
lifetime for the sake of simplicity,
I decided to neglect relatively-rare case. Note that, invalid UTF-8 charactars
are rejected (= currently, silently replaced with ""
) by CStr
, so it's not
very unsafe.
Raw
A raw vector is the sequence of u8
, which can be used for representing various
binary data. But, please be aware that you can use a Rust struct (see the
section about struct) to store the data instead of copying the
whole data into R's memory.
Complex
Complex is optionally supported under feature flag complex
. If it's enabled,
you can use ComplexSexp
and OwnedComplexSexp
to use a complex vector for
input or output, and you can extract the slice of num_complex::Complex64
from
it.
/// @export
#[savvy]
fn abs_complex(x: savvy::ComplexSexp) -> savvy::Result<savvy::Sexp> {
let mut out = savvy::OwnedRealSexp::new(x.len())?;
for (i, c) in x.iter().enumerate() {
if !c.is_na() {
out[i] = (c.re * c.re + c.im * c.im).sqrt();
} else {
out.set_na(i)?;
}
}
out.into()
}
List
List is a different beast. It's pretty complex. You might think of it as a
HashMap
, but it's different in that:
- List elements can be either named or unnamed individually (e.g.,
list(a = 1, 2, c = 3)
). - List names can be duplicated (e.g.,
list(a = 1, a = 2)
).
To make things simple, savvy treats a list as a pair of the same length of
- a character vector containing names, using
""
(empty string) to represent missingness (actually, this is the convention of R itself) - a collection of arbitrary
SEXP
elements
Since list is a very convenient data structure in R, you can come up with a lot of convenient interfaces for list. However, savvy intentionally provides only very limited interfaces. In my opinion, Rust should touch list data as little as possible because it's too complex.
Read values from a list
names_iter()
names_iter()
returns an iterator of &str
.
/// @export
#[savvy]
fn print_list_names(x: ListSexp) -> savvy::Result<()> {
for k in x.names_iter() {
if k.is_empty() {
r_println!("(no name)");
} else {
r_println!(k);
}
r_println!("");
}
Ok(())
}
print_list_names(list(a = 1, 2, c = 3))
#> a
#> (no name)
#> c
values_iter()
values_iter()
returns an iterator of Sexp
enum. You can convert Sexp
to
TypedSexp
by .into_typed()
and then use match
to extract the inner data.
/// @export
#[savvy]
fn print_list_values_if_int(x: ListSexp) -> savvy::Result<()> {
for v in x.values_iter() {
match v.into_typed() {
TypedSexp::Integer(i) => r_println!("int {}\n", i.as_slice()[0]),
_ => r_println("not int")
}
}
Ok(())
}
print_list_values_if_int(list(a = 1, b = 1L, c = "1"))
#> not int
#> int 1
#> not int
iter()
If you want pairs of name and value, you can use iter()
. This is basically a
std::iter::Zip
of the two iterators explained above.
/// @export
#[savvy]
fn print_list(x: ListSexp) -> savvy::Result<()> {
for (k, v) in x.iter() {
// ...snip...
}
Ok(())
}
Put values to a list
new()
OwnedListSexp
's new()
is different than other types; the second argument
(named
) indicates whether the list is named or unnamed. If false
, the list
doesn't have name and all operations on name like set_name()
are simply
ignored.
set_name()
set_name()
simply sets a name at the specified position.
/// @export
#[savvy]
fn list_with_no_values() -> savvy::Result<savvy::Sexp> {
let mut out = OwnedListSexp::new(2, true)?;
out.set_name(0, "foo")?;
out.set_name(1, "bar")?;
out.into()
}
list_with_no_values()
#> $foo
#> NULL
#>
#> $bar
#> NULL
#>
set_value()
set_value()
sets a value at the specified position. "Value" is an arbitrary
type that implmenents Into<Sexp>
trait. Since all {type}Sexp
types
implements it, you can simply pass it like below.
/// @export
#[savvy]
fn list_with_no_names() -> savvy::Result<savvy::Sexp> {
let mut out = OwnedListSexp::new(2, false)?;
let mut e1 = OwnedIntegerSexp::new(1)?;
e1[0] = 100;
let mut e2 = OwnedStringSexp::new(1)?;
e2.set_elt(0, "cool")?;
out.set_value(0, e1)?;
out.set_value(1, e2)?;
out.into()
}
list_with_no_names()
#> [[1]]
#> [1] 100
#>
#> [[2]]
#> [1] "cool"
#>
set_name_and_value()
set_name_and_value()
is simply set_name()
+ set_value()
. Probably this is
what you need in most of the cases.
/// @export
#[savvy]
fn list_with_both() -> savvy::Result<savvy::Sexp> {
let mut out = OwnedListSexp::new(2, true)?;
let mut e1 = OwnedIntegerSexp::new(1)?;
e1[0] = 100;
let mut e2 = OwnedStringSexp::new(1)?;
e2.set_elt(0, "cool")?;
out.set_name_and_value(0, "foo", e1)?;
out.set_name_and_value(1, "bar", e2)?;
out.into()
}
list_with_both()
#> $foo
#> [1] 100
#>
#> $bar
#> [1] "cool"
#>
Struct
Basic usage
You can use #[savvy]
macro on a struct
to convert it to an R object. More
precisely, this macro adds implementations of TryFrom
between Sexp
and the
struct so you can specify the type as the function input and output.
/// @export
#[savvy]
struct Person {
pub name: String,
}
The most handy form is to implement methods and associated functions for the
type. You can add #[savvy]
before the impl
block to make it available on R
sessions.
#[savvy]
impl Person {
fn new() -> Self {
Self {
name: "".to_string(),
}
}
fn set_name(&mut self, name: &str) -> savvy::Result<()> {
self.name = name.to_string();
Ok(())
}
fn name(&self) -> savvy::Result<savvy::Sexp> {
let mut out = OwnedStringSexp::new(1)?;
out.set_elt(0, &self.name)?;
out.into()
}
fn say_hello() -> savvy::Result<savvy::Sexp> {
"Hello!".try_into()
}
}
If we focus on the arguments, there are two types of functions here:
- method: the first argument is
self
1 (set_name()
andname()
) - associated function: no
self
argument (new()
andsay_hello()
)
You should almost always use &self
or &mut self
, not self
, except
when you are an expert and your intention is really to comsume it. Let's
discuss later.
On an R session, associated functions are available as the element of the same
name of R object as the Rust type (in this case, Person
).
p <- Person$new()
Person$say_hello()
#> [1] "Hello"
Among these two associated functions, new()
is a constructor which returns
Self
. This creates an instance of the struct.
The instance has the methods. You can call them like below.
# create an instance
p <- Person$new()
# call methods
p$set_name("たかし")
p$name()
#> [1] "たかし"
The instance has the same name of S3 class as the Rust type, so you can implement
S3 methods such as print.<your struct>()
if necessary.
class(p)
#> [1] "Person"
# register print() S3 method for Person
print.Person <- function(x, ...) print(x$name())
registerS3method("print", "Person", print.Person)
p
#> たかし
Struct output
The above example uses -> Self
as the return type of the associated function,
but it's not the only specification. You can wrap it with savvy::Result<Self>
.
#[savvy]
impl Person {
fn new_fallible() -> savvy::Result<Self> {
let x = Self {
name: "".to_string(),
};
Ok(x)
}
}
More generally, you can specify an arbitrary struct marked with #[savvy]
as
the return type. For example, you can create an instance of the struct outside
of impl
,
/// @export
#[savvy]
fn create_person() -> savvy::Result<Person> {
let x = Self {
name: "".to_string(),
};
Ok(x)
}
and you can generate another type of instance from an instance.
/// @export
#[savvy]
struct UpperPerson {
pub name: String,
}
#[savvy]
impl Person {
fn reborn_as_upper_person(&self) -> savvy::Result<UpperPerson> {
let x = UpperPerson {
name: self.name.to_uppercase(),
};
Ok(x)
}
}
Struct input
You can also use the struct as the argument of a #[savvy]
-ed function. Note
that, in most of the cases, you should specify &T
or &mut T
, not T
.
/// @export
#[savvy]
fn get_name_external(x: &Person) -> savvy::Result<savvy::Sexp> {
x.name()
}
get_name_external(x)
#> [1] "たかし"
&T
vs T
If you are familiar with Rust, you should know the difference. T
moves the
ownership while &T
is just borrowing. But, what does this matter savvy? What
happens in actual when you specify T
in a #[savvy]
function?
Say, you mistyped &Person
above as Person
like this:
/// @export
#[savvy]
fn get_name_external2(x: Person) -> savvy::Result<savvy::Sexp> {
x.name()
}
This function works the same as the previous one. The result of the first call is the same. Yay!
get_name_external2(p)
#> [1] "たかし"
Then, what's wrong? You'll find it when you call the function on the same object second time; it doesn't work anymore.
get_name_external2(p)
#> Error: This external pointer is already consumed or deleted
This is because the Person
object is already moved. The R variable p
doesn't
hold the ownership anymore. So, you should almost always specify &T
(or &mut T
),
not T
.
The same is true for a method. Use &self
and &mut self
instead of self
unless you want such a method like this!
#[savvy]
impl Person {
fn invalidate(self) -> savvy::Result<()> {
r_println!("This instance is invalidated!");
Ok(())
}
}
When is T
useful?
You might wonder why savvy allows this specification at all. Are there any cases when this is useful?
The answer is yes. The advantage of moving the ownership is that you can avoid
copying. For example, consider there's a type HeavyData
, which contains huge
size of data, and HeavyDataBundle
which bundles two HeavyData
s.
/// @export
#[savvy]
#[derive(Clone)]
struct HeavyData(Vec<i32>);
/// @export
#[savvy]
struct HeavyDataBundle {
data1: HeavyData,
data2: HeavyData,
}
#[savvy]
impl HeavyData {
// ...snip...
}
HeavyDataBundle
requires the ownership of the DataBundle
s. So, if the input
is &
, you need to clone()
the data, which can be costly.
/// @export
#[savvy]
impl HeavyDataBundle {
fn new(
data1: &HeavyData,
data2: &HeavyData,
) -> Self {
Self {
data1: data1.clone(),
data2: data2.clone(),
}
}
}
In this case, you can move the ownership to avoid copying.
/// @export
#[savvy]
impl HeavyDataBundle {
fn new(
data1: HeavyData,
data2: HeavyData,
) -> Self {
Self { data1, data2 }
}
}
Of course, this is an expert-only usage and is rarely needed. Again, you should
almost always use &T
or &mut T
instead of T
. If you are really sure it
doesn't work well, you can use T
.
Lifetime
#[savvy]
macro doesn't support a struct with lifetimes. This is because
crossing the boundary of FFI means losing the track of the lifetimes.
For example, the struct below contains a reference to a variable of usize
.
However, once an instance of Foo
is passed to R's side, Rust cannot know
whether the variable is still alive when Foo
is passed back to Rust's side.
struct Foo<'a>(&'a usize)
Then, what should we do to deal with such structs? I'm yet to find the best practices, but you might be able to
- use
'static
lifetime (i.e.struct Foo(&'static usize)
) probably by referencing a global variable - instead of passing the struct itself to R, store the struct in a global
OnceCell<HashMap>
and pass the key
External pointer?
Under the hood, the Person
struct is stored in EXTPTRSXP
. But, you don't
need to care about how to deal with EXTPTRSXP
. This is because it's stored in
a closure environment on creation and never exposed to the user. As it's
guaranteed on R's side that self
is always a EXTPTRSXP
of Person
, Rust
code just restore a Person
instance from the EXTPTRSXP
without any checks.
.savvy_wrap_Person <- function(ptr) {
e <- new.env(parent = emptyenv())
e$.ptr <- ptr
e$set_name <- Person_set_name(ptr)
e$name <- Person_name(ptr)
class(e) <- "Person"
e
}
Person <- new.env(parent = emptyenv())
Person$new <- function() {
.savvy_wrap_Person(.Call(Person_new__impl))
}
Person$say_hello <- function() {
.Call(Person_say_hello__impl)
}
Person_set_name <- function(self) {
function(name) {
invisible(.Call(Person_set_name__impl, self, name))
}
}
Person_name <- function(self) {
function() {
.Call(Person_name__impl, self)
}
}
It's important to mention that savvy only wraps the EXTPTRSXP
in a closure
environment when the type is used directly as the returning type of the function.
If the user wants to return Person
inside a List
, for example, the external
pointer will be directly exposed to the user and it will be the user's responsibility
to deal with it.
#[savvy]
struct Person {}
// This case savvy handles nicely.
/// @export
#[savvy]
impl Person {
fn new() -> savvy::Result<Person> {
Ok(Person {})
}
}
// In this case, the user is handled an external pointer.
/// @export
#[savvy]
fn create_list() -> savvy::Result<Sexp> {
let mut list = OwnedListSexp::new(1, false)?;
let person = Person {};
list.set_value(0, Sexp::try_from(person)?)?;
list.into()
}
in R:
> person = Person$new()
> print(person)
<environment: 0x0000027cf9d46a20>
attr(,"class")
[1] "Person"
> l = create_list()
> print(l)
[[1]]
<pointer: 0x0000000000000001>
Traps about protection
This is a bit advanced topic. It's okay to have a struct to contain arbitrary
things, however, if you want to pass an SEXP
from an R session, it's your
responsibility to take care of the protection on it.
The SEXP
passed from outside doesn't need an additional protection at the time
of the function call because it belongs to some environment on R session, which
means it's not GC-ed accidentally. However, after the function call, it's
possible the SEXP
loses its link to any other R objects. To prevent the
tragedy (i.e., R session crash), you should create a owned version and copy the
values into it because savvy takes care of the protection on it. So, in short,
you should never define such a struct like this:
struct Foo {
a: IntegerSexp
}
Instead, you should write
struct Foo {
a: OwnedIntegerSexp
}
Enum
Savvy supports fieldless enum to express the possible options for a
parameter. For example, if you define such an enum with #[savvy]
,
/// @export
#[savvy]
enum LineType {
Solid,
Dashed,
Dotted,
}
it will be available on R's side as this.
LineType$Solid
LineType$Dashed
LineType$Dotted
You can use the enum type as the argument of such a function like this
/// @export
#[savvy]
fn plot_line(x: IntegerSexp, y: IntegerSexp, line_type: &LineType) -> savvy::Result<()> {
match line_type {
LineType::Solid => {
...
},
LineType::Dashed => {
...
},
LineType::Dotted => {
...
},
}
}
so that the users can use it instead of specifying it by an integer or a character, which might be mistyped.
plot_line(x, y, LineType$Solid)
Of course, you can archive the same thing with i32
or &str
as the input and
match the value. The difference is that enum is typo-proof. But, you might feel
it more handy to use a plain integer or character.
/// @export
#[savvy]
fn plot_line(x: IntegerSexp, y: IntegerSexp, line_type: &str) -> savvy::Result<()> {
match line_type {
"solid" => {
...
},
"dashed" => {
...
},
"dotted" => {
...
},
_ => {
return Err(savvy_err!("Unsupported line type!"));
}
}
}
Limitation
As noted above, savvy supports only fieldless enum for simplicity. If you want to use an enum that contains some value, please wrap it with struct.
// You don't need to mark this with #[savvy]
enum AnimalEnum {
Dog(String, f64),
Cat { name: String, weight: f64 },
}
/// @export
#[savvy]
struct Animal(AnimalEnum);
Also, savvy currently doesn't support discriminants. For example, this one won't compile.
/// @export
#[savvy]
enum HttpStatus {
Ok = 200,
NotFound = 404,
}
Error handling
To propagate your errors to the R session, you can return a savvy::Error
.
savvy_err!()
macro is a shortcut of savvy::Error::new(format!(...))
to
create a new error.
use savvy::savvy_err;
#[savvy]
fn raise_error() -> savvy::Result<savvy::Sexp> {
Err(savvy_err!("This is my custom error"))
}
raise_error()
#> Error: This is my custom error
Like anyhow, you can use ?
to easily propagate any error that implements the
std::error::Error
trait.
#[savvy]
fn no_such_file() -> savvy::Result<()> {
let _ = std::fs::read_to_string("no_such_file")?;
Ok(())
}
Custom error
If you want to implement your own error type and the conversion to
savvy::Error
, it would conflict with the conversion of From<dyn std::error::Error>
.
To avoid an compile error, please sepcify use-custom-error
feature to opt-out
the conversion.
savvy = { version = "...", features = ["use-custom-error"] }
Show a warning
To show a warning, you can use r_warn()
.
savvy::io::r_warn("foo")?;
Note that, a warning can raise error when options(warn = 2)
, so you should not
ignore the error from r_warn()
. The error should be propagated to the R
session.
Dealing with panic!
First of all, don't use panic!
If you are familiar with extendr, you might get used to use panic!
casually.
But, in the savvy framework, panic!
crashes your R session. So, please don't
use panic!
directly. Also, please avoid operations that can cause panic!
(e.g., unrwap()
) when you are unsure.
This is because, in Rust, the meaning of panic!
is an unrecoverable
error. In theory, it's a sign that something impossible happens and
there's no hope of recovery so there should be no way but to terminate the
entire session. Savvy just respects what is supposed to happen.
But, if the session terminates immediately, it's hard to investigate the cause. What can I do?
Use debug build
If DEBUG
envvar is set to true
on building (i.e., devtools::load_all()
),
savvy catches panic!
and shows the backtrace instead of crashing the R
session.
For example, if you write this Rust function and load it by devtools::load_all()
,
#[savvy]
fn must_panic() -> savvy::Result<()> {
let x = &[1];
let _ = x[1]; // Rust's index starts from 0!
Ok(())
}
you'll see such an error like this with a backtrace instead of the RStudio bomb icon. You can check the line of the file suggested in the error message to guess what was happening.
must_panic()
#> panic occured!
#>
#> Original message:
#> panicked at src\error_handling.rs:33:13:
#> index out of bounds: the len is 1 but the index is 1
#>
#> Backtrace:
#> ...
#> 18: std::panic::catch_unwind
#> at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04\library\std\src/panic.rs:142:14
#> 19: simple_savvy::error_handling::savvy_must_panic_inner
#> at .\src\rust\src\error_handling.rs:30:1
#> 20: must_panic
#> at .\src\rust\src\error_handling.rs:30:1
#> 21: must_panic__impl
#> at .\src\init.c:291:16
#> ...
#>
#> note: Run with `RUST_BACKTRACE=1` for a full backtrace.
#>
#>
#> Error: panic happened
Set panic="unwind"
As described above, panic!
is an unrecoverable error. It should not be
recovered on the release build in principle.
That said, in some cases, panic!
happens from the code out of your control.
For example, if it is thrown by some of the dependency crates, there's litte you
can do. You should report the author about the problem, but it's not always the
behavior is fixed immediately and the fixed version is published. Also, keep in
mind that depending on what originates the error, some authors can deliberately
prefer to use panic!
instead of Result
.
Note that panic!
also happens in rust std library in situations such as division
by zero or out-of-bounds error when indexing a Vec
.
In such cases, you can change the following setting included in the template
Cargo.toml
generated by savvy-cli init
. Set this to panic = "unwind"
to gracefully convert a panic into an R error just like the debug build.
Note that the backtrace is not available on the release build because
there's no debug info.
[profile.release]
# ...snip...
panic = "unwind"
Handling Attributes
You sometimes need to deal with attributes like names
and class
. Savvy
provides the following methods for getting and setting the value of the
attribute.
Getter method | Setter method | Type | |
---|---|---|---|
names | get_names() | set_names() | Vec<&str> |
class | get_class() | set_class() | Vec<&str> |
dim | get_dim() | set_dim() | &[i32] |
arbitrary | get_attrib() | set_attrib() | Sexp |
The getter methods return Option<T>
because the object doesn't always have the
attribute. You can match
the result like this:
/// @export
#[savvy]
fn get_class_int(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
match x.get_class() {
Some(class) => class.try_into(),
None => ().try_into(),
}
}
The setter methods are available only for owned SEXPs. The return type is
savvy::Result<()>
becuase the conversion from a Rust type to SEXP is fallible.
/// @export
#[savvy]
fn set_class_int() -> savvy::Result<savvy::Sexp> {
let mut x = OwnedIntegerSexp::new(1)?;
x.set_class(&["foo", "bar"])?;
x.into()
}
For attributes other than names
, class
, dim
, you can use get_attrib()
and set_attrib()
. Since an attribute can store arbitrary values, the type is
Sexp
. In order to extract the underlying value, you can use .into_typed()
and match
.
/// @export
#[savvy]
fn print_attr_values_if_int(attr: &str, value: savvy::Sexp) -> savvy::Result<()> {
let attr_value = value.get_attrib(attr)?;
match attr_value.into_typed() {
TypedSexp::Integer(i) => r_println!("int {:?}", i.as_slice()]),
_ => r_println("not int")
}
Ok(())
}
In order to set values, you can use .into()
to convert from the owned SEXP to
a savvy::Sexp
.
/// @export
#[savvy]
fn set_attr_int(attr: &str) -> savvy::Result<savvy::Sexp> {
let s: &[i32] = &[1, 2, 3];
let attr_value: OwnedIntegerSexp = s.try_into()?;
let mut out = OwnedIntegerSexp::new(1)?;
out.set_attrib(attr, attr_value.into())?;
out.into()
}
Handling Data Frames
A data.frame
is a list. You should simply handle it as a list in Rust code, and
all data.frame
-related operations should be done in R code.
For example, if you want to return the result as a data.frame
, the Rust
function should return a list, and wrapped by an R function that converts the
list into a data.frame. tibble::as_tibble()
should be the right choice for
this purpose. Or, if you prefer lightweight dependency, you can use
vctrs::new_data_frame()
, or simply as.data.frame()
.
/// @export
#[savvy]
fn foo_impl() -> savvy::Result<savvy::Sexp> {
// create a named list
let mut out = savvy::OwnedListSexp::new(2, true)?;
let x: Vec<f64> = some_function();
let y: Vec<f64> = another_function();
out.set_name_and_value(0, "x", OwnedRealSexp::try_from_slice(x)?)?;
out.set_name_and_value(1, "y", OwnedRealSexp::try_from_slice(y)?)?;
out.into()
}
foo <- function() {
result <- foo_impl()
tibble::as_tibble(result)
}
Handling Factors
A factor is internally an integer vector with the levels
attribute. You can
handle this on Rust's side, but the recommended way is to write a wrapper R
function to convert the factor vector to a character vector.
Say there's a Rust function that takes a character vector as its argument.
/// @export
#[extendr]
fn foo_impl(x: StringSexp) -> savvy::Result<()> {
...
}
Then, you can write a function like below to convert the input to a character
vector. If you want better validation, you can use vctrs::vec_cast()
instead.
foo <- function(x) {
x <- as.character(x)
foo_impl(x)
}
If you need the information of the order of the levels, you should pass it as an another argument.
/// @export
#[extendr]
fn foo_impl2(x: StringSexp, levels: StringSexp) -> savvy::Result<()> {
...
}
foo2 <- function(x) {
levels <- levels(x)
x <- as.character(x)
foo_impl2(x, levels)
}
Handling Matrices And Arrays
Savvy doesn't provide a convenient way of converting matrices and arrays. You have to do it by yourself. But, don't worry, it's probably not very difficult thanks to the fact that major Rust matrix crates are column-majo, or at least support column-major.
- ndarray: row-major is default (probably for compatibility with Python ndarray?), but it offers column-major as well
- nalgebra: column-major
- glam (and probably all other rust-gamedev crates): column-major, probably because GLSL is column-major
The example code can be found at https://github.com/yutannihilation/savvy-matrix-examples/tree/master/src/rust/src.
R to Rust
ndarray
By default, ndarray is row-major, but you can specify column-major by
f()
.
So, all you have to do is simply to extract the dim
and pass it to ndarray.
use ndarray::Array;
use ndarray::ShapeBuilder;
use savvy::{r_println, savvy, RealSexp};
/// @export
#[savvy]
fn ndarray_input(x: RealSexp) -> savvy::Result<()> {
// In R, dim is i32, so you need to convert it to usize first.
let dim_i32 = x.get_dim().ok_or("no dimension found")?;
let dim: Vec<usize> = dim_i32.iter().map(|i| *i as usize).collect();
// f() changes the order from row-major (C-style convention) to column-major (Fortran-style convention).
let a = Array::from_shape_vec(dim.f(), x.to_vec());
r_println!("{a:?}");
Ok(())
}
nalgebra
nalgebra is column-major, so you can simply pass the dim
.
use nalgebra::DMatrix;
use savvy::{r_println, savvy, RealSexp};
/// @export
#[savvy]
fn nalgebra_input(x: RealSexp) -> savvy::Result<()> {
let dim = x.get_dim().ok_or("no dimension found")?;
if dim.len() != 2 {
return Err(savvy_err!("Input must be matrix!"));
}
let m = DMatrix::from_vec(dim[0] as _, dim[1] as _, x.to_vec());
r_println!("{m:?}");
Ok(())
}
glam
glam is also column-major. In the case with glam, probably the dimension is fixed (e.g. 3 x 3 in the following code). You can check the dimension is as expected before passing it to the constructor of a matrix.
use glam::{dmat3, dvec3, DMat3};
use savvy::{r_println, savvy, OwnedRealSexp, RealSexp};
/// @export
#[savvy]
fn glam_input(x: RealSexp) -> savvy::Result<()> {
let dim = x.get_dim().ok_or("no dimension found")?;
if dim != [3, 3] {
return Err(savvy_err!("Input must be 3x3 matrix!"));
}
// As we already check the dimension, this must not fail
let x_array: &[f64; 9] = x.as_slice().try_into().unwrap();
let m = DMat3::from_cols_array(x_array);
r_println!("{m:?}");
Ok(())
}
Rust to R
The matrix libraries typically provides method to get the dimension and the
slice of underlying memory. You set the dimension by set_dim()
.
/// @export
#[savvy]
fn nalgebra_output() -> savvy::Result<savvy::Sexp> {
let m = DMatrix::from_vec(2, 3, vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0]);
let shape = m.shape();
let dim = &[shape.0, shape.1];
let mut out = OwnedRealSexp::try_from(m.as_slice())?;
out.set_dim(dim)?;
out.into()
}
Testing
Write integration tests on R's side
The most recommended way is to write tests on R's side just as you do with an ordinary R package. You can write tests on Rust's side as described later, but, ultimately, the R functions are the user interface, so you should test the behavior of actual R functions.
Write Rust tests
The sad news is that cargo test
doesn't work with savvy. This is because savvy
always requires a real R session to work. But, don't worry, savvy-cli test
is
the tool for this. savvy-cli test
does
- extract the Rust code of the test modules and the doc tests
- create a temporary R package1 and inject the extracted Rust code
- build and run the test functions via the R package
The R package is created in the OS's cache dir by default, but you can
specify the location by --cache-dir
.
Note that, this takes the path to the root of a crate, not that of an R package.
savvy-cli test path/to/your_crate
Limitations
savvy-cli test
tries to mimic what cargo test
does as much as possible, but
there's some limitations.
First, in order to run tests, you need to add "lib"
to the crate-type
. This
is because your crate is used as a Rust library when run by savvy-cli test
.
[lib]
crate-type = ["staticlib", "lib"]
^^^^^
Second, if you want to test a function or a struct, it must be public. For the
ones marked with #[savvy]
are automatically made public, but, if you want to
test other functions, you need to add pub
to it by yourself.
pub fn foo() -> savvy::Result<()> {
^^^
Test module
You can write tests under a module marked with #[cfg(feature = "savvy-test")]
instead of
#[cfg(test)]
. A #[test]
function needs to have the return value of
savvy::Result<()>
, which is the same convention as #[savvy]
.
To check if an SEXP contains the expected data, assert_eq_r_code
is convenient.
#[cfg(feature = "savvy-test")]
mod test {
use savvy::{OwnedIntegerSexp, assert_eq_r_code};
#[test]
fn test_integer() -> savvy::Result<()> {
let mut x = OwnedIntegerSexp::new(3)?;
assert_eq_r_code(x, "c(0L, 0L, 0L)");
Ok(())
}
}
Note that savvy-test
is just a marker for savvy-cli
, not a real feature. So,
in theory, you don't really need this. However, in reality, you probably want to
add it to the [features]
section of Cargo.toml
because otherwise Cargo warns.
[features]
savvy-test = []
To test a function that takes user-supplied SEXPs like IntegerSexp
, you can
use .as_read_only()
to convert from the corresponding Owned-
type. For
example, if you have a function your_fn()
that accepts IntegerSexp
, you can
construct an OwnedIntegerSexp
and convert it to IntegerSexp
before passing
it to your_fn()
.
#[savvy]
pub fn your_fn(x: IntegerSexp) -> savvy::Result<()> {
// ...snip...
}
#[cfg(feature = "savvy-test")]
mod test {
use savvy::OwnedIntegerSexp;
#[test]
fn test_integer() -> savvy::Result<()> {
let x = savvy::OwnedIntegerSexp::new(3)?;
let x_ro = x.as_read_only();
let result = super::your_fn(x_ro);
assert_eq_r_code(result, "...");
Ok(())
}
}
Doc tests
You can also write doc tests. savvy-cli test
wraps it with a function with the
return value of savvy::Result<()>
, you can use ?
to extract the Result
value in the code.
/// ```
/// let x = savvy::OwnedIntegerSexp::new(3)?;
/// assert_eq!(x.as_slice(), &[0, 0, 0]);
/// ```
Features and dependencies
If you need to specify some features for testing, use --features
argument.
savvy-cli test --features foo path/to/your_crate
For dependencies, savvy-cli test
picks all dependencies in [dependencies]
and [dev-dependencies]
. If you need some additional crate for the test code,
you can just use [dev-dependencies]
section of the Cargo.toml
just as you do
when you do cargo test
.
Reminder: You can use cargo test
While #[savvy]
requires a real session, you can utilize cargo test
by
separating the actual logic to a function that doesn't rely on savvy. For
example, suppose you have the following function times_two_int()
that doubles
the input numbers.
#[savvy]
fn times_two_int(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
let mut out = OwnedIntegerSexp::new(x.len())?;
for (i, e) in x.iter().enumerate() {
if e.is_na() {
out.set_na(i)?;
} else {
out[i] = e * 2;
}
}
out.into()
}
In this case, you can rewrite the code to the following so that you can test
times_two_int_impl()
with cargo test
.
#[savvy]
fn times_two_int(x: IntegerSexp) -> savvy::Result<savvy::Sexp> {
let result: Vec<i32> = times_two_int_impl(x.as_slice());
result.try_into()
}
fn times_two_int_impl(x: &[i32]) -> Vec<i32> {
x.iter()
.map(|x| if x.is_na() { *x } else { *x * 2 })
.collect::<Vec<i32>>()
}
But, as you might notice, this implementation is a bit inefficient that it
allocates a Vec<i32>
just to store the temporary result. Like this, separating
a function might be a bit tricky and it might not be really worth in some cases.
(In this case, probably the function can return an iterator).
Advanced Topics
"External" external pointers
As described in Struct section, a struct marked with #[savvy]
is
transparently converted from and into an SEXP of an external pointer. So,
usually, you don't need to think about external pointers.
However, in some cases, you might need to deal with an external pointer created
by another R package. For example, you might want to access an Apache Arrow data
created by nanoarrow R package. In such caes, you can use unsafe methods
.cast_unchecked()
or .cast_mut_unchecked()
.
let foo: &Foo = unsafe { &*ext_ptr_sexp.cast_unchecked::<Foo>() };
Initialization Routine
#[savvy_init]
is a special version of #[savvy]
. The function marked with
this macro is called when the package is loaded, which is what Writing R
Extension calls "initialization routine". The function must take *mut DllInfo
as its argument.
For example, if you write such a Rust function like this,
use savvy::ffi::DllInfo;
#[savvy_init]
fn init_foo(_dll_info: *mut DllInfo) -> savvy::Result<()> {
r_eprintln!("Initialized!");
Ok(())
}
You'll see the following message on your R session when you load the package.
library(yourPackage)
#> Initialized!
Under the hood, savvy-cli update .
inserts the following line in a C function
R_init_*()
, which is called when the DLL is loaded.
void R_init_yourPackage(DllInfo *dll) {
R_registerRoutines(dll, NULL, CallEntries, NULL, NULL);
R_useDynamicSymbols(dll, FALSE);
savvy_init_foo__impl(dll); // added!
}
This is useful for initializing resources. For example, you can initialize a global variable.
use std::sync::OnceLock;
static GLOBAL_FOO: OnceLock<Foo> = OnceLock::new();
#[savvy_init]
fn init_global_foo(dll_info: *mut DllInfo) -> savvy::Result<()> {
GLOBAL_FOO.get_or_init(|| Foo::new());
Ok(())
}
You can also register an ALTREP class using this mechanism see the next page.
ALTREP
You can implement an ALTREP class using savvy.
Disclaimer
-
This feature is very experimental, so it's possible that the interface will be significantly changed or even removed in future.
-
The current API might be a bit oversimplified. For example, you cannot stop the vector is materialized (i.e., allocated as a normal
SEXP
and put into thedata2
slot of the ALTREP object).
Using ALTREP
Savvy currently provides only the following traits for ALTREP. The other ALTREPs
like ALTCOMPLEX
are not yet supported.
For example, consider the following struct that simply wraps a Vec<i32>
.
struct MyAltInt(Vec<i32>);
impl MyAltInt {
fn new(x: Vec<i32>) -> Self {
Self(x)
}
}
First, you need to implement IntoExtPtrSexp
trait for the struct, which is
required by Alt*
traits. This trait is what works under the hood of #[savvy]
when it's placed on a struct. You can just rely on the default implementation.
impl savvy::IntoExtPtrSexp for MyAltInt {}
Second, you need to implement one of the Alt*
traits. More specifically, the
trait has 4 members you need to implement:
CLASS_NAME
is the name of the class. This is used for distinguishing the class, so please use a unique string.PACKAGE_NAME
is the name of your package. This probably doesn't matter much.length()
returns the length of the object.elt(i)
returns thei
-th element of the object. An important note is that, usually R handles the out-of-bound check and returnsNA
if it exceeds the length. So, you don't need to check the length here.
In this case, the actual data is i32
, so let's implement AltInteger
.
impl AltInteger for MyAltInt {
const CLASS_NAME: &'static str = "MyAltInt";
const PACKAGE_NAME: &'static str = "TestPackage";
fn length(&mut self) -> usize {
self.0.len()
}
fn elt(&mut self, i: usize) -> i32 {
self.0[i]
}
}
Optionally, you can implement these methods:
copy_date(dst, offset)
: This copies the range of values starting fromoffset
intodst
, a&mut [T]
. The default implementation does just callelt()
repeatedly, but there might be more efficient implementation (e.g.copy_from_slice()
).inspect()
: This is called when.Internal(inspect(x))
. You might want to print some information useful for debugging.
Next step is a bit advanced. You need to create a definition of ALTREP class
from the above trait. This is done by the corresponding register_alt*_class()
function (for example, register_altinteger_class
for an integer class). This
function generates an ALTREP class and registers it to an R session.
The registration needs to happen when an R session loads the DLL of your crate.
As explained in the section of initialization routine,
you can define a #[savvy_init]
function, which will be called in the
initialization routine.
#[savvy_init]
fn init_altrep_class(dll_info: *mut DllInfo) -> savvy::Result<()> {
register_altinteger_class::<MyAltInt>(dll_info)?;
Ok(())
}
Finally, you'll probably want to implement a user-visible function to create the
instance of the ALTREP class. You can convert the struct into an ALTREP by
.into_altrep()
method, which is provided by the Alt*
trait. For example, you
can create the following function that returns the length 3 of the ALTREP vector
to the R session.
#[savvy]
fn altint() -> savvy::Result<savvy::Sexp> {
let v = MyAltInt::new(vec![1, 2, 3]);
v.into_altrep()
}
This function can be used like this:
x <- altint()
x
#> [1] 1 2 3
This looks like a normal integer vector, but this is definitely an ALTREP.
.Internal(inspect(x))
#> @0x0000021684acac40 13 INTSXP g0c0 [REF(65535)] (MyAltInt)
Going deeper...
Once the ALTREP object leaves your hand, it looks like a normal vector. But, if
you really wish, you can convert it back to the original object. Alt*
trait
provides 3 methods for this conversion:
try_from_altrep_ref()
for&T
try_from_altrep_mut()
for&mut T
try_from_altrep()
forT
For example, you can print the underlying data using Debug
trait.
#[savvy]
fn print_altint(x: IntegerSexp) -> savvy::Result<()> {
if let Ok(x) = MyAltInt::try_from_altrep_ref(&x) {
r_println!("{x:?}");
return Ok(());
};
Err(savvy_err!("Not a known ALTREP"))
}
print_altint(x)
#> MyAltInt([1, 2, 3])
But, before getting excited, you need to be aware about the tricky nature of R.
First, your ALTREP object can be easily lost in the sea of copy-on-modify. For example, if the object is get modified, it's no longer an ALTREP object.
x <- altint()
x[1L] <- 3L
print_altint(x)
#> Error: Not a known ALTREP
Second, this is much trickier. As there is try_from_altrep_mut()
, you can
modify the underlying data. For example, you can mutiply each number by two.
#[savvy]
fn tweak_altint(mut x: IntegerSexp) -> savvy::Result<()> {
if let Ok(x) = MyAltInt::try_from_altrep_mut(&mut x, false) {
for i in x.0.iter_mut() {
*i *= 2;
}
return Ok(());
};
Err(savvy_err!("Not a known ALTREP"))
}
Let's confirm this function modifies the underlying data as expected.
x <- altint()
c(x) # This is for a side effect! Let's discuss later.
#> [1] 1 2 3
tweak_altint(x)
print_altint(x)
#> MyAltInt([2, 4, 6])
So far, so good. But, if you print x
, you'll find the values are diverged
between Rust and R... Why can this happen?
x
#> [1] 1 2 3
This is because savvy's implementation caches the SEXP object converted from the
underlying data. It's can be costly if it creates a fresh SEXP object everytime
the R session requires, so the result is cached at the first time it's created
(in the above case, it's c(x)
). As far as I know, most of the ALTREP
implementation adopt this caching strategy (more specifically, an ALTREP object
has two slots, data1
and data2
, and data2
is usually used for the cache).
But, don't worry. try_from_altrep_mut()
has a second argument,
invalidate_cache
. You can set this to true
to clear the cache.
#[savvy]
fn tweak_altint2(mut x: IntegerSexp) -> savvy::Result<()> {
if let Ok(x) = MyAltInt::try_from_altrep_mut(&mut x, true) {
// ^^^^^
// changed!
tweak_altint2(x)
print_altint(x)
#> MyAltInt([2, 4, 6])
x
#> [1] 2, 4, 6
This API is still experimental and I'm yet to find some nicer design. Feedback is really appreciated!
Linkage
Savvy compiles the Rust code into a static library and then use it to generate a DLL for the R package. There's one tricky thing about static library. The Rust's official document about linkage says
Note that any dynamic dependencies that the static library may have (such as dependencies on system libraries, or dependencies on Rust libraries that are compiled as dynamic libraries) will have to be specified manually when linking that static library from somewhere.
What does this mean? If some of the dependency crate needs linking to a native
library, the necessary compiler flags are added by cargo
. But, after creating
the static library, cargo
's turn is over. It's you who have to tell the linker
the necessary flags because there's no automatic mechanism.
If some of the flags are missing, you'll see a "symbol not found" error. For example, this is what I got on macOS. Some dependency of my package uses the objc2 crate, and it needs to be linked against Apple's Objective-C frameworks.
unable to load shared object '.../foo.so':
dlopen(../foo.so, 0x0006): symbol not found in flat namespace '_NSAppKitVersionNumber'
Execution halted
So, how can we know the necessary flags? The official document provides a pro-tip!
The
--print=native-static-libs
flag may help with this.
You can add this option to src/Makevars.in
and src/Makevars.win.in
via
RUSTFLAGS
envvar. Please edit this line.
# Add flags if necessary
- RUSTFLAGS =
+ RUSTFLAGS = --print=native-static-libs
Then, you'll find this note in the installation log.
Compiling ahash v0.8.11
Compiling serde v1.0.210
Compiling zerocopy v0.7.35
...snip...
note: Link against the following native artifacts when linking against this static library. The order and any duplication can be significant on some platforms.
note: native-static-libs: -framework CoreText -framework CoreGraphics -framework CoreFoundation -framework Foundation -lobjc -liconv -lSystem -lc -lm
Finished `dev` profile [unoptimized + debuginfo] target(s) in 19.17s
gcc -shared -L/usr/lib64/R/lib -Wl,-O1 -Wl,--sort-common -Wl,...
installing to /tmp/RtmpvQv8Ur/devtools_install_...
** checking absolute paths in shared objects and dynamic libraries
You can copy these flags to cargo build
. Please be aware that this differs on
platforms, so you probably need to run this command on CI, not on your local.
Also, since Linux and macOS requires different options, you need to tweak it in
the configure script.
For example, here's my setup on the vellogd package.
./configure
:
if [ "$(uname)" = "Darwin" ]; then
FEATURES=""
# result of --print=native-static-libs
ADDITIONAL_PKG_LIBS="-framework CoreText -framework CoreGraphics -framework CoreFoundation -framework Foundation -lobjc -liconv -lSystem -lc -lm"
else
FEATURES="--features use_winit"
fi
src/Makevars.in
:
PKG_LIBS = -L$(LIBDIR) -lvellogd @ADDITIONAL_PKG_LIBS@
Comparison with extendr
What the hell is this?? Why do you need another framework when there's extendr?
extendr is great and ready to use, but it's not perfect in some points (e.g., error handling) and it's kind of stuck; extendr is too feature-rich and complex that no one can introduce a big breaking change easily. So, I needed to create a new, simple framework to experiment with. The main goal of savvy is to provide a simpler option other than extendr, not to be a complete alternative to extendr.
Pros and cons compared to extendr
Pros:
- You can use
Result
for error handling instead ofpanic!
- You can compile your package for webR (I hope extendr gets webR-ready soon)
Cos:
- savvy prefers explicitness over ergonomics
- savvy provides limited amount of APIs and might not fit for complex usages