Cross-Language Data Types
Andreas Weis - 15/06/2026
When using different programming languages like C++ and Rust in the same project, one problem that always comes up is how to share data across language boundaries.
In this article we will explore some of the options that can be used for sharing data between C++ and Rust code. Our working assumption will be that we want to share data without copying it, in order to allow efficient sharing of large sets of data.
Memory Representation
The first step in allowing such forms of data sharing is to ensure that
our data type of choice can actually be represented in both C++ and Rust.
What this boils down to usually, is that we restrict ourselves to the same data
types that are available in C: The elementary data types for signed and
unsigned integers, pointers, and floating points. And the ability to build
compound array and struct types from those elementary types. C is de facto
the lingua franca when it comes to interoperability between programming
languages, so whenever we want to ensure that data can be passed across language
boundaries, we fall back to what is representable with C.
In C++, thanks to its backward-compatibility with C, struct types follow the
same memory layout, as long as they don't use any C++-only
language features that impact the memory layout. The C++ standard uses the term
standard-layout class type for such types. The C++ standard library also
provides the std::is_standard_layout type trait to check whether a type
upholds these constraints.
// A point in 3D space
struct Point3 {
std::int32_t x;
std::int32_t y;
std::int32_t z;
};
static_assert(std::is_standard_layout_v<Point3>);
Rust by default reserves a lot more liberties for the exact layout of its data
types, but it provides the repr(C) representation for forcing types
to use a memory layout that is compatible with C.
// A point in 3D space
#[repr(C)]
struct Point3 {
x: i32,
y: i32,
z: i32,
}
While each language provides mechanisms for ensuring that the declared type uses the correct memory layout to be compatible with C, there is no built-in way in the languages to ensure that the two types from the C++ and Rust world are compatible with each other. We must be extra careful to ensure that the declarations are indeed consistent.
The reward for those struggles is that we end up with data types that have the exact same layout in both languages, so we can send the raw bits from Rust to C++ (or vice versa) and they can be directly accessed with the same meaning in the other language.
Preserving Type Invariants
As long as our shared data is nothing more than a soup of numerical values, ensuring a consistent memory layout is all we need. For more complex data types, additional concerns may arise, in particular if a type relies on complex invariants regarding its state.
The valid values for a member of the type are often constrained, potentially depending on the value of other fields. For example, consider the following type representing rational numbers:
struct Rational {
numerator: i32,
denominator: i32,
}
struct Rational {
std::int32_t numerator;
std::int32_t denominator;
};
The denominator must not be set to 0. Also, if the fraction is stored in
reduced form, each change to one of the fields potentially requires a change to
the other to maintain the reduced form. Violating these constraints may result
in a value that is no longer valid for this type.
Such problems are well addressed by the use of encapsulation. Encapsulation requires a set of operations to be shipped alongside the data. Data is not accessed directly, but only via the operations operating on the type, which in turn have been carefully designed (and tested) to uphold any type invariants. In the example above, a setter for the fraction could reject values of 0 for the denominator and take care of properly reducing the fraction when writing the fields.
For complex types, it is not sufficient to ensure a consistent memory layout, we must also ensure that the surrounding program logic operating on such data is consistent.
There are generally two ways to address this.
Language Bindings
Instead of just sharing the data layout between languages, this approach shares code as well. We implement the methods that act on the underlying data once in our programming language of choice, and then provide bindings for all the other languages that allow invoking these functions. Internally these bindings will use the foreign function interface (FFI) of the respective language.
The obvious advantage of this approach is that it is very easy to enforce consistency between implementations, as there is only one single implementation of the core logic interacting with the data. The maintenance and testing burden is also carried in large part by that single implementation.
The downside of this approach is that the complete interface of the type will have to fit through the needle's eye that is the foreign function interface. Similar to how for the memory layout, we had to restrict ourselves to C data types, which language bindings you have to restrict yourselves to simple function calls. Features like Rust generics or C++ templates have no equivalent on the C side.
While this can be compensated to some extent by spending more effort in the layer implementing the language bindings, it can be very challenging to design such a system in a way that the type that exposes the underlying functionality with the same efficiency and as idiomatic as a native type.
So in particular for fundamental types that are used extensively in interfaces but are unlikely to change much over time, the trade-off between implementation effort and usability may favor a second option: Providing distinct, memory-layout compatible implementations for each language.
Memory-Layout Compatible Data Types
The core idea behind memory-layout compatible data types is quite simple: We implement the same methods for interacting with our data several times, once for each targeted programming language.
Because different languages have different underlying design philosophies, the interfaces will not look the same for each language. For example:
- A C++ implementation may want to use exceptions for reporting errors, while a
Rust implementation will probably want to use
std::Result, together with an error type that implements thestd::error::Errortrait - In Rust the string data type specifies its encoding, while in C++ it does not, so the encoding needs to be verified manually in code with C++
- Formatting for printing data in Rust uses the
std::fmt::Displayorstd::fmt::Debugtrait, while C++ uses astd::formatteror an iostream inserter function - A garbage collected language will need to use a completely different lifetime model than Rust's borrow checker, or the raw references of C++.
- Strongly typed languages will want to make extensive use of the type system in interfaces, while scripting languages like Python will use dynamic types.
Because of these differences, idiomatic implementations in different languages may have very little in common in the end, so the benefits from having only a single implementation exposed via language bindings get smaller and smaller in comparison.
It is essential that the different implementations have a common understanding about what the valid values under the shared memory representation are, and we want to ensure that all implementations follow this common understanding.
Case Study: Memory-Layout Compatible Data Types in iceoryx2
For iceoryx2, we currently provide two types of memory-layout compatible data types, a static (as in: fixed capacity) vector and a static string.
We plan to extend this collection of types by adding containers with dynamic capacity in a future release.
Both of these types come with distinct, fully self-contained implementations for C++ and Rust, that use the exact same underlying memory representation. Data written by the Rust static string implementation in memory can be reinterpreted as a C++ static string, and vice versa.
You can try this out yourself by running the
cross_language_communication_container examples for Rust and C++,
respectively.
Note how the type interfaces were designed to closely follow the idiomatic container interfaces from their respective language's standard library.
Since iceoryx2 was specifically designed to assist with exchanging data across language boundaries, it provides some simple sanity checks for detecting incompatibilities between type layouts when establishing a connection. The plan is to extend these checks in the future to by using the reflection capabilities of Rust and C++, to be able to reliably detect whether the types are compatible.
We also implemented a suite of cross-language component tests that exchange data via shared memory IPC using iceoryx2, to validate that operations carried out on a container in another programming language leave the data in the correct state. Part of this test suite is a metrics check, which is used for validating the expected memory layout. This test suite was designed to enable two avenues of extensibility
- Provide additional implementations for the container types for other programming languages, such as Python
- Provide an extension point for bringing container data types from an existing code base into the iceoryx2 ecosystem. If you already have a static vector implementation in your code base, you may not want to switch over to the iceoryx2 version. But you may be able to make your implementation memory-layout compatible with the iceoryx2 static vector, after which the two can be used interchangeably in IPC communication.
Conclusion
Sharing data across IPC boundaries presents many challenges. In addition to ensuring compatibility of the underlying memory layout, careful consideration of how to share functionality for manipulating the shared data is required.
For the use case of sharing fundamental container types, providing diverse implementations that are mutually memory-layout compatible opens interesting possibilities.