ndarray is a data structure library for Rust that implements an n-dimensional array for general elements and for numerical computation.

By n-dimensional we mean the generalization of the one dimensional vectors that have 1 axis, two dimensional matrices (2 axes), etc to arrays with n axes. In our general use the number of axes of the data is usally small: we use arrays with one, two or three dimensions. The main features of the library is its support for slicing, iteration and array view-taking, using the usual (in Rust) separation of owned (Array) and borrowing (ArrayView, ArrayViewMut) data types.

The new version of ndarray has an exciting feature: support for parallelization of Zip!

Zip is our implementation of lock-step function application across several producers; just like .zip() is used to iterate two iterators in lock step, we use the Zip object to do a similar operation across several producers. Not just two, but up to six producers 1 and not just arrays, but producers. For example, if we cut a 2D array into 2 × 2 chunks, the chunk producer is a two dimensional producer in itself, each item a chunk!

Why Producers?

We use the trait NdProducer for Zip, and not the usual Iterator. We have two goals: efficient loops and rayon-style parallelization. Efficient loops require that we use memory layout information from the producers and adapt the loop the best we can. rayon-style parallelization requires the Zip object can be split, and we again use the memory layout information to split them along the axis that gives the biggest separation in memory location.

An iterator keeps the current state of iteration, and splitting it is tricky when we have changed it from its starting state. So we use producers that are often iterable (convertible to iterators) but not iterators themselves.

Why Zip?

Imagine we want to do an operation of many operands efficiently and without allocating intermediate results. Even computing A + B, storing the result in C is a ternary operation if none of A or B are consumed and reused for the output.

A more involved example is, take 2D arrays A, B, C and 3D array D; for each element in A, B, C, we have a row in D; add to A the elementwise difference of B and C and the difference the second and first element of the matching row in D.

// pseudocode!
A += B - C + D[.., .., 1] - D[.., .., 0]

The question is how to do the operation in place on the elements in A, without allocating intermediate results for the subtractions. In ndarray we don’t have any “algebloat” but I think the underlying implementation would be the same. We have Zip to apply functions elementwise across many operands:

use ndarray::Zip;

// in .apply() we perform the element or item-wise operation

Zip::from(&mut A)
    .and(&B)
    .and(&C)
    .and(D.genrows())
    .apply(|a, &b, &c, d_row| {
    	*a += b - c + d_row[1] - d_row[0];
    });

What we have in Zip is a general tool that allows the user to write efficient loops with a memory safe API using bounds checking by default. The inputs can be arbitrary arrays, slices of other arrays or producers. As many of the inputs as needed can be mutable.

The compiler can often vectorize the code in the loop, even if we have many inputs or outputs.

We embrace rayon’s idea of easy to add parallelization: import the ndarray parallel crate and change .apply( to .par_apply(, and it’s parallelized:

use ndarray::Zip;
use ndarray_parallel::prelude::*;

Zip::from(&mut A)
    .and(&B)
    .and(&C)
    .and(D.genrows())
    .par_apply(|a, &b, &c, d_row| {
    	*a += b - c + d_row[1] - d_row[0];
    });

(See at the end if you wonder how genrows() works.)

Other News in Ndarray 0.9

  • Zip::indexed which passes the index of the current item as the first parameter to Zip’s .apply() or .fold_while()
  • The .windows() producer contributed by @Robbepop. It produces overlapping array views of a particular size, iterating by moving it one element at a time, just like .windows(n) does for a slice. It is an NdProducer and iterable, and can be paired with arrays etc using Zip as well.
  • general_mat_vec_mul was added to allow matrix-vector multiplication in place, without allocating a new result. The fallback implementation is a simple application of Zip, pairing a row-by-row producer of the matrix with the element producer of the output vector.
  • IxDyn which is used for arrays with dynamic number of axes, uses a “short vector” optimization and uses no allocation for short cases.
  • New methods .genrows() and .gencolumns() give easier access to row and column iterables (and you guessed it — producers).

    “gen” is for generalized; the method name rows() is already taken for returning the row count.

    We have generalized it to higher dimensions in this way: an array with dimensions m × n × k has m × n generalized rows, each of k elements (it’s like m stacked matrices of dimension n × k).

See the full release notes for details.

  1. Limited only by our willingness to use macros to implement it for more tuple arities