Python API

Femtodriver has a Python API that allows you to programmatically perform the same tasks as the CLI. We assume that you created a model using PyTorch and created the appropriate fqir_graph using fmot. Please refer to the fmot documentation for how to do this. Once you have the fqir_graph, the general Python workflow is as follows:

Create a Femtodriver object with a context manager
Compile the model with fd.compile()
Generate inputs for the model
Run a simulation specifying the inputs we generated and an input period to get power estimates and outputs
Run the model on real hardware with controlled inputs
Run a comparison between the simulator and the hardware (Coming soon)

In code, this looks like:

from femtodriver import Femtodriver
from fmot.fqir import GraphProto

model: GraphProto = fqir_graph # You should have generated this using the tools in fmot. Please refer to the fmot docs

output_dir = "model_datas"

with Femtodriver() as fd:
    meta_dir, femtofile_path, femtofile_size = fd.compile(model,
                                                          model_name="my_audio_model",
                                                          output_dir=output_dir)
    print(f"femtofile generated at: {femtofile_path} with size {femtofile_size}KB")

    # Generate 4 frames of random inputs
    # model_inputs = fd.generate_audio_inputs(n_frames=4)

    # Use first 4 frames from an input audio wav file
    # model_inputs = fd.generate_audio_inputs(input="test_yes.wav", input_sample_indices=[0, 4])

    input_arr = np.random.randint(-1024, 1024, size=(1000,))
    model_inputs = fd.generate_audio_inputs(input=input_arr, input_sample_indices=[0, 4])

    # use first 4 frames of random input_arr = 4*32 samples
    results, metrics = fd.simulate(model_inputs=model_inputs, input_period=0.0016)
    print(metrics) # metrics are a stringified yaml

    # Optionally debug information to send to femtosense if something goes wrong
    docker_logs = fd.get_docker_logs()

    # Run inputs through a single runner. For example through the evk2 SPU-001 hardware.
    result, runner = fd.execute_runner(
        requested_runner="hw",
        model_inputs=model_inputs,
        hardware="evk2",
    )

    # Compare different runners to ensure hw and simulator match
    comparison_results = fd.compare(runners_to_compare=["hw", "fasmir"],
                                    model_inputs=model_inputs,
                                    hardware="evk2")

In the following sections we will break down each of these steps and expand the possible arguments you can use to control how each of these steps works.

The Context Manager

We use a context manager by doing:

with Femtodriver() as fd:

A context manager in Python allows us to clean up once we leave the scope of the context. Specifically, when we exit the indented block, we shut down the compiler docker container so that we don’t use those resources on your computer anymore.

If you use a debugger and exit the debugger midway through execution, the context manager cannot shut down the docker container so we provide a helper to clean up any previous docker containers that were running.

fd.cleanup_docker_containers()

Compiling a Model and Generating Program Files

The line:

meta_dir, femtofile_path, femtofile_size = fd.compile(model, model_name="my_audio_model", output_dir="model_datas")

does the following:

Compiles the model metadata (useful for Femtosense debugging)
Compiles your model to a .femto file which can be put on an sd card and run on EVK2 (new method)
Compiles your model to 0PROG_A and 0PROG_D program files which can be put on an sd card and run on EVK2 (old method)

The generated files have the following structure where the top level dir ‘model_datas’ is the directory supplied to output_dir:

model_datas
└── my_audio_model
    ├── io_records
    │   ├── apb_records
    │   │   ├── 0PROG_A
    │   │   └── 0PROG_D
    │   ├── ...
    |   └── my_audio_model.femto
    └── meta_from_femtocrux
        ├── metadata.yaml
        ├── ...

Notice that the second level name is the stem my_audio_model from the name field passed into compile.

The current firmware loads the 0PROG_A and 0PROG_D files that are generated. You will need to place these on the SD card in the EVK. However, we will be switching to the .femto file that was generated very shortly.

The full options for compile are:

def compile(
    self,
    model: GraphProto | str,
    model_name: str = "fqir",
    model_options_file=None,
    output_dir: str | Path = "model_datas",
) -> tuple[str, str, str]:

Check the glossary at the end of this document for what each of these options does.

Model Inputs For `fd.simulate()`, `fd.execute_runner()` and `fd.compare()`

Before we can simulate or compare different types of runners we need to generate the inputs we will drive through the model to get our simulation metrics.

The inputs are structured as a dictionary with the key being the name of the input and the value being an np.ndarray of integers in the correct shape that the model expects. As an example:

model_inputs = {
    'inputname':
        np.ndarray([
            [100, 223, ... , 421],
            [-23, 155, ... , 654]
        ], dtype=np.int32, shape=(2, 32))
}

The shape of the np.ndarray is (streaming_sequence_dimension, features). As an example for audio it could be interpreted as (frames, samples_in_frame). For simulation or testing on real hardware frames can be sent at once but in the real world these would be streaming inputs.

You can have more than one input for a model so that’s why we use this dictionary format. However, in the case of audio inputs to models it will typically be a single input for the audio file.

Audio Inputs

We provide a special helper to create inputs for audio models as they are a common use case. A .wav file or audio input as an ndarray is a continuous list of inputs, but the models work on fixed-size audio frames. Since so many of our models take audio data as an input, Femtodriver provides a simple tool to inspect a model’s input frame size, and reshape the data accordingly.

For example, if a .wav file contains 16K samples (e.g. 1 second of audio at 16KHz), and the model takes 128D input frames (8ms per hop), this tool would simply reshape the 16K element .wav file into a (125, 128) vector (125 8ms frames, 128 samples each).

You have 4 options for generating inputs for audio models:

Ignore the helper below and manually create the dictionary with the correct shapes as shown in the section above
Use randomly generated inputs
Use input audio from a .wav file
Use input from an np.ndarray of shape (N,) which will be automatically reshaped to (frames, samples)

The signature is:

def generate_audio_inputs(
    self,
    input: str | np.ndarray | None = None,
    spu_runner: SPURunner | None = None,
    n_frames: int = 2,
    input_sample_indices: list | None = None,
) -> dict[str, np.ndarray]:

To generate a random input with 4 frames of data:

model_inputs = fd.generate_audio_inputs(n_frames=4)

To generate an input from a wav file on disk:

model_inputs = fd.generate_audio_inputs(input="test_yes.wav")

To generate an input from an np.ndarray with shape (N_samples,):

# Input randomly generated here but you can use any ndarray of ints
input_arr = np.random.randint(-1024, 1024, size=(1000,))
model_inputs = fd.generate_audio_inputs(input=input_arr)

In all these cases, the returned object will be the dict[str, np.ndarray] described above with the correct shape the model expects.

You can restrict the frames to be a specific section by using the input_sample_indices parameter.

# Input randomly generated here but you can use any ndarray of ints
input_arr = np.random.randint(-1024, 1024, size=(1000,))
model_inputs = fd.generate_audio_inputs(input=input_arr, input_sample_indices=[0, 4])

This is useful to speed up simulation or runner execution.

Model Simulation

Our simulator allows us to gather metrics about power consumption based on the model, a given input to the model and the input_period. A typical invocation looks like:

results, metrics = fd.simulate(model_inputs=model_inputs, input_period=0.0016)

The full list of arguments is:

def simulate(
    self,
    model_inputs: dict[str, np.ndarray],
    input_period: float,
) -> tuple[dict, str]:
"""
@returns: A tuple with the first element as a dictionary of sim results and the second element
          is string representation of the yaml sim metrics

element 1:
    result = {
        "compare_str": "Single Runner",
        "pass": "No Comparisons",
        "internals": internals,
        "outputs": outputs,
    }
element 2:
    The sim metrics which are described in the e2e example document in femtocrux
"""

The inputs must be generated as described above. The results object contains the internals and outputs of the model for the given input after simulation. The simulate call only works with FXRunner, the femtocrux simulation runner.

Using execute_runner() to run inputs through cable-attached dev kit

fd.execute_runner() is a more generic method than fd.simulate() and it allows us to run inputs through a cable-attached dev kit and control the process of getting outputs through this software. We will be rolling this feature out soon for evk2.

A typical invocation would be:

result, runner = self.execute_runner(
    requested_runner="hw",
    model_inputs=model_inputs,
    hardware="evk2",
)

Where the model_inputs are described in the corresponding section on model_inputs above. The hardware specifies which EVK (EVK2, zynq) is to be controlled and the hardware_address which specific device to use (serial number for EVK2, IP address for Zynq). In the future evk2 will be supported under hardware and we will give a full example of how to use it with this feature.

The full signature looks like:

def execute_runner(
    self,
    requested_runner: str,
    model_inputs: dict[str, np.ndarray],
    noencrypt: bool = False,
    hardware="fakezynq",
    dummy_output_file: str | None = None,
    debug_vars: str | None = None,
    debug_vars_fname: str | None = None,
    hardware_address: str | None = None,
) -> tuple[dict, FemtoRunner]:
"""
Runs an input through a given spu_runner. Runners could be hw, fasmir, fmir, fqir.
The input could be a fake autogenerated random input or an np.ndarray. Use the helper
function generate_audio_inputs() to turn wav files into the correct shape ndarray.

@param: requested_runner: a string runner out of set({"hw", "fasmir", "fmir", "fqir"})
@param: model_inputs: An ndarray that matches the shape required by the model
@param: input_period: the input period processing time for a frame in seconds

@returns: returns a tuple
            element1: result: dictionary which contains the internals activations and outputs of a runner.
            element2: femto_runner: the runner object


result = {
    "compare_str": "Single Runner",
    "pass": "No Comparisons",
    "internals": internals,
    "outputs": outputs,
}

femto_runner of type FXRunner for sim or SPURunner for hw
"""

The result object is the same as the one in simulate and gives the internals and outputs of the model for a given input.

Comparing Software Simulation to cable-attached dev kit

You can compare the results of different runners to see if the model running in software simulation produces the same outputs as the model running on SPU-001 through the EVK2.

The snippet of code from the above example that does this is:

# Compare different runners to ensure hw and simulator match
comparison_results = fd.compare(runners_to_compare=["hw", "fasmir"],
                                model_inputs=model_inputs,
                                hardware="evk2")

The full signature looks like:

def compare(
    self,
    model_inputs: dict[str, np.ndarray],
    runners_to_compare: list,
    noencrypt: bool = False,
    hardware="fakezynq",
    dummy_output_file: str | None = None,
    debug_vars: str | None = None,
    debug_vars_fname: str | None = None,
    hardware_address: str | None = None,
) -> dict:
    """
    Compare runs a comparison between different FemtoRunners.

    @param runners_to_compare: The runners to compare. These are fqir, fmir, fasmir and hw.
                                information on the rest of the arguments are in the run() docstring
    @returns: returns a dictionary of the status of the comparison and any mismatches as well as the results of
                each runner.

The output looks like:

{
    'pass': True,
    'status_str': 'Comparison of these runners SUCCEEDED!:\n  hardware : took 20.13 s\n  fasmir : took 63.49 s\n',
    'outputs':
        {'hardware':
            {'%x.606': array(
            [[   0,    0,    0,    0,   -1,   -1,   -1,   -1,   -1,   -1,   -1,
                ...],
            [ -66,  -20,  101,  129,   19,   17,   84,  109,  135,  154,    0,
                ...],
            [ -12,  -98,  113,  -74,   97, -102,  -62,  102,   35,   -7,   45,
                ...],
            [ -23,  -99,  -79,   -3,  136,  -79,  -69, -120,  124,   50,  -78,
                ...]])},
         'fasmir':
            {'%x.606': array(
            [[   0.,    0.,    0.,    0.,   -1.,   -1.,   -1.,   -1.,   -1.,
                ...],
            [ -66.,  -20.,  101.,  129.,   19.,   17.,   84.,  109.,  135.,
                ...],
            [ -12.,  -98.,  113.,  -74.,   97., -102.,  -62.,  102.,   35.,
                ...],
            [ -23.,  -99.,  -79.,   -3.,  136.,  -79.,  -69., -120.,  124.,
                ...]])}
        },
    'internals': {'hardware': {}, 'fasmir': {}}
}

Running Inputs one at a time through cable-attached dev kit

You may wish to perform per frame pre or post processing on the inputs or outputs of the models before each iteration. We provide a lower level API mechanism to do this using the FemtoRunner API. The recipe for this is:

from femtodriver import Femtodriver
import numpy as np

output_dir = "model_datas"
# model = "clara2.1L_24b.pt" # We also support FQIR pickles as input.
model = "clara2.1L_24b.femto"

with Femtodriver() as fd:
    meta_dir, femtofile, femtofile_size = fd.compile(model, output_dir=output_dir)

    input_arr = np.random.randint(-1024, 1024, size=(1000,))
    model_inputs = fd.generate_audio_inputs(
        input=input_arr, input_sample_indices=[0, 4]
    )

    femto_runner, runner_name = fd.create_runner(requested_runner="hw", hardware="evk2")

    first_var_vals = next(iter(model_inputs.values()))
    n_steps = first_var_vals.shape[0]
    if not all(val.shape[0] == n_steps for val in model_inputs.values()):
        raise ValueError("Input sequence lengths don't match for all variables")

    femto_runner.reset()
    for i in range(n_steps):
        step_inputs = {varname: values[i] for varname, values in model_inputs.items()}

        output_vals, internal_vals = femto_runner.step(step_inputs)
        print(
            f"input # {i}:\n"
            f"  inputs: {step_inputs}\n"
            f"  outputs {output_vals}, internals {internal_vals}"
        )
    femto_runner.finish()

In the above example the model is in the bitfile.zip format. You can also use your fqir.pt here instead if you are using your own model that you have the fqir for. Additionally, when creating the femto_runner object you will need to specify the hardware="evk2" parameter to use the evk2 cable-attached dev kit.

We currently have the following limitations:

Running in cable attached mode doesn’t give accurate latency measurements as USB communication adds latency. We plan to support this in the future using a batched send.

List of Femtodriver Calls

For reference here is a list of all the Python API calls without arguments for discoverability.

fd.compile()
fd.simulate()
fd.compare()
fd.execute_runner()
fd.generate_inputs()
fd.generate_program_files()
fd.write_metadata_to_disk()
fd.write_metrics_to_disk()
fd.cleanup_docker_containers()
fd.get_docker_logs()

The Full list of arguments to run()

The full list of arguments to run is shown below.

Required params:
model:                          Model to run.

Optional:
model_options_file:             .yaml with run options for different models (e.g., compiler options).
                                Default is femtodriver/femtodriver/models/options.yaml
output_dir:                     Directory where to write fasmir, fqir, programming images,
                                programming streams, etc.
n_frames:                       Number of random sim inputs to drive in.
input_file:                     File with inputs to drive in. Expects .npy from numpy.save.
                                Expecting single 2D array of values, indices are (timestep, vector_dim)
input_file_sample_indices:      lo, hi indices to run from input_file.
force_femtocrux_compile:        Force femtocrux as the compiler, even if FS internal packages present.
force_femtocrux_sim:            Force femtocrux as the simulator, even if FS internal packages present.
hardware:                       Primary runner to use: (options: zynq, fakezynq, redis).
runners:                        Which runners to execute. If there are multiple, compare each of them
                                to the first, comma-separated. Options: hw, fasmir, fqir, fmir, fakehw.
debug_vars:                     Debug variables to collect and compare values for, comma-separated
                                (no spaces), or 'all'.
debug_vars_fname:               File with a debug variable name on each line.
debug:                          Set debug log level.
noencrypt:                      Don't encrypt programming files.
input_period:                   Simulator input period for energy estimation. No impact on runtime.
                                Floating point seconds.
dummy_output_file:              For fakezynq, the values that the runner should reply with.
                                Specify a .npy for a single variable.

Misc

Note that many femtodrive options pertain to running an attached SPU-001 directly. As of 9/24, an EVK has not been made available that allows external use of these features.