Python API
==========

Femtodriver has a Python API that allows you to programmatically perform the same tasks as the CLI.
We assume that you created a model using PyTorch and created the appropriate fqir_graph using fmot.
Please refer to the `fmot documentation <https://fmot.femtosense.ai>`_ for how to do this.
Once you have the fqir_graph, the general Python workflow is as follows:

#. Create a Femtodriver object with a context manager
#. Compile the model with :code:`fd.compile()`
#. Generate inputs for the model
#. Run a simulation specifying the inputs we generated and an input period to get power estimates and outputs
#. Run the model on real hardware with controlled inputs
#. Run a comparison between the simulator and the hardware (Coming soon)

In code, this looks like:

.. code-block:: python

  from femtodriver import Femtodriver
  from fmot.fqir import GraphProto

  model: GraphProto = fqir_graph # You should have generated this using the tools in fmot. Please refer to the fmot docs

  output_dir = "model_datas"

  with Femtodriver() as fd:
      meta_dir, femtofile_path, femtofile_size = fd.compile(model,
                                                            model_name="my_audio_model",
                                                            output_dir=output_dir)
      print(f"femtofile generated at: {femtofile_path} with size {femtofile_size}KB")

      # Generate 4 frames of random inputs
      # model_inputs = fd.generate_audio_inputs(n_frames=4)

      # Use first 4 frames from an input audio wav file
      # model_inputs = fd.generate_audio_inputs(input="test_yes.wav", input_sample_indices=[0, 4])

      input_arr = np.random.randint(-1024, 1024, size=(1000,))
      model_inputs = fd.generate_audio_inputs(input=input_arr, input_sample_indices=[0, 4])

      # use first 4 frames of random input_arr = 4*32 samples
      results, metrics = fd.simulate(model_inputs=model_inputs, input_period=0.0016)
      print(metrics) # metrics are a stringified yaml

      # Optionally debug information to send to femtosense if something goes wrong
      docker_logs = fd.get_docker_logs()

      # Run inputs through a single runner. For example through the evk2 SPU-001 hardware.
      result, runner = self.execute_runner(
          requested_runner="hw",
          model_inputs=model_inputs,
          hardware="evk2",
      )

      # Compare different runners to ensure hw and simulator match
      comparison_results = fd.compare(runners_to_compare=["hw", "fasmir"],
                                      model_inputs=model_inputs,
                                      hardware="evk2")

In the following sections we will break down each of these steps and expand the possible arguments you can use to control how each of these steps works.

The Context Manager
-------------------

We use a context manager by doing:

.. code-block:: python

  with Femtodriver() as fd:

A context manager in Python allows us to clean up once we leave the scope of the context.
Specifically, when we exit the indented block, we shut down the compiler docker container so that we don't use those resources on your computer anymore.

If you use a debugger and exit the debugger midway through execution, the context manager cannot shut down the docker container so we provide a helper to clean up any previous docker containers that were running.

.. code-block:: python

  fd.cleanup_docker_containers()

Compiling a Model and Generating Program Files
----------------------------------------------

The line:

.. code-block:: python

  meta_dir, femtofile_path, femtofile_size = fd.compile(model, model_name="my_audio_model", output_dir="model_datas")

does the following:

#. Compiles the model metadata (useful for Femtosense debugging)
#. Compiles your model to a .femto file which can be put on an sd card and run on EVK2 (new method)
#. Compiles your model to 0PROG_A and 0PROG_D program files which can be put on an sd card and run on EVK2 (old method)


The generated files have the following structure where the top level dir 'model_datas' is the directory supplied to :code:`output_dir`:

.. code-block::

  model_datas
  └── my_audio_model
      ├── io_records
      │   ├── apb_records
      │   │   ├── 0PROG_A
      │   │   └── 0PROG_D
      │   ├── ...
      |   └── my_audio_model.femto
      └── meta_from_femtocrux
          ├── metadata.yaml
          ├── ...

Notice that the second level name is the stem :code:`my_audio_model` from the name field passed into compile.

The current firmware loads the 0PROG_A and 0PROG_D files that are generated. You will need to place these on the SD card in the EVK. However, we will be switching to the .femto file that was generated very shortly.

The full options for compile are:

.. code-block:: python

  def compile(
      self,
      model: GraphProto | str,
      model_name: str = "fqir",
      model_options_file=None,
      output_dir: str | Path = "model_datas",
  ) -> tuple[str, str, str]:

Check the glossary at the end of this document for what each of these options does.

Model Inputs For :code:`fd.simulate()`, :code:`fd.execute_runner()` and :code:`fd.compare()`
--------------------------------------------------------------------------------------------

Before we can simulate or compare different types of runners we need to generate the inputs we will drive through the
model to get our simulation metrics.

The inputs are structured as a dictionary with the key being the name of the input and the value being an :code:`np.ndarray` of
integers in the correct shape that the model expects. As an example:

.. code-block:: python

  model_inputs = {
      'inputname':
          np.ndarray([
              [100, 223, ... , 421],
              [-23, 155, ... , 654]
          ], dtype=np.int32, shape=(2, 32))
  }

The shape of the :code:`np.ndarray` is (streaming_sequence_dimension, features).
As an example for audio it could be interpreted as (frames, samples_in_frame).
For simulation or testing on real hardware frames can be sent at once but in the real
world these would be streaming inputs.

You can have more than one input for a model so that's why we use this dictionary format. However, in the case of audio
inputs to models it will typically be a single input for the audio file.

Audio Inputs
************

We provide a special helper to create inputs for audio models as they are a common use case.
A :code:`.wav` file or audio input as an ndarray is a continuous list of inputs, but the models work on fixed-size audio frames.
Since so many of our models take audio data as an input, Femtodriver provides a simple tool to inspect a model's input
frame size, and reshape the data accordingly.

For example, if a :code:`.wav` file contains 16K samples (e.g. 1 second of audio at 16KHz), and the model takes 128D input
frames (8ms per hop), this tool would simply reshape the 16K element :code:`.wav` file into a (125, 128) vector
(125 8ms frames, 128 samples each).

You have 4 options for generating inputs for audio models:

#. Ignore the helper below and manually create the dictionary with the correct shapes as shown in the section above
#. Use randomly generated inputs
#. Use input audio from a :code:`.wav` file
#. Use input from an :code:`np.ndarray` of shape (N,) which will be automatically reshaped to (frames, samples)

The signature is:

.. code-block:: python

  def generate_audio_inputs(
      self,
      input: str | np.ndarray | None = None,
      spu_runner: SPURunner | None = None,
      n_frames: int = 2,
      input_sample_indices: list | None = None,
  ) -> dict[str, np.ndarray]:

To generate a random input with 4 frames of data:

.. code-block:: python

  model_inputs = fd.generate_audio_inputs(n_frames=4)

To generate an input from a wav file on disk:

.. code-block:: python

  model_inputs = fd.generate_audio_inputs(input="test_yes.wav")

To generate an input from an np.ndarray with shape (N_samples,):

.. code-block:: python

  # Input randomly generated here but you can use any ndarray of ints
  input_arr = np.random.randint(-1024, 1024, size=(1000,))
  model_inputs = fd.generate_audio_inputs(input=input_arr)

In all these cases, the returned object will be the :code:`dict[str, np.ndarray]` described above with the correct shape the
model expects.

You can restrict the frames to be a specific section by using the :code:`input_sample_indices` parameter.

.. code-block:: python

  # Input randomly generated here but you can use any ndarray of ints
  input_arr = np.random.randint(-1024, 1024, size=(1000,))
  model_inputs = fd.generate_audio_inputs(input=input_arr, input_sample_indices=[0, 4])

This is useful to speed up simulation or runner execution.

Model Simulation
----------------

Our simulator allows us to gather metrics about power consumption based on the model, a given input to the model and the
input_period. A typical invocation looks like:

.. code-block:: python

  results, metrics = fd.simulate(model_inputs=model_inputs, input_period=0.0016)

The full list of arguments is:

.. code-block:: python

  def simulate(
      self,
      model_inputs: dict[str, np.ndarray],
      input_period: float,
  ) -> tuple[dict, str]:
  """
  @returns: A tuple with the first element as a dictionary of sim results and the second element
            is string representation of the yaml sim metrics

  element 1:
      result = {
          "compare_str": "Single Runner",
          "pass": "No Comparisons",
          "internals": internals,
          "outputs": outputs,
      }
  element 2:
      The sim metrics which are described in the e2e example document in femtocrux
  """

The inputs must be generated as described above. The results object contains the internals and outputs of the model for
the given input after simulation. The simulate call only works with FXRunner, the femtocrux simulation runner.


Using `execute_runner()` to run inputs through cable-attached dev kit
---------------------------------------------------------------------

:code:`fd.execute_runner()` is a more generic method than :code:`fd.simulate()` and it allows us to run inputs through a
cable-attached dev kit and control the process of getting outputs through this software. We will be rolling this feature
out soon for evk2.

A typical invocation would be:

.. code-block:: python

  result, runner = self.execute_runner(
      requested_runner="hw",
      model_inputs=model_inputs,
      hardware="evk2",
  )

Where the model_inputs are described in the corresponding section on model_inputs above. The hardware specifies which
EVK is to be controlled and the zynq_host is specific to zynq boards. In the future evk2 will be supported under
`hardware` and we will give a full example of how to use it with this feature.

The full signature looks like:

.. code-block:: python

  def execute_runner(
      self,
      requested_runner: str,
      model_inputs: dict[str, np.ndarray],
      noencrypt: bool = False,
      hardware="fakezynq",
      dummy_output_file: str | None = None,
      debug_vars: str | None = None,
      debug_vars_fname: str | None = None,
      zynq_host: str | None = None,
  ) -> tuple[dict, FemtoRunner]:
  """
  Runs an input through a given spu_runner. Runners could be hw, fasmir, fmir, fqir.
  The input could be a fake autogenerated random input or an np.ndarray. Use the helper
  function generate_audio_inputs() to turn wav files into the correct shape ndarray.

  @param: requested_runner: a string runner out of set({"hw", "fasmir", "fmir", "fqir"})
  @param: model_inputs: An ndarray that matches the shape required by the model
  @param: input_period: the input period processing time for a frame in seconds

  @returns: returns a tuple
              element1: result: dictionary which contains the internals activations and outputs of a runner.
              element2: femto_runner: the runner object


  result = {
      "compare_str": "Single Runner",
      "pass": "No Comparisons",
      "internals": internals,
      "outputs": outputs,
  }

  femto_runner of type FXRunner for sim or SPURunner for hw
  """

The result object is the same as the one in simulate and gives the internals and outputs of the model for a given input.

Comparing Software Simulation to cable-attached dev kit
-------------------------------------------------------

You can compare the results of different runners to see if the model running in software simulation produces the same
outputs as the model running on SPU-001 through the EVK2.

The snippet of code from the above example that does this is:

.. code-block:: python

  # Compare different runners to ensure hw and simulator match
  comparison_results = fd.compare(runners_to_compare=["hw", "fasmir"],
                                  model_inputs=model_inputs,
                                  hardware="evk2")

The full signature looks like:

.. code-block:: python

  def compare(
      self,
      model_inputs: dict[str, np.ndarray],
      runners_to_compare: list,
      noencrypt: bool = False,
      hardware="fakezynq",
      dummy_output_file: str | None = None,
      debug_vars: str | None = None,
      debug_vars_fname: str | None = None,
      zynq_host: str | None = None,
  ) -> dict:
      """
      Compare runs a comparison between different FemtoRunners.

      @param runners_to_compare: The runners to compare. These are fqir, fmir, fasmir and hw.
                                  information on the rest of the arguments are in the run() docstring
      @returns: returns a dictionary of the status of the comparison and any mismatches as well as the results of
                  each runner.

The output looks like:

.. code-block:: bash

  {
      'pass': True,
      'status_str': 'Comparison of these runners SUCCEEDED!:\n  hardware : took 20.13 s\n  fasmir : took 63.49 s\n',
      'outputs':
          {'hardware':
              {'%x.606': array(
              [[   0,    0,    0,    0,   -1,   -1,   -1,   -1,   -1,   -1,   -1,
                  ...],
              [ -66,  -20,  101,  129,   19,   17,   84,  109,  135,  154,    0,
                  ...],
              [ -12,  -98,  113,  -74,   97, -102,  -62,  102,   35,   -7,   45,
                  ...],
              [ -23,  -99,  -79,   -3,  136,  -79,  -69, -120,  124,   50,  -78,
                  ...]])},
           'fasmir':
              {'%x.606': array(
              [[   0.,    0.,    0.,    0.,   -1.,   -1.,   -1.,   -1.,   -1.,
                  ...],
              [ -66.,  -20.,  101.,  129.,   19.,   17.,   84.,  109.,  135.,
                  ...],
              [ -12.,  -98.,  113.,  -74.,   97., -102.,  -62.,  102.,   35.,
                  ...],
              [ -23.,  -99.,  -79.,   -3.,  136.,  -79.,  -69., -120.,  124.,
                  ...]])}
          },
      'internals': {'hardware': {}, 'fasmir': {}}
  }

Running Inputs one at a time through cable-attached dev kit
-----------------------------------------------------------

You may wish to perform per frame pre or post processing on the inputs or outputs of the models before each
iteration. We provide a lower level API mechanism to do this using the :code:`FemtoRunner` API. The recipe for this is:

.. code-block:: python

  from femtodriver import Femtodriver
  import numpy as np

  output_dir = "model_datas"
  # model = "clara2.1L_24b.pt" # We also support FQIR pickles as input.
  model = "clara2.1L_24b.femto"

  with Femtodriver() as fd:
      meta_dir, femtofile, femtofile_size = fd.compile(model, output_dir=output_dir)

      input_arr = np.random.randint(-1024, 1024, size=(1000,))
      model_inputs = fd.generate_audio_inputs(
          input=input_arr, input_sample_indices=[0, 4]
      )

      femto_runner, runner_name = fd.create_runner(requested_runner="hw", hardware="evk2")

      first_var_vals = next(iter(model_inputs.values()))
      n_steps = first_var_vals.shape[0]
      if not all(val.shape[0] == n_steps for val in model_inputs.values()):
          raise ValueError("Input sequence lengths don't match for all variables")

      femto_runner.reset()
      for i in range(n_steps):
          step_inputs = {varname: values[i] for varname, values in model_inputs.items()}

          output_vals, internal_vals = femto_runner.step(step_inputs)
          print(
              f"input # {i}:\n"
              f"  inputs: {step_inputs}\n"
              f"  outputs {output_vals}, internals {internal_vals}"
          )
      femto_runner.finish()

In the above example the model is in the bitfile.zip format. You can also use your fqir.pt here instead if you are using
your own model that you have the fqir for. Additionally, when creating the femto_runner object you will need to specify
the :code:`hardware="evk2"` parameter to use the evk2 cable-attached dev kit.

We currently have the following limitations:

* Running in cable attached mode doesn't give accurate latency measurements as USB communication adds latency. We plan to support this in the future using a batched send.


List of Femtodriver Calls
-------------------------

For reference here is a list of all the Python API calls without arguments for discoverability.

.. code-block:: python

  fd.compile()
  fd.simulate()
  fd.compare()
  fd.execute_runner()
  fd.generate_inputs()
  fd.generate_program_files()
  fd.write_metadata_to_disk()
  fd.write_metrics_to_disk()
  fd.cleanup_docker_containers()
  fd.get_docker_logs()


The Full list of arguments to run()
-----------------------------------

The full list of arguments to run is shown below.

.. code-block:: text

  Required params:
  model:                          Model to run.

  Optional:
  model_options_file:             .yaml with run options for different models (e.g., compiler options).
                                  Default is femtodriver/femtodriver/models/options.yaml
  output_dir:                     Directory where to write fasmir, fqir, programming images,
                                  programming streams, etc.
  n_frames:                       Number of random sim inputs to drive in.
  input_file:                     File with inputs to drive in. Expects .npy from numpy.save.
                                  Expecting single 2D array of values, indices are (timestep, vector_dim)
  input_file_sample_indices:      lo, hi indices to run from input_file.
  force_femtocrux_compile:        Force femtocrux as the compiler, even if FS internal packages present.
  force_femtocrux_sim:            Force femtocrux as the simulator, even if FS internal packages present.
  hardware:                       Primary runner to use: (options: zynq, fakezynq, redis).
  runners:                        Which runners to execute. If there are multiple, compare each of them
                                  to the first, comma-separated. Options: hw, fasmir, fqir, fmir, fakehw.
  debug_vars:                     Debug variables to collect and compare values for, comma-separated
                                  (no spaces), or 'all'.
  debug_vars_fname:               File with a debug variable name on each line.
  debug:                          Set debug log level.
  noencrypt:                      Don't encrypt programming files.
  input_period:                   Simulator input period for energy estimation. No impact on runtime.
                                  Floating point seconds.
  dummy_output_file:              For fakezynq, the values that the runner should reply with.
                                  Specify a .npy for a single variable.

Misc
-----

Note that many :code:`femtodrive` options pertain to running an attached SPU-001 directly. As of 9/24, an EVK has not been made available that allows external use of these features.