性能分析

性能分析在 onnx-mlir 中是原型功能，可用于调试运行时问题。

为性能分析进行编译

默认情况下，性能分析是关闭的。您需要使用以下命令行选项来启用它。性能分析的 Pass 将通过使用 --instrument-stage 选项插入到某些阶段。例如，当您指定 Onnx 时，性能分析将插入在 onnx 到 onnx 的转换之后，以获取 onnx 级别的 profiling。 --instrument-ops 选项用于指定要进行性能分析的操作。例如，您可以对 onnx Conv 操作使用 onnx.Conv。此外，您可以使用星号，例如 onnx.* 来指定所有 onnx 操作，并使用 , 指定两个表达式，例如 onnx.Conv,onnx.Add 来同时指定 Conv 和 Add 操作。 --InstrumentBeforeOp 和 --InstrumentAfterOp 选项用于在指定的操作之前和/或之后插入性能分析。当您使用 --instrument-ops=onnx.* --InstrumentBeforeOp --InstrumentAfterOp 时，性能分析将插入在所有 onnx 操作之前和之后。对于 NNPA，还提供了额外的 ZHigh 和 ZLow 阶段。您可以使用 --instrument-stage=ZHigh 和 --instrument-ops=onnx.*,zhigh.* 获取 onnx 和 zhigh 操作的 profile，并使用 --instrument-stage=ZLow 和 --instrument-ops=zlow.* 获取 zlow 操作的 profile。

  --instrument-stage=<value>                        - Specify stage to be instrumented:
    =Onnx                                             -   Profile for onnx ops. For NNPA, profile onnx ops before lowering to zhigh.
    =ZHigh                                            -   NNPA profiling for onnx and zhigh ops.
    =ZLow                                             -   NNPA profiling for zlow ops.

  --instrument-ops=<string>                         - Specify operations operations to be instrumented:
                                                      "NONE" or "" for no instrument,
                                                      "ops1,ops2, ..." for the multiple ops.
                                                      e.g. "onnx.Conv,onnx.Add" for Conv and Add ops.
                                                      Asterisk is also available.
                                                      e.g. "onnx.*" for all onnx operations.

  Specify what instrumentation actions at runtime:
      --InstrumentBeforeOp                          - insert instrument before op,
      --InstrumentAfterOp                           - insert instrument after op,
      --InstrumentReportTime                        - instrument runtime reports time usage,
      --InstrumentReportMemory                      - instrument runtime reports memory usage.

目前，初始化函数 OMInstrumentInit 的调用需要在加载动态库之前添加。编译器正在考虑将其添加到 main_graph 的开头。

使用性能分析运行

以通常的方式运行模型。性能分析库将在每个性能分析点打印出时间和内存使用情况。例如，一个模型 mymodel.onnx 使用 onnx-mlir --instrument-stage=Onnx --instrument-ops=onnx.* --InstrumentAfterOp --InstrumentReportMemory --InstrumentReportTime mymodel.onnx 命令进行编译。其运行时输出如下所示：

==PERF-REPORT==, onnx.Cast, bert/encoder/Reshape__27, before, 0.000001, 1692654182.738546
==PERF-REPORT==, onnx.Cast, bert/encoder/Reshape__27, after, 0.000001, 1692654182.738547
==PERF-REPORT==, onnx.Concat, bert/encoder/Reshape__27, before, 0.000000, 1692654182.738547
==PERF-REPORT==, onnx.Concat, bert/encoder/Reshape__27, after, 0.000001, 1692654182.738548
==PERF-REPORT==, onnx.Reshape, bert/encoder/Reshape, before, 0.000001, 1692654182.738549
==PERF-REPORT==, onnx.Reshape, bert/encoder/Reshape, after, 0.000001, 1692654182.738550

此处解释时间测量的输出。

第一列是一个字符串，用于标识正在收集的性能信息，此处为 PERF-REPORT。
第二列是操作（op）的名称。
第三列是操作（op）的节点名称。当操作具有 onnx_node_name 属性时会显示此信息。
第四列指示此处分析的 onnx 操作的时间是在其 之前 还是 之后 报告的。
第五列指示自上一个性能分析点以来经过的时间。
第六列指示自 instrumentationInit 以来累积的时间（以秒为单位）。

此处解释内存测量的输出。

第一列是一个字符串，用于标识正在收集的性能信息，此处为 MEM-REPORT。
第二列和第三列的定义如上所述。
第四列指示 VMem，即此进程使用的虚拟内存大小（以 kb 为单位）。

NNPA 的其他示例

在降低到 zhigh 操作之前对 onnx 操作进行性能分析： onnx-mlir --march=z16 --maccel=NNPA --instrument-stage=Onnx --instrument-ops=onnx.* --InstrumentBeforeOp --InstrumentAfterOp --InstrumentReportTime mymodel.onnx
对 onnx 和 zhigh 操作进行性能分析： onnx-mlir --march=z16 --maccel=NNPA --instrument-stage=ZHigh --instrument-ops=onnx.*,zhigh.* --InstrumentBeforeOp --InstrumentAfterOp --InstrumentReportTime mymodel.onnx
对 zlow 操作进行性能分析： onnx-mlir --march=z16 --maccel=NNPA --instrument-stage=ZLow --instrument-ops=zlow.* --InstrumentBeforeOp --InstrumentAfterOp --InstrumentReportTime mymodel.onnx

运行时控制性能分析

通过在运行时提供某些环境变量，可以禁用性能分析库的报告输出。

如果设置了环境变量 ONNX_MLIR_NO_INSTRUMENT，则完全不输出报告
如果设置了环境变量 ONNX_MLIR_NO_INSTRUMENT_TIME，则禁用时间使用报告
如果设置了环境变量 ONNX_MLIR_NO_INSTRUMENT_MEMORY，则禁用内存使用报告
如果设置了环境变量 ONNX_MLIR_INSTRUMENT_FILE，则此变量提供用于保存性能分析数据的文件名。

请注意，启用性能分析的唯一方法是在编译时请求它。如果在运行时未启用任何详细报告（例如到目前为止的时间和内存），性能分析点的进度仍会打印出来。此功能被认为可用作进度指示器。要完全禁用在编译时请求的任何输出，必须设置 ONNX_MLIR_NO_INSTRUMENT。

在 gdb 中使用

性能分析点的函数名为 OMInstrumentPoint。可以在此函数内部设置断点，以便逐步执行 onnx 操作。

onnx-mlir

操作指南

参考资料

开发

工具

工具

性能分析

为性能分析进行编译

使用性能分析运行

运行时控制性能分析

在 gdb 中使用