注意
转到末尾 下载完整示例代码。
概率向量或 ZipMap¶
分类器通常会返回一个概率矩阵。默认情况下,sklearn-onnx 会将此矩阵转换为一个字典列表,其中每个概率都映射到其类 ID 或名称。此机制保留了类名。此转换会增加预测时间,并非总是必需的。让我们在 Iris 示例中看看如何禁用此行为。
训练模型并进行转换¶
from timeit import repeat
import numpy
import sklearn
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import onnxruntime as rt
import onnx
import skl2onnx
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx import convert_sklearn
from sklearn.linear_model import LogisticRegression
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y)
clr = LogisticRegression(max_iter=500)
clr.fit(X_train, y_train)
print(clr)
initial_type = [("float_input", FloatTensorType([None, 4]))]
onx = convert_sklearn(clr, initial_types=initial_type, target_opset=12)
LogisticRegression(max_iter=500)
输出类型¶
让我们用 onnxruntime 确认概率的输出类型是字典列表。
sess = rt.InferenceSession(onx.SerializeToString(), providers=["CPUExecutionProvider"])
res = sess.run(None, {"float_input": X_test.astype(numpy.float32)})
print(res[1][:2])
print("probabilities type:", type(res[1]))
print("type for the first observations:", type(res[1][0]))
[{0: 0.07156547158956528, 1: 0.9160813689231873, 2: 0.012353206984698772}, {0: 8.799823262961581e-05, 1: 0.10934431850910187, 2: 0.8905677199363708}]
probabilities type: <class 'list'>
type for the first observations: <class 'dict'>
无 ZipMap¶
让我们移除 ZipMap 算子。
initial_type = [("float_input", FloatTensorType([None, 4]))]
options = {id(clr): {"zipmap": False}}
onx2 = convert_sklearn(
clr, initial_types=initial_type, options=options, target_opset=12
)
sess2 = rt.InferenceSession(
onx2.SerializeToString(), providers=["CPUExecutionProvider"]
)
res2 = sess2.run(None, {"float_input": X_test.astype(numpy.float32)})
print(res2[1][:2])
print("probabilities type:", type(res2[1]))
print("type for the first observations:", type(res2[1][0]))
[[7.1565472e-02 9.1608137e-01 1.2353207e-02]
[8.7998233e-05 1.0934432e-01 8.9056772e-01]]
probabilities type: <class 'numpy.ndarray'>
type for the first observations: <class 'numpy.ndarray'>
每个类的输出¶
此选项会移除最后的 ZipMap 算子,并将概率拆分为列。最终模型会生成一个标签输出,以及每个类的输出。
options = {id(clr): {"zipmap": "columns"}}
onx3 = convert_sklearn(
clr, initial_types=initial_type, options=options, target_opset=12
)
sess3 = rt.InferenceSession(
onx3.SerializeToString(), providers=["CPUExecutionProvider"]
)
res3 = sess3.run(None, {"float_input": X_test.astype(numpy.float32)})
for i, out in enumerate(sess3.get_outputs()):
print(
"output: '{}' shape={} values={}...".format(
out.name, res3[i].shape, res3[i][:2]
)
)
output: 'output_label' shape=(38,) values=[1 2]...
output: 'i0' shape=(38,) values=[7.156547e-02 8.799823e-05]...
output: 'i1' shape=(38,) values=[0.91608137 0.10934432]...
output: 'i2' shape=(38,) values=[0.01235321 0.8905677 ]...
让我们比较预测时间¶
X32 = X_test.astype(numpy.float32)
print("Time with ZipMap:")
print(repeat(lambda: sess.run(None, {"float_input": X32}), number=100, repeat=10))
print("Time without ZipMap:")
print(repeat(lambda: sess2.run(None, {"float_input": X32}), number=100, repeat=10))
print("Time without ZipMap but with columns:")
print(repeat(lambda: sess3.run(None, {"float_input": X32}), number=100, repeat=10))
# The prediction is much faster without ZipMap
# on this example.
# The optimisation is even faster when the classes
# are described with strings and not integers
# as the final result (list of dictionaries) may copy
# many times the same information with onnxruntime.
Time with ZipMap:
[0.010032097000021167, 0.01245842299999822, 0.009457742999984475, 0.008190661999833537, 0.008566311000095084, 0.008487106000075073, 0.008385362000126406, 0.008503606999965996, 0.008555368000088492, 0.00829078199990363]
Time without ZipMap:
[0.004322446000060154, 0.004467199000146138, 0.00676410399978522, 0.006740812000089136, 0.005239173000063602, 0.00527031799992983, 0.004541967000022851, 0.004477042999951664, 0.004469333999850278, 0.004470136999998431]
Time without ZipMap but with columns:
[0.008028657000068051, 0.008175955000069735, 0.008919250999952055, 0.010066999999935433, 0.009950227999979688, 0.009849316000099861, 0.00963781900009053, 0.009465580000096452, 0.00968110799999522, 0.008902125999838972]
此示例使用的版本
print("numpy:", numpy.__version__)
print("scikit-learn:", sklearn.__version__)
print("onnx: ", onnx.__version__)
print("onnxruntime: ", rt.__version__)
print("skl2onnx: ", skl2onnx.__version__)
numpy: 2.3.1
scikit-learn: 1.6.1
onnx: 1.19.0
onnxruntime: 1.23.0
skl2onnx: 1.19.1
脚本总运行时间: (0 分钟 0.368 秒)