注意
前往底部 下载完整的示例代码。
使用 LightGBM 分类器转换管道¶
sklearn-onnx 仅将 scikit-learn 模型转换为 ONNX,但许多库实现了 scikit-learn API,以便其模型可以包含在 scikit-learn 管道中。本示例考虑一个包含 LightGBM 模型的管道。只要 sklearn-onnx 知道与 LGBMClassifier 关联的转换器,它就可以转换整个管道。让我们看看如何做到这一点。
训练 LightGBM 分类器¶
import onnxruntime as rt
from skl2onnx import convert_sklearn, update_registered_converter
from skl2onnx.common.shape_calculator import (
calculate_linear_classifier_output_shapes,
)
from onnxmltools.convert.lightgbm.operator_converters.LightGbm import (
convert_lightgbm,
)
from skl2onnx.common.data_types import FloatTensorType
import numpy
from sklearn.datasets import load_iris
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from lightgbm import LGBMClassifier
data = load_iris()
X = data.data[:, :2]
y = data.target
ind = numpy.arange(X.shape[0])
numpy.random.shuffle(ind)
X = X[ind, :].copy()
y = y[ind].copy()
pipe = Pipeline(
[("scaler", StandardScaler()), ("lgbm", LGBMClassifier(n_estimators=3))]
)
pipe.fit(X, y)
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000041 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 47
[LightGBM] [Info] Number of data points in the train set: 150, number of used features: 2
[LightGBM] [Info] Start training from score -1.098612
[LightGBM] [Info] Start training from score -1.098612
[LightGBM] [Info] Start training from score -1.098612
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
注册 LGBMClassifier 的转换器¶
转换器实现在 onnxmltools 中:onnxmltools…LightGbm.py。以及形状计算器:onnxmltools…Classifier.py。
update_registered_converter(
LGBMClassifier,
"LightGbmLGBMClassifier",
calculate_linear_classifier_output_shapes,
convert_lightgbm,
options={"nocl": [True, False], "zipmap": [True, False, "columns"]},
)
再次转换¶
model_onnx = convert_sklearn(
pipe,
"pipeline_lightgbm",
[("input", FloatTensorType([None, 2]))],
target_opset={"": 12, "ai.onnx.ml": 2},
)
# And save.
with open("pipeline_lightgbm.onnx", "wb") as f:
f.write(model_onnx.SerializeToString())
比较预测¶
使用 LightGbm 的预测。
print("predict", pipe.predict(X[:5]))
print("predict_proba", pipe.predict_proba(X[:1]))
/home/xadupre/vv/this312/lib/python3.12/site-packages/sklearn/utils/validation.py:2735: UserWarning: X does not have valid feature names, but LGBMClassifier was fitted with feature names
warnings.warn(
predict [0 2 2 2 1]
/home/xadupre/vv/this312/lib/python3.12/site-packages/sklearn/utils/validation.py:2735: UserWarning: X does not have valid feature names, but LGBMClassifier was fitted with feature names
warnings.warn(
predict_proba [[0.51995794 0.24549283 0.23454923]]
使用 onnxruntime 的预测。
sess = rt.InferenceSession("pipeline_lightgbm.onnx", providers=["CPUExecutionProvider"])
pred_onx = sess.run(None, {"input": X[:5].astype(numpy.float32)})
print("predict", pred_onx[0])
print("predict_proba", pred_onx[1][:1])
predict [0 2 2 2 1]
predict_proba [{0: 0.519957959651947, 1: 0.2454928159713745, 2: 0.23454922437667847}]
脚本总运行时间: (0 分 0.038 秒)