Blame: tests/python/test_basic_models.py - dmlc/xgboost

dmlc / xgboost UNCLAIMED

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

0 0 0 C++

Normal View History Raw

Model IO in JSON. (#5110) 2019-12-11 11:20:40 +08:00			`import json`
Use the pytest tmp path fixture. (#12019) 2026-02-22 10:02:35 +08:00			`from pathlib import Path`
TST: Added glm test for Python 2015-09-08 09:47:48 -04:00
Move Python testing utilities into xgboost module. (#8379) - Add typehints. - Fixes for pylint. Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu> 2022-10-26 16:56:11 +08:00			`import numpy as np`
			`import pytest`
			`import xgboost as xgb`
			`from xgboost import testing as tm`
Handle np integer in model slice and prediction. (#10007) 2024-01-26 04:58:48 +08:00			`from xgboost.core import Integer`
Cleanup `gpu_hist` in Python tests. (#11402) 2025-04-15 14:28:49 +08:00			`from xgboost.testing.basic_models import run_custom_objective`
[mt] Feature importance variants. (#11950) - Store sum hessian over all targets. - Store split gain. 2026-01-23 05:03:50 +08:00			`from xgboost.testing.updater import get_basescore`
Move Python testing utilities into xgboost module. (#8379) - Add typehints. - Fixes for pylint. Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu> 2022-10-26 16:56:11 +08:00
Enable flake8 2016-04-24 16:34:46 +09:00
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`class TestModels:`
Enable flake8 2016-04-24 16:34:46 +09:00			`def test_glm(self):`
Remove deprecated `feval`. (#11051) --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2024-12-06 16:48:17 +08:00			`param = {`
			`"objective": "binary:logistic",`
			`"booster": "gblinear",`
			`"alpha": 0.0001,`
			`"lambda": 1,`
			`"nthread": 1,`
			`}`
[Breaking] Require format to be specified in input URI. (#9077) Previously, we use `libsvm` as default when format is not specified. However, the dmlc data parser is not particularly robust against errors, and the most common type of error is undefined format. Along with which, we will recommend users to use other data loader instead. We will continue the maintenance of the parsers as it's currently used for many internal tests including federated learning. 2023-04-28 19:45:15 +08:00			`dtrain, dtest = tm.load_agaricus(__file__)`
Remove deprecated `feval`. (#11051) --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2024-12-06 16:48:17 +08:00			`watchlist = [(dtest, "eval"), (dtrain, "train")]`
Enable flake8 2016-04-24 16:34:46 +09:00			`num_round = 4`
			`bst = xgb.train(param, dtrain, num_round, watchlist)`
			`assert isinstance(bst, xgb.core.Booster)`
			`preds = bst.predict(dtest)`
			`labels = dtest.get_label()`
Remove deprecated `feval`. (#11051) --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2024-12-06 16:48:17 +08:00			`err = sum(`
			`1 for i in range(len(preds)) if int(preds[i] > 0.5) != labels[i]`
			`) / float(len(preds))`
Bumped up err assert in glm test (#1792) 2016-11-20 18:23:19 -06:00			`assert err < 0.2`
Enable flake8 2016-04-24 16:34:46 +09:00
Use the pytest tmp path fixture. (#12019) 2026-02-22 10:02:35 +08:00			`def test_dart(self, tmp_path: Path) -> None:`
[Breaking] Require format to be specified in input URI. (#9077) Previously, we use `libsvm` as default when format is not specified. However, the dmlc data parser is not particularly robust against errors, and the most common type of error is undefined format. Along with which, we will recommend users to use other data loader instead. We will continue the maintenance of the parsers as it's currently used for many internal tests including federated learning. 2023-04-28 19:45:15 +08:00			`dtrain, dtest = tm.load_agaricus(__file__)`
Remove deprecated `feval`. (#11051) --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2024-12-06 16:48:17 +08:00			`param = {`
			`"max_depth": 5,`
			`"objective": "binary:logistic",`
			`"eval_metric": "logloss",`
			`"booster": "dart",`
			`"verbosity": 1,`
			`}`
add Dart booster (#1220) 2016-06-09 06:04:01 +09:00			`# specify validations set to watch performance`
Remove deprecated `feval`. (#11051) --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2024-12-06 16:48:17 +08:00			`watchlist = [(dtest, "eval"), (dtrain, "train")]`
add Dart booster (#1220) 2016-06-09 06:04:01 +09:00			`num_round = 2`
			`bst = xgb.train(param, dtrain, num_round, watchlist)`
			`# this is prediction`
Remove ntree limit in python package. (#8345) - Remove `ntree_limit`. The parameter has been deprecated since 1.4.0. - The SHAP package compatibility is broken. 2023-03-31 19:01:55 +08:00			`preds = bst.predict(dtest, iteration_range=(0, num_round))`
add Dart booster (#1220) 2016-06-09 06:04:01 +09:00			`labels = dtest.get_label()`
Remove deprecated `feval`. (#11051) --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2024-12-06 16:48:17 +08:00			`err = sum(`
			`1 for i in range(len(preds)) if int(preds[i] > 0.5) != labels[i]`
			`) / float(len(preds))`
add Dart booster (#1220) 2016-06-09 06:04:01 +09:00			`# error must be smaller than 10%`
			`assert err < 0.1`

Use the pytest tmp path fixture. (#12019) 2026-02-22 10:02:35 +08:00			`dtest_path = tmp_path / "dtest.dmatrix"`
			`model_path = tmp_path / "xgboost.model.dart.ubj"`
			`# save dmatrix into binary buffer`
			`dtest.save_binary(dtest_path)`
			`# save model`
			`bst.save_model(model_path)`
			`# load model and data in`
			`bst2 = xgb.Booster(params=param, model_file=model_path)`
			`dtest2 = xgb.DMatrix(dtest_path)`
Fix plotting test. (#6040) Previously the test loads a model generated by `test_basic.py`, now we generate the model explicitly. * Cleanup saved files for basic tests. 2020-08-22 13:18:48 +08:00
Remove ntree limit in python package. (#8345) - Remove `ntree_limit`. The parameter has been deprecated since 1.4.0. - The SHAP package compatibility is broken. 2023-03-31 19:01:55 +08:00			`preds2 = bst2.predict(dtest2, iteration_range=(0, num_round))`
Fix plotting test. (#6040) Previously the test loads a model generated by `test_basic.py`, now we generate the model explicitly. * Cleanup saved files for basic tests. 2020-08-22 13:18:48 +08:00
add Dart booster (#1220) 2016-06-09 06:04:01 +09:00			`# assert they are the same`
			`assert np.sum(np.abs(preds2 - preds)) == 0`

[Breaking] Don't drop trees during DART prediction by default (#5115) * Simplify DropTrees calling logic * Add `training` parameter for prediction method. * [Breaking]: Add `training` to C API. * Change for R and Python custom objective. * Correct comment. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> 2020-01-13 08:48:30 -05:00			`def my_logloss(preds, dtrain):`
			`labels = dtrain.get_label()`
Remove deprecated `feval`. (#11051) --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2024-12-06 16:48:17 +08:00			`return "logloss", np.sum(np.log(np.where(labels, preds, 1 - preds)))`
[Breaking] Don't drop trees during DART prediction by default (#5115) * Simplify DropTrees calling logic * Add `training` parameter for prediction method. * [Breaking]: Add `training` to C API. * Change for R and Python custom objective. * Correct comment. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> 2020-01-13 08:48:30 -05:00
			`# check whether custom evaluation metrics work`
Remove deprecated `feval`. (#11051) --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2024-12-06 16:48:17 +08:00			`bst = xgb.train(`
			`param, dtrain, num_round, evals=watchlist, custom_metric=my_logloss`
			`)`
Remove ntree limit in python package. (#8345) - Remove `ntree_limit`. The parameter has been deprecated since 1.4.0. - The SHAP package compatibility is broken. 2023-03-31 19:01:55 +08:00			`preds3 = bst.predict(dtest, iteration_range=(0, num_round))`
[Breaking] Don't drop trees during DART prediction by default (#5115) * Simplify DropTrees calling logic * Add `training` parameter for prediction method. * [Breaking]: Add `training` to C API. * Change for R and Python custom objective. * Correct comment. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> 2020-01-13 08:48:30 -05:00			`assert all(preds3 == preds)`

add Dart booster (#1220) 2016-06-09 06:04:01 +09:00			`# check whether sample_type and normalize_type work`
			`num_round = 50`
Remove deprecated `feval`. (#11051) --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2024-12-06 16:48:17 +08:00			`param["learning_rate"] = 0.1`
			`param["rate_drop"] = 0.1`
add Dart booster (#1220) 2016-06-09 06:04:01 +09:00			`preds_list = []`
Remove deprecated `feval`. (#11051) --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2024-12-06 16:48:17 +08:00			`for p in [`
			`[p0, p1] for p0 in ["uniform", "weighted"] for p1 in ["tree", "forest"]`
			`]:`
			`param["sample_type"] = p[0]`
			`param["normalize_type"] = p[1]`
			`bst = xgb.train(param, dtrain, num_round, evals=watchlist)`
Remove ntree limit in python package. (#8345) - Remove `ntree_limit`. The parameter has been deprecated since 1.4.0. - The SHAP package compatibility is broken. 2023-03-31 19:01:55 +08:00			`preds = bst.predict(dtest, iteration_range=(0, num_round))`
Remove deprecated `feval`. (#11051) --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2024-12-06 16:48:17 +08:00			`err = sum(`
			`1 for i in range(len(preds)) if int(preds[i] > 0.5) != labels[i]`
			`) / float(len(preds))`
add Dart booster (#1220) 2016-06-09 06:04:01 +09:00			`assert err < 0.1`
			`preds_list.append(preds)`

			`for ii in range(len(preds_list)):`
			`for jj in range(ii + 1, len(preds_list)):`
			`assert np.sum(np.abs(preds_list[ii] - preds_list[jj])) > 0`

Add base margin to sklearn interface. (#5151) 2019-12-24 09:43:41 +08:00			`def test_boost_from_prediction(self):`
			`# Re-construct dtrain here to avoid modification`
[Breaking] Require format to be specified in input URI. (#9077) Previously, we use `libsvm` as default when format is not specified. However, the dmlc data parser is not particularly robust against errors, and the most common type of error is undefined format. Along with which, we will recommend users to use other data loader instead. We will continue the maintenance of the parsers as it's currently used for many internal tests including federated learning. 2023-04-28 19:45:15 +08:00			`margined, _ = tm.load_agaricus(__file__)`
Handle np integer in model slice and prediction. (#10007) 2024-01-26 04:58:48 +08:00			`bst = xgb.train({"tree_method": "hist"}, margined, 1)`
Add base margin to sklearn interface. (#5151) 2019-12-24 09:43:41 +08:00			`predt_0 = bst.predict(margined, output_margin=True)`
			`margined.set_base_margin(predt_0)`
Handle np integer in model slice and prediction. (#10007) 2024-01-26 04:58:48 +08:00			`bst = xgb.train({"tree_method": "hist"}, margined, 1)`
Add base margin to sklearn interface. (#5151) 2019-12-24 09:43:41 +08:00			`predt_1 = bst.predict(margined)`

			`assert np.any(np.abs(predt_1 - predt_0) > 1e-6)`
[Breaking] Require format to be specified in input URI. (#9077) Previously, we use `libsvm` as default when format is not specified. However, the dmlc data parser is not particularly robust against errors, and the most common type of error is undefined format. Along with which, we will recommend users to use other data loader instead. We will continue the maintenance of the parsers as it's currently used for many internal tests including federated learning. 2023-04-28 19:45:15 +08:00			`dtrain, _ = tm.load_agaricus(__file__)`
Handle np integer in model slice and prediction. (#10007) 2024-01-26 04:58:48 +08:00			`bst = xgb.train({"tree_method": "hist"}, dtrain, 2)`
Add base margin to sklearn interface. (#5151) 2019-12-24 09:43:41 +08:00			`predt_2 = bst.predict(dtrain)`
			`assert np.all(np.abs(predt_2 - predt_1) < 1e-6)`

Additional tests for attributes and model booosted rounds. (#9962) 2024-01-09 09:54:39 +08:00			`def test_boost_from_existing_model(self) -> None:`
[Breaking] Require format to be specified in input URI. (#9077) Previously, we use `libsvm` as default when format is not specified. However, the dmlc data parser is not particularly robust against errors, and the most common type of error is undefined format. Along with which, we will recommend users to use other data loader instead. We will continue the maintenance of the parsers as it's currently used for many internal tests including federated learning. 2023-04-28 19:45:15 +08:00			`X, _ = tm.load_agaricus(__file__)`
Additional tests for attributes and model booosted rounds. (#9962) 2024-01-09 09:54:39 +08:00			`booster = xgb.train({"tree_method": "hist"}, X, num_boost_round=4)`
Support early stopping with training continuation, correct num boosted rounds. (#6506) * Implement early stopping with training continuation. * Add new C API for obtaining boosted rounds. * Fix off by 1 in `save_best`. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-12-17 19:59:19 +08:00			`assert booster.num_boosted_rounds() == 4`
Additional tests for attributes and model booosted rounds. (#9962) 2024-01-09 09:54:39 +08:00			`booster.set_param({"tree_method": "approx"})`
			`assert booster.num_boosted_rounds() == 4`
			`booster = xgb.train(`
			`{"tree_method": "hist"}, X, num_boost_round=4, xgb_model=booster`
			`)`
Support early stopping with training continuation, correct num boosted rounds. (#6506) * Implement early stopping with training continuation. * Add new C API for obtaining boosted rounds. * Fix off by 1 in `save_best`. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-12-17 19:59:19 +08:00			`assert booster.num_boosted_rounds() == 8`
Additional tests for attributes and model booosted rounds. (#9962) 2024-01-09 09:54:39 +08:00			with pytest.warns(UserWarning, match="`updater`"):
			`booster = xgb.train(`
			`{"updater": "prune", "process_type": "update"},`
			`X,`
			`num_boost_round=4,`
			`xgb_model=booster,`
			`)`
Support early stopping with training continuation, correct num boosted rounds. (#6506) * Implement early stopping with training continuation. * Add new C API for obtaining boosted rounds. * Fix off by 1 in `save_best`. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-12-17 19:59:19 +08:00			`# Trees are moved for update, the rounds is reduced. This test is`
			`# written for being compatible with current code (1.0.0). If the`
			`# behaviour is considered sub-optimal, feel free to change.`
			`assert booster.num_boosted_rounds() == 4`

Additional tests for attributes and model booosted rounds. (#9962) 2024-01-09 09:54:39 +08:00			`booster = xgb.train({"booster": "gblinear"}, X, num_boost_round=4)`
			`assert booster.num_boosted_rounds() == 4`
			`booster.set_param({"updater": "coord_descent"})`
			`assert booster.num_boosted_rounds() == 4`
			`booster.set_param({"updater": "shotgun"})`
			`assert booster.num_boosted_rounds() == 4`
			`booster = xgb.train(`
			`{"booster": "gblinear"}, X, num_boost_round=4, xgb_model=booster`
			`)`
			`assert booster.num_boosted_rounds() == 8`

Cleanup `gpu_hist` in Python tests. (#11402) 2025-04-15 14:28:49 +08:00			`def test_custom_objective(self) -> None:`
[Breaking] Require format to be specified in input URI. (#9077) Previously, we use `libsvm` as default when format is not specified. However, the dmlc data parser is not particularly robust against errors, and the most common type of error is undefined format. Along with which, we will recommend users to use other data loader instead. We will continue the maintenance of the parsers as it's currently used for many internal tests including federated learning. 2023-04-28 19:45:15 +08:00			`dtrain, dtest = tm.load_agaricus(__file__)`
Cleanup `gpu_hist` in Python tests. (#11402) 2025-04-15 14:28:49 +08:00			`run_custom_objective("hist", "cpu", dtrain, dtest)`
Fix `gpu_id` with custom objective. (#7015) 2021-06-09 14:51:17 +08:00
Cleanup `gpu_hist` in Python tests. (#11402) 2025-04-15 14:28:49 +08:00			`def test_multi_eval_metric(self) -> None:`
[Breaking] Require format to be specified in input URI. (#9077) Previously, we use `libsvm` as default when format is not specified. However, the dmlc data parser is not particularly robust against errors, and the most common type of error is undefined format. Along with which, we will recommend users to use other data loader instead. We will continue the maintenance of the parsers as it's currently used for many internal tests including federated learning. 2023-04-28 19:45:15 +08:00			`dtrain, dtest = tm.load_agaricus(__file__)`
Remove deprecated `feval`. (#11051) --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2024-12-06 16:48:17 +08:00			`watchlist = [(dtest, "eval"), (dtrain, "train")]`
			`param = {`
			`"max_depth": 2,`
			`"eta": 0.2,`
			`"verbosity": 1,`
			`"objective": "binary:logistic",`
			`}`
			`param["eval_metric"] = ["auc", "logloss", "error"]`
Fixes for multiple and default metric (#1239) * fix multiple evaluation metrics * create DefaultEvalMetric only when really necessary * py test for #1239 * make travis happy 2016-06-05 00:17:35 -05:00			`evals_result = {}`
Remove deprecated `feval`. (#11051) --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2024-12-06 16:48:17 +08:00			`bst = xgb.train(param, dtrain, 4, evals=watchlist, evals_result=evals_result)`
Fixes for multiple and default metric (#1239) * fix multiple evaluation metrics * create DefaultEvalMetric only when really necessary * py test for #1239 * make travis happy 2016-06-05 00:17:35 -05:00			`assert isinstance(bst, xgb.core.Booster)`
Remove deprecated `feval`. (#11051) --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2024-12-06 16:48:17 +08:00			`assert len(evals_result["eval"]) == 3`
			`assert set(evals_result["eval"].keys()) == {"auc", "error", "logloss"}`
Fixes for multiple and default metric (#1239) * fix multiple evaluation metrics * create DefaultEvalMetric only when really necessary * py test for #1239 * make travis happy 2016-06-05 00:17:35 -05:00
Enable flake8 2016-04-24 16:34:46 +09:00			`def test_fpreproc(self):`
Remove deprecated `feval`. (#11051) --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2024-12-06 16:48:17 +08:00			`param = {"max_depth": 2, "eta": 1, "objective": "binary:logistic"}`
Enable flake8 2016-04-24 16:34:46 +09:00			`num_round = 2`

			`def fpreproc(dtrain, dtest, param):`
			`label = dtrain.get_label()`
			`ratio = float(np.sum(label == 0)) / np.sum(label == 1)`
Remove deprecated `feval`. (#11051) --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2024-12-06 16:48:17 +08:00			`param["scale_pos_weight"] = ratio`
Enable flake8 2016-04-24 16:34:46 +09:00			`return (dtrain, dtest, param)`

[Breaking] Require format to be specified in input URI. (#9077) Previously, we use `libsvm` as default when format is not specified. However, the dmlc data parser is not particularly robust against errors, and the most common type of error is undefined format. Along with which, we will recommend users to use other data loader instead. We will continue the maintenance of the parsers as it's currently used for many internal tests including federated learning. 2023-04-28 19:45:15 +08:00			`dtrain, _ = tm.load_agaricus(__file__)`
Remove deprecated `feval`. (#11051) --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2024-12-06 16:48:17 +08:00			`xgb.cv(`
			`param,`
			`dtrain,`
			`num_round,`
			`nfold=5,`
			`metrics={"auc"},`
			`seed=0,`
			`fpreproc=fpreproc,`
			`)`
Enable flake8 2016-04-24 16:34:46 +09:00
			`def test_show_stdv(self):`
Remove deprecated `feval`. (#11051) --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2024-12-06 16:48:17 +08:00			`param = {"max_depth": 2, "eta": 1, "objective": "binary:logistic"}`
Enable flake8 2016-04-24 16:34:46 +09:00			`num_round = 2`
[Breaking] Require format to be specified in input URI. (#9077) Previously, we use `libsvm` as default when format is not specified. However, the dmlc data parser is not particularly robust against errors, and the most common type of error is undefined format. Along with which, we will recommend users to use other data loader instead. We will continue the maintenance of the parsers as it's currently used for many internal tests including federated learning. 2023-04-28 19:45:15 +08:00			`dtrain, _ = tm.load_agaricus(__file__)`
Remove deprecated `feval`. (#11051) --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2024-12-06 16:48:17 +08:00			`xgb.cv(`
			`param,`
			`dtrain,`
			`num_round,`
			`nfold=5,`
			`metrics={"error"},`
			`seed=0,`
			`show_stdv=False,`
			`)`
Bug mixing DMatrix's with and without feature names 2016-04-29 13:51:34 +09:00
Use the pytest tmp path fixture. (#12019) 2026-02-22 10:02:35 +08:00			`def test_prediction_cache(self, tmp_path: Path) -> None:`
Clear all cache after model load. (#8904) 2023-03-14 22:09:36 +08:00			`X, y = tm.make_sparse_regression(512, 4, 0.5, as_dense=False)`
			`Xy = xgb.DMatrix(X, y)`
			`param = {"max_depth": 8}`
			`booster = xgb.train(param, Xy, num_boost_round=1)`
Use the pytest tmp path fixture. (#12019) 2026-02-22 10:02:35 +08:00			`path = tmp_path / "model.json"`
			`booster.save_model(path)`
Clear all cache after model load. (#8904) 2023-03-14 22:09:36 +08:00
Use the pytest tmp path fixture. (#12019) 2026-02-22 10:02:35 +08:00			`predt_0 = booster.predict(Xy)`
Clear all cache after model load. (#8904) 2023-03-14 22:09:36 +08:00
Use the pytest tmp path fixture. (#12019) 2026-02-22 10:02:35 +08:00			`param["max_depth"] = 2`
Clear all cache after model load. (#8904) 2023-03-14 22:09:36 +08:00
Use the pytest tmp path fixture. (#12019) 2026-02-22 10:02:35 +08:00			`booster = xgb.train(param, Xy, num_boost_round=1)`
			`predt_1 = booster.predict(Xy)`
			`assert not np.isclose(predt_0, predt_1).all()`
Clear all cache after model load. (#8904) 2023-03-14 22:09:36 +08:00
Use the pytest tmp path fixture. (#12019) 2026-02-22 10:02:35 +08:00			`booster.load_model(path)`
			`predt_2 = booster.predict(Xy)`
			`np.testing.assert_allclose(predt_0, predt_2)`
Clear all cache after model load. (#8904) 2023-03-14 22:09:36 +08:00
Bug mixing DMatrix's with and without feature names 2016-04-29 13:51:34 +09:00			`def test_feature_names_validation(self):`
			`X = np.random.random((10, 3))`
			`y = np.random.randint(2, size=(10,))`

[breaking] Save booster feature info in JSON, remove feature name generation. (#6605) * Save feature info in booster in JSON model. * [breaking] Remove automatic feature name generation in `DMatrix`. This PR is to enable reliable feature validation in Python package. 2021-02-25 18:54:16 +08:00			`dm1 = xgb.DMatrix(X, y, feature_names=("a", "b", "c"))`
			`dm2 = xgb.DMatrix(X, y)`
Bug mixing DMatrix's with and without feature names 2016-04-29 13:51:34 +09:00
			`bst = xgb.train([], dm1)`
			`bst.predict(dm1) # success`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`with pytest.raises(ValueError):`
			`bst.predict(dm2)`
Bug mixing DMatrix's with and without feature names 2016-04-29 13:51:34 +09:00			`bst.predict(dm1) # success`

			`bst = xgb.train([], dm2)`
			`bst.predict(dm2) # success`
Model IO in JSON. (#5110) 2019-12-11 11:20:40 +08:00
Fix feature names with special characters. (#9923) 2023-12-28 22:45:13 +08:00			`def test_special_model_dump_characters(self) -> None:`
Handle special characters in JSON model dump. (#9474) 2023-08-14 15:49:00 +08:00			`params = {"objective": "reg:squarederror", "max_depth": 3}`
Fix feature names with special characters. (#9923) 2023-12-28 22:45:13 +08:00			`feature_names = ['"feature 0"', "\tfeature\n1", """feature "2"."""]`
Handle special characters in JSON model dump. (#9474) 2023-08-14 15:49:00 +08:00			`X, y, w = tm.make_regression(n_samples=128, n_features=3, use_cupy=False)`
			`Xy = xgb.DMatrix(X, label=y, feature_names=feature_names)`
			`booster = xgb.train(params, Xy, num_boost_round=3)`
Fix feature names with special characters. (#9923) 2023-12-28 22:45:13 +08:00
Handle special characters in JSON model dump. (#9474) 2023-08-14 15:49:00 +08:00			`json_dump = booster.get_dump(dump_format="json")`
			`assert len(json_dump) == 3`

Fix feature names with special characters. (#9923) 2023-12-28 22:45:13 +08:00			`def validate_json(obj: dict) -> None:`
Handle special characters in JSON model dump. (#9474) 2023-08-14 15:49:00 +08:00			`for k, v in obj.items():`
			`if k == "split":`
			`assert v in feature_names`
			`elif isinstance(v, dict):`
Fix feature names with special characters. (#9923) 2023-12-28 22:45:13 +08:00			`validate_json(v)`
Handle special characters in JSON model dump. (#9474) 2023-08-14 15:49:00 +08:00
			`for j_tree in json_dump:`
			`loaded = json.loads(j_tree)`
Fix feature names with special characters. (#9923) 2023-12-28 22:45:13 +08:00			`validate_json(loaded)`

			`dot_dump = booster.get_dump(dump_format="dot")`
			`for d in dot_dump:`
			`assert d.find(r"feature \"2\"") != -1`

			`text_dump = booster.get_dump(dump_format="text")`
			`for d in text_dump:`
			`assert d.find(r"feature \"2\"") != -1`
Handle special characters in JSON model dump. (#9474) 2023-08-14 15:49:00 +08:00
Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`def run_slice(`
			`self,`
			`booster: xgb.Booster,`
			`dtrain: xgb.DMatrix,`
			`num_parallel_tree: int,`
			`num_classes: int,`
Handle np integer in model slice and prediction. (#10007) 2024-01-26 04:58:48 +08:00			`num_boost_round: int,`
			`use_np_type: bool,`
Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`):`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`beg = 3`
Handle np integer in model slice and prediction. (#10007) 2024-01-26 04:58:48 +08:00			`if use_np_type:`
			`end: Integer = np.int32(7)`
			`else:`
			`end = 7`

Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`sliced: xgb.Booster = booster[beg:end]`
Fix feature names and types in output model slice. (#7078) 2021-07-06 11:47:49 +08:00			`assert sliced.feature_types == booster.feature_types`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00
			`sliced_trees = (end - beg) * num_parallel_tree * num_classes`
			`assert sliced_trees == len(sliced.get_dump())`

			`sliced_trees = sliced_trees // 2`
Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`sliced = booster[beg:end:2]`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`assert sliced_trees == len(sliced.get_dump())`

Handle np integer in model slice and prediction. (#10007) 2024-01-26 04:58:48 +08:00			`sliced = booster[beg:]`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`sliced_trees = (num_boost_round - beg) * num_parallel_tree * num_classes`
			`assert sliced_trees == len(sliced.get_dump())`

Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`sliced = booster[beg:]`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`sliced_trees = (num_boost_round - beg) * num_parallel_tree * num_classes`
			`assert sliced_trees == len(sliced.get_dump())`

Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`sliced = booster[:end]`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`sliced_trees = end * num_parallel_tree * num_classes`
			`assert sliced_trees == len(sliced.get_dump())`

Remove deprecated `feval`. (#11051) --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2024-12-06 16:48:17 +08:00			`sliced = booster[:end]`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`sliced_trees = end * num_parallel_tree * num_classes`
			`assert sliced_trees == len(sliced.get_dump())`

Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`with pytest.raises(ValueError, match=r">= 0"):`
			`booster[-1:0]`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00
			`# we do not accept empty slice.`
[MT-TREE] Support prediction cache and model slicing. (#8968) - Fix prediction range. - Support prediction cache in mt-hist. - Support model slicing. - Make the booster a Python iterable by defining `__iter__`. - Cleanup removed/deprecated parameters. - A new field in the output model `iteration_indptr` for pointing to the ranges of trees for each iteration. 2023-03-27 23:10:54 +08:00			`with pytest.raises(ValueError, match="Empty slice"):`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`booster[1:1]`
			`# stop can not be smaller than begin`
Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`with pytest.raises(ValueError, match=r"Invalid.*"):`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`booster[3:0]`
Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`with pytest.raises(ValueError, match=r"Invalid.*"):`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`booster[3:-1]`
			`# negative step is not supported.`
Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`with pytest.raises(ValueError, match=r".>= 1."):`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`booster[0:2:-1]`
			`# step can not be 0.`
Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`with pytest.raises(ValueError, match=r".>= 1."):`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`booster[0:2:0]`

			`trees = [_ for _ in booster]`
			`assert len(trees) == num_boost_round`

			`with pytest.raises(TypeError):`
Handle np integer in model slice and prediction. (#10007) 2024-01-26 04:58:48 +08:00			`booster["wrong type"] # type: ignore`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`with pytest.raises(IndexError):`
Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`booster[: num_boost_round + 1]`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`with pytest.raises(ValueError):`
Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`booster[1, 2] # too many dims`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`# setitem is not implemented as model is immutable during slicing.`
			`with pytest.raises(TypeError):`
Handle np integer in model slice and prediction. (#10007) 2024-01-26 04:58:48 +08:00			`booster[:end] = booster # type: ignore`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00
			`sliced_0 = booster[1:3]`
[breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668) * Add a new API function for predicting on `DMatrix`. This function aligns with rest of the `XGBoosterPredictFrom` functions on semantic of function arguments. Purge `ntree_limit` from libxgboost, use iteration instead. * [dask] Use `inplace_predict` by default for dask sklearn models. * [dask] Run prediction shape inference on worker instead of client. The breaking change is in the Python sklearn `apply` function, I made it to be consistent with other prediction functions where `best_iteration` is used by default. 2021-02-08 18:26:32 +08:00			`np.testing.assert_allclose(`
			`booster.predict(dtrain, iteration_range=(1, 3)), sliced_0.predict(dtrain)`
			`)`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`sliced_1 = booster[3:7]`
[breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668) * Add a new API function for predicting on `DMatrix`. This function aligns with rest of the `XGBoosterPredictFrom` functions on semantic of function arguments. Purge `ntree_limit` from libxgboost, use iteration instead. * [dask] Use `inplace_predict` by default for dask sklearn models. * [dask] Run prediction shape inference on worker instead of client. The breaking change is in the Python sklearn `apply` function, I made it to be consistent with other prediction functions where `best_iteration` is used by default. 2021-02-08 18:26:32 +08:00			`np.testing.assert_allclose(`
			`booster.predict(dtrain, iteration_range=(3, 7)), sliced_1.predict(dtrain)`
			`)`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00
			`predt_0 = sliced_0.predict(dtrain, output_margin=True)`
			`predt_1 = sliced_1.predict(dtrain, output_margin=True)`

[mt] Implement vector intercept. (#11656) - Extract the fit intercept procedures from the learner. - Implement vector transform in objectives. - Implement estimation for softmax objective. - Remove scalar intercept. It will always return a vector. - A lot of changes due to the new base score storage type. 2025-09-05 03:01:36 +08:00			`# base score.`
			`intercept = np.broadcast_to(np.array(get_basescore(booster)), predt_0.shape)`
			`merged = predt_0 + predt_1 - intercept`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`single = booster[1:7].predict(dtrain, output_margin=True)`
			`np.testing.assert_allclose(merged, single, atol=1e-6)`

			`sliced_0 = booster[1:7:2] # 1,3,5`
			`sliced_1 = booster[2:8:2] # 2,4,6`

			`predt_0 = sliced_0.predict(dtrain, output_margin=True)`
			`predt_1 = sliced_1.predict(dtrain, output_margin=True)`

[mt] Implement vector intercept. (#11656) - Extract the fit intercept procedures from the learner. - Implement vector transform in objectives. - Implement estimation for softmax objective. - Remove scalar intercept. It will always return a vector. - A lot of changes due to the new base score storage type. 2025-09-05 03:01:36 +08:00			`merged = predt_0 + predt_1 - intercept`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`single = booster[1:7].predict(dtrain, output_margin=True)`
			`np.testing.assert_allclose(merged, single, atol=1e-6)`
[breaking] Save booster feature info in JSON, remove feature name generation. (#6605) * Save feature info in booster in JSON model. * [breaking] Remove automatic feature name generation in `DMatrix`. This PR is to enable reliable feature validation in Python package. 2021-02-25 18:54:16 +08:00
Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`@pytest.mark.skipif(**tm.no_sklearn())`
[breaking] Bump Python requirement to 3.10. (#10434) - Bump the Python requirement. - Fix type hints. - Use loky to avoid deadlock. - Workaround cupy-numpy compatibility issue on Windows caused by the `safe` casting rule. - Simplify the repartitioning logic to avoid dask errors. 2024-07-30 17:31:06 +08:00			`@pytest.mark.parametrize("booster_name", ["gbtree", "dart"])`
			`def test_slice(self, booster_name: str) -> None:`
Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`from sklearn.datasets import make_classification`

			`num_classes = 3`
			`X, y = make_classification(`
			`n_samples=1000, n_informative=5, n_classes=num_classes`
			`)`
			`dtrain = xgb.DMatrix(data=X, label=y)`
			`num_parallel_tree = 4`
			`num_boost_round = 16`
			`total_trees = num_parallel_tree * num_classes * num_boost_round`
			`booster = xgb.train(`
			`{`
			`"num_parallel_tree": num_parallel_tree,`
			`"subsample": 0.5,`
			`"num_class": num_classes,`
[breaking] Bump Python requirement to 3.10. (#10434) - Bump the Python requirement. - Fix type hints. - Use loky to avoid deadlock. - Workaround cupy-numpy compatibility issue on Windows caused by the `safe` casting rule. - Simplify the repartitioning logic to avoid dask errors. 2024-07-30 17:31:06 +08:00			`"booster": booster_name,`
Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`"objective": "multi:softprob",`
			`},`
			`num_boost_round=num_boost_round,`
			`dtrain=dtrain,`
			`)`
			`booster.feature_types = ["q"] * X.shape[1]`

			`assert len(booster.get_dump()) == total_trees`

[breaking] Bump Python requirement to 3.10. (#10434) - Bump the Python requirement. - Fix type hints. - Use loky to avoid deadlock. - Workaround cupy-numpy compatibility issue on Windows caused by the `safe` casting rule. - Simplify the repartitioning logic to avoid dask errors. 2024-07-30 17:31:06 +08:00			`assert booster[...].num_boosted_rounds() == num_boost_round`

Handle np integer in model slice and prediction. (#10007) 2024-01-26 04:58:48 +08:00			`self.run_slice(`
			`booster, dtrain, num_parallel_tree, num_classes, num_boost_round, False`
			`)`
Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00
			`bytesarray = booster.save_raw(raw_format="ubj")`
			`booster = xgb.Booster(model_file=bytesarray)`
Handle np integer in model slice and prediction. (#10007) 2024-01-26 04:58:48 +08:00			`self.run_slice(`
			`booster, dtrain, num_parallel_tree, num_classes, num_boost_round, False`
			`)`
Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00
[breaking] Save booster feature info in JSON, remove feature name generation. (#6605) * Save feature info in booster in JSON model. * [breaking] Remove automatic feature name generation in `DMatrix`. This PR is to enable reliable feature validation in Python package. 2021-02-25 18:54:16 +08:00			`@pytest.mark.skipif(**tm.no_pandas())`
Additional tests for attributes and model booosted rounds. (#9962) 2024-01-09 09:54:39 +08:00			`@pytest.mark.parametrize("ext", ["json", "ubj"])`
Use the pytest tmp path fixture. (#12019) 2026-02-22 10:02:35 +08:00			`def test_feature_info(self, ext: str, tmp_path: Path) -> None:`
[breaking] Save booster feature info in JSON, remove feature name generation. (#6605) * Save feature info in booster in JSON model. * [breaking] Remove automatic feature name generation in `DMatrix`. This PR is to enable reliable feature validation in Python package. 2021-02-25 18:54:16 +08:00			`import pandas as pd`
Fix feature names with special characters. (#9923) 2023-12-28 22:45:13 +08:00
Additional tests for attributes and model booosted rounds. (#9962) 2024-01-09 09:54:39 +08:00			`# make data`
[breaking] Save booster feature info in JSON, remove feature name generation. (#6605) * Save feature info in booster in JSON model. * [breaking] Remove automatic feature name generation in `DMatrix`. This PR is to enable reliable feature validation in Python package. 2021-02-25 18:54:16 +08:00			`rows = 100`
			`cols = 10`
Cleanup `gpu_hist` in Python tests. (#11402) 2025-04-15 14:28:49 +08:00			`rng = np.random.RandomState(1994)`
[breaking] Save booster feature info in JSON, remove feature name generation. (#6605) * Save feature info in booster in JSON model. * [breaking] Remove automatic feature name generation in `DMatrix`. This PR is to enable reliable feature validation in Python package. 2021-02-25 18:54:16 +08:00			`X = rng.randn(rows, cols)`
			`y = rng.randn(rows)`
Additional tests for attributes and model booosted rounds. (#9962) 2024-01-09 09:54:39 +08:00
			`# Test with pandas, which has feature info.`
[breaking] Save booster feature info in JSON, remove feature name generation. (#6605) * Save feature info in booster in JSON model. * [breaking] Remove automatic feature name generation in `DMatrix`. This PR is to enable reliable feature validation in Python package. 2021-02-25 18:54:16 +08:00			`feature_names = ["test_feature_" + str(i) for i in range(cols)]`
			`X_pd = pd.DataFrame(X, columns=feature_names)`
Fix tests with pandas 2.0. (#9014) * Fix tests with pandas 2.0. - `is_categorical` is replaced by `is_categorical_dtype`. - one hot encoding returns boolean type instead of integer type. 2023-04-11 00:17:34 +08:00			`X_pd[f"test_feature_{3}"] = X_pd.iloc[:, 3].astype(np.int32)`
[breaking] Save booster feature info in JSON, remove feature name generation. (#6605) * Save feature info in booster in JSON model. * [breaking] Remove automatic feature name generation in `DMatrix`. This PR is to enable reliable feature validation in Python package. 2021-02-25 18:54:16 +08:00
			`Xy = xgb.DMatrix(X_pd, y)`
Additional tests for attributes and model booosted rounds. (#9962) 2024-01-09 09:54:39 +08:00			`assert Xy.feature_types is not None`
[breaking] Save booster feature info in JSON, remove feature name generation. (#6605) * Save feature info in booster in JSON model. * [breaking] Remove automatic feature name generation in `DMatrix`. This PR is to enable reliable feature validation in Python package. 2021-02-25 18:54:16 +08:00			`assert Xy.feature_types[3] == "int"`
			`booster = xgb.train({}, dtrain=Xy, num_boost_round=1)`

			`assert booster.feature_names == Xy.feature_names`
			`assert booster.feature_names == feature_names`
			`assert booster.feature_types == Xy.feature_types`

Use the pytest tmp path fixture. (#12019) 2026-02-22 10:02:35 +08:00			`path = tmp_path / f"model.{ext}"`
			`booster.save_model(path)`
			`booster = xgb.Booster()`
			`booster.load_model(path)`
[breaking] Save booster feature info in JSON, remove feature name generation. (#6605) * Save feature info in booster in JSON model. * [breaking] Remove automatic feature name generation in `DMatrix`. This PR is to enable reliable feature validation in Python package. 2021-02-25 18:54:16 +08:00
Use the pytest tmp path fixture. (#12019) 2026-02-22 10:02:35 +08:00			`assert booster.feature_names == Xy.feature_names`
			`assert booster.feature_types == Xy.feature_types`
Additional tests for attributes and model booosted rounds. (#9962) 2024-01-09 09:54:39 +08:00
			`# Test with numpy, no feature info is set`
			`Xy = xgb.DMatrix(X, y)`
			`assert Xy.feature_names is None`
			`assert Xy.feature_types is None`

			`booster = xgb.train({}, dtrain=Xy, num_boost_round=1)`
			`assert booster.feature_names is None`
			`assert booster.feature_types is None`

			`# test explicitly set`
			`fns = [str(i) for i in range(cols)]`
			`booster.feature_names = fns`

			`assert booster.feature_names == fns`

Use the pytest tmp path fixture. (#12019) 2026-02-22 10:02:35 +08:00			`path = tmp_path / f"model2.{ext}"`
			`booster.save_model(path)`
Additional tests for attributes and model booosted rounds. (#9962) 2024-01-09 09:54:39 +08:00
Use the pytest tmp path fixture. (#12019) 2026-02-22 10:02:35 +08:00			`booster = xgb.Booster(model_file=path)`
			`assert booster.feature_names == fns`