Blame: tests/python/test_basic_models.py - dmlc/xgboost

dmlc / xgboost UNCLAIMED

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

0 0 111 C++

Normal View History Raw

Model IO in JSON. (#5110) 2019-12-11 11:20:40 +08:00			`import json`
Fix changing locale. (#5314) * Fix changing locale. * Don't use locale guard. As number parsing is implemented in house, we don't need locale. * Update doc. 2020-02-17 11:31:13 +08:00			`import locale`
Move Python testing utilities into xgboost module. (#8379) - Add typehints. - Fixes for pylint. Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu> 2022-10-26 16:56:11 +08:00			`import os`
Fix plotting test. (#6040) Previously the test loads a model generated by `test_basic.py`, now we generate the model explicitly. * Cleanup saved files for basic tests. 2020-08-22 13:18:48 +08:00			`import tempfile`
TST: Added glm test for Python 2015-09-08 09:47:48 -04:00
Move Python testing utilities into xgboost module. (#8379) - Add typehints. - Fixes for pylint. Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu> 2022-10-26 16:56:11 +08:00			`import numpy as np`
			`import pytest`

			`import xgboost as xgb`
			`from xgboost import testing as tm`

			`dpath = tm.data_dir(__file__)`
TST: Added glm test for Python 2015-09-08 09:47:48 -04:00
Added fixed random seed for tests (+1 squashed commit) Squashed commits: [76e3664] Added fixed random seed for tests 2015-10-21 23:24:37 -05:00			`rng = np.random.RandomState(1994)`

Enable flake8 2016-04-24 16:34:46 +09:00
[breaking] Change internal model serialization to UBJSON. (#7556) * Use typed array for models. * Change the memory snapshot format. * Add new C API for saving to raw format. 2022-01-16 02:11:53 +08:00			`def json_model(model_path: str, parameters: dict) -> dict:`
Initial support for multi-target tree. (#8616) * Implement multi-target for hist. - Add new hist tree builder. - Move data fetchers for tests. - Dispatch function calls in gbm base on the tree type. 2023-03-22 23:49:56 +08:00			`datasets = pytest.importorskip("sklearn.datasets")`

			`X, y = datasets.make_classification(64, n_features=8, n_classes=3, n_informative=6)`
			`if parameters.get("objective", None) == "multi:softmax":`
			`parameters["num_class"] = 3`
Example JSON model parser and Schema. (#5137) 2019-12-23 19:47:35 +08:00
			`dm1 = xgb.DMatrix(X, y)`

Add dart to JSON schema. (#5218) * Add dart to JSON schema. * Use spaces instead of tab. 2020-01-28 13:29:09 +08:00			`bst = xgb.train(parameters, dm1)`
Example JSON model parser and Schema. (#5137) 2019-12-23 19:47:35 +08:00			`bst.save_model(model_path)`
Initial support for multi-target tree. (#8616) * Implement multi-target for hist. - Add new hist tree builder. - Move data fetchers for tests. - Dispatch function calls in gbm base on the tree type. 2023-03-22 23:49:56 +08:00
[breaking] Change internal model serialization to UBJSON. (#7556) * Use typed array for models. * Change the memory snapshot format. * Add new C API for saving to raw format. 2022-01-16 02:11:53 +08:00			`if model_path.endswith("ubj"):`
			`import ubjson`
			`with open(model_path, "rb") as ubjfd:`
			`model = ubjson.load(ubjfd)`
			`else:`
			`with open(model_path, 'r') as fd:`
			`model = json.load(fd)`
Example JSON model parser and Schema. (#5137) 2019-12-23 19:47:35 +08:00
			`return model`


Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`class TestModels:`
Enable flake8 2016-04-24 16:34:46 +09:00			`def test_glm(self):`
Deprecate `reg:linear' in favor of `reg:squarederror'. (#4267) * Deprecate `reg:linear' in favor of `reg:squarederror'. * Replace the use of `reg:linear'. * Replace the use of `silent`. 2019-03-17 17:55:04 +08:00			`param = {'verbosity': 0, 'objective': 'binary:logistic',`
Save Scikit-Learn attributes into learner attributes. (#5245) * Remove the recommendation for pickle. * Save skl attributes in booster.attr * Test loading scikit-learn model with native booster. 2020-01-30 16:00:18 +08:00			`'booster': 'gblinear', 'alpha': 0.0001, 'lambda': 1,`
			`'nthread': 1}`
Move Python testing utilities into xgboost module. (#8379) - Add typehints. - Fixes for pylint. Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu> 2022-10-26 16:56:11 +08:00			`dtrain = xgb.DMatrix(os.path.join(dpath, "agaricus.txt.train"))`
			`dtest = xgb.DMatrix(os.path.join(dpath, "agaricus.txt.test"))`
Enable flake8 2016-04-24 16:34:46 +09:00			`watchlist = [(dtest, 'eval'), (dtrain, 'train')]`
			`num_round = 4`
			`bst = xgb.train(param, dtrain, num_round, watchlist)`
			`assert isinstance(bst, xgb.core.Booster)`
			`preds = bst.predict(dtest)`
			`labels = dtest.get_label()`
			`err = sum(1 for i in range(len(preds))`
			`if int(preds[i] > 0.5) != labels[i]) / float(len(preds))`
Bumped up err assert in glm test (#1792) 2016-11-20 18:23:19 -06:00			`assert err < 0.2`
Enable flake8 2016-04-24 16:34:46 +09:00
add Dart booster (#1220) 2016-06-09 06:04:01 +09:00			`def test_dart(self):`
Move Python testing utilities into xgboost module. (#8379) - Add typehints. - Fixes for pylint. Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu> 2022-10-26 16:56:11 +08:00			`dtrain = xgb.DMatrix(os.path.join(dpath, "agaricus.txt.train"))`
			`dtest = xgb.DMatrix(os.path.join(dpath, "agaricus.txt.test"))`
[Breaking] Don't drop trees during DART prediction by default (#5115) * Simplify DropTrees calling logic * Add `training` parameter for prediction method. * [Breaking]: Add `training` to C API. * Change for R and Python custom objective. * Correct comment. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> 2020-01-13 08:48:30 -05:00			`param = {'max_depth': 5, 'objective': 'binary:logistic',`
			`'eval_metric': 'logloss', 'booster': 'dart', 'verbosity': 1}`
add Dart booster (#1220) 2016-06-09 06:04:01 +09:00			`# specify validations set to watch performance`
			`watchlist = [(dtest, 'eval'), (dtrain, 'train')]`
			`num_round = 2`
			`bst = xgb.train(param, dtrain, num_round, watchlist)`
			`# this is prediction`
			`preds = bst.predict(dtest, ntree_limit=num_round)`
			`labels = dtest.get_label()`
[Breaking] Don't drop trees during DART prediction by default (#5115) * Simplify DropTrees calling logic * Add `training` parameter for prediction method. * [Breaking]: Add `training` to C API. * Change for R and Python custom objective. * Correct comment. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> 2020-01-13 08:48:30 -05:00			`err = sum(1 for i in range(len(preds))`
			`if int(preds[i] > 0.5) != labels[i]) / float(len(preds))`
add Dart booster (#1220) 2016-06-09 06:04:01 +09:00			`# error must be smaller than 10%`
			`assert err < 0.1`

Fix plotting test. (#6040) Previously the test loads a model generated by `test_basic.py`, now we generate the model explicitly. * Cleanup saved files for basic tests. 2020-08-22 13:18:48 +08:00			`with tempfile.TemporaryDirectory() as tmpdir:`
			`dtest_path = os.path.join(tmpdir, 'dtest.dmatrix')`
			`model_path = os.path.join(tmpdir, 'xgboost.model.dart')`
			`# save dmatrix into binary buffer`
			`dtest.save_binary(dtest_path)`
			`model_path = model_path`
			`# save model`
			`bst.save_model(model_path)`
			`# load model and data in`
			`bst2 = xgb.Booster(params=param, model_file=model_path)`
			`dtest2 = xgb.DMatrix(dtest_path)`

add Dart booster (#1220) 2016-06-09 06:04:01 +09:00			`preds2 = bst2.predict(dtest2, ntree_limit=num_round)`
Fix plotting test. (#6040) Previously the test loads a model generated by `test_basic.py`, now we generate the model explicitly. * Cleanup saved files for basic tests. 2020-08-22 13:18:48 +08:00
add Dart booster (#1220) 2016-06-09 06:04:01 +09:00			`# assert they are the same`
			`assert np.sum(np.abs(preds2 - preds)) == 0`

[Breaking] Don't drop trees during DART prediction by default (#5115) * Simplify DropTrees calling logic * Add `training` parameter for prediction method. * [Breaking]: Add `training` to C API. * Change for R and Python custom objective. * Correct comment. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> 2020-01-13 08:48:30 -05:00			`def my_logloss(preds, dtrain):`
			`labels = dtrain.get_label()`
			`return 'logloss', np.sum(`
			`np.log(np.where(labels, preds, 1 - preds)))`

			`# check whether custom evaluation metrics work`
			`bst = xgb.train(param, dtrain, num_round, watchlist,`
			`feval=my_logloss)`
			`preds3 = bst.predict(dtest, ntree_limit=num_round)`
			`assert all(preds3 == preds)`

add Dart booster (#1220) 2016-06-09 06:04:01 +09:00			`# check whether sample_type and normalize_type work`
			`num_round = 50`
Deprecate `reg:linear' in favor of `reg:squarederror'. (#4267) * Deprecate `reg:linear' in favor of `reg:squarederror'. * Replace the use of `reg:linear'. * Replace the use of `silent`. 2019-03-17 17:55:04 +08:00			`param['verbosity'] = 0`
add Dart booster (#1220) 2016-06-09 06:04:01 +09:00			`param['learning_rate'] = 0.1`
			`param['rate_drop'] = 0.1`
			`preds_list = []`
[Breaking] Don't drop trees during DART prediction by default (#5115) * Simplify DropTrees calling logic * Add `training` parameter for prediction method. * [Breaking]: Add `training` to C API. * Change for R and Python custom objective. * Correct comment. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> 2020-01-13 08:48:30 -05:00			`for p in [[p0, p1] for p0 in ['uniform', 'weighted']`
			`for p1 in ['tree', 'forest']]:`
add Dart booster (#1220) 2016-06-09 06:04:01 +09:00			`param['sample_type'] = p[0]`
			`param['normalize_type'] = p[1]`
			`bst = xgb.train(param, dtrain, num_round, watchlist)`
			`preds = bst.predict(dtest, ntree_limit=num_round)`
[Breaking] Don't drop trees during DART prediction by default (#5115) * Simplify DropTrees calling logic * Add `training` parameter for prediction method. * [Breaking]: Add `training` to C API. * Change for R and Python custom objective. * Correct comment. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> 2020-01-13 08:48:30 -05:00			`err = sum(1 for i in range(len(preds))`
			`if int(preds[i] > 0.5) != labels[i]) / float(len(preds))`
add Dart booster (#1220) 2016-06-09 06:04:01 +09:00			`assert err < 0.1`
			`preds_list.append(preds)`

			`for ii in range(len(preds_list)):`
			`for jj in range(ii + 1, len(preds_list)):`
			`assert np.sum(np.abs(preds_list[ii] - preds_list[jj])) > 0`

Add base margin to sklearn interface. (#5151) 2019-12-24 09:43:41 +08:00			`def test_boost_from_prediction(self):`
			`# Re-construct dtrain here to avoid modification`
Move Python testing utilities into xgboost module. (#8379) - Add typehints. - Fixes for pylint. Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu> 2022-10-26 16:56:11 +08:00			`margined = xgb.DMatrix(os.path.join(dpath, "agaricus.txt.train"))`
Add base margin to sklearn interface. (#5151) 2019-12-24 09:43:41 +08:00			`bst = xgb.train({'tree_method': 'hist'}, margined, 1)`
			`predt_0 = bst.predict(margined, output_margin=True)`
			`margined.set_base_margin(predt_0)`
			`bst = xgb.train({'tree_method': 'hist'}, margined, 1)`
			`predt_1 = bst.predict(margined)`

			`assert np.any(np.abs(predt_1 - predt_0) > 1e-6)`
Move Python testing utilities into xgboost module. (#8379) - Add typehints. - Fixes for pylint. Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu> 2022-10-26 16:56:11 +08:00			`dtrain = xgb.DMatrix(os.path.join(dpath, "agaricus.txt.train"))`
Add base margin to sklearn interface. (#5151) 2019-12-24 09:43:41 +08:00			`bst = xgb.train({'tree_method': 'hist'}, dtrain, 2)`
			`predt_2 = bst.predict(dtrain)`
			`assert np.all(np.abs(predt_2 - predt_1) < 1e-6)`

Support early stopping with training continuation, correct num boosted rounds. (#6506) * Implement early stopping with training continuation. * Add new C API for obtaining boosted rounds. * Fix off by 1 in `save_best`. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-12-17 19:59:19 +08:00			`def test_boost_from_existing_model(self):`
Move Python testing utilities into xgboost module. (#8379) - Add typehints. - Fixes for pylint. Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu> 2022-10-26 16:56:11 +08:00			`X = xgb.DMatrix(os.path.join(dpath, "agaricus.txt.train"))`
Support early stopping with training continuation, correct num boosted rounds. (#6506) * Implement early stopping with training continuation. * Add new C API for obtaining boosted rounds. * Fix off by 1 in `save_best`. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-12-17 19:59:19 +08:00			`booster = xgb.train({'tree_method': 'hist'}, X, num_boost_round=4)`
			`assert booster.num_boosted_rounds() == 4`
			`booster = xgb.train({'tree_method': 'hist'}, X, num_boost_round=4,`
			`xgb_model=booster)`
			`assert booster.num_boosted_rounds() == 8`
			`booster = xgb.train({'updater': 'prune', 'process_type': 'update'}, X,`
			`num_boost_round=4, xgb_model=booster)`
			`# Trees are moved for update, the rounds is reduced. This test is`
			`# written for being compatible with current code (1.0.0). If the`
			`# behaviour is considered sub-optimal, feel free to change.`
			`assert booster.num_boosted_rounds() == 4`

Fix `gpu_id` with custom objective. (#7015) 2021-06-09 14:51:17 +08:00			`def run_custom_objective(self, tree_method=None):`
			`param = {`
			`'max_depth': 2,`
			`'eta': 1,`
			`'objective': 'reg:logistic',`
			`"tree_method": tree_method`
			`}`
Move Python testing utilities into xgboost module. (#8379) - Add typehints. - Fixes for pylint. Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu> 2022-10-26 16:56:11 +08:00			`dtrain = xgb.DMatrix(os.path.join(dpath, "agaricus.txt.train"))`
			`dtest = xgb.DMatrix(os.path.join(dpath, "agaricus.txt.test"))`
Enable flake8 2016-04-24 16:34:46 +09:00			`watchlist = [(dtest, 'eval'), (dtrain, 'train')]`
Update Python custom objective demo. (#5981) 2020-08-05 12:27:19 +08:00			`num_round = 10`
Enable flake8 2016-04-24 16:34:46 +09:00
			`def logregobj(preds, dtrain):`
			`labels = dtrain.get_label()`
			`preds = 1.0 / (1.0 + np.exp(-preds))`
			`grad = preds - labels`
			`hess = preds * (1.0 - preds)`
			`return grad, hess`

			`def evalerror(preds, dtrain):`
			`labels = dtrain.get_label()`
Update Python custom objective demo. (#5981) 2020-08-05 12:27:19 +08:00			`preds = 1.0 / (1.0 + np.exp(-preds))`
Fix r early stop with custom objective. (#5923) * Specify `ntreelimit`. 2020-07-23 03:28:17 +08:00			`return 'error', float(sum(labels != (preds > 0.5))) / len(labels)`
Enable flake8 2016-04-24 16:34:46 +09:00
			`# test custom_objective in training`
Update Python custom objective demo. (#5981) 2020-08-05 12:27:19 +08:00			`bst = xgb.train(param, dtrain, num_round, watchlist, obj=logregobj,`
			`feval=evalerror)`
Enable flake8 2016-04-24 16:34:46 +09:00			`assert isinstance(bst, xgb.core.Booster)`
			`preds = bst.predict(dtest)`
			`labels = dtest.get_label()`
			`err = sum(1 for i in range(len(preds))`
			`if int(preds[i] > 0.5) != labels[i]) / float(len(preds))`
			`assert err < 0.1`

			`# test custom_objective in cross-validation`
			`xgb.cv(param, dtrain, num_round, nfold=5, seed=0,`
			`obj=logregobj, feval=evalerror)`

			`# test maximize parameter`
			`def neg_evalerror(preds, dtrain):`
			`labels = dtrain.get_label()`
			`return 'error', float(sum(labels == (preds > 0.0))) / len(labels)`

Update Python custom objective demo. (#5981) 2020-08-05 12:27:19 +08:00			`bst2 = xgb.train(param, dtrain, num_round, watchlist, logregobj,`
			`neg_evalerror, maximize=True)`
Enable flake8 2016-04-24 16:34:46 +09:00			`preds2 = bst2.predict(dtest)`
			`err2 = sum(1 for i in range(len(preds2))`
			`if int(preds2[i] > 0.5) != labels[i]) / float(len(preds2))`
			`assert err == err2`

Fix `gpu_id` with custom objective. (#7015) 2021-06-09 14:51:17 +08:00			`def test_custom_objective(self):`
			`self.run_custom_objective()`

Fixes for multiple and default metric (#1239) * fix multiple evaluation metrics * create DefaultEvalMetric only when really necessary * py test for #1239 * make travis happy 2016-06-05 00:17:35 -05:00			`def test_multi_eval_metric(self):`
Move Python testing utilities into xgboost module. (#8379) - Add typehints. - Fixes for pylint. Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu> 2022-10-26 16:56:11 +08:00			`dtrain = xgb.DMatrix(os.path.join(dpath, "agaricus.txt.train"))`
			`dtest = xgb.DMatrix(os.path.join(dpath, "agaricus.txt.test"))`
Fixes for multiple and default metric (#1239) * fix multiple evaluation metrics * create DefaultEvalMetric only when really necessary * py test for #1239 * make travis happy 2016-06-05 00:17:35 -05:00			`watchlist = [(dtest, 'eval'), (dtrain, 'train')]`
Refactor configuration [Part II]. (#4577) * Refactor configuration [Part II]. * General changes: Remove `Init` methods to avoid ambiguity. Remove `Configure(std::map<>)` to avoid redundant copying and prepare for parameter validation. (`std::vector` is returned from `InitAllowUnknown`). ** Add name to tree updaters for easier debugging. * Learner changes: Make `LearnerImpl` the only source of configuration. All configurations are stored and carried out by `LearnerImpl::Configure()`. Remove booster in C API. Originally kept for "compatibility reason", but did not state why. So here we just remove it. Add a `metric_names_` field in `LearnerImpl`. Remove `LazyInit`. Configuration will always be lazy. ** Run `Configure` before every iteration. * Predictor changes: Allocate both cpu and gpu predictor. Remove cpu_predictor from gpu_predictor. `GBTree` is now used to dispatch the predictor. ** Remove some GPU Predictor tests. * IO No IO changes. The binary model format stability is tested by comparing hashing value of save models between two commits 2019-07-20 08:34:56 -04:00			`param = {'max_depth': 2, 'eta': 0.2, 'verbosity': 1,`
Deprecate `reg:linear' in favor of `reg:squarederror'. (#4267) * Deprecate `reg:linear' in favor of `reg:squarederror'. * Replace the use of `reg:linear'. * Replace the use of `silent`. 2019-03-17 17:55:04 +08:00			`'objective': 'binary:logistic'}`
Fixes for multiple and default metric (#1239) * fix multiple evaluation metrics * create DefaultEvalMetric only when really necessary * py test for #1239 * make travis happy 2016-06-05 00:17:35 -05:00			`param['eval_metric'] = ["auc", "logloss", 'error']`
			`evals_result = {}`
			`bst = xgb.train(param, dtrain, 4, watchlist, evals_result=evals_result)`
			`assert isinstance(bst, xgb.core.Booster)`
			`assert len(evals_result['eval']) == 3`
			`assert set(evals_result['eval'].keys()) == {'auc', 'error', 'logloss'}`

Enable flake8 2016-04-24 16:34:46 +09:00			`def test_fpreproc(self):`
Deprecate `reg:linear' in favor of `reg:squarederror'. (#4267) * Deprecate `reg:linear' in favor of `reg:squarederror'. * Replace the use of `reg:linear'. * Replace the use of `silent`. 2019-03-17 17:55:04 +08:00			`param = {'max_depth': 2, 'eta': 1, 'verbosity': 0,`
Enable flake8 2016-04-24 16:34:46 +09:00			`'objective': 'binary:logistic'}`
			`num_round = 2`

			`def fpreproc(dtrain, dtest, param):`
			`label = dtrain.get_label()`
			`ratio = float(np.sum(label == 0)) / np.sum(label == 1)`
			`param['scale_pos_weight'] = ratio`
			`return (dtrain, dtest, param)`

Move Python testing utilities into xgboost module. (#8379) - Add typehints. - Fixes for pylint. Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu> 2022-10-26 16:56:11 +08:00			`dtrain = xgb.DMatrix(os.path.join(dpath, "agaricus.txt.train"))`
Enable flake8 2016-04-24 16:34:46 +09:00			`xgb.cv(param, dtrain, num_round, nfold=5,`
			`metrics={'auc'}, seed=0, fpreproc=fpreproc)`

			`def test_show_stdv(self):`
Deprecate `reg:linear' in favor of `reg:squarederror'. (#4267) * Deprecate `reg:linear' in favor of `reg:squarederror'. * Replace the use of `reg:linear'. * Replace the use of `silent`. 2019-03-17 17:55:04 +08:00			`param = {'max_depth': 2, 'eta': 1, 'verbosity': 0,`
Enable flake8 2016-04-24 16:34:46 +09:00			`'objective': 'binary:logistic'}`
			`num_round = 2`
Move Python testing utilities into xgboost module. (#8379) - Add typehints. - Fixes for pylint. Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu> 2022-10-26 16:56:11 +08:00			`dtrain = xgb.DMatrix(os.path.join(dpath, "agaricus.txt.train"))`
Enable flake8 2016-04-24 16:34:46 +09:00			`xgb.cv(param, dtrain, num_round, nfold=5,`
			`metrics={'error'}, seed=0, show_stdv=False)`
Bug mixing DMatrix's with and without feature names 2016-04-29 13:51:34 +09:00
Clear all cache after model load. (#8904) 2023-03-14 22:09:36 +08:00			`def test_prediction_cache(self) -> None:`
			`X, y = tm.make_sparse_regression(512, 4, 0.5, as_dense=False)`
			`Xy = xgb.DMatrix(X, y)`
			`param = {"max_depth": 8}`
			`booster = xgb.train(param, Xy, num_boost_round=1)`
			`with tempfile.TemporaryDirectory() as tmpdir:`
			`path = os.path.join(tmpdir, "model.json")`
			`booster.save_model(path)`

			`predt_0 = booster.predict(Xy)`

			`param["max_depth"] = 2`

			`booster = xgb.train(param, Xy, num_boost_round=1)`
			`predt_1 = booster.predict(Xy)`
			`assert not np.isclose(predt_0, predt_1).all()`

			`booster.load_model(path)`
			`predt_2 = booster.predict(Xy)`
			`np.testing.assert_allclose(predt_0, predt_2)`

Bug mixing DMatrix's with and without feature names 2016-04-29 13:51:34 +09:00			`def test_feature_names_validation(self):`
			`X = np.random.random((10, 3))`
			`y = np.random.randint(2, size=(10,))`

[breaking] Save booster feature info in JSON, remove feature name generation. (#6605) * Save feature info in booster in JSON model. * [breaking] Remove automatic feature name generation in `DMatrix`. This PR is to enable reliable feature validation in Python package. 2021-02-25 18:54:16 +08:00			`dm1 = xgb.DMatrix(X, y, feature_names=("a", "b", "c"))`
			`dm2 = xgb.DMatrix(X, y)`
Bug mixing DMatrix's with and without feature names 2016-04-29 13:51:34 +09:00
			`bst = xgb.train([], dm1)`
			`bst.predict(dm1) # success`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`with pytest.raises(ValueError):`
			`bst.predict(dm2)`
Bug mixing DMatrix's with and without feature names 2016-04-29 13:51:34 +09:00			`bst.predict(dm1) # success`

			`bst = xgb.train([], dm2)`
			`bst.predict(dm2) # success`
Model IO in JSON. (#5110) 2019-12-11 11:20:40 +08:00
Merge model compatibility fixes from 1.0rc branch. (#5305) * Port test model compatibility. * Port logit model fix. https://github.com/dmlc/xgboost/pull/5248 https://github.com/dmlc/xgboost/pull/5281 2020-02-13 20:41:59 +08:00			`def test_model_binary_io(self):`
			`model_path = 'test_model_binary_io.bin'`
			`parameters = {'tree_method': 'hist', 'booster': 'gbtree',`
			`'scale_pos_weight': '0.5'}`
			`X = np.random.random((10, 3))`
			`y = np.random.random((10,))`
			`dtrain = xgb.DMatrix(X, y)`
			`bst = xgb.train(parameters, dtrain, num_boost_round=2)`
			`bst.save_model(model_path)`
			`bst = xgb.Booster(model_file=model_path)`
			`os.remove(model_path)`
			`config = json.loads(bst.save_config())`
			`assert float(config['learner']['objective'][`
			`'reg_loss_param']['scale_pos_weight']) == 0.5`

Restore loading model from buffer. (#5360) 2020-02-26 11:30:13 +08:00			`buf = bst.save_raw()`
			`from_raw = xgb.Booster()`
			`from_raw.load_model(buf)`

			`buf_from_raw = from_raw.save_raw()`
			`assert buf == buf_from_raw`

[breaking] Change internal model serialization to UBJSON. (#7556) * Use typed array for models. * Change the memory snapshot format. * Add new C API for saving to raw format. 2022-01-16 02:11:53 +08:00			`def run_model_json_io(self, parameters: dict, ext: str) -> None:`
			`if ext == "ubj" and tm.no_ubjson()["condition"]:`
			`pytest.skip(tm.no_ubjson()["reason"])`

Fix changing locale. (#5314) * Fix changing locale. * Don't use locale guard. As number parsing is implemented in house, we don't need locale. * Update doc. 2020-02-17 11:31:13 +08:00			`loc = locale.getpreferredencoding(False)`
[breaking] Change internal model serialization to UBJSON. (#7556) * Use typed array for models. * Change the memory snapshot format. * Add new C API for saving to raw format. 2022-01-16 02:11:53 +08:00			`model_path = 'test_model_json_io.' + ext`
Add dart to JSON schema. (#5218) * Add dart to JSON schema. * Use spaces instead of tab. 2020-01-28 13:29:09 +08:00			`j_model = json_model(model_path, parameters)`
Small refinements for JSON model. (#5112) * Naming consistency. * Remove duplicated test. 2019-12-11 19:49:01 +08:00			`assert isinstance(j_model['learner'], dict)`
Model IO in JSON. (#5110) 2019-12-11 11:20:40 +08:00
Merge model compatibility fixes from 1.0rc branch. (#5305) * Port test model compatibility. * Port logit model fix. https://github.com/dmlc/xgboost/pull/5248 https://github.com/dmlc/xgboost/pull/5281 2020-02-13 20:41:59 +08:00			`bst = xgb.Booster(model_file=model_path)`
Model IO in JSON. (#5110) 2019-12-11 11:20:40 +08:00
Example JSON model parser and Schema. (#5137) 2019-12-23 19:47:35 +08:00			`bst.save_model(fname=model_path)`
[breaking] Change internal model serialization to UBJSON. (#7556) * Use typed array for models. * Change the memory snapshot format. * Add new C API for saving to raw format. 2022-01-16 02:11:53 +08:00			`if ext == "ubj":`
			`import ubjson`
			`with open(model_path, "rb") as ubjfd:`
			`j_model = ubjson.load(ubjfd)`
			`else:`
			`with open(model_path, 'r') as fd:`
			`j_model = json.load(fd)`

Small refinements for JSON model. (#5112) * Naming consistency. * Remove duplicated test. 2019-12-11 19:49:01 +08:00			`assert isinstance(j_model['learner'], dict)`
Model IO in JSON. (#5110) 2019-12-11 11:20:40 +08:00
Example JSON model parser and Schema. (#5137) 2019-12-23 19:47:35 +08:00			`os.remove(model_path)`
Fix changing locale. (#5314) * Fix changing locale. * Don't use locale guard. As number parsing is implemented in house, we don't need locale. * Update doc. 2020-02-17 11:31:13 +08:00			`assert locale.getpreferredencoding(False) == loc`
Example JSON model parser and Schema. (#5137) 2019-12-23 19:47:35 +08:00
Implement new `save_raw` in Python. (#7572) * Expose the new C API function to Python. * Remove old document and helper script. * Small optimization to the `save_raw` and Json ctors. 2022-01-19 02:27:51 +08:00			`json_raw = bst.save_raw(raw_format="json")`
			`from_jraw = xgb.Booster()`
			`from_jraw.load_model(json_raw)`

			`ubj_raw = bst.save_raw(raw_format="ubj")`
			`from_ubjraw = xgb.Booster()`
			`from_ubjraw.load_model(ubj_raw)`

Initial support for multi-target tree. (#8616) * Implement multi-target for hist. - Add new hist tree builder. - Move data fetchers for tests. - Dispatch function calls in gbm base on the tree type. 2023-03-22 23:49:56 +08:00			`if parameters.get("multi_strategy", None) != "multi_output_tree":`
			`# old binary model is not supported.`
			`old_from_json = from_jraw.save_raw(raw_format="deprecated")`
			`old_from_ubj = from_ubjraw.save_raw(raw_format="deprecated")`
Implement new `save_raw` in Python. (#7572) * Expose the new C API function to Python. * Remove old document and helper script. * Small optimization to the `save_raw` and Json ctors. 2022-01-19 02:27:51 +08:00
Initial support for multi-target tree. (#8616) * Implement multi-target for hist. - Add new hist tree builder. - Move data fetchers for tests. - Dispatch function calls in gbm base on the tree type. 2023-03-22 23:49:56 +08:00			`assert old_from_json == old_from_ubj`
Implement new `save_raw` in Python. (#7572) * Expose the new C API function to Python. * Remove old document and helper script. * Small optimization to the `save_raw` and Json ctors. 2022-01-19 02:27:51 +08:00
Handle formatted JSON input. (#7953) 2022-06-01 16:20:58 +08:00			`raw_json = bst.save_raw(raw_format="json")`
			`pretty = json.dumps(json.loads(raw_json), indent=2) + "\n\n"`
			`bst.load_model(bytearray(pretty, encoding="ascii"))`

Initial support for multi-target tree. (#8616) * Implement multi-target for hist. - Add new hist tree builder. - Move data fetchers for tests. - Dispatch function calls in gbm base on the tree type. 2023-03-22 23:49:56 +08:00			`if parameters.get("multi_strategy", None) != "multi_output_tree":`
			`# old binary model is not supported.`
			`old_from_json = from_jraw.save_raw(raw_format="deprecated")`
			`old_from_ubj = from_ubjraw.save_raw(raw_format="deprecated")`

			`assert old_from_json == old_from_ubj`
Handle formatted JSON input. (#7953) 2022-06-01 16:20:58 +08:00
Initial support for multi-target tree. (#8616) * Implement multi-target for hist. - Add new hist tree builder. - Move data fetchers for tests. - Dispatch function calls in gbm base on the tree type. 2023-03-22 23:49:56 +08:00			`rng = np.random.default_rng()`
			`X = rng.random(size=from_jraw.num_features() * 10).reshape(`
			`(10, from_jraw.num_features())`
			`)`
			`predt_from_jraw = from_jraw.predict(xgb.DMatrix(X))`
			`predt_from_bst = bst.predict(xgb.DMatrix(X))`
			`np.testing.assert_allclose(predt_from_jraw, predt_from_bst)`
Handle formatted JSON input. (#7953) 2022-06-01 16:20:58 +08:00
[breaking] Change internal model serialization to UBJSON. (#7556) * Use typed array for models. * Change the memory snapshot format. * Add new C API for saving to raw format. 2022-01-16 02:11:53 +08:00			`@pytest.mark.parametrize("ext", ["json", "ubj"])`
			`def test_model_json_io(self, ext: str) -> None:`
			`parameters = {"booster": "gbtree", "tree_method": "hist"}`
			`self.run_model_json_io(parameters, ext)`
Initial support for multi-target tree. (#8616) * Implement multi-target for hist. - Add new hist tree builder. - Move data fetchers for tests. - Dispatch function calls in gbm base on the tree type. 2023-03-22 23:49:56 +08:00			`parameters = {`
			`"booster": "gbtree",`
			`"tree_method": "hist",`
			`"multi_strategy": "multi_output_tree",`
			`"objective": "multi:softmax",`
			`}`
			`self.run_model_json_io(parameters, ext)`
[breaking] Change internal model serialization to UBJSON. (#7556) * Use typed array for models. * Change the memory snapshot format. * Add new C API for saving to raw format. 2022-01-16 02:11:53 +08:00			`parameters = {"booster": "gblinear"}`
			`self.run_model_json_io(parameters, ext)`
			`parameters = {"booster": "dart", "tree_method": "hist"}`
			`self.run_model_json_io(parameters, ext)`

Example JSON model parser and Schema. (#5137) 2019-12-23 19:47:35 +08:00			`@pytest.mark.skipif(**tm.no_json_schema())`
Add JSON schema to model dump. (#5660) 2020-05-15 10:18:43 +08:00			`def test_json_io_schema(self):`
Example JSON model parser and Schema. (#5137) 2019-12-23 19:47:35 +08:00			`import jsonschema`
Merge model compatibility fixes from 1.0rc branch. (#5305) * Port test model compatibility. * Port logit model fix. https://github.com/dmlc/xgboost/pull/5248 https://github.com/dmlc/xgboost/pull/5281 2020-02-13 20:41:59 +08:00			`model_path = 'test_json_schema.json'`
Example JSON model parser and Schema. (#5137) 2019-12-23 19:47:35 +08:00			`path = os.path.dirname(`
			`os.path.dirname(os.path.dirname(os.path.abspath(__file__))))`
			`doc = os.path.join(path, 'doc', 'model.schema')`
			`with open(doc, 'r') as fd:`
			`schema = json.load(fd)`
Add dart to JSON schema. (#5218) * Add dart to JSON schema. * Use spaces instead of tab. 2020-01-28 13:29:09 +08:00			`parameters = {'tree_method': 'hist', 'booster': 'gbtree'}`
			`jsonschema.validate(instance=json_model(model_path, parameters),`
			`schema=schema)`
			`os.remove(model_path)`

			`parameters = {'tree_method': 'hist', 'booster': 'dart'}`
			`jsonschema.validate(instance=json_model(model_path, parameters),`
			`schema=schema)`
Example JSON model parser and Schema. (#5137) 2019-12-23 19:47:35 +08:00			`os.remove(model_path)`
Add JSON schema to model dump. (#5660) 2020-05-15 10:18:43 +08:00
Update JSON schema. (#5982) * Update JSON schema for pseudo huber. * Update JSON model schema. 2020-08-05 15:21:11 +08:00			`try:`
Move Python testing utilities into xgboost module. (#8379) - Add typehints. - Fixes for pylint. Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu> 2022-10-26 16:56:11 +08:00			`dtrain = xgb.DMatrix(os.path.join(dpath, "agaricus.txt.train"))`
Update JSON schema. (#5982) * Update JSON schema for pseudo huber. * Update JSON model schema. 2020-08-05 15:21:11 +08:00			`xgb.train({'objective': 'foo'}, dtrain, num_boost_round=1)`
			`except ValueError as e:`
			`e_str = str(e)`
			`beg = e_str.find('Objective candidate')`
			`end = e_str.find('Stack trace')`
			`e_str = e_str[beg: end]`
			`e_str = e_str.strip()`
			`splited = e_str.splitlines()`
			`objectives = [s.split(': ')[1] for s in splited]`
			`j_objectives = schema['properties']['learner']['properties'][`
			`'objective']['oneOf']`
			`objectives_from_schema = set()`
			`for j_obj in j_objectives:`
			`objectives_from_schema.add(`
			`j_obj['properties']['name']['const'])`
			`objectives = set(objectives)`
			`assert objectives == objectives_from_schema`

Add JSON schema to model dump. (#5660) 2020-05-15 10:18:43 +08:00			`@pytest.mark.skipif(**tm.no_json_schema())`
			`def test_json_dump_schema(self):`
			`import jsonschema`

			`def validate_model(parameters):`
			`X = np.random.random((100, 30))`
			`y = np.random.randint(0, 4, size=(100,))`

			`parameters['num_class'] = 4`
			`m = xgb.DMatrix(X, y)`

			`booster = xgb.train(parameters, m)`
			`dump = booster.get_dump(dump_format='json')`

			`for i in range(len(dump)):`
			`jsonschema.validate(instance=json.loads(dump[i]),`
			`schema=schema)`

			`path = os.path.dirname(`
			`os.path.dirname(os.path.dirname(os.path.abspath(__file__))))`
			`doc = os.path.join(path, 'doc', 'dump.schema')`
			`with open(doc, 'r') as fd:`
			`schema = json.load(fd)`

			`parameters = {'tree_method': 'hist', 'booster': 'gbtree',`
			`'objective': 'multi:softmax'}`
			`validate_model(parameters)`

			`parameters = {'tree_method': 'hist', 'booster': 'dart',`
			`'objective': 'multi:softmax'}`
			`validate_model(parameters)`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00
Ensure models with categorical splits don't use old binary format. (#7666) 2022-02-19 08:05:28 +08:00			`def test_categorical_model_io(self):`
			`X, y = tm.make_categorical(256, 16, 71, False)`
			`Xy = xgb.DMatrix(X, y, enable_categorical=True)`
			`booster = xgb.train({"tree_method": "approx"}, Xy, num_boost_round=16)`
			`predt_0 = booster.predict(Xy)`

			`with tempfile.TemporaryDirectory() as tempdir:`
			`path = os.path.join(tempdir, "model.binary")`
			`with pytest.raises(ValueError, match=r".JSON/UBJSON."):`
			`booster.save_model(path)`

			`path = os.path.join(tempdir, "model.json")`
			`booster.save_model(path)`
			`booster = xgb.Booster(model_file=path)`
			`predt_1 = booster.predict(Xy)`
			`np.testing.assert_allclose(predt_0, predt_1)`

			`path = os.path.join(tempdir, "model.ubj")`
			`booster.save_model(path)`
			`booster = xgb.Booster(model_file=path)`
			`predt_1 = booster.predict(Xy)`
			`np.testing.assert_allclose(predt_0, predt_1)`

[breaking] Remove duplicated predict functions, Fix attributes IO. (#6593) * Fix attributes not being restored. * Rename all `data` to `X`. [breaking] 2021-01-13 16:56:49 +08:00			`@pytest.mark.skipif(**tm.no_sklearn())`
			`def test_attributes(self):`
			`from sklearn.datasets import load_iris`
			`X, y = load_iris(return_X_y=True)`
			`cls = xgb.XGBClassifier(n_estimators=2)`
			`cls.fit(X, y, early_stopping_rounds=1, eval_set=[(X, y)])`
Revert ntree limit fix (#6616) The old (before fix) best_ntree_limit ignores the num_class parameters, which is incorrect. In before we workarounded it in c++ layer to avoid possible breaking changes on other language bindings. But the Python interpretation stayed incorrect. The PR fixed that in Python to consider num_class, but didn't remove the old workaround, so tree calculation in predictor is incorrect, see PredictBatch in CPUPredictor. 2021-01-19 23:51:16 +08:00			`assert cls.get_booster().best_ntree_limit == 2`
[breaking] Remove duplicated predict functions, Fix attributes IO. (#6593) * Fix attributes not being restored. * Rename all `data` to `X`. [breaking] 2021-01-13 16:56:49 +08:00			`assert cls.best_ntree_limit == cls.get_booster().best_ntree_limit`

			`with tempfile.TemporaryDirectory() as tmpdir:`
			`path = os.path.join(tmpdir, "cls.json")`
			`cls.save_model(path)`

			`cls = xgb.XGBClassifier(n_estimators=2)`
			`cls.load_model(path)`
Revert ntree limit fix (#6616) The old (before fix) best_ntree_limit ignores the num_class parameters, which is incorrect. In before we workarounded it in c++ layer to avoid possible breaking changes on other language bindings. But the Python interpretation stayed incorrect. The PR fixed that in Python to consider num_class, but didn't remove the old workaround, so tree calculation in predictor is incorrect, see PredictBatch in CPUPredictor. 2021-01-19 23:51:16 +08:00			`assert cls.get_booster().best_ntree_limit == 2`
[breaking] Remove duplicated predict functions, Fix attributes IO. (#6593) * Fix attributes not being restored. * Rename all `data` to `X`. [breaking] 2021-01-13 16:56:49 +08:00			`assert cls.best_ntree_limit == cls.get_booster().best_ntree_limit`

Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`def run_slice(`
			`self,`
			`booster: xgb.Booster,`
			`dtrain: xgb.DMatrix,`
			`num_parallel_tree: int,`
			`num_classes: int,`
			`num_boost_round: int`
			`):`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`beg = 3`
			`end = 7`
Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`sliced: xgb.Booster = booster[beg:end]`
Fix feature names and types in output model slice. (#7078) 2021-07-06 11:47:49 +08:00			`assert sliced.feature_types == booster.feature_types`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00
			`sliced_trees = (end - beg) * num_parallel_tree * num_classes`
			`assert sliced_trees == len(sliced.get_dump())`

			`sliced_trees = sliced_trees // 2`
Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`sliced = booster[beg:end:2]`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`assert sliced_trees == len(sliced.get_dump())`

Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`sliced = booster[beg: ...]`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`sliced_trees = (num_boost_round - beg) * num_parallel_tree * num_classes`
			`assert sliced_trees == len(sliced.get_dump())`

Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`sliced = booster[beg:]`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`sliced_trees = (num_boost_round - beg) * num_parallel_tree * num_classes`
			`assert sliced_trees == len(sliced.get_dump())`

Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`sliced = booster[:end]`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`sliced_trees = end * num_parallel_tree * num_classes`
			`assert sliced_trees == len(sliced.get_dump())`

Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`sliced = booster[...: end]`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`sliced_trees = end * num_parallel_tree * num_classes`
			`assert sliced_trees == len(sliced.get_dump())`

Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`with pytest.raises(ValueError, match=r">= 0"):`
			`booster[-1:0]`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00
			`# we do not accept empty slice.`
[MT-TREE] Support prediction cache and model slicing. (#8968) - Fix prediction range. - Support prediction cache in mt-hist. - Support model slicing. - Make the booster a Python iterable by defining `__iter__`. - Cleanup removed/deprecated parameters. - A new field in the output model `iteration_indptr` for pointing to the ranges of trees for each iteration. 2023-03-27 23:10:54 +08:00			`with pytest.raises(ValueError, match="Empty slice"):`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`booster[1:1]`
			`# stop can not be smaller than begin`
Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`with pytest.raises(ValueError, match=r"Invalid.*"):`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`booster[3:0]`
Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`with pytest.raises(ValueError, match=r"Invalid.*"):`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`booster[3:-1]`
			`# negative step is not supported.`
Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`with pytest.raises(ValueError, match=r".>= 1."):`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`booster[0:2:-1]`
			`# step can not be 0.`
Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`with pytest.raises(ValueError, match=r".>= 1."):`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`booster[0:2:0]`

			`trees = [_ for _ in booster]`
			`assert len(trees) == num_boost_round`

			`with pytest.raises(TypeError):`
			`booster["wrong type"]`
			`with pytest.raises(IndexError):`
Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`booster[: num_boost_round + 1]`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`with pytest.raises(ValueError):`
Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`booster[1, 2] # too many dims`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`# setitem is not implemented as model is immutable during slicing.`
			`with pytest.raises(TypeError):`
Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`booster[...: end] = booster`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00
			`sliced_0 = booster[1:3]`
[breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668) * Add a new API function for predicting on `DMatrix`. This function aligns with rest of the `XGBoosterPredictFrom` functions on semantic of function arguments. Purge `ntree_limit` from libxgboost, use iteration instead. * [dask] Use `inplace_predict` by default for dask sklearn models. * [dask] Run prediction shape inference on worker instead of client. The breaking change is in the Python sklearn `apply` function, I made it to be consistent with other prediction functions where `best_iteration` is used by default. 2021-02-08 18:26:32 +08:00			`np.testing.assert_allclose(`
			`booster.predict(dtrain, iteration_range=(1, 3)), sliced_0.predict(dtrain)`
			`)`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00			`sliced_1 = booster[3:7]`
[breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668) * Add a new API function for predicting on `DMatrix`. This function aligns with rest of the `XGBoosterPredictFrom` functions on semantic of function arguments. Purge `ntree_limit` from libxgboost, use iteration instead. * [dask] Use `inplace_predict` by default for dask sklearn models. * [dask] Run prediction shape inference on worker instead of client. The breaking change is in the Python sklearn `apply` function, I made it to be consistent with other prediction functions where `best_iteration` is used by default. 2021-02-08 18:26:32 +08:00			`np.testing.assert_allclose(`
			`booster.predict(dtrain, iteration_range=(3, 7)), sliced_1.predict(dtrain)`
			`)`
Support slicing tree model (#6302) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2020-11-03 02:27:39 -05:00
			`predt_0 = sliced_0.predict(dtrain, output_margin=True)`
			`predt_1 = sliced_1.predict(dtrain, output_margin=True)`

			`merged = predt_0 + predt_1 - 0.5 # base score.`
			`single = booster[1:7].predict(dtrain, output_margin=True)`
			`np.testing.assert_allclose(merged, single, atol=1e-6)`

			`sliced_0 = booster[1:7:2] # 1,3,5`
			`sliced_1 = booster[2:8:2] # 2,4,6`

			`predt_0 = sliced_0.predict(dtrain, output_margin=True)`
			`predt_1 = sliced_1.predict(dtrain, output_margin=True)`

			`merged = predt_0 + predt_1 - 0.5`
			`single = booster[1:7].predict(dtrain, output_margin=True)`
			`np.testing.assert_allclose(merged, single, atol=1e-6)`
[breaking] Save booster feature info in JSON, remove feature name generation. (#6605) * Save feature info in booster in JSON model. * [breaking] Remove automatic feature name generation in `DMatrix`. This PR is to enable reliable feature validation in Python package. 2021-02-25 18:54:16 +08:00
Move `num_parallel_tree` to model parameter. (#7751) The size of forest should be a property of model itself instead of a training hyper-parameter. 2022-03-29 02:32:42 +08:00			`@pytest.mark.skipif(**tm.no_sklearn())`
			`@pytest.mark.parametrize("booster", ["gbtree", "dart"])`
			`def test_slice(self, booster):`
			`from sklearn.datasets import make_classification`

			`num_classes = 3`
			`X, y = make_classification(`
			`n_samples=1000, n_informative=5, n_classes=num_classes`
			`)`
			`dtrain = xgb.DMatrix(data=X, label=y)`
			`num_parallel_tree = 4`
			`num_boost_round = 16`
			`total_trees = num_parallel_tree * num_classes * num_boost_round`
			`booster = xgb.train(`
			`{`
			`"num_parallel_tree": num_parallel_tree,`
			`"subsample": 0.5,`
			`"num_class": num_classes,`
			`"booster": booster,`
			`"objective": "multi:softprob",`
			`},`
			`num_boost_round=num_boost_round,`
			`dtrain=dtrain,`
			`)`
			`booster.feature_types = ["q"] * X.shape[1]`

			`assert len(booster.get_dump()) == total_trees`

			`self.run_slice(booster, dtrain, num_parallel_tree, num_classes, num_boost_round)`

			`bytesarray = booster.save_raw(raw_format="ubj")`
			`booster = xgb.Booster(model_file=bytesarray)`
			`self.run_slice(booster, dtrain, num_parallel_tree, num_classes, num_boost_round)`

			`bytesarray = booster.save_raw(raw_format="deprecated")`
			`booster = xgb.Booster(model_file=bytesarray)`
			`self.run_slice(booster, dtrain, num_parallel_tree, num_classes, num_boost_round)`

[MT-TREE] Support prediction cache and model slicing. (#8968) - Fix prediction range. - Support prediction cache in mt-hist. - Support model slicing. - Make the booster a Python iterable by defining `__iter__`. - Cleanup removed/deprecated parameters. - A new field in the output model `iteration_indptr` for pointing to the ranges of trees for each iteration. 2023-03-27 23:10:54 +08:00			`def test_slice_multi(self) -> None:`
			`from sklearn.datasets import make_classification`

			`num_classes = 3`
			`X, y = make_classification(`
			`n_samples=1000, n_informative=5, n_classes=num_classes`
			`)`
			`Xy = xgb.DMatrix(data=X, label=y)`
			`num_parallel_tree = 4`
			`num_boost_round = 16`

			`class ResetStrategy(xgb.callback.TrainingCallback):`
			`def after_iteration(self, model, epoch: int, evals_log) -> bool:`
			`model.set_param({"multi_strategy": "multi_output_tree"})`
			`return False`

			`booster = xgb.train(`
			`{`
			`"num_parallel_tree": num_parallel_tree,`
			`"num_class": num_classes,`
			`"booster": "gbtree",`
			`"objective": "multi:softprob",`
			`"multi_strategy": "multi_output_tree",`
			`"tree_method": "hist",`
			`"base_score": 0,`
			`},`
			`num_boost_round=num_boost_round,`
			`dtrain=Xy,`
			`callbacks=[ResetStrategy()]`
			`)`
			`sliced = [t for t in booster]`
			`assert len(sliced) == 16`

			`predt0 = booster.predict(Xy, output_margin=True)`
			`predt1 = np.zeros(predt0.shape)`
			`for t in booster:`
			`predt1 += t.predict(Xy, output_margin=True)`

			`np.testing.assert_allclose(predt0, predt1, atol=1e-5)`

[breaking] Save booster feature info in JSON, remove feature name generation. (#6605) * Save feature info in booster in JSON model. * [breaking] Remove automatic feature name generation in `DMatrix`. This PR is to enable reliable feature validation in Python package. 2021-02-25 18:54:16 +08:00			`@pytest.mark.skipif(**tm.no_pandas())`
			`def test_feature_info(self):`
			`import pandas as pd`
			`rows = 100`
			`cols = 10`
			`X = rng.randn(rows, cols)`
			`y = rng.randn(rows)`
			`feature_names = ["test_feature_" + str(i) for i in range(cols)]`
			`X_pd = pd.DataFrame(X, columns=feature_names)`
Avoid warning in np primitive type tests. (#7833) 2022-04-23 02:07:01 +08:00			`X_pd.iloc[:, 3] = X_pd.iloc[:, 3].astype(np.int32)`
[breaking] Save booster feature info in JSON, remove feature name generation. (#6605) * Save feature info in booster in JSON model. * [breaking] Remove automatic feature name generation in `DMatrix`. This PR is to enable reliable feature validation in Python package. 2021-02-25 18:54:16 +08:00
			`Xy = xgb.DMatrix(X_pd, y)`
			`assert Xy.feature_types[3] == "int"`
			`booster = xgb.train({}, dtrain=Xy, num_boost_round=1)`

			`assert booster.feature_names == Xy.feature_names`
			`assert booster.feature_names == feature_names`
			`assert booster.feature_types == Xy.feature_types`

			`with tempfile.TemporaryDirectory() as tmpdir:`
			`path = tmpdir + "model.json"`
			`booster.save_model(path)`
			`booster = xgb.Booster()`
			`booster.load_model(path)`

			`assert booster.feature_names == Xy.feature_names`
			`assert booster.feature_types == Xy.feature_types`