Blame: tests/python/test_basic.py - dmlc/xgboost

dmlc / xgboost UNCLAIMED

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

0 0 132 C++

Normal View History Raw

Add dump_format=json option (#1726) * Add format to the params accepted by DumpModel Currently, only the test format is supported when trying to dump a model. The plan is to add more such formats like JSON which are easy to read and/or parse by machines. And to make the interface for this even more generic to allow other formats to be added. Hence, we make some modifications to make these function generic and accept a new parameter "format" which signifies the format of the dump to be created. * Fix typos and errors in docs * plugin: Mention all the register macros available Document the register macros currently available to the plugin writers so they know what exactly can be extended using hooks. * sparce_page_source: Use same arg name in .h and .cc * gbm: Add JSON dump The dump_format argument can be used to specify what type of dump file should be created. Add functionality to dump gblinear and gbtree into a JSON file. The JSON file has an array, each item is a JSON object for the tree. For gblinear: - The item is the bias and weights vectors For gbtree: - The item is the root node. The root node has a attribute "children" which holds the children nodes. This happens recursively. * core.py: Add arg dump_format for get_dump() 2016-11-04 22:25:25 +05:30			`import json`
Move Python testing utilities into xgboost module. (#8379) - Add typehints. - Fixes for pylint. Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu> 2022-10-26 16:56:11 +08:00			`import os`
Handle UTF-8 paths correctly on Windows platform (#9443) * Fix round-trip serialization with UTF-8 paths * Add compiler version check * Add comment to C API functions * Add Python tests * [CI] Updatre MacOS deployment target * Use std::filesystem instead of dmlc::TemporaryDirectory 2023-08-07 23:27:25 -07:00			`import pathlib`
Simplify the data backends. (#5893) 2020-07-16 15:17:31 +08:00			`import tempfile`
Move Python testing utilities into xgboost module. (#8379) - Add typehints. - Fixes for pylint. Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu> 2022-10-26 16:56:11 +08:00			`from pathlib import Path`

			`import numpy as np`
			`import pytest`

			`import xgboost as xgb`
			`from xgboost import testing as tm`
Support doc link for the sklearn module. (#10287) 2024-08-06 02:35:32 +08:00			`from xgboost.core import _parse_version`
BUG: incorrect model_file results in segfault 2015-09-16 21:53:51 +09:00
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`dpath = "demo/data/"`
Added fixed random seed for tests (+1 squashed commit) Squashed commits: [76e3664] Added fixed random seed for tests 2015-10-21 23:24:37 -05:00			`rng = np.random.RandomState(1994)`
last check 2015-07-03 21:27:29 -07:00
BUG: incorrect model_file results in segfault 2015-09-16 21:53:51 +09:00
Use pytest conventions consistently (#6337) * Do not derive from unittest.TestCase (not needed for pytest) * assertRaises -> pytest.raises * Simplify test_empty_dmatrix with test parametrization * setUpClass -> setup_class, tearDownClass -> teardown_class * Don't import unittest; import pytest * Use plain assert * Use parametrized tests in more places * Fix test_gpu_with_sklearn.py * Put back run_empty_dmatrix_reg / run_empty_dmatrix_cls * Fix test_eta_decay_gpu_hist * Add parametrized tests for monotone constraints * Fix test names * Remove test parametrization * Revise test_slice to be not flaky 2020-11-19 17:00:15 -08:00			`class TestBasic:`
Define lazy isinstance for Python compat. (#5364) * Avoid importing datatable. * Fix #5363. 2020-02-26 14:23:33 +08:00			`def test_compat(self):`
			`from xgboost.compat import lazy_isinstance`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00
Define lazy isinstance for Python compat. (#5364) * Avoid importing datatable. * Fix #5363. 2020-02-26 14:23:33 +08:00			`a = np.array([1, 2, 3])`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`assert lazy_isinstance(a, "numpy", "ndarray")`
			`assert not lazy_isinstance(a, "numpy", "dataframe")`
Enable flake8 2016-04-24 16:34:46 +09:00
Cleanup str roundtrip using ctypes 2015-09-16 20:37:19 +09:00			`def test_basic(self):`
[Breaking] Require format to be specified in input URI. (#9077) Previously, we use `libsvm` as default when format is not specified. However, the dmlc data parser is not particularly robust against errors, and the most common type of error is undefined format. Along with which, we will recommend users to use other data loader instead. We will continue the maintenance of the parsers as it's currently used for many internal tests including federated learning. 2023-04-28 19:45:15 +08:00			`dtrain, dtest = tm.load_agaricus(__file__)`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`param = {"max_depth": 2, "eta": 1, "objective": "binary:logistic"}`
Cleanup str roundtrip using ctypes 2015-09-16 20:37:19 +09:00			`# specify validations set to watch performance`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`watchlist = [(dtrain, "train")]`
Cleanup str roundtrip using ctypes 2015-09-16 20:37:19 +09:00			`num_round = 2`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`bst = xgb.train(param, dtrain, num_round, evals=watchlist, verbose_eval=True)`
Move prediction cache to Learner. (#5220) * Move prediction cache into Learner. * Clean-ups - Remove duplicated cache in Learner and GBM. - Remove ad-hoc fix of invalid cache. - Remove `PredictFromCache` in predictors. - Remove prediction cache for linear altogether, as it's only moving the prediction into training process but doesn't provide any actual overall speed gain. - The cache is now unique to Learner, which means the ownership is no longer shared by any other components. * Changes - Add version to prediction cache. - Use weak ptr to check expired DMatrix. - Pass shared pointer instead of raw pointer. 2020-02-14 13:04:23 +08:00
			`preds = bst.predict(dtrain)`
			`labels = dtrain.get_label()`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`err = sum(`
			`1 for i in range(len(preds)) if int(preds[i] > 0.5) != labels[i]`
			`) / float(len(preds))`
Move prediction cache to Learner. (#5220) * Move prediction cache into Learner. * Clean-ups - Remove duplicated cache in Learner and GBM. - Remove ad-hoc fix of invalid cache. - Remove `PredictFromCache` in predictors. - Remove prediction cache for linear altogether, as it's only moving the prediction into training process but doesn't provide any actual overall speed gain. - The cache is now unique to Learner, which means the ownership is no longer shared by any other components. * Changes - Add version to prediction cache. - Use weak ptr to check expired DMatrix. - Pass shared pointer instead of raw pointer. 2020-02-14 13:04:23 +08:00			`# error must be smaller than 10%`
			`assert err < 0.1`

Cleanup str roundtrip using ctypes 2015-09-16 20:37:19 +09:00			`preds = bst.predict(dtest)`
			`labels = dtest.get_label()`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`err = sum(`
			`1 for i in range(len(preds)) if int(preds[i] > 0.5) != labels[i]`
			`) / float(len(preds))`
Cleanup str roundtrip using ctypes 2015-09-16 20:37:19 +09:00			`# error must be smaller than 10%`
			`assert err < 0.1`

Simplify the data backends. (#5893) 2020-07-16 15:17:31 +08:00			`with tempfile.TemporaryDirectory() as tmpdir:`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`dtest_path = os.path.join(tmpdir, "dtest.dmatrix")`
Simplify the data backends. (#5893) 2020-07-16 15:17:31 +08:00			`# save dmatrix into binary buffer`
			`dtest.save_binary(dtest_path)`
			`# save model`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`model_path = os.path.join(tmpdir, "model.ubj")`
Simplify the data backends. (#5893) 2020-07-16 15:17:31 +08:00			`bst.save_model(model_path)`
			`# load model and data in`
			`bst2 = xgb.Booster(model_file=model_path)`
			`dtest2 = xgb.DMatrix(dtest_path)`
			`preds2 = bst2.predict(dtest2)`
			`# assert they are the same`
			`assert np.sum(np.abs(preds2 - preds)) == 0`
Cleanup str roundtrip using ctypes 2015-09-16 20:37:19 +09:00
Move metric configuration into booster. (#6504) 2020-12-16 05:35:04 +08:00			`def test_metric_config(self):`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`# Make sure that the metric configuration happens in booster so the string`
			# `['error', 'auc']` doesn't get passed down to core.
[Breaking] Require format to be specified in input URI. (#9077) Previously, we use `libsvm` as default when format is not specified. However, the dmlc data parser is not particularly robust against errors, and the most common type of error is undefined format. Along with which, we will recommend users to use other data loader instead. We will continue the maintenance of the parsers as it's currently used for many internal tests including federated learning. 2023-04-28 19:45:15 +08:00			`dtrain, dtest = tm.load_agaricus(__file__)`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`param = {`
			`"max_depth": 2,`
			`"eta": 1,`
			`"objective": "binary:logistic",`
			`"eval_metric": ["error", "auc"],`
			`}`
			`watchlist = [(dtest, "eval"), (dtrain, "train")]`
Move metric configuration into booster. (#6504) 2020-12-16 05:35:04 +08:00			`num_round = 2`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`booster = xgb.train(param, dtrain, num_round, evals=watchlist)`
Move metric configuration into booster. (#6504) 2020-12-16 05:35:04 +08:00			`predt_0 = booster.predict(dtrain)`
			`with tempfile.TemporaryDirectory() as tmpdir:`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`path = os.path.join(tmpdir, "model.json")`
Move metric configuration into booster. (#6504) 2020-12-16 05:35:04 +08:00			`booster.save_model(path)`

			`booster = xgb.Booster(params=param, model_file=path)`
			`predt_1 = booster.predict(dtrain)`
			`np.testing.assert_allclose(predt_0, predt_1)`

Fix multi-class loading 2016-03-10 19:21:29 -08:00			`def test_multiclass(self):`
[Breaking] Require format to be specified in input URI. (#9077) Previously, we use `libsvm` as default when format is not specified. However, the dmlc data parser is not particularly robust against errors, and the most common type of error is undefined format. Along with which, we will recommend users to use other data loader instead. We will continue the maintenance of the parsers as it's currently used for many internal tests including federated learning. 2023-04-28 19:45:15 +08:00			`dtrain, dtest = tm.load_agaricus(__file__)`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`param = {"max_depth": 2, "eta": 1, "num_class": 2}`
Fix multi-class loading 2016-03-10 19:21:29 -08:00			`# specify validations set to watch performance`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`watchlist = [(dtest, "eval"), (dtrain, "train")]`
Fix multi-class loading 2016-03-10 19:21:29 -08:00			`num_round = 2`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`bst = xgb.train(param, dtrain, num_round, evals=watchlist)`
Fix multi-class loading 2016-03-10 19:21:29 -08:00			`# this is prediction`
			`preds = bst.predict(dtest)`
			`labels = dtest.get_label()`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`err = sum(1 for i in range(len(preds)) if preds[i] != labels[i]) / float(`
			`len(preds)`
			`)`
Fix multi-class loading 2016-03-10 19:21:29 -08:00			`# error must be smaller than 10%`
			`assert err < 0.1`

Fix plotting test. (#6040) Previously the test loads a model generated by `test_basic.py`, now we generate the model explicitly. * Cleanup saved files for basic tests. 2020-08-22 13:18:48 +08:00			`with tempfile.TemporaryDirectory() as tmpdir:`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`dtest_path = os.path.join(tmpdir, "dtest.buffer")`
			`model_path = os.path.join(tmpdir, "model.ubj")`
Fix plotting test. (#6040) Previously the test loads a model generated by `test_basic.py`, now we generate the model explicitly. * Cleanup saved files for basic tests. 2020-08-22 13:18:48 +08:00			`# save dmatrix into binary buffer`
			`dtest.save_binary(dtest_path)`
			`# save model`
			`bst.save_model(model_path)`
			`# load model and data in`
			`bst2 = xgb.Booster(model_file=model_path)`
			`dtest2 = xgb.DMatrix(dtest_path)`
			`preds2 = bst2.predict(dtest2)`
			`# assert they are the same`
			`assert np.sum(np.abs(preds2 - preds)) == 0`
Fix multi-class loading 2016-03-10 19:21:29 -08:00
Add dump_format=json option (#1726) * Add format to the params accepted by DumpModel Currently, only the test format is supported when trying to dump a model. The plan is to add more such formats like JSON which are easy to read and/or parse by machines. And to make the interface for this even more generic to allow other formats to be added. Hence, we make some modifications to make these function generic and accept a new parameter "format" which signifies the format of the dump to be created. * Fix typos and errors in docs * plugin: Mention all the register macros available Document the register macros currently available to the plugin writers so they know what exactly can be extended using hooks. * sparce_page_source: Use same arg name in .h and .cc * gbm: Add JSON dump The dump_format argument can be used to specify what type of dump file should be created. Add functionality to dump gblinear and gbtree into a JSON file. The JSON file has an array, each item is a JSON object for the tree. For gblinear: - The item is the bias and weights vectors For gbtree: - The item is the root node. The root node has a attribute "children" which holds the children nodes. This happens recursively. * core.py: Add arg dump_format for get_dump() 2016-11-04 22:25:25 +05:30			`def test_dump(self):`
			`data = np.random.randn(100, 2)`
			`target = np.array([0, 1] * 50)`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`features = ["Feature1", "Feature2"]`
Add dump_format=json option (#1726) * Add format to the params accepted by DumpModel Currently, only the test format is supported when trying to dump a model. The plan is to add more such formats like JSON which are easy to read and/or parse by machines. And to make the interface for this even more generic to allow other formats to be added. Hence, we make some modifications to make these function generic and accept a new parameter "format" which signifies the format of the dump to be created. * Fix typos and errors in docs * plugin: Mention all the register macros available Document the register macros currently available to the plugin writers so they know what exactly can be extended using hooks. * sparce_page_source: Use same arg name in .h and .cc * gbm: Add JSON dump The dump_format argument can be used to specify what type of dump file should be created. Add functionality to dump gblinear and gbtree into a JSON file. The JSON file has an array, each item is a JSON object for the tree. For gblinear: - The item is the bias and weights vectors For gbtree: - The item is the root node. The root node has a attribute "children" which holds the children nodes. This happens recursively. * core.py: Add arg dump_format for get_dump() 2016-11-04 22:25:25 +05:30
			`dm = xgb.DMatrix(data, label=target, feature_names=features)`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`params = {`
			`"objective": "binary:logistic",`
			`"eval_metric": "logloss",`
			`"eta": 0.3,`
			`"max_depth": 1,`
			`}`
Add dump_format=json option (#1726) * Add format to the params accepted by DumpModel Currently, only the test format is supported when trying to dump a model. The plan is to add more such formats like JSON which are easy to read and/or parse by machines. And to make the interface for this even more generic to allow other formats to be added. Hence, we make some modifications to make these function generic and accept a new parameter "format" which signifies the format of the dump to be created. * Fix typos and errors in docs * plugin: Mention all the register macros available Document the register macros currently available to the plugin writers so they know what exactly can be extended using hooks. * sparce_page_source: Use same arg name in .h and .cc * gbm: Add JSON dump The dump_format argument can be used to specify what type of dump file should be created. Add functionality to dump gblinear and gbtree into a JSON file. The JSON file has an array, each item is a JSON object for the tree. For gblinear: - The item is the bias and weights vectors For gbtree: - The item is the root node. The root node has a attribute "children" which holds the children nodes. This happens recursively. * core.py: Add arg dump_format for get_dump() 2016-11-04 22:25:25 +05:30
			`bst = xgb.train(params, dm, num_boost_round=1)`

			`# number of feature importances should == number of features`
			`dump1 = bst.get_dump()`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`assert len(dump1) == 1, "Expected only 1 tree to be dumped."`
			`len(`
			`dump1[0].splitlines()`
			`) == 3, "Expected 1 root and 2 leaves - 3 lines in dump."`
Add dump_format=json option (#1726) * Add format to the params accepted by DumpModel Currently, only the test format is supported when trying to dump a model. The plan is to add more such formats like JSON which are easy to read and/or parse by machines. And to make the interface for this even more generic to allow other formats to be added. Hence, we make some modifications to make these function generic and accept a new parameter "format" which signifies the format of the dump to be created. * Fix typos and errors in docs * plugin: Mention all the register macros available Document the register macros currently available to the plugin writers so they know what exactly can be extended using hooks. * sparce_page_source: Use same arg name in .h and .cc * gbm: Add JSON dump The dump_format argument can be used to specify what type of dump file should be created. Add functionality to dump gblinear and gbtree into a JSON file. The JSON file has an array, each item is a JSON object for the tree. For gblinear: - The item is the bias and weights vectors For gbtree: - The item is the root node. The root node has a attribute "children" which holds the children nodes. This happens recursively. * core.py: Add arg dump_format for get_dump() 2016-11-04 22:25:25 +05:30
			`dump2 = bst.get_dump(with_stats=True)`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`assert (`
			`dump2[0].count("\n") == 3`
			`), "Expected 1 root and 2 leaves - 3 lines in dump."`
			`msg = "Expected more info when with_stats=True is given."`
			`assert dump2[0].find("\n") > dump1[0].find("\n"), msg`
Add dump_format=json option (#1726) * Add format to the params accepted by DumpModel Currently, only the test format is supported when trying to dump a model. The plan is to add more such formats like JSON which are easy to read and/or parse by machines. And to make the interface for this even more generic to allow other formats to be added. Hence, we make some modifications to make these function generic and accept a new parameter "format" which signifies the format of the dump to be created. * Fix typos and errors in docs * plugin: Mention all the register macros available Document the register macros currently available to the plugin writers so they know what exactly can be extended using hooks. * sparce_page_source: Use same arg name in .h and .cc * gbm: Add JSON dump The dump_format argument can be used to specify what type of dump file should be created. Add functionality to dump gblinear and gbtree into a JSON file. The JSON file has an array, each item is a JSON object for the tree. For gblinear: - The item is the bias and weights vectors For gbtree: - The item is the root node. The root node has a attribute "children" which holds the children nodes. This happens recursively. * core.py: Add arg dump_format for get_dump() 2016-11-04 22:25:25 +05:30
			`dump3 = bst.get_dump(dump_format="json")`
			`dump3j = json.loads(dump3[0])`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`assert dump3j["nodeid"] == 0, "Expected the root node on top."`
Add dump_format=json option (#1726) * Add format to the params accepted by DumpModel Currently, only the test format is supported when trying to dump a model. The plan is to add more such formats like JSON which are easy to read and/or parse by machines. And to make the interface for this even more generic to allow other formats to be added. Hence, we make some modifications to make these function generic and accept a new parameter "format" which signifies the format of the dump to be created. * Fix typos and errors in docs * plugin: Mention all the register macros available Document the register macros currently available to the plugin writers so they know what exactly can be extended using hooks. * sparce_page_source: Use same arg name in .h and .cc * gbm: Add JSON dump The dump_format argument can be used to specify what type of dump file should be created. Add functionality to dump gblinear and gbtree into a JSON file. The JSON file has an array, each item is a JSON object for the tree. For gblinear: - The item is the bias and weights vectors For gbtree: - The item is the root node. The root node has a attribute "children" which holds the children nodes. This happens recursively. * core.py: Add arg dump_format for get_dump() 2016-11-04 22:25:25 +05:30
			`dump4 = bst.get_dump(dump_format="json", with_stats=True)`
			`dump4j = json.loads(dump4[0])`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`assert "gain" in dump4j, "Expected 'gain' to be dumped in JSON."`
Add dump_format=json option (#1726) * Add format to the params accepted by DumpModel Currently, only the test format is supported when trying to dump a model. The plan is to add more such formats like JSON which are easy to read and/or parse by machines. And to make the interface for this even more generic to allow other formats to be added. Hence, we make some modifications to make these function generic and accept a new parameter "format" which signifies the format of the dump to be created. * Fix typos and errors in docs * plugin: Mention all the register macros available Document the register macros currently available to the plugin writers so they know what exactly can be extended using hooks. * sparce_page_source: Use same arg name in .h and .cc * gbm: Add JSON dump The dump_format argument can be used to specify what type of dump file should be created. Add functionality to dump gblinear and gbtree into a JSON file. The JSON file has an array, each item is a JSON object for the tree. For gblinear: - The item is the bias and weights vectors For gbtree: - The item is the root node. The root node has a attribute "children" which holds the children nodes. This happens recursively. * core.py: Add arg dump_format for get_dump() 2016-11-04 22:25:25 +05:30
Implement feature score for linear model. (#7048) * Add feature score support for linear model. * Port R interface to the new implementation. * Add linear model support in Python. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> 2021-06-25 14:34:02 +08:00			`with pytest.raises(ValueError):`
			`bst.get_dump(fmap="foo")`

Implement feature score in GBTree. (#7041) * Categorical data support. * Eliminate text parsing during feature score computation. 2021-06-18 11:53:16 +08:00			`def test_feature_score(self):`
			`rng = np.random.RandomState(0)`
			`data = rng.randn(100, 2)`
			`target = np.array([0, 1] * 50)`
			`features = ["F0"]`
			`with pytest.raises(ValueError):`
			`xgb.DMatrix(data, label=target, feature_names=features)`

			`params = {"objective": "binary:logistic"}`
			`dm = xgb.DMatrix(data, label=target, feature_names=["F0", "F1"])`
			`booster = xgb.train(params, dm, num_boost_round=1)`
			`# no error since feature names might be assigned before the booster seeing data`
			`# and booster doesn't known about the actual number of features.`
			`booster.feature_names = ["F0"]`
			`with pytest.raises(ValueError):`
			`booster.get_fscore()`

Convert numpy float to Python float in feat score. (#7047) 2021-06-21 20:58:43 +08:00			`booster.feature_names = None`
			`# Use JSON to make sure the output has native Python type`
			`scores = json.loads(json.dumps(booster.get_fscore()))`
			`np.testing.assert_allclose(scores["f0"], 6.0)`

BUG: incorrect model_file results in segfault 2015-09-16 21:53:51 +09:00			`def test_load_file_invalid(self):`
Use pytest conventions consistently (#6337) * Do not derive from unittest.TestCase (not needed for pytest) * assertRaises -> pytest.raises * Simplify test_empty_dmatrix with test parametrization * setUpClass -> setup_class, tearDownClass -> teardown_class * Don't import unittest; import pytest * Use plain assert * Use parametrized tests in more places * Fix test_gpu_with_sklearn.py * Put back run_empty_dmatrix_reg / run_empty_dmatrix_cls * Fix test_eta_decay_gpu_hist * Add parametrized tests for monotone constraints * Fix test names * Remove test parametrization * Revise test_slice to be not flaky 2020-11-19 17:00:15 -08:00			`with pytest.raises(xgb.core.XGBoostError):`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`xgb.Booster(model_file="incorrect_path")`
BUG: incorrect model_file results in segfault 2015-09-16 21:53:51 +09:00
Use pytest conventions consistently (#6337) * Do not derive from unittest.TestCase (not needed for pytest) * assertRaises -> pytest.raises * Simplify test_empty_dmatrix with test parametrization * setUpClass -> setup_class, tearDownClass -> teardown_class * Don't import unittest; import pytest * Use plain assert * Use parametrized tests in more places * Fix test_gpu_with_sklearn.py * Put back run_empty_dmatrix_reg / run_empty_dmatrix_cls * Fix test_eta_decay_gpu_hist * Add parametrized tests for monotone constraints * Fix test names * Remove test parametrization * Revise test_slice to be not flaky 2020-11-19 17:00:15 -08:00			`with pytest.raises(xgb.core.XGBoostError):`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`xgb.Booster(model_file="不正なパス")`
Fix numpy array check logic 2015-09-16 20:47:37 +09:00
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`@pytest.mark.parametrize(`
			`"path", ["모델.ubj", "がうる・ぐら.json"], ids=["path-0", "path-1"]`
			`)`
Handle UTF-8 paths correctly on Windows platform (#9443) * Fix round-trip serialization with UTF-8 paths * Add compiler version check * Add comment to C API functions * Add Python tests * [CI] Updatre MacOS deployment target * Use std::filesystem instead of dmlc::TemporaryDirectory 2023-08-07 23:27:25 -07:00			`def test_unicode_path(self, tmpdir, path):`
			`model_path = pathlib.Path(tmpdir) / path`
			`dtrain, _ = tm.load_agaricus(__file__)`
			`param = {"max_depth": 2, "eta": 1, "objective": "binary:logistic"}`
			`bst = xgb.train(param, dtrain, num_boost_round=2)`
			`bst.save_model(model_path)`

			`bst2 = xgb.Booster(model_file=model_path)`
			`assert bst.get_dump(dump_format="text") == bst2.get_dump(dump_format="text")`

Multi-threaded XGDMatrixCreateFromMat for faster DMatrix creation (#2530) * Multi-threaded XGDMatrixCreateFromMat for faster DMatrix creation from numpy arrays for python interface. 2017-07-20 19:43:17 -07:00			`def test_dmatrix_numpy_init_omp(self):`
			`rows = [1000, 11326, 15000]`
			`cols = 50`
			`for row in rows:`
			`X = np.random.randn(row, cols)`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`y = np.random.randn(row).astype("f")`
Multi-threaded XGDMatrixCreateFromMat for faster DMatrix creation (#2530) * Multi-threaded XGDMatrixCreateFromMat for faster DMatrix creation from numpy arrays for python interface. 2017-07-20 19:43:17 -07:00			`dm = xgb.DMatrix(X, y, nthread=0)`
			`np.testing.assert_array_equal(dm.get_label(), y)`
			`assert dm.num_row() == row`
			`assert dm.num_col() == cols`

			`dm = xgb.DMatrix(X, y, nthread=10)`
			`np.testing.assert_array_equal(dm.get_label(), y)`
			`assert dm.num_row() == row`
			`assert dm.num_col() == cols`

CV returns ndarray or DataFrame 2015-10-02 21:56:35 +09:00			`def test_cv(self):`
[Breaking] Require format to be specified in input URI. (#9077) Previously, we use `libsvm` as default when format is not specified. However, the dmlc data parser is not particularly robust against errors, and the most common type of error is undefined format. Along with which, we will recommend users to use other data loader instead. We will continue the maintenance of the parsers as it's currently used for many internal tests including federated learning. 2023-04-28 19:45:15 +08:00			`dm, _ = tm.load_agaricus(__file__)`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`params = {"max_depth": 2, "eta": 1, "objective": "binary:logistic"}`
CV returns ndarray or DataFrame 2015-10-02 21:56:35 +09:00
			`# return np.ndarray`
			`cv = xgb.cv(params, dm, num_boost_round=10, nfold=10, as_pandas=False)`
[PYTHON] Refactor trainnig API to use callback 2016-05-19 17:47:11 -07:00			`assert isinstance(cv, dict)`
			`assert len(cv) == (4)`
option to shuffle data in mknfolds (#1459) * option to shuffle data in mknfolds * removed possibility to run as stand alone test * split function def in 2 lines for lint * option to shuffle data in mknfolds * removed possibility to run as stand alone test * split function def in 2 lines for lint 2016-12-22 17:53:30 -06:00
			`def test_cv_no_shuffle(self):`
[Breaking] Require format to be specified in input URI. (#9077) Previously, we use `libsvm` as default when format is not specified. However, the dmlc data parser is not particularly robust against errors, and the most common type of error is undefined format. Along with which, we will recommend users to use other data loader instead. We will continue the maintenance of the parsers as it's currently used for many internal tests including federated learning. 2023-04-28 19:45:15 +08:00			`dm, _ = tm.load_agaricus(__file__)`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`params = {"max_depth": 2, "eta": 1, "objective": "binary:logistic"}`
option to shuffle data in mknfolds (#1459) * option to shuffle data in mknfolds * removed possibility to run as stand alone test * split function def in 2 lines for lint * option to shuffle data in mknfolds * removed possibility to run as stand alone test * split function def in 2 lines for lint 2016-12-22 17:53:30 -06:00
			`# return np.ndarray`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`cv = xgb.cv(`
			`params, dm, num_boost_round=10, shuffle=False, nfold=10, as_pandas=False`
			`)`
option to shuffle data in mknfolds (#1459) * option to shuffle data in mknfolds * removed possibility to run as stand alone test * split function def in 2 lines for lint * option to shuffle data in mknfolds * removed possibility to run as stand alone test * split function def in 2 lines for lint 2016-12-22 17:53:30 -06:00			`assert isinstance(cv, dict)`
			`assert len(cv) == (4)`
allow arbitrary cross validation fold indices (#3353) * allow arbitrary cross validation fold indices - use training indices passed to `folds` parameter in `training.cv` - update doc string * add tests for arbitrary fold indices 2018-06-30 20:23:49 +01:00
			`def test_cv_explicit_fold_indices(self):`
[Breaking] Require format to be specified in input URI. (#9077) Previously, we use `libsvm` as default when format is not specified. However, the dmlc data parser is not particularly robust against errors, and the most common type of error is undefined format. Along with which, we will recommend users to use other data loader instead. We will continue the maintenance of the parsers as it's currently used for many internal tests including federated learning. 2023-04-28 19:45:15 +08:00			`dm, _ = tm.load_agaricus(__file__)`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`params = {"max_depth": 2, "eta": 1, "objective": "binary:logistic"}`
allow arbitrary cross validation fold indices (#3353) * allow arbitrary cross validation fold indices - use training indices passed to `folds` parameter in `training.cv` - update doc string * add tests for arbitrary fold indices 2018-06-30 20:23:49 +01:00			`folds = [`
			`# Train Test`
			`([1, 3], [5, 8]),`
			`([7, 9], [23, 43]),`
			`]`

			`# return np.ndarray`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`cv = xgb.cv(params, dm, num_boost_round=10, folds=folds, as_pandas=False)`
allow arbitrary cross validation fold indices (#3353) * allow arbitrary cross validation fold indices - use training indices passed to `folds` parameter in `training.cv` - update doc string * add tests for arbitrary fold indices 2018-06-30 20:23:49 +01:00			`assert isinstance(cv, dict)`
			`assert len(cv) == (4)`

			`def test_cv_explicit_fold_indices_labels(self):`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`params = {"max_depth": 2, "eta": 1, "objective": "reg:squarederror"}`
allow arbitrary cross validation fold indices (#3353) * allow arbitrary cross validation fold indices - use training indices passed to `folds` parameter in `training.cv` - update doc string * add tests for arbitrary fold indices 2018-06-30 20:23:49 +01:00			`N = 100`
			`F = 3`
			`dm = xgb.DMatrix(data=np.random.randn(N, F), label=np.arange(N))`
			`folds = [`
			`# Train Test`
			`([1, 3], [5, 8]),`
			`([7, 9], [23, 43, 11]),`
			`]`

			`# Use callback to log the test labels in each fold`
Remove old callback deprecated in 1.3. (#7280) 2021-10-08 17:24:59 +08:00			`class Callback(xgb.callback.TrainingCallback):`
			`def __init__(self) -> None:`
			`super().__init__()`

			`def after_iteration(`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`self,`
			`model,`
Remove old callback deprecated in 1.3. (#7280) 2021-10-08 17:24:59 +08:00			`epoch: int,`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`evals_log: xgb.callback.TrainingCallback.EvalsLog,`
Remove old callback deprecated in 1.3. (#7280) 2021-10-08 17:24:59 +08:00			`):`
			`print([fold.dtest.get_label() for fold in model.cvfolds])`

			`cb = Callback()`
allow arbitrary cross validation fold indices (#3353) * allow arbitrary cross validation fold indices - use training indices passed to `folds` parameter in `training.cv` - update doc string * add tests for arbitrary fold indices 2018-06-30 20:23:49 +01:00
			`# Run cross validation and capture standard out to test callback result`
Add period to evaluation monitor. (#6348) 2020-11-10 07:47:48 +08:00			`with tm.captured_output() as (out, err):`
allow arbitrary cross validation fold indices (#3353) * allow arbitrary cross validation fold indices - use training indices passed to `folds` parameter in `training.cv` - update doc string * add tests for arbitrary fold indices 2018-06-30 20:23:49 +01:00			`xgb.cv(`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`params,`
			`dm,`
			`num_boost_round=1,`
			`folds=folds,`
			`callbacks=[cb],`
			`as_pandas=False,`
allow arbitrary cross validation fold indices (#3353) * allow arbitrary cross validation fold indices - use training indices passed to `folds` parameter in `training.cv` - update doc string * add tests for arbitrary fold indices 2018-06-30 20:23:49 +01:00			`)`
			`output = out.getvalue().strip()`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`solution = (`
			`"[array([5., 8.], dtype=float32), array([23., 43., 11.],"`
			`+ " dtype=float32)]"`
			`)`
Refactor Python tests. (#3897) * Deprecate nose tests. * Format python tests. 2018-11-15 13:56:33 +13:00			`assert output == solution`
Fix get_uint_info() (#3442) * Add regression test 2018-07-05 20:06:59 -07:00
add os.PathLike support for file paths to DMatrix and Booster Python classes (#4757) 2019-08-15 04:46:25 -04:00
Use pytest conventions consistently (#6337) * Do not derive from unittest.TestCase (not needed for pytest) * assertRaises -> pytest.raises * Simplify test_empty_dmatrix with test parametrization * setUpClass -> setup_class, tearDownClass -> teardown_class * Don't import unittest; import pytest * Use plain assert * Use parametrized tests in more places * Fix test_gpu_with_sklearn.py * Put back run_empty_dmatrix_reg / run_empty_dmatrix_cls * Fix test_eta_decay_gpu_hist * Add parametrized tests for monotone constraints * Fix test names * Remove test parametrization * Revise test_slice to be not flaky 2020-11-19 17:00:15 -08:00			`class TestBasicPathLike:`
Cleanup Python code. (#6223) * Remove pathlike as XGBoost 1.2 requires Python 3.6. * Move conditional import of dask/distributed into dask module. 2020-10-12 15:44:41 +08:00			`"""Unit tests using pathlib.Path for file interaction."""`
add os.PathLike support for file paths to DMatrix and Booster Python classes (#4757) 2019-08-15 04:46:25 -04:00
			`def test_DMatrix_init_from_path(self):`
			`"""Initialization from the data path."""`
[Breaking] Require format to be specified in input URI. (#9077) Previously, we use `libsvm` as default when format is not specified. However, the dmlc data parser is not particularly robust against errors, and the most common type of error is undefined format. Along with which, we will recommend users to use other data loader instead. We will continue the maintenance of the parsers as it's currently used for many internal tests including federated learning. 2023-04-28 19:45:15 +08:00			`dtrain, _ = tm.load_agaricus(__file__)`
add os.PathLike support for file paths to DMatrix and Booster Python classes (#4757) 2019-08-15 04:46:25 -04:00			`assert dtrain.num_row() == 6513`
			`assert dtrain.num_col() == 127`

			`def test_DMatrix_save_to_path(self):`
			`"""Saving to a binary file using pathlib from a DMatrix."""`
			`data = np.random.randn(100, 2)`
			`target = np.array([0, 1] * 50)`
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00			`features = ["Feature1", "Feature2"]`
add os.PathLike support for file paths to DMatrix and Booster Python classes (#4757) 2019-08-15 04:46:25 -04:00
			`dm = xgb.DMatrix(data, label=target, feature_names=features)`

			`# save, assert exists, remove file`
			`binary_path = Path("dtrain.bin")`
			`dm.save_binary(binary_path)`
			`assert binary_path.exists()`
			`Path.unlink(binary_path)`

			`def test_Booster_init_invalid_path(self):`
			`"""An invalid model_file path should raise XGBoostError."""`
Use pytest conventions consistently (#6337) * Do not derive from unittest.TestCase (not needed for pytest) * assertRaises -> pytest.raises * Simplify test_empty_dmatrix with test parametrization * setUpClass -> setup_class, tearDownClass -> teardown_class * Don't import unittest; import pytest * Use plain assert * Use parametrized tests in more places * Fix test_gpu_with_sklearn.py * Put back run_empty_dmatrix_reg / run_empty_dmatrix_cls * Fix test_eta_decay_gpu_hist * Add parametrized tests for monotone constraints * Fix test names * Remove test parametrization * Revise test_slice to be not flaky 2020-11-19 17:00:15 -08:00			`with pytest.raises(xgb.core.XGBoostError):`
			`xgb.Booster(model_file=Path("invalidpath"))`
Support doc link for the sklearn module. (#10287) 2024-08-06 02:35:32 +08:00

			`def test_parse_ver() -> None:`
			`(major, minor, patch), post = _parse_version("2.1.0")`
			`assert post == ""`
			`(major, minor, patch), post = _parse_version("2.1.0-dev")`
			`assert post == "dev"`
			`(major, minor, patch), post = _parse_version("2.1.0rc1")`
			`assert post == "rc1"`
			`(major, minor, patch), post = _parse_version("2.1.0.post1")`
			`assert post == "post1"`