{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Intro\n", "\n", "TPOT gives the user a lot of options for customizing the search space, from hyperparameter ranges to model selection to pipeline configuration. TPOT is able to select models, optimize their hyperparameters, and build a complex pipeline structure. Each level of detail has multiple customization options. This tutorial will first explore how to set up a hyperparameter search space for a single method. Next, we will describe how to set up simultaneous model selection and hyperparameter tuning. Finally, we will cover how to utilize these steps to configure a search space for a fixed pipeline of multiple steps, as well as having TPOT optimize the pipeline structure itself.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Hyperparameter Search Spaces with ConfigSpace\n", "\n", "Hyperparameter search spaces are defined using the [ConfigSpace package found here](https://github.com/automl/ConfigSpace). More information on how to set up a hyperparameter space can be found in their [documentation here](https://automl.github.io/ConfigSpace/main/guide.html).\n", "\n", "TPOT uses `ConfigSpace.ConfigurationSpace` objects to define the hyperparameter search space for individual models. This object can be used to keep track of the desired hyperparameters as well as provide functions for random sampling from this space.\n", "\n", "In short, you can use the `Integer`, `Float`, and `Categorical` functions of `ConfigSpace` to define a range of values used for each param. Alternatively, a tuple with (min,max) ints or floats can be used to specify an int/float search space and a list can be used to specify a categorical search space. A fixed value can also be provided for parameters that are not tunned. The space parameter of `ConfigurationSpace` takes in a dictionary of param names to these ranges.\n", "\n", "Note: If you want reproducible results, you need to set a fixed random_state in the search space.\n", "\n", "Here is an example of a hyperparameter range for RandomForest" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sampled hyperparameters\n", "{'bootstrap': True, 'criterion': 'gini', 'max_features': 0.8874647037836, 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 128}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", " from .autonotebook import tqdm as notebook_tqdm\n" ] }, { "data": { "text/html": [ "
RandomForestClassifier(max_features=0.8874647037836, min_samples_leaf=2,\n",
" min_samples_split=5, n_estimators=128)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. RandomForestClassifier(max_features=0.8874647037836, min_samples_leaf=2,\n",
" min_samples_split=5, n_estimators=128)RandomForestClassifier(bootstrap=False, criterion='entropy',\n",
" max_features=0.8418685817308, min_samples_leaf=5,\n",
" n_estimators=128)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. RandomForestClassifier(bootstrap=False, criterion='entropy',\n",
" max_features=0.8418685817308, min_samples_leaf=5,\n",
" n_estimators=128)KNeighborsClassifier(n_jobs=1, n_neighbors=3, weights='distance')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
KNeighborsClassifier(n_jobs=1, n_neighbors=3, weights='distance')
KNeighborsClassifier(n_neighbors=10)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
KNeighborsClassifier(n_neighbors=10)
KNeighborsClassifier(n_jobs=1, n_neighbors=3)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
KNeighborsClassifier(n_jobs=1, n_neighbors=3)
KNeighborsClassifier(metric='euclidean', n_jobs=1, n_neighbors=9)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
KNeighborsClassifier(metric='euclidean', n_jobs=1, n_neighbors=9)
KNeighborsClassifier(n_jobs=1, n_neighbors=15, p=1, weights='distance')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
KNeighborsClassifier(n_jobs=1, n_neighbors=15, p=1, weights='distance')
LogisticRegression(C=5.9018435257131, max_iter=1000, n_jobs=1, solver='saga')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LogisticRegression(C=5.9018435257131, max_iter=1000, n_jobs=1, solver='saga')
SGDClassifier(alpha=0.0007786971309, class_weight='balanced',\n",
" eta0=0.0209976430718, l1_ratio=0.8571538017043,\n",
" learning_rate='constant', loss='modified_huber', n_jobs=1,\n",
" penalty='elasticnet')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. SGDClassifier(alpha=0.0007786971309, class_weight='balanced',\n",
" eta0=0.0209976430718, l1_ratio=0.8571538017043,\n",
" learning_rate='constant', loss='modified_huber', n_jobs=1,\n",
" penalty='elasticnet')BernoulliNB(alpha=0.0667141454883, fit_prior=False)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
BernoulliNB(alpha=0.0667141454883, fit_prior=False)
RandomForestClassifier(bootstrap=False, max_features=0.0234127070363,\n",
" min_samples_leaf=3, min_samples_split=8,\n",
" n_estimators=128, n_jobs=1, random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. RandomForestClassifier(bootstrap=False, max_features=0.0234127070363,\n",
" min_samples_leaf=3, min_samples_split=8,\n",
" n_estimators=128, n_jobs=1, random_state=1)Pipeline(steps=[('variancethreshold',\n",
" VarianceThreshold(threshold=0.00023551581)),\n",
" ('pca', PCA(n_components=0.9764631370244)),\n",
" ('logisticregression',\n",
" LogisticRegression(C=1.9396611393109, max_iter=1000, n_jobs=1,\n",
" penalty='l1', solver='saga'))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. Pipeline(steps=[('variancethreshold',\n",
" VarianceThreshold(threshold=0.00023551581)),\n",
" ('pca', PCA(n_components=0.9764631370244)),\n",
" ('logisticregression',\n",
" LogisticRegression(C=1.9396611393109, max_iter=1000, n_jobs=1,\n",
" penalty='l1', solver='saga'))])VarianceThreshold(threshold=0.00023551581)
PCA(n_components=0.9764631370244)
LogisticRegression(C=1.9396611393109, max_iter=1000, n_jobs=1, penalty='l1',\n",
" solver='saga')Pipeline(steps=[('variancethreshold',\n",
" VarianceThreshold(threshold=0.0004317798946)),\n",
" ('kbinsdiscretizer',\n",
" KBinsDiscretizer(encode='onehot-dense', n_bins=77)),\n",
" ('lgbmclassifier',\n",
" LGBMClassifier(boosting_type='dart', max_depth=5,\n",
" n_estimators=76, n_jobs=1, num_leaves=192,\n",
" verbose=-1))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. Pipeline(steps=[('variancethreshold',\n",
" VarianceThreshold(threshold=0.0004317798946)),\n",
" ('kbinsdiscretizer',\n",
" KBinsDiscretizer(encode='onehot-dense', n_bins=77)),\n",
" ('lgbmclassifier',\n",
" LGBMClassifier(boosting_type='dart', max_depth=5,\n",
" n_estimators=76, n_jobs=1, num_leaves=192,\n",
" verbose=-1))])VarianceThreshold(threshold=0.0004317798946)
KBinsDiscretizer(encode='onehot-dense', n_bins=77)
LGBMClassifier(boosting_type='dart', max_depth=5, n_estimators=76, n_jobs=1,\n",
" num_leaves=192, verbose=-1)Pipeline(steps=[('selectpercentile',\n",
" SelectPercentile(percentile=4.5788544361168)),\n",
" ('columnonehotencoder', ColumnOneHotEncoder()),\n",
" ('decisiontreeclassifier',\n",
" DecisionTreeClassifier(criterion='entropy', max_depth=10,\n",
" min_samples_split=13))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. Pipeline(steps=[('selectpercentile',\n",
" SelectPercentile(percentile=4.5788544361168)),\n",
" ('columnonehotencoder', ColumnOneHotEncoder()),\n",
" ('decisiontreeclassifier',\n",
" DecisionTreeClassifier(criterion='entropy', max_depth=10,\n",
" min_samples_split=13))])SelectPercentile(percentile=4.5788544361168)
ColumnOneHotEncoder()
DecisionTreeClassifier(criterion='entropy', max_depth=10, min_samples_split=13)
Pipeline(steps=[('pca-1', PCA(n_components=0.6376571946485)),\n",
" ('pca-2', PCA(n_components=0.7836827180307)),\n",
" ('quantiletransformer',\n",
" QuantileTransformer(n_quantiles=334,\n",
" output_distribution='normal'))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. Pipeline(steps=[('pca-1', PCA(n_components=0.6376571946485)),\n",
" ('pca-2', PCA(n_components=0.7836827180307)),\n",
" ('quantiletransformer',\n",
" QuantileTransformer(n_quantiles=334,\n",
" output_distribution='normal'))])PCA(n_components=0.6376571946485)
PCA(n_components=0.7836827180307)
QuantileTransformer(n_quantiles=334, output_distribution='normal')
Pipeline(steps=[('selectfwe', SelectFwe(alpha=0.0004164619371)),\n",
" ('binarizer', Binarizer(threshold=0.2392693027442)),\n",
" ('rbfsampler',\n",
" RBFSampler(gamma=0.3669672326084, n_components=35))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. Pipeline(steps=[('selectfwe', SelectFwe(alpha=0.0004164619371)),\n",
" ('binarizer', Binarizer(threshold=0.2392693027442)),\n",
" ('rbfsampler',\n",
" RBFSampler(gamma=0.3669672326084, n_components=35))])SelectFwe(alpha=0.0004164619371)
Binarizer(threshold=0.2392693027442)
RBFSampler(gamma=0.3669672326084, n_components=35)
Pipeline(steps=[('pipeline',\n",
" Pipeline(steps=[('binarizer',\n",
" Binarizer(threshold=0.2150677779496)),\n",
" ('maxabsscaler', MaxAbsScaler()),\n",
" ('columnonehotencoder',\n",
" ColumnOneHotEncoder())])),\n",
" ('gaussiannb', GaussianNB())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. Pipeline(steps=[('pipeline',\n",
" Pipeline(steps=[('binarizer',\n",
" Binarizer(threshold=0.2150677779496)),\n",
" ('maxabsscaler', MaxAbsScaler()),\n",
" ('columnonehotencoder',\n",
" ColumnOneHotEncoder())])),\n",
" ('gaussiannb', GaussianNB())])Pipeline(steps=[('binarizer', Binarizer(threshold=0.2150677779496)),\n",
" ('maxabsscaler', MaxAbsScaler()),\n",
" ('columnonehotencoder', ColumnOneHotEncoder())])Binarizer(threshold=0.2150677779496)
MaxAbsScaler()
ColumnOneHotEncoder()
GaussianNB()
Pipeline(steps=[('pipeline',\n",
" Pipeline(steps=[('zerocount', ZeroCount()),\n",
" ('selectfrommodel',\n",
" SelectFromModel(estimator=ExtraTreesClassifier(class_weight='balanced',\n",
" max_features=0.1619832293406,\n",
" min_samples_leaf=7,\n",
" min_samples_split=7,\n",
" n_jobs=1),\n",
" threshold=0.6414209870839)),\n",
" ('variancethreshold',\n",
" VarianceThreshold(threshold=0.0113542845765))])),\n",
" ('multinomialnb', MultinomialNB(alpha=0.0815128367119))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. Pipeline(steps=[('pipeline',\n",
" Pipeline(steps=[('zerocount', ZeroCount()),\n",
" ('selectfrommodel',\n",
" SelectFromModel(estimator=ExtraTreesClassifier(class_weight='balanced',\n",
" max_features=0.1619832293406,\n",
" min_samples_leaf=7,\n",
" min_samples_split=7,\n",
" n_jobs=1),\n",
" threshold=0.6414209870839)),\n",
" ('variancethreshold',\n",
" VarianceThreshold(threshold=0.0113542845765))])),\n",
" ('multinomialnb', MultinomialNB(alpha=0.0815128367119))])Pipeline(steps=[('zerocount', ZeroCount()),\n",
" ('selectfrommodel',\n",
" SelectFromModel(estimator=ExtraTreesClassifier(class_weight='balanced',\n",
" max_features=0.1619832293406,\n",
" min_samples_leaf=7,\n",
" min_samples_split=7,\n",
" n_jobs=1),\n",
" threshold=0.6414209870839)),\n",
" ('variancethreshold',\n",
" VarianceThreshold(threshold=0.0113542845765))])ZeroCount()
SelectFromModel(estimator=ExtraTreesClassifier(class_weight='balanced',\n",
" max_features=0.1619832293406,\n",
" min_samples_leaf=7,\n",
" min_samples_split=7, n_jobs=1),\n",
" threshold=0.6414209870839)ExtraTreesClassifier(class_weight='balanced', max_features=0.1619832293406,\n",
" min_samples_leaf=7, min_samples_split=7, n_jobs=1)ExtraTreesClassifier(class_weight='balanced', max_features=0.1619832293406,\n",
" min_samples_leaf=7, min_samples_split=7, n_jobs=1)VarianceThreshold(threshold=0.0113542845765)
MultinomialNB(alpha=0.0815128367119)
FeatureUnion(transformer_list=[('pca', PCA(n_components=0.7674007136568)),\n",
" ('passthrough', Passthrough())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. FeatureUnion(transformer_list=[('pca', PCA(n_components=0.7674007136568)),\n",
" ('passthrough', Passthrough())])PCA(n_components=0.7674007136568)
Passthrough()
Pipeline(steps=[('selectpercentile',\n",
" SelectPercentile(percentile=29.1049436421441)),\n",
" ('featureunion',\n",
" FeatureUnion(transformer_list=[('powertransformer',\n",
" PowerTransformer()),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
" ('extratreesclassifier',\n",
" ExtraTreesClassifier(max_features=0.8376611419015,\n",
" min_samples_leaf=9, min_samples_split=17,\n",
" n_jobs=1))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. Pipeline(steps=[('selectpercentile',\n",
" SelectPercentile(percentile=29.1049436421441)),\n",
" ('featureunion',\n",
" FeatureUnion(transformer_list=[('powertransformer',\n",
" PowerTransformer()),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
" ('extratreesclassifier',\n",
" ExtraTreesClassifier(max_features=0.8376611419015,\n",
" min_samples_leaf=9, min_samples_split=17,\n",
" n_jobs=1))])SelectPercentile(percentile=29.1049436421441)
FeatureUnion(transformer_list=[('powertransformer', PowerTransformer()),\n",
" ('passthrough', Passthrough())])PowerTransformer()
Passthrough()
ExtraTreesClassifier(max_features=0.8376611419015, min_samples_leaf=9,\n",
" min_samples_split=17, n_jobs=1)Pipeline(steps=[('featureunion',\n",
" FeatureUnion(transformer_list=[('pipeline-1',\n",
" Pipeline(steps=[('selectfwe',\n",
" SelectFwe(alpha=0.0080564930162)),\n",
" ('quantiletransformer',\n",
" QuantileTransformer(n_quantiles=450,\n",
" output_distribution='normal'))])),\n",
" ('pipeline-2',\n",
" Pipeline(steps=[('variancethreshold',\n",
" VarianceThreshold(threshold=0.155443085484)),\n",
" ('columnonehotencoder...\n",
" feature_types=None, gamma=14.5866790094856,\n",
" grow_policy=None, importance_type=None,\n",
" interaction_constraints=None,\n",
" learning_rate=0.2226908938347, max_bin=None,\n",
" max_cat_threshold=None, max_cat_to_onehot=None,\n",
" max_delta_step=None, max_depth=11,\n",
" max_leaves=None, min_child_weight=3, missing=nan,\n",
" monotone_constraints=None, multi_strategy=None,\n",
" n_estimators=100, n_jobs=1, nthread=1,\n",
" num_parallel_tree=None, ...))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. Pipeline(steps=[('featureunion',\n",
" FeatureUnion(transformer_list=[('pipeline-1',\n",
" Pipeline(steps=[('selectfwe',\n",
" SelectFwe(alpha=0.0080564930162)),\n",
" ('quantiletransformer',\n",
" QuantileTransformer(n_quantiles=450,\n",
" output_distribution='normal'))])),\n",
" ('pipeline-2',\n",
" Pipeline(steps=[('variancethreshold',\n",
" VarianceThreshold(threshold=0.155443085484)),\n",
" ('columnonehotencoder...\n",
" feature_types=None, gamma=14.5866790094856,\n",
" grow_policy=None, importance_type=None,\n",
" interaction_constraints=None,\n",
" learning_rate=0.2226908938347, max_bin=None,\n",
" max_cat_threshold=None, max_cat_to_onehot=None,\n",
" max_delta_step=None, max_depth=11,\n",
" max_leaves=None, min_child_weight=3, missing=nan,\n",
" monotone_constraints=None, multi_strategy=None,\n",
" n_estimators=100, n_jobs=1, nthread=1,\n",
" num_parallel_tree=None, ...))])FeatureUnion(transformer_list=[('pipeline-1',\n",
" Pipeline(steps=[('selectfwe',\n",
" SelectFwe(alpha=0.0080564930162)),\n",
" ('quantiletransformer',\n",
" QuantileTransformer(n_quantiles=450,\n",
" output_distribution='normal'))])),\n",
" ('pipeline-2',\n",
" Pipeline(steps=[('variancethreshold',\n",
" VarianceThreshold(threshold=0.155443085484)),\n",
" ('columnonehotencoder',\n",
" ColumnOneHotEncoder())]))])SelectFwe(alpha=0.0080564930162)
QuantileTransformer(n_quantiles=450, output_distribution='normal')
VarianceThreshold(threshold=0.155443085484)
ColumnOneHotEncoder()
XGBClassifier(base_score=None, booster=None, callbacks=None,\n",
" colsample_bylevel=None, colsample_bynode=None,\n",
" colsample_bytree=None, device=None, early_stopping_rounds=None,\n",
" enable_categorical=False, eval_metric=None, feature_types=None,\n",
" gamma=14.5866790094856, grow_policy=None, importance_type=None,\n",
" interaction_constraints=None, learning_rate=0.2226908938347,\n",
" max_bin=None, max_cat_threshold=None, max_cat_to_onehot=None,\n",
" max_delta_step=None, max_depth=11, max_leaves=None,\n",
" min_child_weight=3, missing=nan, monotone_constraints=None,\n",
" multi_strategy=None, n_estimators=100, n_jobs=1, nthread=1,\n",
" num_parallel_tree=None, ...)FeatureUnion(transformer_list=[('fastica', FastICA(n_components=4))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. FeatureUnion(transformer_list=[('fastica', FastICA(n_components=4))])FastICA(n_components=4)
FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('pca',\n",
" PCA(n_components=0.9386236966835)),\n",
" ('zerocount',\n",
" ZeroCount()),\n",
" ('featureagglomeration',\n",
" FeatureAgglomeration(n_clusters=94,\n",
" pooling_func=<function max at 0x1048f3470>))])),\n",
" ('passthrough', Passthrough())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('pca',\n",
" PCA(n_components=0.9386236966835)),\n",
" ('zerocount',\n",
" ZeroCount()),\n",
" ('featureagglomeration',\n",
" FeatureAgglomeration(n_clusters=94,\n",
" pooling_func=<function max at 0x1048f3470>))])),\n",
" ('passthrough', Passthrough())])PCA(n_components=0.9386236966835)
ZeroCount()
FeatureAgglomeration(n_clusters=94, pooling_func=<function max at 0x1048f3470>)
Passthrough()
Pipeline(steps=[('variancethreshold',\n",
" VarianceThreshold(threshold=0.0003352949622)),\n",
" ('featureunion',\n",
" FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('featureagglomeration',\n",
" FeatureAgglomeration(linkage='complete',\n",
" metric='cosine',\n",
" n_clusters=25)),\n",
" ('columnordinalencoder',\n",
" ColumnOrdinalEncoder())])),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
" ('mlpclassifier',\n",
" MLPClassifier(activation='identity', alpha=0.000256185492,\n",
" early_stopping=True,\n",
" hidden_layer_sizes=[146, 146, 146],\n",
" learning_rate='invscaling',\n",
" learning_rate_init=0.0006442167601,\n",
" n_iter_no_change=32))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. Pipeline(steps=[('variancethreshold',\n",
" VarianceThreshold(threshold=0.0003352949622)),\n",
" ('featureunion',\n",
" FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('featureagglomeration',\n",
" FeatureAgglomeration(linkage='complete',\n",
" metric='cosine',\n",
" n_clusters=25)),\n",
" ('columnordinalencoder',\n",
" ColumnOrdinalEncoder())])),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
" ('mlpclassifier',\n",
" MLPClassifier(activation='identity', alpha=0.000256185492,\n",
" early_stopping=True,\n",
" hidden_layer_sizes=[146, 146, 146],\n",
" learning_rate='invscaling',\n",
" learning_rate_init=0.0006442167601,\n",
" n_iter_no_change=32))])VarianceThreshold(threshold=0.0003352949622)
FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('featureagglomeration',\n",
" FeatureAgglomeration(linkage='complete',\n",
" metric='cosine',\n",
" n_clusters=25)),\n",
" ('columnordinalencoder',\n",
" ColumnOrdinalEncoder())])),\n",
" ('passthrough', Passthrough())])FeatureAgglomeration(linkage='complete', metric='cosine', n_clusters=25)
ColumnOrdinalEncoder()
Passthrough()
MLPClassifier(activation='identity', alpha=0.000256185492, early_stopping=True,\n",
" hidden_layer_sizes=[146, 146, 146], learning_rate='invscaling',\n",
" learning_rate_init=0.0006442167601, n_iter_no_change=32)ExtraTreesClassifier(class_weight='balanced', max_features=0.9851993193336,\n",
" min_samples_leaf=5, min_samples_split=6, n_jobs=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. ExtraTreesClassifier(class_weight='balanced', max_features=0.9851993193336,\n",
" min_samples_leaf=5, min_samples_split=6, n_jobs=1)SelectFromModel(estimator=ExtraTreesClassifier(max_features=0.277440186742,\n",
" min_samples_leaf=9,\n",
" min_samples_split=17, n_jobs=1),\n",
" threshold=0.0032005860778)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. SelectFromModel(estimator=ExtraTreesClassifier(max_features=0.277440186742,\n",
" min_samples_leaf=9,\n",
" min_samples_split=17, n_jobs=1),\n",
" threshold=0.0032005860778)ExtraTreesClassifier(max_features=0.277440186742, min_samples_leaf=9,\n",
" min_samples_split=17, n_jobs=1)ExtraTreesClassifier(max_features=0.277440186742, min_samples_leaf=9,\n",
" min_samples_split=17, n_jobs=1)EstimatorTransformer(estimator=MLPClassifier(alpha=0.000648285661,\n",
" hidden_layer_sizes=[380],\n",
" learning_rate='invscaling',\n",
" learning_rate_init=0.0008851810314,\n",
" n_iter_no_change=32))In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. EstimatorTransformer(estimator=MLPClassifier(alpha=0.000648285661,\n",
" hidden_layer_sizes=[380],\n",
" learning_rate='invscaling',\n",
" learning_rate_init=0.0008851810314,\n",
" n_iter_no_change=32))MLPClassifier(alpha=0.000648285661, hidden_layer_sizes=[380],\n",
" learning_rate='invscaling', learning_rate_init=0.0008851810314,\n",
" n_iter_no_change=32)MLPClassifier(alpha=0.000648285661, hidden_layer_sizes=[380],\n",
" learning_rate='invscaling', learning_rate_init=0.0008851810314,\n",
" n_iter_no_change=32)Pipeline(steps=[('robustscaler',\n",
" RobustScaler(quantile_range=(0.2632669052042,\n",
" 0.892009308738))),\n",
" ('featureunion-1',\n",
" FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('columnonehotencoder',\n",
" ColumnOneHotEncoder()),\n",
" ('kbinsdiscretizer',\n",
" KBinsDiscretizer(encode='onehot-dense',\n",
" n_bins=58,\n",
" strategy='kmeans'))])),\n",
" ('passthrough',\n",
" Passth...\n",
" estimator=LogisticRegression(C=334.8557628287718,\n",
" max_iter=1000,\n",
" n_jobs=1,\n",
" solver='saga'),\n",
" method='predict')),\n",
" ('estimatortransformer-2',\n",
" EstimatorTransformer(cross_val_predict_cv=10,\n",
" estimator=QuadraticDiscriminantAnalysis(reg_param=0.0011738914966),\n",
" method='predict'))])),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
" ('lineardiscriminantanalysis', LinearDiscriminantAnalysis())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. Pipeline(steps=[('robustscaler',\n",
" RobustScaler(quantile_range=(0.2632669052042,\n",
" 0.892009308738))),\n",
" ('featureunion-1',\n",
" FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('columnonehotencoder',\n",
" ColumnOneHotEncoder()),\n",
" ('kbinsdiscretizer',\n",
" KBinsDiscretizer(encode='onehot-dense',\n",
" n_bins=58,\n",
" strategy='kmeans'))])),\n",
" ('passthrough',\n",
" Passth...\n",
" estimator=LogisticRegression(C=334.8557628287718,\n",
" max_iter=1000,\n",
" n_jobs=1,\n",
" solver='saga'),\n",
" method='predict')),\n",
" ('estimatortransformer-2',\n",
" EstimatorTransformer(cross_val_predict_cv=10,\n",
" estimator=QuadraticDiscriminantAnalysis(reg_param=0.0011738914966),\n",
" method='predict'))])),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
" ('lineardiscriminantanalysis', LinearDiscriminantAnalysis())])RobustScaler(quantile_range=(0.2632669052042, 0.892009308738))
FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('columnonehotencoder',\n",
" ColumnOneHotEncoder()),\n",
" ('kbinsdiscretizer',\n",
" KBinsDiscretizer(encode='onehot-dense',\n",
" n_bins=58,\n",
" strategy='kmeans'))])),\n",
" ('passthrough', Passthrough())])ColumnOneHotEncoder()
KBinsDiscretizer(encode='onehot-dense', n_bins=58, strategy='kmeans')
Passthrough()
FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('estimatortransformer-1',\n",
" EstimatorTransformer(cross_val_predict_cv=10,\n",
" estimator=LogisticRegression(C=334.8557628287718,\n",
" max_iter=1000,\n",
" n_jobs=1,\n",
" solver='saga'),\n",
" method='predict')),\n",
" ('estimatortransformer-2',\n",
" EstimatorTransformer(cross_val_predict_cv=10,\n",
" estimator=QuadraticDiscriminantAnalysis(reg_param=0.0011738914966),\n",
" method='predict'))])),\n",
" ('passthrough', Passthrough())])LogisticRegression(C=334.8557628287718, max_iter=1000, n_jobs=1, solver='saga')
LogisticRegression(C=334.8557628287718, max_iter=1000, n_jobs=1, solver='saga')
QuadraticDiscriminantAnalysis(reg_param=0.0011738914966)
QuadraticDiscriminantAnalysis(reg_param=0.0011738914966)
Passthrough()
LinearDiscriminantAnalysis()
FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('kbinsdiscretizer',\n",
" KBinsDiscretizer(encode='onehot-dense',\n",
" n_bins=80,\n",
" strategy='kmeans')),\n",
" ('fastica',\n",
" FastICA(algorithm='deflation',\n",
" n_components=91))])),\n",
" ('passthrough', Passthrough())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('kbinsdiscretizer',\n",
" KBinsDiscretizer(encode='onehot-dense',\n",
" n_bins=80,\n",
" strategy='kmeans')),\n",
" ('fastica',\n",
" FastICA(algorithm='deflation',\n",
" n_components=91))])),\n",
" ('passthrough', Passthrough())])KBinsDiscretizer(encode='onehot-dense', n_bins=80, strategy='kmeans')
FastICA(algorithm='deflation', n_components=91)
Passthrough()
FeatureUnion(transformer_list=[('skiptransformer', SkipTransformer()),\n",
" ('passthrough', Passthrough())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. FeatureUnion(transformer_list=[('skiptransformer', SkipTransformer()),\n",
" ('passthrough', Passthrough())])SkipTransformer()
Passthrough()
FeatureUnion(transformer_list=[('skiptransformer', SkipTransformer()),\n",
" ('passthrough', Passthrough())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. FeatureUnion(transformer_list=[('skiptransformer', SkipTransformer()),\n",
" ('passthrough', Passthrough())])SkipTransformer()
Passthrough()
Pipeline(steps=[('normalizer', Normalizer(norm='l1')),\n",
" ('featureunion-1',\n",
" FeatureUnion(transformer_list=[('skiptransformer',\n",
" SkipTransformer()),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
" ('featureunion-2',\n",
" FeatureUnion(transformer_list=[('skiptransformer',\n",
" SkipTransformer()),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
" ('bernoullinb',\n",
" BernoulliNB(alpha=5.0573782838899, fit_prior=False))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. Pipeline(steps=[('normalizer', Normalizer(norm='l1')),\n",
" ('featureunion-1',\n",
" FeatureUnion(transformer_list=[('skiptransformer',\n",
" SkipTransformer()),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
" ('featureunion-2',\n",
" FeatureUnion(transformer_list=[('skiptransformer',\n",
" SkipTransformer()),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
" ('bernoullinb',\n",
" BernoulliNB(alpha=5.0573782838899, fit_prior=False))])Normalizer(norm='l1')
FeatureUnion(transformer_list=[('skiptransformer', SkipTransformer()),\n",
" ('passthrough', Passthrough())])SkipTransformer()
Passthrough()
FeatureUnion(transformer_list=[('skiptransformer', SkipTransformer()),\n",
" ('passthrough', Passthrough())])SkipTransformer()
Passthrough()
BernoulliNB(alpha=5.0573782838899, fit_prior=False)
Pipeline(steps=[('standardscaler', StandardScaler()),\n",
" ('rfe',\n",
" RFE(estimator=ExtraTreesClassifier(max_features=0.8009842720563,\n",
" min_samples_leaf=4,\n",
" min_samples_split=9,\n",
" n_jobs=1),\n",
" step=0.4315847507401)),\n",
" ('featureunion-1',\n",
" FeatureUnion(transformer_list=[('skiptransformer',\n",
" SkipTransformer()),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
" ('featureunion-2',\n",
" FeatureUnion(tran...\n",
" max_features='sqrt',\n",
" min_samples_leaf=17,\n",
" min_samples_split=8))),\n",
" ('estimatortransformer-2',\n",
" EstimatorTransformer(cross_val_predict_cv=5,\n",
" estimator=LGBMClassifier(max_depth=3,\n",
" n_estimators=84,\n",
" n_jobs=1,\n",
" num_leaves=244,\n",
" verbose=-1)))])),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
" ('lineardiscriminantanalysis',\n",
" LinearDiscriminantAnalysis(shrinkage=0.369619691802,\n",
" solver='eigen'))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. Pipeline(steps=[('standardscaler', StandardScaler()),\n",
" ('rfe',\n",
" RFE(estimator=ExtraTreesClassifier(max_features=0.8009842720563,\n",
" min_samples_leaf=4,\n",
" min_samples_split=9,\n",
" n_jobs=1),\n",
" step=0.4315847507401)),\n",
" ('featureunion-1',\n",
" FeatureUnion(transformer_list=[('skiptransformer',\n",
" SkipTransformer()),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
" ('featureunion-2',\n",
" FeatureUnion(tran...\n",
" max_features='sqrt',\n",
" min_samples_leaf=17,\n",
" min_samples_split=8))),\n",
" ('estimatortransformer-2',\n",
" EstimatorTransformer(cross_val_predict_cv=5,\n",
" estimator=LGBMClassifier(max_depth=3,\n",
" n_estimators=84,\n",
" n_jobs=1,\n",
" num_leaves=244,\n",
" verbose=-1)))])),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
" ('lineardiscriminantanalysis',\n",
" LinearDiscriminantAnalysis(shrinkage=0.369619691802,\n",
" solver='eigen'))])StandardScaler()
RFE(estimator=ExtraTreesClassifier(max_features=0.8009842720563,\n",
" min_samples_leaf=4, min_samples_split=9,\n",
" n_jobs=1),\n",
" step=0.4315847507401)ExtraTreesClassifier(max_features=0.8009842720563, min_samples_leaf=4,\n",
" min_samples_split=9, n_jobs=1)ExtraTreesClassifier(max_features=0.8009842720563, min_samples_leaf=4,\n",
" min_samples_split=9, n_jobs=1)FeatureUnion(transformer_list=[('skiptransformer', SkipTransformer()),\n",
" ('passthrough', Passthrough())])SkipTransformer()
Passthrough()
FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('estimatortransformer-1',\n",
" EstimatorTransformer(cross_val_predict_cv=5,\n",
" estimator=DecisionTreeClassifier(criterion='entropy',\n",
" max_depth=1,\n",
" max_features='sqrt',\n",
" min_samples_leaf=17,\n",
" min_samples_split=8))),\n",
" ('estimatortransformer-2',\n",
" EstimatorTransformer(cross_val_predict_cv=5,\n",
" estimator=LGBMClassifier(max_depth=3,\n",
" n_estimators=84,\n",
" n_jobs=1,\n",
" num_leaves=244,\n",
" verbose=-1)))])),\n",
" ('passthrough', Passthrough())])DecisionTreeClassifier(criterion='entropy', max_depth=1, max_features='sqrt',\n",
" min_samples_leaf=17, min_samples_split=8)DecisionTreeClassifier(criterion='entropy', max_depth=1, max_features='sqrt',\n",
" min_samples_leaf=17, min_samples_split=8)LGBMClassifier(max_depth=3, n_estimators=84, n_jobs=1, num_leaves=244,\n",
" verbose=-1)LGBMClassifier(max_depth=3, n_estimators=84, n_jobs=1, num_leaves=244,\n",
" verbose=-1)Passthrough()
LinearDiscriminantAnalysis(shrinkage=0.369619691802, solver='eigen')
TPOTEstimator(classification=True, cv=5, early_stop=2, max_time_mins=10,\n",
" n_jobs=4,\n",
" scorers=['roc_auc_ovr',\n",
" <function complexity_scorer at 0x32f4e0550>],\n",
" scorers_weights=[1.0, -1.0],\n",
" search_space=<tpot.search_spaces.pipelines.sequential.SequentialPipeline object at 0x32f7692d0>,\n",
" verbose=2)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. TPOTEstimator(classification=True, cv=5, early_stop=2, max_time_mins=10,\n",
" n_jobs=4,\n",
" scorers=['roc_auc_ovr',\n",
" <function complexity_scorer at 0x32f4e0550>],\n",
" scorers_weights=[1.0, -1.0],\n",
" search_space=<tpot.search_spaces.pipelines.sequential.SequentialPipeline object at 0x32f7692d0>,\n",
" verbose=2)Pipeline(steps=[('selectfwe', SelectFwe(alpha=0.0001569023321)),\n",
" ('powertransformer', PowerTransformer()),\n",
" ('mlpclassifier',\n",
" MLPClassifier(activation='identity', alpha=0.0008696190619,\n",
" hidden_layer_sizes=[203, 203],\n",
" learning_rate_init=0.0135276110446,\n",
" n_iter_no_change=32))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. Pipeline(steps=[('selectfwe', SelectFwe(alpha=0.0001569023321)),\n",
" ('powertransformer', PowerTransformer()),\n",
" ('mlpclassifier',\n",
" MLPClassifier(activation='identity', alpha=0.0008696190619,\n",
" hidden_layer_sizes=[203, 203],\n",
" learning_rate_init=0.0135276110446,\n",
" n_iter_no_change=32))])SelectFwe(alpha=0.0001569023321)
PowerTransformer()
MLPClassifier(activation='identity', alpha=0.0008696190619,\n",
" hidden_layer_sizes=[203, 203], learning_rate_init=0.0135276110446,\n",
" n_iter_no_change=32)SimpleImputer()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
SimpleImputer()
TPOTEstimator(classification=True, early_stop=2, max_eval_time_mins=300,\n",
" max_time_mins=10, n_jobs=20, objective_function_names=['rmse'],\n",
" other_objective_functions=[functools.partial(<function rmse_obective at 0x33edfd480>, X=array([[ 0.03807591, 0.05068012, 0.06169621, ..., -0.00259226,\n",
" 0.01990749, -0.01764613],\n",
" [-0.00188202, -0.04464164, -0.05147406, ..., -0.03949338,\n",
" -...\n",
" -0.04688253, 0.01549073],\n",
" [-0.04547248, -0.04464164, 0.03906215, ..., 0.02655962,\n",
" 0.04452873, -0.02593034],\n",
" [-0.04547248, -0.04464164, -0.0730303 , ..., -0.03949338,\n",
" -0.00422151, 0.00306441]]), missing_add=0.2)],\n",
" other_objective_functions_weights=[-1], scorers=[],\n",
" scorers_weights=[],\n",
" search_space=<tpot.search_spaces.pipelines.choice.ChoicePipeline object at 0x36ff3e770>,\n",
" verbose=3)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. TPOTEstimator(classification=True, early_stop=2, max_eval_time_mins=300,\n",
" max_time_mins=10, n_jobs=20, objective_function_names=['rmse'],\n",
" other_objective_functions=[functools.partial(<function rmse_obective at 0x33edfd480>, X=array([[ 0.03807591, 0.05068012, 0.06169621, ..., -0.00259226,\n",
" 0.01990749, -0.01764613],\n",
" [-0.00188202, -0.04464164, -0.05147406, ..., -0.03949338,\n",
" -...\n",
" -0.04688253, 0.01549073],\n",
" [-0.04547248, -0.04464164, 0.03906215, ..., 0.02655962,\n",
" 0.04452873, -0.02593034],\n",
" [-0.04547248, -0.04464164, -0.0730303 , ..., -0.03949338,\n",
" -0.00422151, 0.00306441]]), missing_add=0.2)],\n",
" other_objective_functions_weights=[-1], scorers=[],\n",
" scorers_weights=[],\n",
" search_space=<tpot.search_spaces.pipelines.choice.ChoicePipeline object at 0x36ff3e770>,\n",
" verbose=3)IterativeImputer(estimator=ExtraTreesRegressor(criterion='friedman_mse',\n",
" max_features=0.6404215718013,\n",
" min_samples_leaf=2,\n",
" min_samples_split=10, n_jobs=1),\n",
" imputation_order='arabic', n_nearest_features=9)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. IterativeImputer(estimator=ExtraTreesRegressor(criterion='friedman_mse',\n",
" max_features=0.6404215718013,\n",
" min_samples_leaf=2,\n",
" min_samples_split=10, n_jobs=1),\n",
" imputation_order='arabic', n_nearest_features=9)ExtraTreesRegressor(criterion='friedman_mse', max_features=0.6404215718013,\n",
" min_samples_leaf=2, min_samples_split=10, n_jobs=1)ExtraTreesRegressor(criterion='friedman_mse', max_features=0.6404215718013,\n",
" min_samples_leaf=2, min_samples_split=10, n_jobs=1)TPOTEstimator(classification=True, cv=5, max_eval_time_mins=300,\n",
" max_time_mins=10, n_jobs=20, scorers=['roc_auc'],\n",
" scorers_weights=[1],\n",
" search_space=<tpot.search_spaces.pipelines.sequential.SequentialPipeline object at 0x35798c880>,\n",
" verbose=2)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. TPOTEstimator(classification=True, cv=5, max_eval_time_mins=300,\n",
" max_time_mins=10, n_jobs=20, scorers=['roc_auc'],\n",
" scorers_weights=[1],\n",
" search_space=<tpot.search_spaces.pipelines.sequential.SequentialPipeline object at 0x35798c880>,\n",
" verbose=2)Pipeline(steps=[('maxabsscaler', MaxAbsScaler()),\n",
" ('selectfwe', SelectFwe(alpha=0.0004883916878)),\n",
" ('featureunion-1',\n",
" FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('powertransformer',\n",
" PowerTransformer())])),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
" ('featureunion-2',\n",
" FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('estimatortransformer',\n",
" EstimatorTransformer(estimator=LinearDiscriminantAnalysis(shrinkage=0.5801392483719,\n",
" solver='lsqr')))])),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
" ('mlpclassifier',\n",
" MLPClassifier(activation='identity', alpha=0.0310773820788,\n",
" hidden_layer_sizes=[54, 54, 54],\n",
" learning_rate_init=0.0017701050157,\n",
" n_iter_no_change=32))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. Pipeline(steps=[('maxabsscaler', MaxAbsScaler()),\n",
" ('selectfwe', SelectFwe(alpha=0.0004883916878)),\n",
" ('featureunion-1',\n",
" FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('powertransformer',\n",
" PowerTransformer())])),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
" ('featureunion-2',\n",
" FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('estimatortransformer',\n",
" EstimatorTransformer(estimator=LinearDiscriminantAnalysis(shrinkage=0.5801392483719,\n",
" solver='lsqr')))])),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
" ('mlpclassifier',\n",
" MLPClassifier(activation='identity', alpha=0.0310773820788,\n",
" hidden_layer_sizes=[54, 54, 54],\n",
" learning_rate_init=0.0017701050157,\n",
" n_iter_no_change=32))])MaxAbsScaler()
SelectFwe(alpha=0.0004883916878)
FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('powertransformer',\n",
" PowerTransformer())])),\n",
" ('passthrough', Passthrough())])PowerTransformer()
Passthrough()
FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('estimatortransformer',\n",
" EstimatorTransformer(estimator=LinearDiscriminantAnalysis(shrinkage=0.5801392483719,\n",
" solver='lsqr')))])),\n",
" ('passthrough', Passthrough())])LinearDiscriminantAnalysis(shrinkage=0.5801392483719, solver='lsqr')
LinearDiscriminantAnalysis(shrinkage=0.5801392483719, solver='lsqr')
Passthrough()
MLPClassifier(activation='identity', alpha=0.0310773820788,\n",
" hidden_layer_sizes=[54, 54, 54],\n",
" learning_rate_init=0.0017701050157, n_iter_no_change=32)