{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Genetic Feature Selection nodes in TPOT\n", "\n", "TPOT can use evolutionary algorithms to optimize feature selection simultaneously with pipeline optimization. It includes two node search spaces with different feature selection strategies: FSSNode and GeneticFeatureSelectorNode. \n", "\n", "1. FSSNode - (Feature Set Selector) This node is useful if you have a list of predefined feature sets you want to select from. Each FeatureSetSelector Node will select a single group of features to be passed to the next step in the pipeline. Note that FSSNode does not create its own subset of features and does not mix/match multiple predefined feature sets.\n", "\n", "2. GeneticFeatureSelectorNode—Whereas the FSSNode selects from a predefined list of subsets of features, this node uses evolutionary algorithms to optimize a novel subset of features from scratch. This is useful where there is no predefined grouping of features. \n", "\n", "This tutorial focuses on FSSNode. See Tutorial 5 for more information on GeneticFeatureSelectorNode.\n", "\n", "It may also be beneficial to pair these search spaces with a secondary objective function to minimize complexity. That would encourage TPOT to try to produce the simplest pipeline with the fewest number of features.\n", "\n", "tpot.objectives.number_of_nodes_objective - This can be used as an other_objective_function that counts the number of nodes.\n", "\n", "tpot.objectives.complexity_scorer - This is a scorer that tries to count the total number of learned parameters (number of coefficients, number of nodes in decision trees, etc.).\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Feature Set Selector\n", "\n", "The FeatureSetSelector is a subclass of sklearn.feature_selection.SelectorMixin that simply returns the manually specified columns. The parameter sel_subset specifies the name or index of the column that it selects. The transform function then simply indexes and returns the selected columns. You can also optionally name the group with the name parameter, though this is only for note keeping and does is not used by the class.\n", "\n", "\n", " sel_subset: list or int\n", " If X is a dataframe, items in sel_subset list must correspond to column names\n", " If X is a numpy array, items in sel_subset list must correspond to column indexes\n", " int: index of a single column\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "original DataFrame\n", " a b c d e f\n", "0 0 1 2 3 4 5\n", "1 0 1 2 3 4 5\n", "2 0 1 2 3 4 5\n", "3 0 1 2 3 4 5\n", "4 0 1 2 3 4 5\n", "5 0 1 2 3 4 5\n", "6 0 1 2 3 4 5\n", "7 0 1 2 3 4 5\n", "8 0 1 2 3 4 5\n", "9 0 1 2 3 4 5\n", "Transformed Data\n", "[[0 1 2]\n", " [0 1 2]\n", " [0 1 2]\n", " [0 1 2]\n", " [0 1 2]\n", " [0 1 2]\n", " [0 1 2]\n", " [0 1 2]\n", " [0 1 2]\n", " [0 1 2]]\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", " from .autonotebook import tqdm as notebook_tqdm\n" ] } ], "source": [ "import tpot\n", "import pandas as pd\n", "import numpy as np\n", "#make a dataframe with columns a,b,c,d,e,f\n", "\n", "#numpy array where columns are 1,2,3,4,5,6\n", "data = np.repeat([np.arange(6)],10,0)\n", "\n", "df = pd.DataFrame(data,columns=['a','b','c','d','e','f'])\n", "fss = tpot.builtin_modules.FeatureSetSelector(name='test',sel_subset=['a','b','c'])\n", "\n", "print(\"original DataFrame\")\n", "print(df)\n", "print(\"Transformed Data\")\n", "print(fss.fit_transform(df))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# FSSNode\n", "\n", "The `FSSNode` is a node search space that simply selects one feature set from a list of feature sets. This works identically to the EstimatorNode, but provides a easier interface for defining the feature sets.\n", "\n", "Note that the FSS is only well defined when used as the first step in a pipeline. This is because downstream nodes will receive different transformations of the data such that the original indexes no longer correspond to the same columns in the transformed data.\n", "\n", "The `FSSNode` takes in a single parameter `subsets` which defines the groups of features. There are four ways of defining the subsets. \n", "\n", " subsets : str or list, default=None\n", " Sets the subsets that the FeatureSetSeletor will select from if set as an option in one of the configuration dictionaries. \n", " Features are defined by column names if using a Pandas data frame, or ints corresponding to indexes if using numpy arrays.\n", " - str : If a string, it is assumed to be a path to a csv file with the subsets. \n", " The first column is assumed to be the name of the subset and the remaining columns are the features in the subset.\n", " - list or np.ndarray : If a list or np.ndarray, it is assumed to be a list of subsets (i.e a list of lists).\n", " - dict : A dictionary where keys are the names of the subsets and the values are the list of features.\n", " - int : If an int, it is assumed to be the number of subsets to generate. Each subset will contain one feature.\n", " - None : If None, each column will be treated as a subset. One column will be selected per subset.\n", "\n", "\n", "Lets say you want to have three groups of features, each with three columns each. The following examples are equivalent:\n", "\n", "### str\n", "\n", "sel_subsets=simple_fss.csv\n", "\n", "\n", " \\# simple_fss.csv\n", " group_one, 1,2,3\n", " group_two, 4,5,6\n", " group_three, 7,8,9\n", "\n", "\n", "### dict\n", "\n", "\n", "sel_subsets = { \"group_one\" : [1,2,3],\n", " \"group_two\" : [4,5,6],\n", " \"group_three\" : [7,8,9],\n", " }\n", "\n", "\n", "### list\n", "\n", "\n", "sel_subsets = [[1,2,3],\n", "[4,5,6],\n", "[7,8,9]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Examples\n", "\n", "For these examples, we create a dummy dataset where the first six columns are informative and the rest are uninformative." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | a | \n", "b | \n", "c | \n", "d | \n", "e | \n", "f | \n", "g | \n", "h | \n", "i | \n", "j | \n", "k | \n", "l | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "2.315814 | \n", "-3.427720 | \n", "-1.314654 | \n", "-1.508737 | \n", "-0.300932 | \n", "0.089448 | \n", "0.327651 | \n", "0.329022 | \n", "0.857495 | \n", "0.734238 | \n", "0.257218 | \n", "0.652350 | \n", "
| 1 | \n", "-0.191001 | \n", "-1.396922 | \n", "0.149488 | \n", "-1.730145 | \n", "-0.394932 | \n", "0.519712 | \n", "0.807762 | \n", "0.509823 | \n", "0.876159 | \n", "0.002806 | \n", "0.449828 | \n", "0.671350 | \n", "
| 2 | \n", "0.661264 | \n", "-0.981737 | \n", "0.703879 | \n", "0.730321 | \n", "-2.750405 | \n", "0.396581 | \n", "0.380302 | \n", "0.532604 | \n", "0.877129 | \n", "0.610919 | \n", "0.780108 | \n", "0.625689 | \n", "
| 3 | \n", "1.445936 | \n", "0.354237 | \n", "0.779040 | \n", "1.288014 | \n", "2.397133 | \n", "0.186324 | \n", "0.544191 | \n", "0.465419 | \n", "0.588535 | \n", "0.919575 | \n", "0.513460 | \n", "0.831546 | \n", "
| 4 | \n", "-0.989027 | \n", "-1.824787 | \n", "-1.448234 | \n", "1.546442 | \n", "1.643775 | \n", "0.167975 | \n", "0.188238 | \n", "0.024149 | \n", "0.544878 | \n", "0.834503 | \n", "0.877869 | \n", "0.278330 | \n", "
FeatureSetSelector(name='group_two', sel_subset=['d', 'e', 'f'])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
FeatureSetSelector(name='group_two', sel_subset=['d', 'e', 'f'])
| \n", " | d | \n", "e | \n", "f | \n", "
|---|---|---|---|
| 162 | \n", "1.315442 | \n", "-1.039258 | \n", "0.194516 | \n", "
| 168 | \n", "-1.908995 | \n", "-0.953551 | \n", "-1.430472 | \n", "
| 214 | \n", "0.181162 | \n", "1.022858 | \n", "-2.289700 | \n", "
| 895 | \n", "2.825765 | \n", "-1.205520 | \n", "1.147791 | \n", "
| 154 | \n", "-2.300481 | \n", "1.023173 | \n", "0.449162 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "
| 32 | \n", "-1.793062 | \n", "2.209649 | \n", "-0.045031 | \n", "
| 829 | \n", "-0.221409 | \n", "1.688750 | \n", "0.069356 | \n", "
| 176 | \n", "0.141471 | \n", "-1.880294 | \n", "1.984397 | \n", "
| 124 | \n", "-0.359952 | \n", "1.141758 | \n", "2.019301 | \n", "
| 35 | \n", "0.171312 | \n", "0.079332 | \n", "0.178522 | \n", "
750 rows × 3 columns
\n", "FeatureSetSelector(name='group_two', sel_subset=['d', 'e', 'f'])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
FeatureSetSelector(name='group_two', sel_subset=['d', 'e', 'f'])
FeatureSetSelector(name='group_four', sel_subset=['j', 'k', 'l'])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
FeatureSetSelector(name='group_four', sel_subset=['j', 'k', 'l'])
Pipeline(steps=[('featuresetselector',\n",
" FeatureSetSelector(name='group_one',\n",
" sel_subset=['a', 'b', 'c'])),\n",
" ('randomforestclassifier',\n",
" RandomForestClassifier(max_features=0.30141491087,\n",
" min_samples_leaf=4,\n",
" min_samples_split=17, n_estimators=128,\n",
" n_jobs=1))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. Pipeline(steps=[('featuresetselector',\n",
" FeatureSetSelector(name='group_one',\n",
" sel_subset=['a', 'b', 'c'])),\n",
" ('randomforestclassifier',\n",
" RandomForestClassifier(max_features=0.30141491087,\n",
" min_samples_leaf=4,\n",
" min_samples_split=17, n_estimators=128,\n",
" n_jobs=1))])FeatureSetSelector(name='group_one', sel_subset=['a', 'b', 'c'])
RandomForestClassifier(max_features=0.30141491087, min_samples_leaf=4,\n",
" min_samples_split=17, n_estimators=128, n_jobs=1)FeatureUnion(transformer_list=[('featuresetselector-1',\n",
" FeatureSetSelector(name='group_two',\n",
" sel_subset=['d', 'e', 'f'])),\n",
" ('featuresetselector-2',\n",
" FeatureSetSelector(name='group_three',\n",
" sel_subset=['g', 'h',\n",
" 'i']))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. FeatureUnion(transformer_list=[('featuresetselector-1',\n",
" FeatureSetSelector(name='group_two',\n",
" sel_subset=['d', 'e', 'f'])),\n",
" ('featuresetselector-2',\n",
" FeatureSetSelector(name='group_three',\n",
" sel_subset=['g', 'h',\n",
" 'i']))])FeatureSetSelector(name='group_two', sel_subset=['d', 'e', 'f'])
FeatureSetSelector(name='group_three', sel_subset=['g', 'h', 'i'])
| \n", " | d | \n", "e | \n", "f | \n", "g | \n", "h | \n", "i | \n", "
|---|---|---|---|---|---|---|
| 162 | \n", "1.315442 | \n", "-1.039258 | \n", "0.194516 | \n", "0.751175 | \n", "0.411340 | \n", "0.824754 | \n", "
| 168 | \n", "-1.908995 | \n", "-0.953551 | \n", "-1.430472 | \n", "0.072697 | \n", "0.875766 | \n", "0.953255 | \n", "
| 214 | \n", "0.181162 | \n", "1.022858 | \n", "-2.289700 | \n", "0.135222 | \n", "0.395847 | \n", "0.232638 | \n", "
| 895 | \n", "2.825765 | \n", "-1.205520 | \n", "1.147791 | \n", "0.925905 | \n", "0.486645 | \n", "0.710991 | \n", "
| 154 | \n", "-2.300481 | \n", "1.023173 | \n", "0.449162 | \n", "0.645161 | \n", "0.131657 | \n", "0.863514 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 32 | \n", "-1.793062 | \n", "2.209649 | \n", "-0.045031 | \n", "0.502947 | \n", "0.994603 | \n", "0.280062 | \n", "
| 829 | \n", "-0.221409 | \n", "1.688750 | \n", "0.069356 | \n", "0.328066 | \n", "0.102381 | \n", "0.492280 | \n", "
| 176 | \n", "0.141471 | \n", "-1.880294 | \n", "1.984397 | \n", "0.365550 | \n", "0.465859 | \n", "0.974601 | \n", "
| 124 | \n", "-0.359952 | \n", "1.141758 | \n", "2.019301 | \n", "0.329380 | \n", "0.718647 | \n", "0.365507 | \n", "
| 35 | \n", "0.171312 | \n", "0.079332 | \n", "0.178522 | \n", "0.215759 | \n", "0.546279 | \n", "0.662928 | \n", "
750 rows × 6 columns
\n", "FeatureUnion(transformer_list=[('featuresetselector',\n",
" FeatureSetSelector(name='group_three',\n",
" sel_subset=['g', 'h',\n",
" 'i']))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. FeatureUnion(transformer_list=[('featuresetselector',\n",
" FeatureSetSelector(name='group_three',\n",
" sel_subset=['g', 'h',\n",
" 'i']))])FeatureSetSelector(name='group_three', sel_subset=['g', 'h', 'i'])
FeatureUnion(transformer_list=[('featuresetselector-1',\n",
" FeatureSetSelector(name='group_one',\n",
" sel_subset=['a', 'b', 'c'])),\n",
" ('featuresetselector-2',\n",
" FeatureSetSelector(name='group_four',\n",
" sel_subset=['j', 'k',\n",
" 'l']))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. FeatureUnion(transformer_list=[('featuresetselector-1',\n",
" FeatureSetSelector(name='group_one',\n",
" sel_subset=['a', 'b', 'c'])),\n",
" ('featuresetselector-2',\n",
" FeatureSetSelector(name='group_four',\n",
" sel_subset=['j', 'k',\n",
" 'l']))])FeatureSetSelector(name='group_one', sel_subset=['a', 'b', 'c'])
FeatureSetSelector(name='group_four', sel_subset=['j', 'k', 'l'])
Pipeline(steps=[('featureunion',\n",
" FeatureUnion(transformer_list=[('featuresetselector-1',\n",
" FeatureSetSelector(name='group_two',\n",
" sel_subset=['d',\n",
" 'e',\n",
" 'f'])),\n",
" ('featuresetselector-2',\n",
" FeatureSetSelector(name='group_one',\n",
" sel_subset=['a',\n",
" 'b',\n",
" 'c']))])),\n",
" ('randomforestclassifier',\n",
" RandomForestClassifier(max_features=0.0530704381152,\n",
" min_samples_leaf=2, min_samples_split=5,\n",
" n_estimators=128, n_jobs=1))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. Pipeline(steps=[('featureunion',\n",
" FeatureUnion(transformer_list=[('featuresetselector-1',\n",
" FeatureSetSelector(name='group_two',\n",
" sel_subset=['d',\n",
" 'e',\n",
" 'f'])),\n",
" ('featuresetselector-2',\n",
" FeatureSetSelector(name='group_one',\n",
" sel_subset=['a',\n",
" 'b',\n",
" 'c']))])),\n",
" ('randomforestclassifier',\n",
" RandomForestClassifier(max_features=0.0530704381152,\n",
" min_samples_leaf=2, min_samples_split=5,\n",
" n_estimators=128, n_jobs=1))])FeatureUnion(transformer_list=[('featuresetselector-1',\n",
" FeatureSetSelector(name='group_two',\n",
" sel_subset=['d', 'e', 'f'])),\n",
" ('featuresetselector-2',\n",
" FeatureSetSelector(name='group_one',\n",
" sel_subset=['a', 'b',\n",
" 'c']))])FeatureSetSelector(name='group_two', sel_subset=['d', 'e', 'f'])
FeatureSetSelector(name='group_one', sel_subset=['a', 'b', 'c'])
RandomForestClassifier(max_features=0.0530704381152, min_samples_leaf=2,\n",
" min_samples_split=5, n_estimators=128, n_jobs=1)Pipeline(steps=[('featuresetselector',\n",
" FeatureSetSelector(name='group_two',\n",
" sel_subset=['d', 'e', 'f'])),\n",
" ('pipeline',\n",
" Pipeline(steps=[('maxabsscaler', MaxAbsScaler()),\n",
" ('rfe',\n",
" RFE(estimator=ExtraTreesClassifier(max_features=0.0390676831531,\n",
" min_samples_leaf=8,\n",
" min_samples_split=14,\n",
" n_jobs=1),\n",
" step=0.753983388654)),\n",
" ('featureunion-1',\n",
" FeatureUnion(transformer_lis...\n",
" FeatureUnion(transformer_list=[('skiptransformer',\n",
" SkipTransformer()),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
" ('histgradientboostingclassifier',\n",
" HistGradientBoostingClassifier(early_stopping=True,\n",
" l2_regularization=9.1304e-09,\n",
" learning_rate=0.0036310282582,\n",
" max_features=0.238877814721,\n",
" max_leaf_nodes=1696,\n",
" min_samples_leaf=59,\n",
" n_iter_no_change=14,\n",
" tol=0.0001,\n",
" validation_fraction=None))]))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. Pipeline(steps=[('featuresetselector',\n",
" FeatureSetSelector(name='group_two',\n",
" sel_subset=['d', 'e', 'f'])),\n",
" ('pipeline',\n",
" Pipeline(steps=[('maxabsscaler', MaxAbsScaler()),\n",
" ('rfe',\n",
" RFE(estimator=ExtraTreesClassifier(max_features=0.0390676831531,\n",
" min_samples_leaf=8,\n",
" min_samples_split=14,\n",
" n_jobs=1),\n",
" step=0.753983388654)),\n",
" ('featureunion-1',\n",
" FeatureUnion(transformer_lis...\n",
" FeatureUnion(transformer_list=[('skiptransformer',\n",
" SkipTransformer()),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
" ('histgradientboostingclassifier',\n",
" HistGradientBoostingClassifier(early_stopping=True,\n",
" l2_regularization=9.1304e-09,\n",
" learning_rate=0.0036310282582,\n",
" max_features=0.238877814721,\n",
" max_leaf_nodes=1696,\n",
" min_samples_leaf=59,\n",
" n_iter_no_change=14,\n",
" tol=0.0001,\n",
" validation_fraction=None))]))])FeatureSetSelector(name='group_two', sel_subset=['d', 'e', 'f'])
Pipeline(steps=[('maxabsscaler', MaxAbsScaler()),\n",
" ('rfe',\n",
" RFE(estimator=ExtraTreesClassifier(max_features=0.0390676831531,\n",
" min_samples_leaf=8,\n",
" min_samples_split=14,\n",
" n_jobs=1),\n",
" step=0.753983388654)),\n",
" ('featureunion-1',\n",
" FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('columnordinalencoder',\n",
" ColumnOrdinalEncoder()),\n",
" ('pca',\n",
" PCA(n_co...\n",
" FeatureUnion(transformer_list=[('skiptransformer',\n",
" SkipTransformer()),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
" ('histgradientboostingclassifier',\n",
" HistGradientBoostingClassifier(early_stopping=True,\n",
" l2_regularization=9.1304e-09,\n",
" learning_rate=0.0036310282582,\n",
" max_features=0.238877814721,\n",
" max_leaf_nodes=1696,\n",
" min_samples_leaf=59,\n",
" n_iter_no_change=14, tol=0.0001,\n",
" validation_fraction=None))])MaxAbsScaler()
RFE(estimator=ExtraTreesClassifier(max_features=0.0390676831531,\n",
" min_samples_leaf=8, min_samples_split=14,\n",
" n_jobs=1),\n",
" step=0.753983388654)ExtraTreesClassifier(max_features=0.0390676831531, min_samples_leaf=8,\n",
" min_samples_split=14, n_jobs=1)ExtraTreesClassifier(max_features=0.0390676831531, min_samples_leaf=8,\n",
" min_samples_split=14, n_jobs=1)FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('columnordinalencoder',\n",
" ColumnOrdinalEncoder()),\n",
" ('pca',\n",
" PCA(n_components=0.9286371732844))])),\n",
" ('passthrough', Passthrough())])ColumnOrdinalEncoder()
PCA(n_components=0.9286371732844)
Passthrough()
FeatureUnion(transformer_list=[('skiptransformer', SkipTransformer()),\n",
" ('passthrough', Passthrough())])SkipTransformer()
Passthrough()
HistGradientBoostingClassifier(early_stopping=True,\n",
" l2_regularization=9.1304e-09,\n",
" learning_rate=0.0036310282582,\n",
" max_features=0.238877814721, max_leaf_nodes=1696,\n",
" min_samples_leaf=59, n_iter_no_change=14,\n",
" tol=0.0001, validation_fraction=None)Pipeline(steps=[('featureunion',\n",
" FeatureUnion(transformer_list=[('pipeline-1',\n",
" Pipeline(steps=[('featuresetselector',\n",
" FeatureSetSelector(name='group_one',\n",
" sel_subset=['a',\n",
" 'b',\n",
" 'c'])),\n",
" ('pipeline',\n",
" Pipeline(steps=[('featureunion',\n",
" FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('zerocount',\n",
" ZeroCount())])),\n",
" ('passthrough',\n",
" Passth...\n",
" KBinsDiscretizer(encode='onehot-dense',\n",
" n_bins=11)),\n",
" ('rbfsampler',\n",
" RBFSampler(gamma=0.0925899621466,\n",
" n_components=17)),\n",
" ('maxabsscaler',\n",
" MaxAbsScaler())])),\n",
" ('passthrough',\n",
" Passthrough())]))]))]))])),\n",
" ('randomforestclassifier',\n",
" RandomForestClassifier(bootstrap=False,\n",
" class_weight='balanced',\n",
" max_features=0.8205760841606,\n",
" min_samples_leaf=16,\n",
" min_samples_split=11, n_estimators=128,\n",
" n_jobs=1))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. Pipeline(steps=[('featureunion',\n",
" FeatureUnion(transformer_list=[('pipeline-1',\n",
" Pipeline(steps=[('featuresetselector',\n",
" FeatureSetSelector(name='group_one',\n",
" sel_subset=['a',\n",
" 'b',\n",
" 'c'])),\n",
" ('pipeline',\n",
" Pipeline(steps=[('featureunion',\n",
" FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('zerocount',\n",
" ZeroCount())])),\n",
" ('passthrough',\n",
" Passth...\n",
" KBinsDiscretizer(encode='onehot-dense',\n",
" n_bins=11)),\n",
" ('rbfsampler',\n",
" RBFSampler(gamma=0.0925899621466,\n",
" n_components=17)),\n",
" ('maxabsscaler',\n",
" MaxAbsScaler())])),\n",
" ('passthrough',\n",
" Passthrough())]))]))]))])),\n",
" ('randomforestclassifier',\n",
" RandomForestClassifier(bootstrap=False,\n",
" class_weight='balanced',\n",
" max_features=0.8205760841606,\n",
" min_samples_leaf=16,\n",
" min_samples_split=11, n_estimators=128,\n",
" n_jobs=1))])FeatureUnion(transformer_list=[('pipeline-1',\n",
" Pipeline(steps=[('featuresetselector',\n",
" FeatureSetSelector(name='group_one',\n",
" sel_subset=['a',\n",
" 'b',\n",
" 'c'])),\n",
" ('pipeline',\n",
" Pipeline(steps=[('featureunion',\n",
" FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('zerocount',\n",
" ZeroCount())])),\n",
" ('passthrough',\n",
" Passthrough())]))]))])),\n",
" ('pipeline-2',...\n",
" PCA(n_components=0.9470333477868))])),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
" ('featureunion-3',\n",
" FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('kbinsdiscretizer',\n",
" KBinsDiscretizer(encode='onehot-dense',\n",
" n_bins=11)),\n",
" ('rbfsampler',\n",
" RBFSampler(gamma=0.0925899621466,\n",
" n_components=17)),\n",
" ('maxabsscaler',\n",
" MaxAbsScaler())])),\n",
" ('passthrough',\n",
" Passthrough())]))]))]))])FeatureSetSelector(name='group_one', sel_subset=['a', 'b', 'c'])
Pipeline(steps=[('featureunion',\n",
" FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('zerocount',\n",
" ZeroCount())])),\n",
" ('passthrough',\n",
" Passthrough())]))])FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('zerocount',\n",
" ZeroCount())])),\n",
" ('passthrough', Passthrough())])ZeroCount()
Passthrough()
FeatureSetSelector(name='group_four', sel_subset=['j', 'k', 'l'])
Pipeline(steps=[('featureunion-1',\n",
" FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('kbinsdiscretizer',\n",
" KBinsDiscretizer(encode='onehot-dense',\n",
" n_bins=37,\n",
" strategy='kmeans')),\n",
" ('featureagglomeration',\n",
" FeatureAgglomeration(n_clusters=31))])),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
" ('featureunion-2',\n",
" FeatureUnion(transformer_list=[('f...\n",
" PCA(n_components=0.9470333477868))])),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
" ('featureunion-3',\n",
" FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('kbinsdiscretizer',\n",
" KBinsDiscretizer(encode='onehot-dense',\n",
" n_bins=11)),\n",
" ('rbfsampler',\n",
" RBFSampler(gamma=0.0925899621466,\n",
" n_components=17)),\n",
" ('maxabsscaler',\n",
" MaxAbsScaler())])),\n",
" ('passthrough',\n",
" Passthrough())]))])FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('kbinsdiscretizer',\n",
" KBinsDiscretizer(encode='onehot-dense',\n",
" n_bins=37,\n",
" strategy='kmeans')),\n",
" ('featureagglomeration',\n",
" FeatureAgglomeration(n_clusters=31))])),\n",
" ('passthrough', Passthrough())])KBinsDiscretizer(encode='onehot-dense', n_bins=37, strategy='kmeans')
FeatureAgglomeration(n_clusters=31)
Passthrough()
FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('quantiletransformer',\n",
" QuantileTransformer(n_quantiles=840,\n",
" output_distribution='normal')),\n",
" ('pca',\n",
" PCA(n_components=0.9470333477868))])),\n",
" ('passthrough', Passthrough())])QuantileTransformer(n_quantiles=840, output_distribution='normal')
PCA(n_components=0.9470333477868)
Passthrough()
FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('kbinsdiscretizer',\n",
" KBinsDiscretizer(encode='onehot-dense',\n",
" n_bins=11)),\n",
" ('rbfsampler',\n",
" RBFSampler(gamma=0.0925899621466,\n",
" n_components=17)),\n",
" ('maxabsscaler',\n",
" MaxAbsScaler())])),\n",
" ('passthrough', Passthrough())])KBinsDiscretizer(encode='onehot-dense', n_bins=11)
RBFSampler(gamma=0.0925899621466, n_components=17)
MaxAbsScaler()
Passthrough()
RandomForestClassifier(bootstrap=False, class_weight='balanced',\n",
" max_features=0.8205760841606, min_samples_leaf=16,\n",
" min_samples_split=11, n_estimators=128, n_jobs=1)| \n", " | d | \n", "e | \n", "f | \n", "
|---|---|---|---|
| 162 | \n", "1.315442 | \n", "-1.039258 | \n", "0.194516 | \n", "
| 168 | \n", "-1.908995 | \n", "-0.953551 | \n", "-1.430472 | \n", "
| 214 | \n", "0.181162 | \n", "1.022858 | \n", "-2.289700 | \n", "
| 895 | \n", "2.825765 | \n", "-1.205520 | \n", "1.147791 | \n", "
| 154 | \n", "-2.300481 | \n", "1.023173 | \n", "0.449162 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "
| 32 | \n", "-1.793062 | \n", "2.209649 | \n", "-0.045031 | \n", "
| 829 | \n", "-0.221409 | \n", "1.688750 | \n", "0.069356 | \n", "
| 176 | \n", "0.141471 | \n", "-1.880294 | \n", "1.984397 | \n", "
| 124 | \n", "-0.359952 | \n", "1.141758 | \n", "2.019301 | \n", "
| 35 | \n", "0.171312 | \n", "0.079332 | \n", "0.178522 | \n", "
750 rows × 3 columns
\n", "| \n", " | d | \n", "e | \n", "f | \n", "
|---|---|---|---|
| 162 | \n", "1.315442 | \n", "-1.039258 | \n", "0.194516 | \n", "
| 168 | \n", "-1.908995 | \n", "-0.953551 | \n", "-1.430472 | \n", "
| 214 | \n", "0.181162 | \n", "1.022858 | \n", "-2.289700 | \n", "
| 895 | \n", "2.825765 | \n", "-1.205520 | \n", "1.147791 | \n", "
| 154 | \n", "-2.300481 | \n", "1.023173 | \n", "0.449162 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "
| 32 | \n", "-1.793062 | \n", "2.209649 | \n", "-0.045031 | \n", "
| 829 | \n", "-0.221409 | \n", "1.688750 | \n", "0.069356 | \n", "
| 176 | \n", "0.141471 | \n", "-1.880294 | \n", "1.984397 | \n", "
| 124 | \n", "-0.359952 | \n", "1.141758 | \n", "2.019301 | \n", "
| 35 | \n", "0.171312 | \n", "0.079332 | \n", "0.178522 | \n", "
750 rows × 3 columns
\n", "| \n", " | d | \n", "e | \n", "f | \n", "
|---|---|---|---|
| 162 | \n", "1.315442 | \n", "-1.039258 | \n", "0.194516 | \n", "
| 168 | \n", "-1.908995 | \n", "-0.953551 | \n", "-1.430472 | \n", "
| 214 | \n", "0.181162 | \n", "1.022858 | \n", "-2.289700 | \n", "
| 895 | \n", "2.825765 | \n", "-1.205520 | \n", "1.147791 | \n", "
| 154 | \n", "-2.300481 | \n", "1.023173 | \n", "0.449162 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "
| 32 | \n", "-1.793062 | \n", "2.209649 | \n", "-0.045031 | \n", "
| 829 | \n", "-0.221409 | \n", "1.688750 | \n", "0.069356 | \n", "
| 176 | \n", "0.141471 | \n", "-1.880294 | \n", "1.984397 | \n", "
| 124 | \n", "-0.359952 | \n", "1.141758 | \n", "2.019301 | \n", "
| 35 | \n", "0.171312 | \n", "0.079332 | \n", "0.178522 | \n", "
750 rows × 3 columns
\n", "