SIGN IN SIGN UP
EpistasisLab / tpot UNCLAIMED

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

0 0 1 Jupyter Notebook
2024-09-20 14:48:56 -07:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# What to expect from AutoML software\n",
"Automated machine learning (AutoML) takes a higher-level approach to machine learning than most practitioners are used to, so we've gathered a handful of guidelines on what to expect when running AutoML software such as TPOT.\n",
"\n",
"#### AUTOML ALGORITHMS AREN'T INTENDED TO RUN FOR ONLY A FEW MINUTES\n",
"Of course, you can run TPOT for only a few minutes and it will find a reasonably good pipeline for your dataset. However, if you don't run TPOT for long enough, it may not find the best possible pipeline for your dataset. It may even not find any suitable pipeline at all, in which case a RuntimeError('A pipeline has not yet been optimized. Please call fit() first.') will be raised. Often it is worthwhile to run multiple instances of TPOT in parallel for a long time (hours to days) to allow TPOT to thoroughly search the pipeline space for your dataset.\n",
"\n",
"#### AUTOML ALGORITHMS CAN TAKE A LONG TIME TO FINISH THEIR SEARCH\n",
"AutoML algorithms aren't as simple as fitting one model on the dataset; they are considering multiple machine learning algorithms (random forests, linear models, SVMs, etc.) in a pipeline with multiple preprocessing steps (missing value imputation, scaling, PCA, feature selection, etc.), the hyperparameters for all of the models and preprocessing steps, as well as multiple ways to ensemble or stack the algorithms within the pipeline.\n",
"\n",
"As such, TPOT will take a while to run on larger datasets, but it's important to realize why. With the default TPOT settings (100 generations with 100 population size), TPOT will evaluate 10,000 pipeline configurations before finishing. To put this number into context, think about a grid search of 10,000 hyperparameter combinations for a machine learning algorithm and how long that grid search will take. That is 10,000 model configurations to evaluate with 10-fold cross-validation, which means that roughly 100,000 models are fit and evaluated on the training data in one grid search. That's a time-consuming procedure, even for simpler models like decision trees.\n",
"\n",
"Typical TPOT runs will take hours to days to finish (unless it's a small dataset), but you can always interrupt the run partway through and see the best results so far. TPOT also provides a warm_start parameter that lets you restart a TPOT run from where it left off.\n",
"\n",
"#### AUTOML ALGORITHMS CAN RECOMMEND DIFFERENT SOLUTIONS FOR THE SAME DATASET\n",
"If you're working with a reasonably complex dataset or run TPOT for a short amount of time, different TPOT runs may result in different pipeline recommendations. TPOT's optimization algorithm is stochastic in nature, which means that it uses randomness (in part) to search the possible pipeline space. When two TPOT runs recommend different pipelines, this means that the TPOT runs didn't converge due to lack of time or that multiple pipelines perform more-or-less the same on your dataset.\n",
"\n",
"This is actually an advantage over fixed grid search techniques: TPOT is meant to be an assistant that gives you ideas on how to solve a particular machine learning problem by exploring pipeline configurations that you might have never considered, then leaves the fine-tuning to more constrained parameter tuning techniques such as grid search."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# TPOT with code\n",
"\n",
"We've taken care to design the TPOT interface to be as similar as possible to scikit-learn.\n",
"\n",
"TPOT can be imported just like any regular Python module. To import TPOT, type:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from tpot2 import TPOTClassifier"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"then create an instance of TPOT as follows:"
]
},
{
"cell_type": "code",
2024-09-23 19:45:04 -07:00
"execution_count": 2,
2024-09-20 14:48:56 -07:00
"metadata": {},
"outputs": [],
"source": [
"classification_optimizer = TPOTClassifier()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It's also possible to use TPOT for regression problems with the TPOTRegressor class. Other than the class name, a TPOTRegressor is used the same way as a TPOTClassifier. You can read more about the TPOTClassifier and TPOTRegressor classes in the API documentation."
]
},
{
"cell_type": "code",
2024-09-23 19:45:04 -07:00
"execution_count": 3,
2024-09-20 14:48:56 -07:00
"metadata": {},
"outputs": [],
"source": [
"from tpot2 import TPOTRegressor\n",
"regression_optimizer = TPOTRegressor()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Fitting a TPOT model works exactly like any other sklearn estimator. Some example code with custom TPOT parameters might look like:"
]
},
{
"cell_type": "code",
2024-09-23 19:45:04 -07:00
"execution_count": 5,
2024-09-20 14:48:56 -07:00
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
2024-09-23 19:45:04 -07:00
"Generation: : 3it [00:31, 10.38s/it]\n"
2024-09-20 14:48:56 -07:00
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"auroc_score: 0.9904100529100529\n"
]
}
],
"source": [
"import sklearn\n",
"import sklearn.datasets\n",
"import sklearn.metrics\n",
"\n",
2024-09-23 19:45:04 -07:00
"classification_optimizer = TPOTClassifier(search_space=\"linear-light\", max_time_mins=30/60, n_jobs=30, cv=5)\n",
2024-09-20 14:48:56 -07:00
"\n",
"X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)\n",
"X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, random_state=1, test_size=0.2)\n",
"\n",
"classification_optimizer.fit(X_train, y_train)\n",
"\n",
"auroc_score = sklearn.metrics.roc_auc_score(y_test, classification_optimizer.predict_proba(X_test)[:,1])\n",
"print(\"auroc_score: \", auroc_score)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Scorers, Objective Functions, and multi objective optimization.\n",
"\n",
"There are two ways of passing objectives into TPOT2. \n",
"\n",
2024-09-20 19:44:59 -07:00
"1. `scorers`: Scorers are functions that have the signature (estimator, X_test, y_test) and take in estimators that are expected to be fitted to training data. These can be produced with the [sklearn.metrics.make_scorer](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html) function. This function is used to evaluate the test folds during cross validation (defined in the `cv` parameter). These are passed into TPOT2 via the scorers parameter. This can take in the scorer itself or the string corresponding to a scoring function ([as listed here](https://scikit-learn.org/stable/modules/model_evaluation.html)). TPOT2 also supports passing in a list of several scorers for multi-objective optimization. For each fold of CV, TPOT only fits the estimator once, then evaluates all provided scorers in a loop.\n",
2024-09-20 14:48:56 -07:00
"\n",
2024-09-20 19:44:59 -07:00
"2. `other_objective_functions` : Other objective functions in TPOT2 have the signature (estimator) and returns a float or list of floats. These get passed a single unfitted estimator once, outside of cross validation. The user may choose to fit the pipeline within this objective function as well.\n",
2024-09-20 14:48:56 -07:00
"\n",
"\n",
"\n",
2024-09-20 19:44:59 -07:00
"Each scorer and objective function must be accompanied by a list of weights corresponding to the list of objectives, these are `scorers_weights` and `other_objective_function_weights`, respectively. By default, TPOT2 maximizes objective functions (this can be changed by `bigger_is_better=False`). Positive weights means that TPOT2 will seek to maximize that objective, and negative weights correspond to minimization. For most selectors (and the default), only the sign matters. The scale of the weight may matter if using a custom selection function for the optimization algorithm. A zero weight means that the score will not have an impact on the selection algorithm.\n",
2024-09-20 14:48:56 -07:00
"\n",
"Here is an example of using two scorers\n",
"\n",
" scorers=['roc_auc_ovr',tpot2.objectives.complexity_scorer],\n",
" scorers_weights=[1,-1],\n",
"\n",
"\n",
"Here is an example with a scorer and a secondary objective function\n",
"\n",
" scorers=['roc_auc_ovr'],\n",
" scorers_weights=[1],\n",
" other_objective_functions=[tpot2.objectives.number_of_leaves_objective],\n",
" other_objective_functions_weights=[-1],\n",
"\n",
"\n",
2024-09-20 19:44:59 -07:00
"TPOT will always automatically name the scorers based on the function name for the columns in the final results dataframe. TPOT will use the function name as the column name for `other_objective_functions`. However, if you would like to specify custom column names, you can set the `objective_function_names` to be a list of names (str) for each value returned by the function in `other_objective_functions`. This can be useful if your additional functions return more than one value per function.\n",
2024-09-20 14:48:56 -07:00
"\n",
2024-09-20 19:44:59 -07:00
"It is possible to have either the scorer or other_objective_function to return multiple values. In that case, just make sure that the `scorers_weights` and `other_objective_function_weights` are the same length as the number of returned scores.\n",
2024-09-20 14:48:56 -07:00
"\n",
"\n",
"TPOT comes with a few additional built in objective functions you can use. The first table are objectives applied to fitted pipelines, and thus are passee into the `scorers` parameter. The second table are objective functions for the `other_objective_functions` param.\n",
"\n",
"Scorers:\n",
"| Function | Description |\n",
"| :--- | :----: |\n",
"| tpot2.objectives.complexity_scorer | Estimates the number of learned parameters across all classifiers and regressors in the pipelines. Additionally, currently transformers add 1 point and selectors add 0 points (since they don't affect the complexity of the \"final\" predictive pipeline.) |\n",
"\n",
"Other Objective Functions.\n",
"\n",
"| Function | Description |\n",
"| :--- | :----: |\n",
"| tpot2.objectives.average_path_length | Computes the average shortest path from all nodes to the root/final estimator (only supported for GraphPipeline) |\n",
"| tpot2.objectives.number_of_leaves_objective | Calculates the number of leaves (input nodes) in a GraphPipeline |\n",
"| tpot2.objectives.number_of_nodes_objective | Calculates the number of nodes in a pipeline (whether it is an scikit-learn Pipeline, GraphPipeline, Feature Union, or the previous nested within each other) |"
]
},
2024-09-20 19:44:59 -07:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Measuring Model Complexity\n",
2024-09-20 19:44:59 -07:00
"\n",
"When running TPOT, it can sometimes be beneficial to include a secondary objective that measures model complexity. More complex models can yield higher performance but this comes at the cost of interpretability. Simpler models may be more interpretable, but often have lower predictive performance. Sometimes, however, vast increases in complexity only marginally improve predictive performance. There may be other simpler and more interpretable pipelines with marginal performance decreases that could be acceptable for the increased interpretability. However, these pipelines are often missed by optimizing purely for performance. By including both performance and complexity as objective functions, TPOT will attempt to optimize the best pipeline for all complexity levels simultaneously. After optimization, the user will be able to see the complexity vs performance tradeoff and make the decision of which pipeline best suits their needs. \n",
"\n",
"Two methods of measuring complexity to consider would be `tpot2.objectives.number_of_nodes_objective` or `tpot2.objectives.complexity_scorer`. The number of nodes objective simply calculates the number of steps within a pipeline. This is a simple metric, however it does not differentiate between the complexity of different model types. For example, a simple LogisticRegression counts the same as the much more complex XGBoost. The complexity scorer tries to estimate the number of learned parameters included in the classifiers and regressors of the pipeline. It is challenging and potentially subjective how to exactly quantify and compare complexity between different classes of models. However, this function provides a reasonable heuristic for the evolutionary algorithm that at least separates out qualitatively more or less complex algorithms from one another. While it may be hard to exactly compare the relative complexities of LogisticRegression and XGBoost, for example, both will always be on opposite ends of the complexity values returned by this function. This allows for pareto fronts with LogisticRegression on one side, and XGBoost on the other.\n",
"\n",
"An example of this analysis is demonstrated in a following section."
]
},
2024-09-20 14:48:56 -07:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Built In Configurations\n",
"TPOT can be used to optimize hyperparameters, select models, and optimize pipelines of models including determining the sequence of steps. Tutorial 2 goes into more detail on how to customize search spaces with custom hyperparameter ranges, model types, and possible pipeline configurations. TPOT also comes with a handful of default operators and parameter configurations that we believe work well for optimizing machine learning pipelines. Below is a list of the current built-in configurations that come with TPOT. These can be passed in as strings to the `search space` parameter of any of the TPOT estimators.\n",
"\n",
"| String | Description |\n",
"| :--- | :----: |\n",
"| linear | A linear pipeline with the structure of \"Selector->(transformers+Passthrough)->(classifiers/regressors+Passthrough)->final classifier/regressor.\" For both the transformer and inner estimator layers, TPOT may choose one or more transformers/classifiers, or it may choose none. The inner classifier/regressor layer is optional. |\n",
2024-09-23 10:43:58 -07:00
"| linear-light | Same search space as linear, but without the inner classifier/regressor layer and with a reduced set of faster running estimators. |\n",
2024-09-20 14:48:56 -07:00
"| graph | TPOT will optimize a pipeline in the shape of a directed acyclic graph. The nodes of the graph can include selectors, scalers, transformers, or classifiers/regressors (inner classifiers/regressors can optionally be not included). This will return a custom GraphPipeline rather than an sklearn Pipeline. More details in Tutorial 6. |\n",
2024-09-23 10:43:58 -07:00
"| graph-light | Same as graph search space, but without the inner classifier/regressors and with a reduced set of faster running estimators. |\n",
2024-09-20 14:48:56 -07:00
"| mdr |TPOT will search over a series of feature selectors and Multifactor Dimensionality Reduction models to find a series of operators that maximize prediction accuracy. The TPOT MDR configuration is specialized for genome-wide association studies (GWAS), and is described in detail online here.\n",
"\n",
"Note that TPOT MDR may be slow to run because the feature selection routines are computationally expensive, especially on large datasets. |\n",
"\n",
"Note: the `linear` and `graph` configurations by default allow for additional stacked classifiers/regressors within the pipeline in addition to the final classifier/regressor. If you would like to disable this, you can manually get the search space without inner classifier/regressors through the function `tpot2.config.template_search_spaces.get_template_search_spaces` with `inner_predictios=False`. You can pass the resulting search space into the `search space` param.\n",
"\n",
"The specific hyperparameter ranges used by TPOT can be found in files in the tpot2/config folder. The template search spaces listed above are defined in tpot2/config/template_search_spaces.py. Search spaces for individual models can be acquired in the tpot2/config/get_configspace.py file (`tpot2.config.get_search_space`). More details in Tutorial 2."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Terminating Optimization (Early Stopping)\n",
2024-09-20 15:27:26 -07:00
"\n",
"Note that we use a short time duration for a quick example, but in practice you may need to run TPOT for a longer duration. by default, TPOT sets a time limit of 1 hour with a max limit of 5 minutes per pipeline. In practice you may want to increase these values.\n",
"\n",
"There are three methods of terminating a TPOT run and ending the optimization process. TPOT will always terminate as soon as one of the conditions is met.\n",
"* `max_time_mins` : (Default, 1 hour) After this many minutes, TPOT will terminate and return the best pipeline it found so far.\n",
"* `early_stop` : An int causes TPOT to terminate early if it goes that number of generations without seeing an improvement in performance. Generally a value of around 5 to 20 is sufficient to be reasonably sure that performance has converged.\n",
"* `generations` : The total number of generations of the evolutionary algorithm to run.\n",
"\n",
"By default, TPOT will run until the time limit is up, with no generation or early stop limits."
2024-09-20 14:48:56 -07:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Best Practices and tips:\n",
2024-09-20 14:48:56 -07:00
"\n",
"* You can use the `early_stop` parameter to have TPOT terminate early. \n",
"* When running tpot from an .py script, it is important to protect code with `if __name__==\"__main__\":` . This is because of how TPOT handles parallelization with Python and Dask."
2024-09-20 14:48:56 -07:00
]
},
{
"cell_type": "code",
2024-09-23 19:45:04 -07:00
"execution_count": 6,
2024-09-20 14:48:56 -07:00
"metadata": {},
2024-09-23 19:45:04 -07:00
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Generation: : 5it [03:23, 40.66s/it]\n",
"/home/perib/miniconda3/envs/myenv/lib/python3.10/site-packages/sklearn/feature_selection/_univariate_selection.py:112: UserWarning: Features [ 0 32 39] are constant.\n",
" warnings.warn(\"Features %s are constant.\" % constant_features_idx, UserWarning)\n",
"/home/perib/miniconda3/envs/myenv/lib/python3.10/site-packages/sklearn/feature_selection/_univariate_selection.py:113: RuntimeWarning: invalid value encountered in divide\n",
" f = msb / msw\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.9985192724823948\n"
]
}
],
2024-09-20 14:48:56 -07:00
"source": [
"#my_analysis.py\n",
"\n",
"from dask.distributed import Client, LocalCluster\n",
"import tpot2\n",
"import sklearn\n",
"import sklearn.datasets\n",
"import numpy as np\n",
"\n",
"if __name__==\"__main__\":\n",
" scorer = sklearn.metrics.get_scorer('roc_auc_ovo')\n",
" X, y = sklearn.datasets.load_digits(return_X_y=True)\n",
" X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, train_size=0.75, test_size=0.25)\n",
"\n",
"\n",
" est = tpot2.TPOTClassifier(n_jobs=4, max_time_mins=60, verbose=2)\n",
" est.fit(X_train, y_train)\n",
"\n",
"\n",
" print(scorer(est, X_test, y_test))"
]
},
2024-09-20 15:27:26 -07:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
2024-09-20 19:44:59 -07:00
"# Example analysis and the Estimator class \n",
"\n",
"Here we use a toy example dataset included in scikit-learn. We will use the `light` configuration and the `complexity_scorer` to estimate complexity.\n",
"\n",
"Note, for this toy example, we set a relatively short run time. In practice, we would recommend running TPOT for a longer duration with an `early_stop` value of around 5 to 20 (more details below)."
2024-09-20 15:27:26 -07:00
]
},
{
"cell_type": "code",
2024-09-23 19:45:04 -07:00
"execution_count": 10,
2024-09-20 15:27:26 -07:00
"metadata": {},
"outputs": [
{
2024-09-20 19:44:59 -07:00
"name": "stderr",
"output_type": "stream",
"text": [
2024-09-23 19:45:04 -07:00
"Generation: : 4it [01:39, 24.97s/it]\n",
"/home/perib/miniconda3/envs/myenv/lib/python3.10/site-packages/sklearn/linear_model/_sag.py:349: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge\n",
2024-09-20 19:44:59 -07:00
" warnings.warn(\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
2024-09-23 19:45:04 -07:00
"0.9948227797690163\n"
2024-09-20 15:27:26 -07:00
]
}
],
"source": [
"#my_analysis.py\n",
"\n",
"from dask.distributed import Client, LocalCluster\n",
"import tpot2\n",
"import sklearn\n",
"import sklearn.datasets\n",
"import numpy as np\n",
"\n",
"import tpot2.objectives\n",
"\n",
"\n",
"scorer = sklearn.metrics.get_scorer('roc_auc_ovr')\n",
"\n",
2024-09-20 19:44:59 -07:00
"X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)\n",
2024-09-20 15:27:26 -07:00
"X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, train_size=0.75, test_size=0.25)\n",
"\n",
"\n",
"est = tpot2.TPOTClassifier(\n",
" scorers=[scorer, tpot2.objectives.complexity_scorer],\n",
2024-09-20 19:44:59 -07:00
" scorers_weights=[1.0, -1.0],\n",
2024-09-20 15:27:26 -07:00
"\n",
2024-09-23 19:45:04 -07:00
" search_space=\"linear-light\",\n",
2024-09-20 15:27:26 -07:00
" n_jobs=4, \n",
" max_time_mins=60, \n",
" max_eval_time_mins=10,\n",
2024-09-20 15:27:26 -07:00
" early_stop=2,\n",
" verbose=2,)\n",
"est.fit(X_train, y_train)\n",
"\n",
"print(scorer(est, X_test, y_test))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2024-09-20 19:44:59 -07:00
"You can access the best pipeline selected by TPOT with the `fitted_pipeline_` attribute. This is the pipeline with the highest cross validation score (on the first scorer, or first objective function if no scorer is provided.)"
]
},
{
"cell_type": "code",
2024-09-23 19:45:04 -07:00
"execution_count": 11,
2024-09-20 19:44:59 -07:00
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
2024-09-23 19:45:04 -07:00
"<style>#sk-container-id-1 {\n",
2024-09-20 19:44:59 -07:00
" /* Definition of color scheme common for light and dark mode */\n",
" --sklearn-color-text: black;\n",
" --sklearn-color-line: gray;\n",
" /* Definition of color scheme for unfitted estimators */\n",
" --sklearn-color-unfitted-level-0: #fff5e6;\n",
" --sklearn-color-unfitted-level-1: #f6e4d2;\n",
" --sklearn-color-unfitted-level-2: #ffe0b3;\n",
" --sklearn-color-unfitted-level-3: chocolate;\n",
" /* Definition of color scheme for fitted estimators */\n",
" --sklearn-color-fitted-level-0: #f0f8ff;\n",
" --sklearn-color-fitted-level-1: #d4ebff;\n",
" --sklearn-color-fitted-level-2: #b3dbfd;\n",
" --sklearn-color-fitted-level-3: cornflowerblue;\n",
"\n",
" /* Specific color for light theme */\n",
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, white)));\n",
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
" --sklearn-color-icon: #696969;\n",
"\n",
" @media (prefers-color-scheme: dark) {\n",
" /* Redefinition of color scheme for dark theme */\n",
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, #111)));\n",
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
" --sklearn-color-icon: #878787;\n",
" }\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 {\n",
2024-09-20 19:44:59 -07:00
" color: var(--sklearn-color-text);\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 pre {\n",
2024-09-20 19:44:59 -07:00
" padding: 0;\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 input.sk-hidden--visually {\n",
2024-09-20 19:44:59 -07:00
" border: 0;\n",
" clip: rect(1px 1px 1px 1px);\n",
" clip: rect(1px, 1px, 1px, 1px);\n",
" height: 1px;\n",
" margin: -1px;\n",
" overflow: hidden;\n",
" padding: 0;\n",
" position: absolute;\n",
" width: 1px;\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-dashed-wrapped {\n",
2024-09-20 19:44:59 -07:00
" border: 1px dashed var(--sklearn-color-line);\n",
" margin: 0 0.4em 0.5em 0.4em;\n",
" box-sizing: border-box;\n",
" padding-bottom: 0.4em;\n",
" background-color: var(--sklearn-color-background);\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-container {\n",
2024-09-20 19:44:59 -07:00
" /* jupyter's `normalize.less` sets `[hidden] { display: none; }`\n",
" but bootstrap.min.css set `[hidden] { display: none !important; }`\n",
" so we also need the `!important` here to be able to override the\n",
" default hidden behavior on the sphinx rendered scikit-learn.org.\n",
" See: https://github.com/scikit-learn/scikit-learn/issues/21755 */\n",
" display: inline-block !important;\n",
" position: relative;\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-text-repr-fallback {\n",
2024-09-20 19:44:59 -07:00
" display: none;\n",
"}\n",
"\n",
"div.sk-parallel-item,\n",
"div.sk-serial,\n",
"div.sk-item {\n",
" /* draw centered vertical line to link estimators */\n",
" background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));\n",
" background-size: 2px 100%;\n",
" background-repeat: no-repeat;\n",
" background-position: center center;\n",
"}\n",
"\n",
"/* Parallel-specific style estimator block */\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-parallel-item::after {\n",
2024-09-20 19:44:59 -07:00
" content: \"\";\n",
" width: 100%;\n",
" border-bottom: 2px solid var(--sklearn-color-text-on-default-background);\n",
" flex-grow: 1;\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-parallel {\n",
2024-09-20 19:44:59 -07:00
" display: flex;\n",
" align-items: stretch;\n",
" justify-content: center;\n",
" background-color: var(--sklearn-color-background);\n",
" position: relative;\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-parallel-item {\n",
2024-09-20 19:44:59 -07:00
" display: flex;\n",
" flex-direction: column;\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-parallel-item:first-child::after {\n",
2024-09-20 19:44:59 -07:00
" align-self: flex-end;\n",
" width: 50%;\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-parallel-item:last-child::after {\n",
2024-09-20 19:44:59 -07:00
" align-self: flex-start;\n",
" width: 50%;\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-parallel-item:only-child::after {\n",
2024-09-20 19:44:59 -07:00
" width: 0;\n",
"}\n",
"\n",
"/* Serial-specific style estimator block */\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-serial {\n",
2024-09-20 19:44:59 -07:00
" display: flex;\n",
" flex-direction: column;\n",
" align-items: center;\n",
" background-color: var(--sklearn-color-background);\n",
" padding-right: 1em;\n",
" padding-left: 1em;\n",
"}\n",
"\n",
"\n",
"/* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is\n",
"clickable and can be expanded/collapsed.\n",
"- Pipeline and ColumnTransformer use this feature and define the default style\n",
"- Estimators will overwrite some part of the style using the `sk-estimator` class\n",
"*/\n",
"\n",
"/* Pipeline and ColumnTransformer style (default) */\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-toggleable {\n",
2024-09-20 19:44:59 -07:00
" /* Default theme specific background. It is overwritten whether we have a\n",
" specific estimator or a Pipeline/ColumnTransformer */\n",
" background-color: var(--sklearn-color-background);\n",
"}\n",
"\n",
"/* Toggleable label */\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 label.sk-toggleable__label {\n",
2024-09-20 19:44:59 -07:00
" cursor: pointer;\n",
" display: block;\n",
" width: 100%;\n",
" margin-bottom: 0;\n",
" padding: 0.5em;\n",
" box-sizing: border-box;\n",
" text-align: center;\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 label.sk-toggleable__label-arrow:before {\n",
2024-09-20 19:44:59 -07:00
" /* Arrow on the left of the label */\n",
" content: \"▸\";\n",
" float: left;\n",
" margin-right: 0.25em;\n",
" color: var(--sklearn-color-icon);\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 label.sk-toggleable__label-arrow:hover:before {\n",
2024-09-20 19:44:59 -07:00
" color: var(--sklearn-color-text);\n",
"}\n",
"\n",
"/* Toggleable content - dropdown */\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-toggleable__content {\n",
2024-09-20 19:44:59 -07:00
" max-height: 0;\n",
" max-width: 0;\n",
" overflow: hidden;\n",
" text-align: left;\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-0);\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-toggleable__content.fitted {\n",
2024-09-20 19:44:59 -07:00
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-0);\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-toggleable__content pre {\n",
2024-09-20 19:44:59 -07:00
" margin: 0.2em;\n",
" border-radius: 0.25em;\n",
" color: var(--sklearn-color-text);\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-0);\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-toggleable__content.fitted pre {\n",
2024-09-20 19:44:59 -07:00
" /* unfitted */\n",
" background-color: var(--sklearn-color-fitted-level-0);\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 input.sk-toggleable__control:checked~div.sk-toggleable__content {\n",
2024-09-20 19:44:59 -07:00
" /* Expand drop-down */\n",
" max-height: 200px;\n",
" max-width: 100%;\n",
" overflow: auto;\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {\n",
2024-09-20 19:44:59 -07:00
" content: \"▾\";\n",
"}\n",
"\n",
"/* Pipeline/ColumnTransformer-specific style */\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
2024-09-20 19:44:59 -07:00
" color: var(--sklearn-color-text);\n",
" background-color: var(--sklearn-color-unfitted-level-2);\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
2024-09-20 19:44:59 -07:00
" background-color: var(--sklearn-color-fitted-level-2);\n",
"}\n",
"\n",
"/* Estimator-specific style */\n",
"\n",
"/* Colorize estimator box */\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
2024-09-20 19:44:59 -07:00
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-2);\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
2024-09-20 19:44:59 -07:00
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-2);\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-label label.sk-toggleable__label,\n",
"#sk-container-id-1 div.sk-label label {\n",
2024-09-20 19:44:59 -07:00
" /* The background is the default theme color */\n",
" color: var(--sklearn-color-text-on-default-background);\n",
"}\n",
"\n",
"/* On hover, darken the color of the background */\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-label:hover label.sk-toggleable__label {\n",
2024-09-20 19:44:59 -07:00
" color: var(--sklearn-color-text);\n",
" background-color: var(--sklearn-color-unfitted-level-2);\n",
"}\n",
"\n",
"/* Label box, darken color on hover, fitted */\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-label.fitted:hover label.sk-toggleable__label.fitted {\n",
2024-09-20 19:44:59 -07:00
" color: var(--sklearn-color-text);\n",
" background-color: var(--sklearn-color-fitted-level-2);\n",
"}\n",
"\n",
"/* Estimator label */\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-label label {\n",
2024-09-20 19:44:59 -07:00
" font-family: monospace;\n",
" font-weight: bold;\n",
" display: inline-block;\n",
" line-height: 1.2em;\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-label-container {\n",
2024-09-20 19:44:59 -07:00
" text-align: center;\n",
"}\n",
"\n",
"/* Estimator-specific */\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-estimator {\n",
2024-09-20 19:44:59 -07:00
" font-family: monospace;\n",
" border: 1px dotted var(--sklearn-color-border-box);\n",
" border-radius: 0.25em;\n",
" box-sizing: border-box;\n",
" margin-bottom: 0.5em;\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-0);\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-estimator.fitted {\n",
2024-09-20 19:44:59 -07:00
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-0);\n",
"}\n",
"\n",
"/* on hover */\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-estimator:hover {\n",
2024-09-20 19:44:59 -07:00
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-2);\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 div.sk-estimator.fitted:hover {\n",
2024-09-20 19:44:59 -07:00
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-2);\n",
"}\n",
"\n",
"/* Specification for estimator info (e.g. \"i\" and \"?\") */\n",
"\n",
"/* Common style for \"i\" and \"?\" */\n",
"\n",
".sk-estimator-doc-link,\n",
"a:link.sk-estimator-doc-link,\n",
"a:visited.sk-estimator-doc-link {\n",
" float: right;\n",
" font-size: smaller;\n",
" line-height: 1em;\n",
" font-family: monospace;\n",
" background-color: var(--sklearn-color-background);\n",
" border-radius: 1em;\n",
" height: 1em;\n",
" width: 1em;\n",
" text-decoration: none !important;\n",
" margin-left: 1ex;\n",
" /* unfitted */\n",
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
" color: var(--sklearn-color-unfitted-level-1);\n",
"}\n",
"\n",
".sk-estimator-doc-link.fitted,\n",
"a:link.sk-estimator-doc-link.fitted,\n",
"a:visited.sk-estimator-doc-link.fitted {\n",
" /* fitted */\n",
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
" color: var(--sklearn-color-fitted-level-1);\n",
"}\n",
"\n",
"/* On hover */\n",
"div.sk-estimator:hover .sk-estimator-doc-link:hover,\n",
".sk-estimator-doc-link:hover,\n",
"div.sk-label-container:hover .sk-estimator-doc-link:hover,\n",
".sk-estimator-doc-link:hover {\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-3);\n",
" color: var(--sklearn-color-background);\n",
" text-decoration: none;\n",
"}\n",
"\n",
"div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,\n",
".sk-estimator-doc-link.fitted:hover,\n",
"div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,\n",
".sk-estimator-doc-link.fitted:hover {\n",
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-3);\n",
" color: var(--sklearn-color-background);\n",
" text-decoration: none;\n",
"}\n",
"\n",
"/* Span, style for the box shown on hovering the info icon */\n",
".sk-estimator-doc-link span {\n",
" display: none;\n",
" z-index: 9999;\n",
" position: relative;\n",
" font-weight: normal;\n",
" right: .2ex;\n",
" padding: .5ex;\n",
" margin: .5ex;\n",
" width: min-content;\n",
" min-width: 20ex;\n",
" max-width: 50ex;\n",
" color: var(--sklearn-color-text);\n",
" box-shadow: 2pt 2pt 4pt #999;\n",
" /* unfitted */\n",
" background: var(--sklearn-color-unfitted-level-0);\n",
" border: .5pt solid var(--sklearn-color-unfitted-level-3);\n",
"}\n",
"\n",
".sk-estimator-doc-link.fitted span {\n",
" /* fitted */\n",
" background: var(--sklearn-color-fitted-level-0);\n",
" border: var(--sklearn-color-fitted-level-3);\n",
"}\n",
"\n",
".sk-estimator-doc-link:hover span {\n",
" display: block;\n",
"}\n",
"\n",
"/* \"?\"-specific style due to the `<a>` HTML tag */\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 a.estimator_doc_link {\n",
2024-09-20 19:44:59 -07:00
" float: right;\n",
" font-size: 1rem;\n",
" line-height: 1em;\n",
" font-family: monospace;\n",
" background-color: var(--sklearn-color-background);\n",
" border-radius: 1rem;\n",
" height: 1rem;\n",
" width: 1rem;\n",
" text-decoration: none;\n",
" /* unfitted */\n",
" color: var(--sklearn-color-unfitted-level-1);\n",
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 a.estimator_doc_link.fitted {\n",
2024-09-20 19:44:59 -07:00
" /* fitted */\n",
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
" color: var(--sklearn-color-fitted-level-1);\n",
"}\n",
"\n",
"/* On hover */\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 a.estimator_doc_link:hover {\n",
2024-09-20 19:44:59 -07:00
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-3);\n",
" color: var(--sklearn-color-background);\n",
" text-decoration: none;\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-1 a.estimator_doc_link.fitted:hover {\n",
2024-09-20 19:44:59 -07:00
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-3);\n",
"}\n",
2024-09-23 19:45:04 -07:00
"</style><div id=\"sk-container-id-1\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>Pipeline(steps=[(&#x27;standardscaler&#x27;, StandardScaler()),\n",
" (&#x27;selectfwe&#x27;, SelectFwe(alpha=0.0001226579434)),\n",
" (&#x27;featureunion-1&#x27;,\n",
2024-09-20 19:44:59 -07:00
" FeatureUnion(transformer_list=[(&#x27;skiptransformer&#x27;,\n",
" SkipTransformer()),\n",
" (&#x27;passthrough&#x27;,\n",
" Passthrough())])),\n",
2024-09-23 19:45:04 -07:00
" (&#x27;featureunion-2&#x27;,\n",
2024-09-20 19:44:59 -07:00
" FeatureUnion(transformer_list=[(&#x27;skiptransformer&#x27;,\n",
" SkipTransformer()),\n",
" (&#x27;passthrough&#x27;,\n",
" Passthrough())])),\n",
2024-09-23 19:45:04 -07:00
" (&#x27;logisticregression&#x27;,\n",
" LogisticRegression(C=31921.0176296069, max_iter=1000, n_jobs=1,\n",
" solver=&#x27;saga&#x27;))])</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-1\" type=\"checkbox\" ><label for=\"sk-estimator-id-1\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">&nbsp;&nbsp;Pipeline<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.pipeline.Pipeline.html\">?<span>Documentation for Pipeline</span></a><span class=\"sk-estimator-doc-link fitted\">i<span>Fitted</span></span></label><div class=\"sk-toggleable__content fitted\"><pre>Pipeline(steps=[(&#x27;standardscaler&#x27;, StandardScaler()),\n",
" (&#x27;selectfwe&#x27;, SelectFwe(alpha=0.0001226579434)),\n",
" (&#x27;featureunion-1&#x27;,\n",
" FeatureUnion(transformer_list=[(&#x27;skiptransformer&#x27;,\n",
" SkipTransformer()),\n",
" (&#x27;passthrough&#x27;,\n",
" Passthrough())])),\n",
" (&#x27;featureunion-2&#x27;,\n",
" FeatureUnion(transformer_list=[(&#x27;skiptransformer&#x27;,\n",
" SkipTransformer()),\n",
" (&#x27;passthrough&#x27;,\n",
" Passthrough())])),\n",
" (&#x27;logisticregression&#x27;,\n",
" LogisticRegression(C=31921.0176296069, max_iter=1000, n_jobs=1,\n",
" solver=&#x27;saga&#x27;))])</pre></div> </div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-2\" type=\"checkbox\" ><label for=\"sk-estimator-id-2\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">&nbsp;StandardScaler<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.preprocessing.StandardScaler.html\">?<span>Documentation for StandardScaler</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>StandardScaler()</pre></div> </div></div><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-3\" type=\"checkbox\" ><label for=\"sk-estimator-id-3\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">&nbsp;SelectFwe<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.feature_selection.SelectFwe.html\">?<span>Documentation for SelectFwe</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>SelectFwe(alpha=0.0001226579434)</pre></div> </div></div><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-4\" type=\"checkbox\" ><label for=\"sk-estimator-id-4\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">&nbsp;featureunion-1: FeatureUnion<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.pipeline.FeatureUnion.html\">?<span>Documentation for featureunion-1: FeatureUnion</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>FeatureUnion(transformer_list=[(&#x27;skiptransformer&#x27;, SkipTransformer()),\n",
" (&#x27;passthrough&#x27;, Passthrough())])</pre></div> </div></div><div class=\"sk-parallel\"><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><label>skiptransformer</label></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-5\" type=\"checkbox\" ><label for=\"sk-estimator-id-5\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">SkipTransformer</label><div class=\"sk-toggleable__content fitted\"><pre>SkipTransformer()</pre></div> </div></div></div></div></div><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><label>passthrough</label></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-6\" type=\"checkbox\" ><label for=\"sk-estimator-id-6\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">Passthrough</label><div class=\"sk-toggleable__content fitted\"><pre>Passthrough()</pre></div> </div></div></div></div></div></div></div><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-7\" type=\"checkbox\" ><label for=\"sk-estimator-id-7\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">&nbsp;featureunion-2: FeatureUnion<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.pipeline.FeatureUnion.html\">?<span>Documentation for featureunion-2: FeatureUnion</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>FeatureUnion(transformer_list=[(&#x27;skiptransformer&#x27;, SkipTransformer()),\n",
" (&#x27;passthrough&#x27;, Passthrough())])</pre></div> </div></div><div class=\"sk-parallel\"><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><label>skiptransformer</label></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-8\" type=\"checkbox\" ><label for=\"sk-estimator-id-8\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">SkipTransformer</label><div class=\"sk-toggleable__content fitted\"><pre>SkipTransformer()</pre></div> </div></div></div></div></div><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><label>passthrough</label></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-9\" type=\"checkbox\" ><label for=\"sk-estimator-id-9\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">Passthrough</label><div class=\"sk-toggleable__content fitted\"><pre>Passthrough()</pre></div> </div></div></div></div></div></div></div><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-10\" type=\"checkbox\" ><label for=\"sk-estimator-id-10\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">&nbsp;LogisticRegression<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.linear_model.LogisticRegression.html\">?<span>Documentation for LogisticRegression</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>LogisticRegression(C=31921.0176296069, max_iter=1000, n_jobs=1, solver=&#x27;saga&#x27;)</pre></div> </div></div></div></div></div></div>"
2024-09-20 19:44:59 -07:00
],
"text/plain": [
2024-09-23 19:45:04 -07:00
"Pipeline(steps=[('standardscaler', StandardScaler()),\n",
" ('selectfwe', SelectFwe(alpha=0.0001226579434)),\n",
" ('featureunion-1',\n",
2024-09-20 19:44:59 -07:00
" FeatureUnion(transformer_list=[('skiptransformer',\n",
" SkipTransformer()),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
2024-09-23 19:45:04 -07:00
" ('featureunion-2',\n",
" FeatureUnion(transformer_list=[('skiptransformer',\n",
" SkipTransformer()),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
" ('logisticregression',\n",
" LogisticRegression(C=31921.0176296069, max_iter=1000, n_jobs=1,\n",
" solver='saga'))])"
2024-09-20 19:44:59 -07:00
]
},
2024-09-23 19:45:04 -07:00
"execution_count": 11,
2024-09-20 19:44:59 -07:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"best_pipeline = est.fitted_pipeline_\n",
"best_pipeline"
]
},
{
"cell_type": "code",
2024-09-23 19:45:04 -07:00
"execution_count": 12,
2024-09-20 19:44:59 -07:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
2024-09-23 19:45:04 -07:00
"array([1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1,\n",
" 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1,\n",
" 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0,\n",
" 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1,\n",
" 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0,\n",
" 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0,\n",
" 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1])"
2024-09-20 19:44:59 -07:00
]
},
2024-09-23 19:45:04 -07:00
"execution_count": 12,
2024-09-20 19:44:59 -07:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"best_pipeline.predict(X_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Saving the Pipeline\n",
2024-09-20 19:44:59 -07:00
"\n",
"We recommend using dill or pickle to save the instance of the fitted_pipeline_. Note that we do not recommend pickling the TPOT object itself."
]
},
{
"cell_type": "code",
2024-09-23 19:45:04 -07:00
"execution_count": 13,
2024-09-20 19:44:59 -07:00
"metadata": {},
"outputs": [],
"source": [
"import dill as pickle\n",
"with open(\"best_pipeline.pkl\", \"wb\") as f:\n",
" pickle.dump(best_pipeline, f)\n",
"\n",
"#load the pipeline\n",
"import dill as pickle\n",
"with open(\"best_pipeline.pkl\", \"rb\") as f:\n",
" my_loaded_best_pipeline = pickle.load(f)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The evaluated_individuals Dataframe - Further analysis of results\n",
2024-09-20 15:27:26 -07:00
"\n",
"The `evaluated_individuals` attribute of the tpot estimator object is a Pandas Dataframe containing information about a run. Each row corresponds to an individual pipeline explored by tpot. The dataframe contains the following columns:\n",
"\n",
"| Column | Description |\n",
"| :--- | :----: |\n",
"| <n objective function columns> | The first set of columns will correspond to each objective function. These can either be automatically named by TPOT, or passed in by the user. |\n",
"| Parents | This contains a tuple that contains the indexes of the 'parents' of the current pipeline. For example, (29, 42) means that the pipelines in indexes 29 and 42 were utilized to generate that pipeline. |\n",
2024-09-20 19:44:59 -07:00
"| Variation_Function | The function applied to the parents to generate the new pipeline |\n",
"| Individual | The individual class that represents a specific pipeline and hyperparameter configuration. This class also contains functions for mutation and crossover. To get the sklearn estimator/pipeline object from the individual you can call the `export_pipeline()` function. (as in, `pipe = ind.export_pipeline()`) |\n",
"| Generation | The generation where the individual was created. (Note that the higher performing pipelines from previous generations may still be present in the current \"population\" of a given generation if selected.) |\n",
"| Submitted Timestamp | Timestamp, in seconds, at which the pipeline was sent to be evaluated. This is the output of time.time(), which is \"Return the time in seconds since the epoch as a floating-point number. \" |\n",
"| Completed Timestamp | Timestamp at which the pipeline evaluation completed in the same units as Submitted Timestamp |\n",
"| Pareto_Front\t | If you have multiple parameters, this column is True if the pipeline performance fall on the pareto front line. This is the set of pipelines with scores that are strictly better than pipelines not on the line, but not strictly better than one another. |\n",
"| Instance | This contains the unfitted pipeline evaluated for this row. (This is the pipeline returned by calling the export_pipeline() function of the individual class) |\n"
2024-09-20 15:27:26 -07:00
]
},
{
"cell_type": "code",
2024-09-23 19:45:04 -07:00
"execution_count": 14,
2024-09-20 19:44:59 -07:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['roc_auc_score', 'complexity_scorer']"
]
},
2024-09-23 19:45:04 -07:00
"execution_count": 14,
2024-09-20 19:44:59 -07:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#get the score/objective column names generated by TPOT\n",
"est.objective_names"
]
},
{
"cell_type": "code",
2024-09-23 19:45:04 -07:00
"execution_count": 15,
2024-09-20 15:27:26 -07:00
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>roc_auc_score</th>\n",
2024-09-20 19:44:59 -07:00
" <th>complexity_scorer</th>\n",
2024-09-20 15:27:26 -07:00
" <th>Parents</th>\n",
" <th>Variation_Function</th>\n",
" <th>Individual</th>\n",
" <th>Generation</th>\n",
" <th>Submitted Timestamp</th>\n",
" <th>Completed Timestamp</th>\n",
" <th>Eval Error</th>\n",
" <th>Pareto_Front</th>\n",
" <th>Instance</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
2024-09-20 19:44:59 -07:00
" <td>NaN</td>\n",
" <td>NaN</td>\n",
2024-09-20 15:27:26 -07:00
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>&lt;tpot2.search_spaces.pipelines.sequential.Sequ...</td>\n",
" <td>0.0</td>\n",
2024-09-23 19:45:04 -07:00
" <td>1.727136e+09</td>\n",
" <td>1.727136e+09</td>\n",
2024-09-20 19:44:59 -07:00
" <td>INVALID</td>\n",
2024-09-20 15:27:26 -07:00
" <td>NaN</td>\n",
2024-09-23 19:45:04 -07:00
" <td>(StandardScaler(), VarianceThreshold(threshold...</td>\n",
2024-09-20 15:27:26 -07:00
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
2024-09-23 19:45:04 -07:00
" <td>0.990146</td>\n",
" <td>10.0</td>\n",
2024-09-20 15:27:26 -07:00
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>&lt;tpot2.search_spaces.pipelines.sequential.Sequ...</td>\n",
" <td>0.0</td>\n",
2024-09-23 19:45:04 -07:00
" <td>1.727136e+09</td>\n",
" <td>1.727136e+09</td>\n",
2024-09-20 19:44:59 -07:00
" <td>None</td>\n",
2024-09-20 15:27:26 -07:00
" <td>NaN</td>\n",
2024-09-23 19:45:04 -07:00
" <td>(MaxAbsScaler(), VarianceThreshold(threshold=0...</td>\n",
2024-09-20 15:27:26 -07:00
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
2024-09-23 19:45:04 -07:00
" <td>NaN</td>\n",
" <td>NaN</td>\n",
2024-09-20 15:27:26 -07:00
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>&lt;tpot2.search_spaces.pipelines.sequential.Sequ...</td>\n",
" <td>0.0</td>\n",
2024-09-23 19:45:04 -07:00
" <td>1.727136e+09</td>\n",
" <td>1.727136e+09</td>\n",
" <td>INVALID</td>\n",
2024-09-20 15:27:26 -07:00
" <td>NaN</td>\n",
2024-09-23 19:45:04 -07:00
" <td>(StandardScaler(), VarianceThreshold(threshold...</td>\n",
2024-09-20 15:27:26 -07:00
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
2024-09-23 19:45:04 -07:00
" <td>0.961892</td>\n",
" <td>80.0</td>\n",
2024-09-20 15:27:26 -07:00
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>&lt;tpot2.search_spaces.pipelines.sequential.Sequ...</td>\n",
" <td>0.0</td>\n",
2024-09-23 19:45:04 -07:00
" <td>1.727136e+09</td>\n",
" <td>1.727136e+09</td>\n",
2024-09-20 15:27:26 -07:00
" <td>None</td>\n",
" <td>NaN</td>\n",
2024-09-23 19:45:04 -07:00
" <td>(RobustScaler(quantile_range=(0.2009793033711,...</td>\n",
2024-09-20 15:27:26 -07:00
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
2024-09-23 19:45:04 -07:00
" <td>0.955582</td>\n",
" <td>8.0</td>\n",
2024-09-20 15:27:26 -07:00
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>&lt;tpot2.search_spaces.pipelines.sequential.Sequ...</td>\n",
" <td>0.0</td>\n",
2024-09-23 19:45:04 -07:00
" <td>1.727136e+09</td>\n",
" <td>1.727136e+09</td>\n",
2024-09-20 19:44:59 -07:00
" <td>None</td>\n",
2024-09-20 15:27:26 -07:00
" <td>NaN</td>\n",
2024-09-23 19:45:04 -07:00
" <td>(MaxAbsScaler(), Passthrough(), FeatureUnion(t...</td>\n",
2024-09-20 15:27:26 -07:00
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
2024-09-20 19:44:59 -07:00
" <td>...</td>\n",
2024-09-20 15:27:26 -07:00
" </tr>\n",
" <tr>\n",
2024-09-23 19:45:04 -07:00
" <th>245</th>\n",
" <td>0.981354</td>\n",
" <td>11.0</td>\n",
" <td>(176, 176)</td>\n",
2024-09-20 19:44:59 -07:00
" <td>ind_mutate</td>\n",
2024-09-20 15:27:26 -07:00
" <td>&lt;tpot2.search_spaces.pipelines.sequential.Sequ...</td>\n",
2024-09-23 19:45:04 -07:00
" <td>4.0</td>\n",
" <td>1.727136e+09</td>\n",
" <td>1.727136e+09</td>\n",
2024-09-20 15:27:26 -07:00
" <td>None</td>\n",
" <td>NaN</td>\n",
2024-09-23 19:45:04 -07:00
" <td>(MinMaxScaler(), SelectFwe(alpha=0.00036066272...</td>\n",
2024-09-20 15:27:26 -07:00
" </tr>\n",
" <tr>\n",
2024-09-23 19:45:04 -07:00
" <th>246</th>\n",
" <td>0.972795</td>\n",
" <td>10.0</td>\n",
" <td>(145, 58)</td>\n",
" <td>ind_crossover</td>\n",
2024-09-20 15:27:26 -07:00
" <td>&lt;tpot2.search_spaces.pipelines.sequential.Sequ...</td>\n",
2024-09-23 19:45:04 -07:00
" <td>4.0</td>\n",
" <td>1.727136e+09</td>\n",
" <td>1.727136e+09</td>\n",
2024-09-20 19:44:59 -07:00
" <td>None</td>\n",
2024-09-20 15:27:26 -07:00
" <td>NaN</td>\n",
2024-09-23 19:45:04 -07:00
" <td>(MaxAbsScaler(), Passthrough(), FeatureUnion(t...</td>\n",
2024-09-20 15:27:26 -07:00
" </tr>\n",
" <tr>\n",
2024-09-23 19:45:04 -07:00
" <th>247</th>\n",
" <td>0.895754</td>\n",
" <td>6.0</td>\n",
" <td>(195, 195)</td>\n",
" <td>ind_mutate</td>\n",
2024-09-20 15:27:26 -07:00
" <td>&lt;tpot2.search_spaces.pipelines.sequential.Sequ...</td>\n",
2024-09-23 19:45:04 -07:00
" <td>4.0</td>\n",
" <td>1.727136e+09</td>\n",
" <td>1.727136e+09</td>\n",
2024-09-20 15:27:26 -07:00
" <td>None</td>\n",
" <td>NaN</td>\n",
2024-09-23 19:45:04 -07:00
" <td>(StandardScaler(), VarianceThreshold(threshold...</td>\n",
2024-09-20 15:27:26 -07:00
" </tr>\n",
" <tr>\n",
2024-09-23 19:45:04 -07:00
" <th>248</th>\n",
" <td>0.978311</td>\n",
" <td>7.0</td>\n",
" <td>(32, 32)</td>\n",
" <td>ind_mutate</td>\n",
2024-09-20 15:27:26 -07:00
" <td>&lt;tpot2.search_spaces.pipelines.sequential.Sequ...</td>\n",
2024-09-23 19:45:04 -07:00
" <td>4.0</td>\n",
" <td>1.727136e+09</td>\n",
" <td>1.727136e+09</td>\n",
2024-09-20 15:27:26 -07:00
" <td>None</td>\n",
" <td>NaN</td>\n",
2024-09-23 19:45:04 -07:00
" <td>(MaxAbsScaler(), SelectFwe(alpha=0.00140487405...</td>\n",
2024-09-20 15:27:26 -07:00
" </tr>\n",
" <tr>\n",
2024-09-23 19:45:04 -07:00
" <th>249</th>\n",
" <td>0.983915</td>\n",
" <td>9.0</td>\n",
" <td>(99, 99)</td>\n",
" <td>ind_mutate</td>\n",
2024-09-20 15:27:26 -07:00
" <td>&lt;tpot2.search_spaces.pipelines.sequential.Sequ...</td>\n",
2024-09-23 19:45:04 -07:00
" <td>4.0</td>\n",
" <td>1.727136e+09</td>\n",
" <td>1.727136e+09</td>\n",
" <td>None</td>\n",
2024-09-20 15:27:26 -07:00
" <td>NaN</td>\n",
2024-09-23 19:45:04 -07:00
" <td>(StandardScaler(), VarianceThreshold(threshold...</td>\n",
2024-09-20 15:27:26 -07:00
" </tr>\n",
" </tbody>\n",
"</table>\n",
2024-09-23 19:45:04 -07:00
"<p>250 rows × 11 columns</p>\n",
2024-09-20 15:27:26 -07:00
"</div>"
],
"text/plain": [
2024-09-23 19:45:04 -07:00
" roc_auc_score complexity_scorer Parents Variation_Function \\\n",
"0 NaN NaN NaN NaN \n",
"1 0.990146 10.0 NaN NaN \n",
"2 NaN NaN NaN NaN \n",
"3 0.961892 80.0 NaN NaN \n",
"4 0.955582 8.0 NaN NaN \n",
".. ... ... ... ... \n",
"245 0.981354 11.0 (176, 176) ind_mutate \n",
"246 0.972795 10.0 (145, 58) ind_crossover \n",
"247 0.895754 6.0 (195, 195) ind_mutate \n",
"248 0.978311 7.0 (32, 32) ind_mutate \n",
"249 0.983915 9.0 (99, 99) ind_mutate \n",
2024-09-20 15:27:26 -07:00
"\n",
" Individual Generation \\\n",
"0 <tpot2.search_spaces.pipelines.sequential.Sequ... 0.0 \n",
"1 <tpot2.search_spaces.pipelines.sequential.Sequ... 0.0 \n",
"2 <tpot2.search_spaces.pipelines.sequential.Sequ... 0.0 \n",
"3 <tpot2.search_spaces.pipelines.sequential.Sequ... 0.0 \n",
"4 <tpot2.search_spaces.pipelines.sequential.Sequ... 0.0 \n",
".. ... ... \n",
2024-09-23 19:45:04 -07:00
"245 <tpot2.search_spaces.pipelines.sequential.Sequ... 4.0 \n",
"246 <tpot2.search_spaces.pipelines.sequential.Sequ... 4.0 \n",
"247 <tpot2.search_spaces.pipelines.sequential.Sequ... 4.0 \n",
"248 <tpot2.search_spaces.pipelines.sequential.Sequ... 4.0 \n",
"249 <tpot2.search_spaces.pipelines.sequential.Sequ... 4.0 \n",
2024-09-20 15:27:26 -07:00
"\n",
" Submitted Timestamp Completed Timestamp Eval Error Pareto_Front \\\n",
2024-09-23 19:45:04 -07:00
"0 1.727136e+09 1.727136e+09 INVALID NaN \n",
"1 1.727136e+09 1.727136e+09 None NaN \n",
"2 1.727136e+09 1.727136e+09 INVALID NaN \n",
"3 1.727136e+09 1.727136e+09 None NaN \n",
"4 1.727136e+09 1.727136e+09 None NaN \n",
2024-09-20 15:27:26 -07:00
".. ... ... ... ... \n",
2024-09-23 19:45:04 -07:00
"245 1.727136e+09 1.727136e+09 None NaN \n",
"246 1.727136e+09 1.727136e+09 None NaN \n",
"247 1.727136e+09 1.727136e+09 None NaN \n",
"248 1.727136e+09 1.727136e+09 None NaN \n",
"249 1.727136e+09 1.727136e+09 None NaN \n",
2024-09-20 15:27:26 -07:00
"\n",
" Instance \n",
2024-09-23 19:45:04 -07:00
"0 (StandardScaler(), VarianceThreshold(threshold... \n",
"1 (MaxAbsScaler(), VarianceThreshold(threshold=0... \n",
"2 (StandardScaler(), VarianceThreshold(threshold... \n",
"3 (RobustScaler(quantile_range=(0.2009793033711,... \n",
"4 (MaxAbsScaler(), Passthrough(), FeatureUnion(t... \n",
2024-09-20 15:27:26 -07:00
".. ... \n",
2024-09-23 19:45:04 -07:00
"245 (MinMaxScaler(), SelectFwe(alpha=0.00036066272... \n",
"246 (MaxAbsScaler(), Passthrough(), FeatureUnion(t... \n",
"247 (StandardScaler(), VarianceThreshold(threshold... \n",
"248 (MaxAbsScaler(), SelectFwe(alpha=0.00140487405... \n",
"249 (StandardScaler(), VarianceThreshold(threshold... \n",
2024-09-20 15:27:26 -07:00
"\n",
2024-09-23 19:45:04 -07:00
"[250 rows x 11 columns]"
2024-09-20 19:44:59 -07:00
]
},
2024-09-23 19:45:04 -07:00
"execution_count": 15,
2024-09-20 19:44:59 -07:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = est.evaluated_individuals\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Lets plot the performances of the different pipelines including the pareto front\n",
2024-09-20 19:44:59 -07:00
"\n",
"Plotting the performance of multiple objectives in a scatterplot is a useful way to visualize the tradeoff between model complexity and predictive performance. This is best visualized when plotting the pareto front pipelines, which presents the best performing pipeline along the spectrum of complexity. Generally, higher complexity models may yield higher performance, but be more difficult to interpret. "
]
},
{
"cell_type": "code",
2024-09-23 19:45:04 -07:00
"execution_count": 16,
2024-09-20 19:44:59 -07:00
"metadata": {},
"outputs": [
{
"data": {
2024-09-23 19:45:04 -07:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAc0AAAHUCAYAAABYnHNOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpAABn3klEQVR4nO3deVxU9foH8M8wMBu7IAiKiIILLomppYBr6rVM7VrX1EzUVjU1K5fMJTU1KysNbbku9avMeyWtzBZzF72lhpm5oqiomIKyDjAw8/39QTMxzDAMw8wwwOf9evGqOXOWZ444j+ec7/d5JEIIASIiIqqSW20HQEREVFcwaRIREVmJSZOIiMhKTJpERERWYtIkIiKyEpMmERGRlZg0iYiIrMSkSUREZCUmTSIiIisxaZLNNm7cCIlEYvhxd3dHs2bNMH78eFy7ds2ux9JoNHjmmWcQEhICqVSKzp0723X/VLnNmzejffv2UCqVkEgkOH78eI33KZFIsHDhQsPrvXv3QiKRYO/evTXet15CQgJatGhht/1VxhmfhVyHe20HQHXfhg0b0LZtWxQWFmL//v1YtmwZ9u3bh99//x2enp52OcbatWvxwQcfYPXq1bj77rvh5eVll/2SZbdu3cLYsWPxj3/8A2vWrIFcLkfr1q1rOyyrzJs3D9OmTXP6cbt06YLDhw8jOjra6ccmx2PSpBrr0KEDunbtCgDo27cvtFotFi9ejG3btmHMmDE12rdarYZKpcLJkyehVCoxZcoUe4QMACgsLIRSqbTb/uqjc+fOoaSkBI899hh69+5d2+FUS6tWrWrluD4+Prj33ntr5djkeLw9S3an/8K4fPkyAEAIgTVr1qBz585QKpXw9/fHww8/jIsXLxpt16dPH3To0AH79+9Hz549oVKpMGHCBEgkEvz73/9GYWGh4Vbwxo0bAQBFRUWYM2cOIiIiIJPJ0LRpU0yePBnZ2dlG+27RogWGDBmCL7/8EjExMVAoFHj11VcNt9I+//xzzJo1CyEhIfDy8sKDDz6IP//8E3l5eXjqqacQGBiIwMBAjB8/Hvn5+Ub7TkxMRK9evRAUFARPT0907NgRK1asQElJidnPd+TIEcTHx0OlUqFly5ZYvnw5dDqd0brZ2dl44YUX0LJlS8jlcgQFBeH+++/HmTNnDOtoNBosWbIEbdu2hVwuR+PGjTF+/HjcunXLqj+nr7/+Gj169IBKpYK3tzcGDBiAw4cPG95PSEhAXFwcAGDkyJGQSCTo06dPpfu7desWJk2ahOjoaHh5eSEoKAj9+vXDgQMHrIrHGvpHAjt37sT48ePRqFEjeHp64sEHHzT5fTJ3e1YikWDKlCn44IMP0Lp1a8jlckRHR+OLL74wOdaNGzfw9NNPo1mzZpDJZIiIiMCrr76K0tJSizGauz2bkJAALy8vpKam4v7774eXlxfCwsLwwgsvoLi42Gh7a/9cd+/ejT59+iAgIABKpRLNmzfHiBEjoFarrTiTZCteaZLdpaamAgAaN24MAHj66aexceNGTJ06Fa+//jpu376NRYsWoWfPnvjtt98QHBxs2DYjIwOPPfYYZs6ciaVLl8LNzQ3Tp0/H4sWLsWfPHuzevRtA2VWEEALDhw/Hrl27MGfOHMTHx+PEiRNYsGABDh8+jMOHD0Mulxv2/euvv+L06dN45ZVXEBERAU9PTxQUFAAAXn75ZfTt2xcbN27EpUuX8OKLL2LUqFFwd3fHXXfdhU2bNiElJQUvv/wyvL29sWrVKsN+L1y4gNGjRxsS92+//YbXXnsNZ86cwfr1643OzY0bNzBmzBi88MILWLBgAbZu3Yo5c+YgNDQUjz/+OAAgLy8PcXFxuHTpEmbNmoV77rkH+fn52L9/PzIyMtC2bVvodDoMGzYMBw4cwMyZM9GzZ09cvnwZCxYsQJ8+fXD06FGLV9Gff/45xowZg4EDB2LTpk0oLi7GihUr0KdPH+zatQtxcXGYN28eunfvjsmTJ2Pp0qXo27cvfHx8Kt3n7du3AQALFixAkyZNkJ+fj61btxr2aSnhVtfEiRMxYMAAfP7550hPT8crr7yCPn364MSJE/Dz87O47ddff409e/Zg0aJF8PT0xJo1awx/1g8//DCAsj+n7t27w83NDfPnz0erVq1w+PBhLFmyBJcuXcKGDRuqHXNJSQmGDh2KiRMn4oUXXsD+/fuxePFi+Pr6Yv78+QBg9Z/rpUuX8MADDyA+Ph7r16+Hn58frl27hu+//x4ajQYqlara8ZGVBJGNNmzYIACI//3vf6KkpETk5eWJ7du3i8aNGwtvb29x48YNcfjwYQFAvPXWW0bbpqenC6VSKWbOnGlY1rt3bwFA7Nq1y+RY48aNE56enkbLvv/+ewFArFixwmj55s2bBQDx4YcfGpaFh4cLqVQqzp49a7Tunj17BADx4IMPGi2fPn26ACCmTp1qtHz48OGiUaNGlZ4TrVYrSkpKxCeffCKkUqm4ffu2yef7+eefjbaJjo4WgwYNMrxetGiRACB27txZ6XE2bdokAIikpCSj5UeOHBEAxJo1ayzGGBoaKjp27Ci0Wq1heV5enggKChI9e/Y0LNOfn//+97+V7q8ypaWloqSkRPTv31889NBDRu8BEAsWLDA5zp49eyzuU/87V3F/ycnJAoBYsmSJYdm4ceNEeHi4yXGVSqW4ceOGUZxt27YVkZGRhmVPP/208PLyEpcvXzba/s033xQAxB9//FGtzzJu3DgBQPznP/8x2t/9998v2rRpY3ht7Z/rli1bBABx/Phxc6eJHIi3Z6nG7r33Xnh4eMDb2xtDhgxBkyZN8N133yE4OBjbt2+HRCLBY489htLSUsNPkyZNcNddd5mMMPT390e/fv2sOq7+qjMhIcFo+SOPPAJPT0/s2rXLaHmnTp0qHcQyZMgQo9ft2rUDADzwwAMmy2/fvm10izYlJQVDhw5FQEAApFIpPDw88Pjjj0Or1eLcuXNG2zdp0gTdu3c3iUt/KxsAvvvuO7Ru3Rr33XdfZR8d27dvh5+fHx588EGj89q5c2c0adLE4sjNs2fP4vr16xg7dizc3P7+CvDy8sKIESPwv//9z+ZbfO+//z66dOkChUIBd3d3eHh4YNeuXTh9+rRN+6tMxWflPXv2RHh4OPbs2VPltv379ze6uyGVSjFy5Eikpqbi6tWrAMrOb9++fREaGmp0fgcPHgwA2LdvX7VjlkgkePDBB42WVfyzt/bPtXPnzpDJZHjqqafw8ccfm9yaJsfh7VmqsU8++QTt2rWDu7s7goODERISYnjvzz//hBDC6EuqvJYtWxq9Lr9tVbKysuDu7m64DawnkUjQpEkTZGVlWb3vRo0aGb2WyWQWlxcVFcHLywtXrlxBfHw82rRpg3fffRctWrSAQqHAL7/8gsmTJ6OwsNBo+4CAAJNjy+Vyo/Vu3bqF5s2bVxorUHZes7OzDfFUlJmZWem2+vNi7nyEhoZCp9Phzp071b7Ft3LlSrzwwgt45plnsHjxYgQGBkIqlWLevHl2T5pNmjQxu6zin3l1tgXKzk2zZs3w559/4ptvvoGHh4fZfVg6v5VRqVRQKBRGy+RyOYqKigyvrf1zbdWqFX766SesWLECkydPRkFBAVq2bImpU6fWyojhhoRJk2qsXbt2htGzFQUGBkIikeDAgQNGzxf1Ki6TSCRWHzcgIAClpaW4deuWUeIUQuDGjRvo1q2bzfu21rZt21BQUIAvv/wS4eHhhuU1mcvYuHFjwxVPZQIDAxEQEIDvv//e7Pve3t6VbqtP3BkZGSbvXb9+HW5ubvD3969GxGU+/fRT9OnTB2vXrjVanpeXV+19VeXGjRtml0VGRtq8LfD3uQkMDESnTp3w2muvmd1HaGhodcK1WnX+XOPj4xEfHw+tVoujR49i9erVmD59OoKDg/Hoo486JD7i6FlysCFDhkAIgWvXrqFr164mPx07drR53/379wdQ9mVdXlJSEgoKCgzvO5I+EZdP/kIIfPTRRzbvc/DgwTh37pzh9rM5Q4YMQVZWFrRardnz2qZNm0q3bdOmDZo2bYrPP/8
2024-09-20 19:44:59 -07:00
"text/plain": [
"<Figure size 500x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
2024-09-23 19:45:04 -07:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA0oAAAHUCAYAAAAEKdj3AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpAABOzElEQVR4nO3deXjM5/7/8deIZLJHiWxEBFG7qqW2WqqhlGqrv1YtR7TVBVVtHctxFKUoR3uU0t1yVGlLHYfW0iIoWmtba5UoJUosiSQkkdy/P3plvmYSkYwkEzwf1zXXZe77s7w/447m1fvzucdijDECAAAAANiUcnUBAAAAAFDSEJQAAAAAwAFBCQAAAAAcEJQAAAAAwAFBCQAAAAAcEJQAAAAAwAFBCQAAAAAcEJQAAAAAwAFBCQAAAAAcEJQAlChz5syRxWKxvUqXLq2KFSuqb9++OnHiRKGeKz09Xc8//7xCQ0Pl5uamu+66q1CPj2tbtGiRateuLS8vL1ksFu3evbvYzh0TE6PKlSsX2vFmzpypOXPm5Ghfv369LBaLvvzyy0I7V26OHj1q9zNTqlQplStXTp06ddKWLVuK9NzXcq3P5Ea1adPG7lqvfu3Zs6fQz5eXzZs3a8yYMbpw4UKxnhdA8Snt6gIAIDezZ89WjRo1dOnSJW3YsEETJ05UbGysfvnlF/n4+BTKOWbNmqX3339f06dPV8OGDeXr61sox0Xezpw5o969e+uBBx7QzJkzZbVaVb16dVeX5bSZM2cqMDBQMTExLq3jxRdfVI8ePZSZmam9e/dq7Nixatu2rbZs2aIGDRoUay1F+ZlUqVJFn376aY72qlWrFvq58rJ582aNHTtWMTExKlOmTLGeG0DxICgBKJHq1KmjRo0aSZLatm2rzMxMjRs3TkuXLlXPnj1v6Nipqany9vbWnj175OXlpYEDBxZGyZKkS5cuycvLq9COdyv69ddflZGRoV69eql169auLueWUalSJTVt2lSS1KJFC1WrVk3t2rXTzJkz9eGHH97QsbN/ZkoCLy8v23XmR0mqHcDNhVvvANwUsn8x+v333yVJxhjNnDlTd911l7y8vHTHHXfoscce05EjR+z2a9OmjerUqaMNGzaoefPm8vb21lNPPSWLxaKPPvpIly5dst26k32r0OXLlzVixAhFRkbKw8NDFSpU0IABA3LcYlO5cmV17txZS5YsUYMGDeTp6amxY8fabrlasGCBhg0bptDQUPn6+qpLly76888/dfHiRT377LMKDAxUYGCg+vbtq+TkZLtjv/vuu2rVqpWCgoLk4+OjunXravLkycrIyMj1+rZt26Z7771X3t7eqlKliiZNmqSsrCy7bS9cuKBXX31VVapUkdVqVVBQkDp16qQDBw7YtklPT9f48eNVo0YNWa1WlS9fXn379tWZM2fy9fe0bNkyNWvWTN7e3vLz81N0dLTd7V8xMTFq2bKlJOmJJ56QxWJRmzZt8jzmnj171LVrV91xxx3y9PTUXXfdpblz59ptk/2Zf/bZZxo5cqTCwsLk7++v+++/XwcPHszz+O3atVONGjVkjLFrN8aoWrVqevDBB6+5b+XKlbV3717FxsbaxpHjbX0ZGRn5qunbb79Vu3bt5O/vL29vb7Vo0ULfffddnrXnxfFnZtGiRWrfvr1CQ0Pl5eWlmjVravjw4UpJSbHbLyYmRr6+vvrll1/Uvn17+fn5qV27dpLyNz6u95kcO3ZMvXr1UlBQkKxWq2rWrKmpU6fmGK/OyKv2c+fOqX///qpQoYI8PDxUpUoVjRw5UmlpaXbHsFgsGjhwoP7zn/+oZs2a8vb2Vv369bV8+XLbNmPGjNHf//53SVJkZKTtOtevX3/D1wCgBDEAUILMnj3bSDLbtm2za582bZqRZD744ANjjDH9+vUz7u7u5tVXXzUrV640CxYsMDVq1DDBwcHm1KlTtv1at25typYta8LDw8306dPNunXrTGxsrNmyZYvp1KmT8fLyMlu2bDFbtmwxp0+fNllZWaZDhw6mdOnSZtSoUWb16tXmX//6l/Hx8TENGjQwly9fth07IiLChIaGmipVqphPPvnErFu3zvz4449m3bp1RpKJiIgwMTExZuXKlea9994zvr6+pm3btiY6OtoMGTLErF692rz55pvGzc3NvPjii3bX+/LLL5tZs2aZlStXmrVr15q3337bBAYGmr59+9pt17p1a1OuXDkTFRVl3nvvPbNmzRrTv39/I8nMnTvXtl1SUpKpXbu28fHxMa+//rpZtWqVWbx4sXnppZfM2rVrjTHGZGZmmgceeMD4+PiYsWPHmjVr1piPPvrIVKhQwdSqVcukpqbm+Xf36aefGkmmffv2ZunSpWbRokWmYcOGxsPDw2zcuNEYY8xvv/1m3n33XSPJTJgwwWzZssXs3bv3msc8cOCA8fPzM1WrVjXz5s0zK1asME8++aSRZN58803bdtmfeeXKlU3Pnj3NihUrzGeffWYqVapkoqKizJUrV2zb9unTx0RERNje//e//zWSzJo1a+zOvWLFCiPJrFix4pr17dy501SpUsU0aNDANo527txZ4Jr+85//GIvFYh5++GGzZMkS87///c907tzZuLm5mW+//TbPzz0uLs5IMlOmTLFr/+mnn4wk06NHD2OMMePGjTNvv/22WbFihVm/fr157733TGRkpGnbtq3dfn369DHu7u6mcuXKZuLEiea7774zq1atyvf4yOszOX36tKlQoYIpX768ee+998zKlSvNwIEDjSTzwgsv5Hmdxvw13mvXrm0yMjLsXpmZmXnWfunSJVOvXj3j4+Nj/vWvf5nVq1ebUaNGmdKlS5tOnTrZnSP776xJkybm888/N19//bVp06aNKV26tDl8+LAxxpjjx4+bF1980UgyS5YssV1nYmLida8BwM2DoASgRMkOSlu3bjUZGRnm4sWLZvny5aZ8+fLGz8/PnDp1ymzZssVIMlOnTrXb9/jx48bLy8sMHTrU1ta6dWsjyXz33Xc5ztWnTx/j4+Nj17Zy5UojyUyePNmufdGiRXZBzZi/gpKbm5s5ePCg3bbZvyB36dLFrn3w4MFGkhk0aJBd+8MPP2zKli17zc8kMzPTZGRkmHnz5hk3Nzdz7ty5HNf3ww8/2O1Tq1Yt06FDB9v7119/PdcwcLXPPvvMSDKLFy+2a9+2bZuRZGbOnJlnjWFhYaZu3bq2X1qNMebixYsmKCjING/e3NaW/fl88cUX1zxetu7duxur1WqOHTtm196xY0fj7e1tLly4YHdMx196P//8cyPJbNmyxdbmGJQyMzNNlSpVTNeuXXOco2rVqiYrKyvPGmvXrm1at26doz2/NaWkpJiyZcvmGC+ZmZmmfv36pkmTJnmePzsovfnmmyYjI8NcvnzZ7NixwzRu3PiaQS8rK8tkZGSY2NhYI8n89NNPtr4+ffoYSeaTTz6x26cg4+Nan8nw4cNzHa8vvPCCsVgsOX6WHGWPd8dXz54986z9vffeM5LM559/btf+5ptvGklm9erVtjZJJjg42CQlJdnaTp06ZUqVKmUmTpxoa5syZYqRZOLi4vKsGcDNi1vvAJRITZs2lbu7u/z8/NS5c2eFhITom2++UXBwsJYvXy6LxaJevXrpypUrtldISIjq16+f4/aXO+64Q/fdd1++zrt27VpJyvEQ+v/7f/9PPj4+OW6Fqlev3jUXIujcubPd+5o1a0pSjlu5atasqXPnztndfrdr1y499NBDKleunNzc3OTu7q6//e1vyszM1K+//mq3f0hIiJo0aZKjruxbriTpm2++UfXq1XX//fdf69K1fPlylSlTRl26dLH7XO+66y6FhITkeVvRwYMHdfLkSfXu3VulSv3ff1p8fX3VrVs3bd26Vampqdfc/1rWrl2rdu3aKTw83K49JiZGqampOVZ1e+ihh+ze16tXT5LsPgtHpUqV0sCBA7V8+XIdO3ZMknT48GGtXLlS/fv3l8ViKXDdBalp8+bNOnfunPr06WP3uWdlZemBBx7Qtm3bctwel5thw4bJ3d1dnp6eatiwoY4dO6b3339
2024-09-20 19:44:59 -07:00
"text/plain": [
"<Figure size 1000x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"#replace nans in pareto front with 0\n",
"fig, ax = plt.subplots(figsize=(5,5))\n",
"sns.scatterplot(df[df['Pareto_Front']!=1], x='roc_auc_score', y='complexity_scorer', label='other', ax=ax)\n",
"sns.scatterplot(df[df['Pareto_Front']==1], x='roc_auc_score', y='complexity_scorer', label='Pareto Front', ax=ax)\n",
"ax.title.set_text('Performance of all pipelines')\n",
"#log scale y\n",
"ax.set_yscale('log')\n",
"plt.show()\n",
"\n",
"#replace nans in pareto front with 0\n",
"fig, ax = plt.subplots(figsize=(10,5))\n",
"sns.scatterplot(df[df['Pareto_Front']==1], x='roc_auc_score', y='complexity_scorer', label='Pareto Front', ax=ax)\n",
"ax.title.set_text('Performance of only the Pareto Front')\n",
"#log scale y\n",
"# ax.set_yscale('log')\n",
"plt.show()"
]
},
{
"cell_type": "code",
2024-09-23 19:45:04 -07:00
"execution_count": 17,
2024-09-20 19:44:59 -07:00
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>roc_auc_score</th>\n",
" <th>complexity_scorer</th>\n",
" <th>Parents</th>\n",
" <th>Variation_Function</th>\n",
" <th>Individual</th>\n",
" <th>Generation</th>\n",
" <th>Submitted Timestamp</th>\n",
" <th>Completed Timestamp</th>\n",
" <th>Eval Error</th>\n",
" <th>Pareto_Front</th>\n",
" <th>Instance</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
2024-09-23 19:45:04 -07:00
" <th>58</th>\n",
" <td>0.997081</td>\n",
" <td>30.4</td>\n",
" <td>(17, 17)</td>\n",
2024-09-20 19:44:59 -07:00
" <td>ind_mutate</td>\n",
" <td>&lt;tpot2.search_spaces.pipelines.sequential.Sequ...</td>\n",
" <td>1.0</td>\n",
2024-09-23 19:45:04 -07:00
" <td>1.727136e+09</td>\n",
" <td>1.727136e+09</td>\n",
2024-09-20 19:44:59 -07:00
" <td>None</td>\n",
" <td>1.0</td>\n",
2024-09-23 19:45:04 -07:00
" <td>(StandardScaler(), SelectFwe(alpha=0.000122657...</td>\n",
2024-09-20 19:44:59 -07:00
" </tr>\n",
" <tr>\n",
2024-09-23 19:45:04 -07:00
" <th>141</th>\n",
" <td>0.994594</td>\n",
" <td>11.0</td>\n",
" <td>(46, 46)</td>\n",
2024-09-20 19:44:59 -07:00
" <td>ind_mutate</td>\n",
" <td>&lt;tpot2.search_spaces.pipelines.sequential.Sequ...</td>\n",
2024-09-23 19:45:04 -07:00
" <td>2.0</td>\n",
" <td>1.727136e+09</td>\n",
" <td>1.727136e+09</td>\n",
2024-09-20 19:44:59 -07:00
" <td>None</td>\n",
" <td>1.0</td>\n",
2024-09-23 19:45:04 -07:00
" <td>(MaxAbsScaler(), Passthrough(), FeatureUnion(t...</td>\n",
2024-09-20 19:44:59 -07:00
" </tr>\n",
" <tr>\n",
2024-09-23 19:45:04 -07:00
" <th>204</th>\n",
" <td>0.993951</td>\n",
" <td>9.0</td>\n",
" <td>(90, 90)</td>\n",
2024-09-20 19:44:59 -07:00
" <td>ind_mutate</td>\n",
" <td>&lt;tpot2.search_spaces.pipelines.sequential.Sequ...</td>\n",
2024-09-23 19:45:04 -07:00
" <td>4.0</td>\n",
" <td>1.727136e+09</td>\n",
" <td>1.727136e+09</td>\n",
2024-09-20 19:44:59 -07:00
" <td>None</td>\n",
" <td>1.0</td>\n",
2024-09-23 19:45:04 -07:00
" <td>(MinMaxScaler(), Passthrough(), FeatureUnion(t...</td>\n",
2024-09-20 19:44:59 -07:00
" </tr>\n",
" <tr>\n",
2024-09-23 19:45:04 -07:00
" <th>149</th>\n",
" <td>0.985930</td>\n",
" <td>8.0</td>\n",
" <td>(88, 88)</td>\n",
2024-09-20 19:44:59 -07:00
" <td>ind_mutate</td>\n",
" <td>&lt;tpot2.search_spaces.pipelines.sequential.Sequ...</td>\n",
2024-09-23 19:45:04 -07:00
" <td>2.0</td>\n",
" <td>1.727136e+09</td>\n",
" <td>1.727136e+09</td>\n",
2024-09-20 19:44:59 -07:00
" <td>None</td>\n",
" <td>1.0</td>\n",
2024-09-23 19:45:04 -07:00
" <td>(MinMaxScaler(), SelectPercentile(percentile=5...</td>\n",
2024-09-20 19:44:59 -07:00
" </tr>\n",
" <tr>\n",
2024-09-23 19:45:04 -07:00
" <th>178</th>\n",
" <td>0.980084</td>\n",
2024-09-20 19:44:59 -07:00
" <td>7.0</td>\n",
2024-09-23 19:45:04 -07:00
" <td>(131, 131)</td>\n",
2024-09-20 19:44:59 -07:00
" <td>ind_mutate</td>\n",
" <td>&lt;tpot2.search_spaces.pipelines.sequential.Sequ...</td>\n",
2024-09-23 19:45:04 -07:00
" <td>3.0</td>\n",
" <td>1.727136e+09</td>\n",
" <td>1.727136e+09</td>\n",
2024-09-20 19:44:59 -07:00
" <td>None</td>\n",
" <td>1.0</td>\n",
2024-09-23 19:45:04 -07:00
" <td>(MaxAbsScaler(), SelectFwe(alpha=0.00020686984...</td>\n",
2024-09-20 19:44:59 -07:00
" </tr>\n",
" <tr>\n",
2024-09-23 19:45:04 -07:00
" <th>201</th>\n",
" <td>0.949153</td>\n",
" <td>6.0</td>\n",
" <td>(176, 32)</td>\n",
" <td>ind_mutate , ind_mutate , ind_crossover</td>\n",
2024-09-20 19:44:59 -07:00
" <td>&lt;tpot2.search_spaces.pipelines.sequential.Sequ...</td>\n",
2024-09-23 19:45:04 -07:00
" <td>4.0</td>\n",
" <td>1.727136e+09</td>\n",
" <td>1.727136e+09</td>\n",
2024-09-20 19:44:59 -07:00
" <td>None</td>\n",
" <td>1.0</td>\n",
2024-09-23 19:45:04 -07:00
" <td>(MinMaxScaler(), SelectFwe(alpha=0.00140487405...</td>\n",
2024-09-20 19:44:59 -07:00
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
2024-09-23 19:45:04 -07:00
" roc_auc_score complexity_scorer Parents \\\n",
"58 0.997081 30.4 (17, 17) \n",
"141 0.994594 11.0 (46, 46) \n",
"204 0.993951 9.0 (90, 90) \n",
"149 0.985930 8.0 (88, 88) \n",
"178 0.980084 7.0 (131, 131) \n",
"201 0.949153 6.0 (176, 32) \n",
"\n",
" Variation_Function \\\n",
"58 ind_mutate \n",
"141 ind_mutate \n",
"204 ind_mutate \n",
"149 ind_mutate \n",
"178 ind_mutate \n",
"201 ind_mutate , ind_mutate , ind_crossover \n",
2024-09-20 19:44:59 -07:00
"\n",
" Individual Generation \\\n",
2024-09-23 19:45:04 -07:00
"58 <tpot2.search_spaces.pipelines.sequential.Sequ... 1.0 \n",
"141 <tpot2.search_spaces.pipelines.sequential.Sequ... 2.0 \n",
"204 <tpot2.search_spaces.pipelines.sequential.Sequ... 4.0 \n",
"149 <tpot2.search_spaces.pipelines.sequential.Sequ... 2.0 \n",
"178 <tpot2.search_spaces.pipelines.sequential.Sequ... 3.0 \n",
"201 <tpot2.search_spaces.pipelines.sequential.Sequ... 4.0 \n",
2024-09-20 19:44:59 -07:00
"\n",
" Submitted Timestamp Completed Timestamp Eval Error Pareto_Front \\\n",
2024-09-23 19:45:04 -07:00
"58 1.727136e+09 1.727136e+09 None 1.0 \n",
"141 1.727136e+09 1.727136e+09 None 1.0 \n",
"204 1.727136e+09 1.727136e+09 None 1.0 \n",
"149 1.727136e+09 1.727136e+09 None 1.0 \n",
"178 1.727136e+09 1.727136e+09 None 1.0 \n",
"201 1.727136e+09 1.727136e+09 None 1.0 \n",
2024-09-20 19:44:59 -07:00
"\n",
" Instance \n",
2024-09-23 19:45:04 -07:00
"58 (StandardScaler(), SelectFwe(alpha=0.000122657... \n",
"141 (MaxAbsScaler(), Passthrough(), FeatureUnion(t... \n",
"204 (MinMaxScaler(), Passthrough(), FeatureUnion(t... \n",
"149 (MinMaxScaler(), SelectPercentile(percentile=5... \n",
"178 (MaxAbsScaler(), SelectFwe(alpha=0.00020686984... \n",
"201 (MinMaxScaler(), SelectFwe(alpha=0.00140487405... "
2024-09-20 19:44:59 -07:00
]
},
2024-09-23 19:45:04 -07:00
"execution_count": 17,
2024-09-20 19:44:59 -07:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#plot only the pareto front pipelines\n",
"sorted_pareto_front = df[df['Pareto_Front']==1].sort_values('roc_auc_score', ascending=False)\n",
"sorted_pareto_front"
]
},
{
"cell_type": "code",
2024-09-23 19:45:04 -07:00
"execution_count": 18,
2024-09-20 19:44:59 -07:00
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
2024-09-23 19:45:04 -07:00
"<style>#sk-container-id-2 {\n",
2024-09-20 19:44:59 -07:00
" /* Definition of color scheme common for light and dark mode */\n",
" --sklearn-color-text: black;\n",
" --sklearn-color-line: gray;\n",
" /* Definition of color scheme for unfitted estimators */\n",
" --sklearn-color-unfitted-level-0: #fff5e6;\n",
" --sklearn-color-unfitted-level-1: #f6e4d2;\n",
" --sklearn-color-unfitted-level-2: #ffe0b3;\n",
" --sklearn-color-unfitted-level-3: chocolate;\n",
" /* Definition of color scheme for fitted estimators */\n",
" --sklearn-color-fitted-level-0: #f0f8ff;\n",
" --sklearn-color-fitted-level-1: #d4ebff;\n",
" --sklearn-color-fitted-level-2: #b3dbfd;\n",
" --sklearn-color-fitted-level-3: cornflowerblue;\n",
"\n",
" /* Specific color for light theme */\n",
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, white)));\n",
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
" --sklearn-color-icon: #696969;\n",
"\n",
" @media (prefers-color-scheme: dark) {\n",
" /* Redefinition of color scheme for dark theme */\n",
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, #111)));\n",
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
" --sklearn-color-icon: #878787;\n",
" }\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 {\n",
2024-09-20 19:44:59 -07:00
" color: var(--sklearn-color-text);\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 pre {\n",
2024-09-20 19:44:59 -07:00
" padding: 0;\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 input.sk-hidden--visually {\n",
2024-09-20 19:44:59 -07:00
" border: 0;\n",
" clip: rect(1px 1px 1px 1px);\n",
" clip: rect(1px, 1px, 1px, 1px);\n",
" height: 1px;\n",
" margin: -1px;\n",
" overflow: hidden;\n",
" padding: 0;\n",
" position: absolute;\n",
" width: 1px;\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-dashed-wrapped {\n",
2024-09-20 19:44:59 -07:00
" border: 1px dashed var(--sklearn-color-line);\n",
" margin: 0 0.4em 0.5em 0.4em;\n",
" box-sizing: border-box;\n",
" padding-bottom: 0.4em;\n",
" background-color: var(--sklearn-color-background);\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-container {\n",
2024-09-20 19:44:59 -07:00
" /* jupyter's `normalize.less` sets `[hidden] { display: none; }`\n",
" but bootstrap.min.css set `[hidden] { display: none !important; }`\n",
" so we also need the `!important` here to be able to override the\n",
" default hidden behavior on the sphinx rendered scikit-learn.org.\n",
" See: https://github.com/scikit-learn/scikit-learn/issues/21755 */\n",
" display: inline-block !important;\n",
" position: relative;\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-text-repr-fallback {\n",
2024-09-20 19:44:59 -07:00
" display: none;\n",
"}\n",
"\n",
"div.sk-parallel-item,\n",
"div.sk-serial,\n",
"div.sk-item {\n",
" /* draw centered vertical line to link estimators */\n",
" background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));\n",
" background-size: 2px 100%;\n",
" background-repeat: no-repeat;\n",
" background-position: center center;\n",
"}\n",
"\n",
"/* Parallel-specific style estimator block */\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-parallel-item::after {\n",
2024-09-20 19:44:59 -07:00
" content: \"\";\n",
" width: 100%;\n",
" border-bottom: 2px solid var(--sklearn-color-text-on-default-background);\n",
" flex-grow: 1;\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-parallel {\n",
2024-09-20 19:44:59 -07:00
" display: flex;\n",
" align-items: stretch;\n",
" justify-content: center;\n",
" background-color: var(--sklearn-color-background);\n",
" position: relative;\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-parallel-item {\n",
2024-09-20 19:44:59 -07:00
" display: flex;\n",
" flex-direction: column;\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-parallel-item:first-child::after {\n",
2024-09-20 19:44:59 -07:00
" align-self: flex-end;\n",
" width: 50%;\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-parallel-item:last-child::after {\n",
2024-09-20 19:44:59 -07:00
" align-self: flex-start;\n",
" width: 50%;\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-parallel-item:only-child::after {\n",
2024-09-20 19:44:59 -07:00
" width: 0;\n",
"}\n",
"\n",
"/* Serial-specific style estimator block */\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-serial {\n",
2024-09-20 19:44:59 -07:00
" display: flex;\n",
" flex-direction: column;\n",
" align-items: center;\n",
" background-color: var(--sklearn-color-background);\n",
" padding-right: 1em;\n",
" padding-left: 1em;\n",
"}\n",
"\n",
"\n",
"/* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is\n",
"clickable and can be expanded/collapsed.\n",
"- Pipeline and ColumnTransformer use this feature and define the default style\n",
"- Estimators will overwrite some part of the style using the `sk-estimator` class\n",
"*/\n",
"\n",
"/* Pipeline and ColumnTransformer style (default) */\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-toggleable {\n",
2024-09-20 19:44:59 -07:00
" /* Default theme specific background. It is overwritten whether we have a\n",
" specific estimator or a Pipeline/ColumnTransformer */\n",
" background-color: var(--sklearn-color-background);\n",
"}\n",
"\n",
"/* Toggleable label */\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 label.sk-toggleable__label {\n",
2024-09-20 19:44:59 -07:00
" cursor: pointer;\n",
" display: block;\n",
" width: 100%;\n",
" margin-bottom: 0;\n",
" padding: 0.5em;\n",
" box-sizing: border-box;\n",
" text-align: center;\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 label.sk-toggleable__label-arrow:before {\n",
2024-09-20 19:44:59 -07:00
" /* Arrow on the left of the label */\n",
" content: \"▸\";\n",
" float: left;\n",
" margin-right: 0.25em;\n",
" color: var(--sklearn-color-icon);\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 label.sk-toggleable__label-arrow:hover:before {\n",
2024-09-20 19:44:59 -07:00
" color: var(--sklearn-color-text);\n",
"}\n",
"\n",
"/* Toggleable content - dropdown */\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-toggleable__content {\n",
2024-09-20 19:44:59 -07:00
" max-height: 0;\n",
" max-width: 0;\n",
" overflow: hidden;\n",
" text-align: left;\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-0);\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-toggleable__content.fitted {\n",
2024-09-20 19:44:59 -07:00
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-0);\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-toggleable__content pre {\n",
2024-09-20 19:44:59 -07:00
" margin: 0.2em;\n",
" border-radius: 0.25em;\n",
" color: var(--sklearn-color-text);\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-0);\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-toggleable__content.fitted pre {\n",
2024-09-20 19:44:59 -07:00
" /* unfitted */\n",
" background-color: var(--sklearn-color-fitted-level-0);\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 input.sk-toggleable__control:checked~div.sk-toggleable__content {\n",
2024-09-20 19:44:59 -07:00
" /* Expand drop-down */\n",
" max-height: 200px;\n",
" max-width: 100%;\n",
" overflow: auto;\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {\n",
2024-09-20 19:44:59 -07:00
" content: \"▾\";\n",
"}\n",
"\n",
"/* Pipeline/ColumnTransformer-specific style */\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
2024-09-20 19:44:59 -07:00
" color: var(--sklearn-color-text);\n",
" background-color: var(--sklearn-color-unfitted-level-2);\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
2024-09-20 19:44:59 -07:00
" background-color: var(--sklearn-color-fitted-level-2);\n",
"}\n",
"\n",
"/* Estimator-specific style */\n",
"\n",
"/* Colorize estimator box */\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
2024-09-20 19:44:59 -07:00
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-2);\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
2024-09-20 19:44:59 -07:00
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-2);\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-label label.sk-toggleable__label,\n",
"#sk-container-id-2 div.sk-label label {\n",
2024-09-20 19:44:59 -07:00
" /* The background is the default theme color */\n",
" color: var(--sklearn-color-text-on-default-background);\n",
"}\n",
"\n",
"/* On hover, darken the color of the background */\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-label:hover label.sk-toggleable__label {\n",
2024-09-20 19:44:59 -07:00
" color: var(--sklearn-color-text);\n",
" background-color: var(--sklearn-color-unfitted-level-2);\n",
"}\n",
"\n",
"/* Label box, darken color on hover, fitted */\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-label.fitted:hover label.sk-toggleable__label.fitted {\n",
2024-09-20 19:44:59 -07:00
" color: var(--sklearn-color-text);\n",
" background-color: var(--sklearn-color-fitted-level-2);\n",
"}\n",
"\n",
"/* Estimator label */\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-label label {\n",
2024-09-20 19:44:59 -07:00
" font-family: monospace;\n",
" font-weight: bold;\n",
" display: inline-block;\n",
" line-height: 1.2em;\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-label-container {\n",
2024-09-20 19:44:59 -07:00
" text-align: center;\n",
"}\n",
"\n",
"/* Estimator-specific */\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-estimator {\n",
2024-09-20 19:44:59 -07:00
" font-family: monospace;\n",
" border: 1px dotted var(--sklearn-color-border-box);\n",
" border-radius: 0.25em;\n",
" box-sizing: border-box;\n",
" margin-bottom: 0.5em;\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-0);\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-estimator.fitted {\n",
2024-09-20 19:44:59 -07:00
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-0);\n",
"}\n",
"\n",
"/* on hover */\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-estimator:hover {\n",
2024-09-20 19:44:59 -07:00
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-2);\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 div.sk-estimator.fitted:hover {\n",
2024-09-20 19:44:59 -07:00
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-2);\n",
"}\n",
"\n",
"/* Specification for estimator info (e.g. \"i\" and \"?\") */\n",
"\n",
"/* Common style for \"i\" and \"?\" */\n",
"\n",
".sk-estimator-doc-link,\n",
"a:link.sk-estimator-doc-link,\n",
"a:visited.sk-estimator-doc-link {\n",
" float: right;\n",
" font-size: smaller;\n",
" line-height: 1em;\n",
" font-family: monospace;\n",
" background-color: var(--sklearn-color-background);\n",
" border-radius: 1em;\n",
" height: 1em;\n",
" width: 1em;\n",
" text-decoration: none !important;\n",
" margin-left: 1ex;\n",
" /* unfitted */\n",
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
" color: var(--sklearn-color-unfitted-level-1);\n",
"}\n",
"\n",
".sk-estimator-doc-link.fitted,\n",
"a:link.sk-estimator-doc-link.fitted,\n",
"a:visited.sk-estimator-doc-link.fitted {\n",
" /* fitted */\n",
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
" color: var(--sklearn-color-fitted-level-1);\n",
"}\n",
"\n",
"/* On hover */\n",
"div.sk-estimator:hover .sk-estimator-doc-link:hover,\n",
".sk-estimator-doc-link:hover,\n",
"div.sk-label-container:hover .sk-estimator-doc-link:hover,\n",
".sk-estimator-doc-link:hover {\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-3);\n",
" color: var(--sklearn-color-background);\n",
" text-decoration: none;\n",
"}\n",
"\n",
"div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,\n",
".sk-estimator-doc-link.fitted:hover,\n",
"div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,\n",
".sk-estimator-doc-link.fitted:hover {\n",
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-3);\n",
" color: var(--sklearn-color-background);\n",
" text-decoration: none;\n",
"}\n",
"\n",
"/* Span, style for the box shown on hovering the info icon */\n",
".sk-estimator-doc-link span {\n",
" display: none;\n",
" z-index: 9999;\n",
" position: relative;\n",
" font-weight: normal;\n",
" right: .2ex;\n",
" padding: .5ex;\n",
" margin: .5ex;\n",
" width: min-content;\n",
" min-width: 20ex;\n",
" max-width: 50ex;\n",
" color: var(--sklearn-color-text);\n",
" box-shadow: 2pt 2pt 4pt #999;\n",
" /* unfitted */\n",
" background: var(--sklearn-color-unfitted-level-0);\n",
" border: .5pt solid var(--sklearn-color-unfitted-level-3);\n",
"}\n",
"\n",
".sk-estimator-doc-link.fitted span {\n",
" /* fitted */\n",
" background: var(--sklearn-color-fitted-level-0);\n",
" border: var(--sklearn-color-fitted-level-3);\n",
"}\n",
"\n",
".sk-estimator-doc-link:hover span {\n",
" display: block;\n",
"}\n",
"\n",
"/* \"?\"-specific style due to the `<a>` HTML tag */\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 a.estimator_doc_link {\n",
2024-09-20 19:44:59 -07:00
" float: right;\n",
" font-size: 1rem;\n",
" line-height: 1em;\n",
" font-family: monospace;\n",
" background-color: var(--sklearn-color-background);\n",
" border-radius: 1rem;\n",
" height: 1rem;\n",
" width: 1rem;\n",
" text-decoration: none;\n",
" /* unfitted */\n",
" color: var(--sklearn-color-unfitted-level-1);\n",
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 a.estimator_doc_link.fitted {\n",
2024-09-20 19:44:59 -07:00
" /* fitted */\n",
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
" color: var(--sklearn-color-fitted-level-1);\n",
"}\n",
"\n",
"/* On hover */\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 a.estimator_doc_link:hover {\n",
2024-09-20 19:44:59 -07:00
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-3);\n",
" color: var(--sklearn-color-background);\n",
" text-decoration: none;\n",
"}\n",
"\n",
2024-09-23 19:45:04 -07:00
"#sk-container-id-2 a.estimator_doc_link.fitted:hover {\n",
2024-09-20 19:44:59 -07:00
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-3);\n",
"}\n",
2024-09-23 19:45:04 -07:00
"</style><div id=\"sk-container-id-2\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>Pipeline(steps=[(&#x27;minmaxscaler&#x27;, MinMaxScaler()),\n",
" (&#x27;selectfwe&#x27;, SelectFwe(alpha=0.0014048740592)),\n",
" (&#x27;featureunion-1&#x27;,\n",
" FeatureUnion(transformer_list=[(&#x27;featureunion&#x27;,\n",
" FeatureUnion(transformer_list=[(&#x27;zerocount&#x27;,\n",
" ZeroCount())])),\n",
" (&#x27;passthrough&#x27;,\n",
" Passthrough())])),\n",
" (&#x27;featureunion-2&#x27;,\n",
" FeatureUnion(transformer_list=[(&#x27;featureunion&#x27;,\n",
" FeatureUnion(transformer_list=[(&#x27;estimatortransformer&#x27;,\n",
" EstimatorTransformer(estimator=BernoulliNB(alpha=76.5761838773666,\n",
" fit_prior=False)))])),\n",
2024-09-20 19:44:59 -07:00
" (&#x27;passthrough&#x27;,\n",
" Passthrough())])),\n",
" (&#x27;kneighborsclassifier&#x27;,\n",
2024-09-23 19:45:04 -07:00
" KNeighborsClassifier(n_jobs=1, n_neighbors=1, p=3))])</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-11\" type=\"checkbox\" ><label for=\"sk-estimator-id-11\" class=\"sk-toggleable__label sk-toggleable__label-arrow \">&nbsp;&nbsp;Pipeline<a class=\"sk-estimator-doc-link \" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.pipeline.Pipeline.html\">?<span>Documentation for Pipeline</span></a><span class=\"sk-estimator-doc-link \">i<span>Not fitted</span></span></label><div class=\"sk-toggleable__content \"><pre>Pipeline(steps=[(&#x27;minmaxscaler&#x27;, MinMaxScaler()),\n",
" (&#x27;selectfwe&#x27;, SelectFwe(alpha=0.0014048740592)),\n",
" (&#x27;featureunion-1&#x27;,\n",
" FeatureUnion(transformer_list=[(&#x27;featureunion&#x27;,\n",
" FeatureUnion(transformer_list=[(&#x27;zerocount&#x27;,\n",
" ZeroCount())])),\n",
" (&#x27;passthrough&#x27;,\n",
" Passthrough())])),\n",
" (&#x27;featureunion-2&#x27;,\n",
" FeatureUnion(transformer_list=[(&#x27;featureunion&#x27;,\n",
" FeatureUnion(transformer_list=[(&#x27;estimatortransformer&#x27;,\n",
" EstimatorTransformer(estimator=BernoulliNB(alpha=76.5761838773666,\n",
" fit_prior=False)))])),\n",
2024-09-20 19:44:59 -07:00
" (&#x27;passthrough&#x27;,\n",
" Passthrough())])),\n",
" (&#x27;kneighborsclassifier&#x27;,\n",
2024-09-23 19:45:04 -07:00
" KNeighborsClassifier(n_jobs=1, n_neighbors=1, p=3))])</pre></div> </div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-12\" type=\"checkbox\" ><label for=\"sk-estimator-id-12\" class=\"sk-toggleable__label sk-toggleable__label-arrow \">&nbsp;MinMaxScaler<a class=\"sk-estimator-doc-link \" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.preprocessing.MinMaxScaler.html\">?<span>Documentation for MinMaxScaler</span></a></label><div class=\"sk-toggleable__content \"><pre>MinMaxScaler()</pre></div> </div></div><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-13\" type=\"checkbox\" ><label for=\"sk-estimator-id-13\" class=\"sk-toggleable__label sk-toggleable__label-arrow \">&nbsp;SelectFwe<a class=\"sk-estimator-doc-link \" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.feature_selection.SelectFwe.html\">?<span>Documentation for SelectFwe</span></a></label><div class=\"sk-toggleable__content \"><pre>SelectFwe(alpha=0.0014048740592)</pre></div> </div></div><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-14\" type=\"checkbox\" ><label for=\"sk-estimator-id-14\" class=\"sk-toggleable__label sk-toggleable__label-arrow \">&nbsp;featureunion-1: FeatureUnion<a class=\"sk-estimator-doc-link \" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.pipeline.FeatureUnion.html\">?<span>Documentation for featureunion-1: FeatureUnion</span></a></label><div class=\"sk-toggleable__content \"><pre>FeatureUnion(transformer_list=[(&#x27;featureunion&#x27;,\n",
" FeatureUnion(transformer_list=[(&#x27;zerocount&#x27;,\n",
" ZeroCount())])),\n",
" (&#x27;passthrough&#x27;, Passthrough())])</pre></div> </div></div><div class=\"sk-parallel\"><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><label>featureunion</label></div></div><div class=\"sk-serial\"><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-parallel\"><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><label>zerocount</label></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-15\" type=\"checkbox\" ><label for=\"sk-estimator-id-15\" class=\"sk-toggleable__label sk-toggleable__label-arrow \">ZeroCount</label><div class=\"sk-toggleable__content \"><pre>ZeroCount()</pre></div> </div></div></div></div></div></div></div></div></div></div><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><label>passthrough</label></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-16\" type=\"checkbox\" ><label for=\"sk-estimator-id-16\" class=\"sk-toggleable__label sk-toggleable__label-arrow \">Passthrough</label><div class=\"sk-toggleable__content \"><pre>Passthrough()</pre></div> </div></div></div></div></div></div></div><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-17\" type=\"checkbox\" ><label for=\"sk-estimator-id-17\" class=\"sk-toggleable__label sk-toggleable__label-arrow \">&nbsp;featureunion-2: FeatureUnion<a class=\"sk-estimator-doc-link \" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.pipeline.FeatureUnion.html\">?<span>Documentation for featureunion-2: FeatureUnion</span></a></label><div class=\"sk-toggleable__content \"><pre>FeatureUnion(transformer_list=[(&#x27;featureunion&#x27;,\n",
" FeatureUnion(transformer_list=[(&#x27;estimatortransformer&#x27;,\n",
" EstimatorTransformer(estimator=BernoulliNB(alpha=76.5761838773666,\n",
" fit_prior=False)))])),\n",
" (&#x27;passthrough&#x27;, Passthrough())])</pre></div> </div></div><div class=\"sk-parallel\"><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><label>featureunion</label></div></div><div class=\"sk-serial\"><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-parallel\"><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><label>estimatortransformer</label></div></div><div class=\"sk-serial\"><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-parallel\"><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-18\" type=\"checkbox\" ><label for=\"sk-estimator-id-18\" class=\"sk-toggleable__label sk-toggleable__label-arrow \">estimator: BernoulliNB</label><div class=\"sk-toggleable__content \"><pre>BernoulliNB(alpha=76.5761838773666, fit_prior=False)</pre></div> </div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-19\" type=\"checkbox\" ><label for=\"sk-estimator-id-19\" class=\"sk-toggleable__label sk-toggleable__label-arrow \">&nbsp;BernoulliNB<a class=\"sk-estimator-doc-link \" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.naive_bayes.BernoulliNB.html\">?<span>Documentation for BernoulliNB</span></a></label><div class=\"sk-toggleable__content \"><pre>BernoulliNB(alpha=76.5761838773666, fit_prior=False)</pre></div> </div></div></div></div></div></div></div></div></div></div></div></div></div></div></div><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><label>passthrough</label></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-20\" type=\"checkbox\" ><label for=\"sk-estimator-id-20\" class=\"sk-toggleable__label sk-toggleable__label-arrow \">Passthrough</label><div class=\"sk-toggleable__content \"><pre>Passthrough()</pre></div> </div></div></div></div></div></div></div><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-21\" type=\"checkbox\" ><label for=\"sk-estimator-id-21\" class=\"sk-toggleable__label sk-toggleable__label-arrow \">&nbsp;KNeighborsClassifier<a class=\"sk-estimator-doc-link \" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.neighbors.KNeighborsClassifier.html\">?<span>Documentation for KNeighborsClassifier</span></a></label><div class=\"sk-toggleable__content \"><pre>KNeighborsClassifier(n_jobs=1, n_neighbors=1, p=3)</pre></div> </div></div></div></div></div></div>"
2024-09-20 19:44:59 -07:00
],
"text/plain": [
2024-09-23 19:45:04 -07:00
"Pipeline(steps=[('minmaxscaler', MinMaxScaler()),\n",
" ('selectfwe', SelectFwe(alpha=0.0014048740592)),\n",
" ('featureunion-1',\n",
" FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('zerocount',\n",
" ZeroCount())])),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
" ('featureunion-2',\n",
" FeatureUnion(transformer_list=[('featureunion',\n",
" FeatureUnion(transformer_list=[('estimatortransformer',\n",
" EstimatorTransformer(estimator=BernoulliNB(alpha=76.5761838773666,\n",
" fit_prior=False)))])),\n",
2024-09-20 19:44:59 -07:00
" ('passthrough',\n",
" Passthrough())])),\n",
" ('kneighborsclassifier',\n",
2024-09-23 19:45:04 -07:00
" KNeighborsClassifier(n_jobs=1, n_neighbors=1, p=3))])"
2024-09-20 15:27:26 -07:00
]
},
2024-09-23 19:45:04 -07:00
"execution_count": 18,
2024-09-20 15:27:26 -07:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
2024-09-20 19:44:59 -07:00
"#access the best performing pipeline with the lowest complexity\n",
"\n",
"best_pipeline_lowest_complexity = sorted_pareto_front.iloc[-1]['Instance']\n",
"best_pipeline_lowest_complexity"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plot performance over time + Continuing a run from where it left off\n",
2024-09-20 19:44:59 -07:00
"\n",
"Plotting the performance over time is a good way of trying to access whether or not the TPOT model has converged. If performance seems to asymptote over time, there may not be much more performance to be gained by running for a longer period of time. If the plot looks like it is still actively improving, it may be worth running TPOT for a longer duration. \n",
"\n",
2024-09-20 20:25:42 -07:00
"There are two ways to resume TPOT. If the `warm_start` parameter is set to True, subsequent calls to `fit` will continue training where it left off (The conventional scikit-learn default is to retrain from scratch on subsequent calls to fit). Additionally, if `periodic_checkpoint_folder` is set, TPOT will periodically save its current state. If TPOT terminates normally, is interrupted (job canceled, PC shut off), or crashes (memory issues), it will be able to resume training from where it left off. ** NOTE: If the periodic_checkpoint_folder is set, TPOT will always resume from the **\n",
"\n",
"In this case we can see that performance is near optimal and has slowed, so more time is likely unnecessary."
]
},
{
"cell_type": "code",
2024-09-23 19:45:04 -07:00
"execution_count": 19,
2024-09-20 20:25:42 -07:00
"metadata": {},
"outputs": [
{
"data": {
2024-09-23 19:45:04 -07:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA2AAAAHACAYAAADEEQtjAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpAABAm0lEQVR4nO3de3hU1b3/8c9kcoWQCAQSkUsiByGFFDFoIMELisEgCOjxAFUEqlZOUYloLRT4gQhE5ZDqqYJcFbyBrWi1UjEqIphiJIISoYBFSAjh0AASBMlMJvv3R5yNYwJmyGV2tu/X88zzOGvWzF4b9kO/3661vsthGIYhAAAAAECDCwr0AAAAAADg54IEDAAAAAAaCQkYAAAAADQSEjAAAAAAaCQkYAAAAADQSEjAAAAAAKCRkIABAAAAQCMhAQMAAACARhIc6AE0VZWVlTp48KBatGghh8MR6OEAAAAACBDDMHTixAm1a9dOQUHnnuMiATtPBw8eVIcOHQI9DAAAAAAWUVRUpPbt25+zDwnYeWrRooWkqj/kqKioAI8GAAAAQKCUlZWpQ4cOZo5wLiRg58m77DAqKooEDAAAAECttiZRhAMAAAAAGgkJGAAAAAA0EhIwAAAAAGgkJGAAAAAA0EhIwAAAAACgkZCAAQAAAEAjIQEDAAAAgEZCAgYAAAAAjYQEDAAAAAAaCQkYAAAAADQSEjAAAAAAaCQkYAAAAADQSEjAAAAAAKCRBAd6AADwQ3/fXqKn138lT6UR6KEAAACLC3EG6a37+gV6GH4hAQNgKSv+sU9fHiwL9DAAAEATEBrc9Bb0kYABsBS3p2rmK3NAF/Xu1CrAowEAAFYW5Aj0CPxHAgbAUio8lZKkpIui1a9LTIBHAwAAUL+a3pwdAFur+H7vl7Mp/l9aAAAAP4EEDIClVHy/BDHEyT9PAADAfohwAFhKRWXVEkRmwAAAgB2RgAGwFO8SxBAnCRgAALAfEjAAluJdgugM4p8nAABgP0Q4ACzFuwQxmCWIAADAhkjAAFiK5/sliMEsQQQAADZEAgbAUrwHMTMDBgAA7IgEDIClmDNg7AEDAAA2RIQDwFIoQw8AAOyMBAyApXAQMwAAsDMiHACWYRiGeQ4YM2AAAMCOSMAAWIZ3/5fEQcwAAMCeSMAAWEbFDxIwZsAAAIAdkYABsIwKnxkw/nkCAAD2Q4QDwDI8HmbAAACAvZGAAbAM9/cl6CUOYgYAAPZEAgbAMjw/qIDocJCAAQAA+wkO9AAAfy3f9LW27D8a6GGgAXzn8khi+SEAALAvEjA0KV+XntSsv+0I9DDQwNpEhgV6CAAAAA2CBAxNyt8LSiRJPS6K0ojeHQI8GjSUPhe3DvQQAAAAGgQJGJqUdwoOSZJ+dUUn/SqlY4BHAwAAAPiHIhxoMoq/+U5fHDiuIIeU3j020MMBAAAA/EYChibDO/t1eXwrxbBHCAAAAE0QCRiajHe+3/+V0SMuwCMBAAAAzg8JGJqEwydOa8v+Y5KkgSRgAAAAaKJIwNAkrPvy/2QY0qUdLtCF0RGBHg4AAABwXgKegC1YsEAJCQkKDw9XcnKyNm7ceM7+zzzzjBITExUREaGuXbtq5cqVPp+73W7NmjVLnTt3Vnh4uHr27Kl33nnHp8/MmTPlcDh8XnFxzKpYGcsPAQAAYAcBLUO/evVqZWZmasGCBUpLS9OiRYuUkZGhHTt2qGPH6iXGFy5cqClTpmjJkiW6/PLLlZeXp7vvvlstW7bUkCFDJEnTpk3Tiy++qCVLlqhbt25at26dhg8frtzcXPXq1cv8re7du+u9994z3zudzoa/YZyXYydd2rz3qCQpo8eFAR4NAAAAcP4chmEYgbp4SkqKLrvsMi1cuNBsS0xM1LBhw5SVlVWtf2pqqtLS0jRv3jyzLTMzU1u2bNGmTZskSe3atdPUqVM1YcIEs8+wYcMUGRmpF198UVLVDNgbb7yhbdu2nffYy8rKFB0drePHjysqKuq8fwc/7dVPi/Twa1/oFxdGae3EKwM9HAAAAMCHP7lBwJYgulwu5efnKz093ac9PT1dubm5NX6nvLxc4eHhPm0RERHKy8uT2+0+Zx9vgua1Z88etWvXTgkJCRo5cqT27t17zvGWl5errKzM54XG8XeWHwIAAMAmApaAlZaWyuPxKDbW90Dd2NhYHTp0qMbvDBw4UEuXLlV+fr4Mw9CWLVu0fPlyud1ulZaWmn2ys7O1Z88eVVZWKicnR3/9619VUlJi/k5KSopWrlypdevWacmSJTp06JBSU1N15MiRs443KytL0dHR5qtDhw718KeAn1J22q2Pv6r6e8lIIgEDAABA0xbwIhwOh8PnvWEY1dq8pk+froyMDPXp00chISEaOnSoxo4dK+nMHq6nnnpKXbp0Ubdu3RQaGqp7771X48aN89njlZGRoVtuuUVJSUkaMGCA3n77bUnSihUrzjrOKVOm6Pjx4+arqKioLreNWlr/z8NyeSr1H20j9R9tWwR6OAAAAECdBCwBi4mJkdPprDbbdfjw4WqzYl4RERFavny5Tp06pX379qmwsFDx8fFq0aKFYmJiJElt2rTRG2+8oZMnT2r//v365z//qcjISCUkJJx1LM2bN1dSUpL27Nlz1j5hYWGKioryeaHh/X171fNxQ3dmvwAAAND0BSwBCw0NVXJysnJycnzac3JylJqaes7vhoSEqH379nI6nVq1apUGDx6soCDfWwkPD9dFF12kiooKvfbaaxo6dOhZf6+8vFw7d+7UhRdSYc9KTrkq9OHuw5KkG9j/BQAAABsIaBn6SZMmafTo0erdu7f69u2rxYsXq7CwUOPHj5dUteyvuLjYPOtr9+7dysvLU0pKio4dO6bs7GwVFBT4LB385JNPVFxcrEsvvVTFxcWaOXOmKisr9fDDD5t9HnroIQ0ZMkQdO3bU4cOHNXv2bJWVlWnMmDGN+wdgU9+5PPq2vKLOv7Nh97912l2pDq0i1L0dM44AAABo+gKagI0YMUJHjhzRrFmzVFJSoh49emjt2rXq1KmTJKmkpESFhYVmf4/Ho/nz52vXrl0KCQlR//79lZubq/j4eLPP6dOnNW3aNO3du1eRkZEaNGiQXnjhBV1wwQVmnwMHDmjUqFEqLS1VmzZt1KdPH23evNm8Ls7fzpIyDV/wsU67K+vtNzN6XHjWfYEAAABAUxLQc8CaMs4Bq9mqvEJNXrO93n6vdfNQvTq+rzq3iay33wQAAADqkz+5QUBnwGA/J10eSdLQS9vpqZG9AjwaAAAAwFoCXoYe9nLq+71fzULJ7QEAAIAfIwFDvfLOgDUPdf5ETwAAAODnhwQM9eqU6/sZsDBmwAAAAIAfIwFDvTpZzgwYAAAAcDYkYKhXzIABAAAAZ0cChnrlPYCZGTAAAACgOhIw1KtT3xfhoAoiAAAAUB0JGOrVSe8MWBgzYAAAAMCPkYChXjEDBgAAAJwdCRjqlbcIBzNgAAAAQHUkYKhXZ8rQMwMGAAAA/BgJGOqNp9LQd27vEkRmwAAAAIAfIwFDvfEmX5LUnHPAAAAAgGpIwFBvTn1fATHIIYUF82gBAAAAP0aUjHpz5hDmYDkcjgCPBgAAALAe1onZXOm35Vr9aZFO/2B5YENeS5KaUQERAAAAqBEJmM0t2bhXizbsbdRrtmoe1qjXAwAAAJoKEjCbO3bSJUm6rOMFSroousGv53A4NKRnuwa/DgAAANAUkYDZnNtjSJIyelyou6+6OMCjAQAAAH7eKMJhcy5PpSQpxElRDAAAACDQSMBszl1RlYCFBlMYAwAAAAg0EjCbYwYMAAAAsA4SMJtze7wzYPxVAwAAAIFGVG5z7oqqIhwhTv6qAQAAgEAjKre5M0sQ+asGAAAAAo2o3OZYgggAAABYB1G5zbkqKMIBAAAAWAUJmM2ZM2AsQQQAAAACjqjc5tweinAAAAAAVkFUbnMU4QAAAACsg6jc5rx7wCjCAQAAAAQeUbnNsQcMAAAAsA6icpvzJmAhwVRBBAAAAAKNBMzGDMOgCAcAAABgIUTlNuZNviT2gAEAAABWQFRuY94KiBJ7wAA
2024-09-20 20:25:42 -07:00
"text/plain": [
"<Figure size 1000x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#get columns where roc_auc_score is not NaN\n",
"scores_and_times = df[df['roc_auc_score'].notna()][['roc_auc_score', 'Completed Timestamp']].sort_values('Completed Timestamp', ascending=True).to_numpy()\n",
"\n",
"#get best score at a given time\n",
"best_scores = np.maximum.accumulate(scores_and_times[:,0])\n",
"times = scores_and_times[:,1]\n",
"times = times - df['Submitted Timestamp'].min()\n",
"\n",
"fig, ax = plt.subplots(figsize=(10,5))\n",
"ax.plot(times, best_scores)\n",
"ax.set_xlabel('Time (seconds)')\n",
"ax.set_ylabel('Best Score')\n",
"plt.show()\n"
2024-09-20 15:27:26 -07:00
]
},
2024-09-20 14:48:56 -07:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Common parameters\n",
"\n",
2024-09-23 09:02:43 -07:00
"Here is a subset of the most common parameters to customize and what they do. See the docs for `TPOTEstimator` or `TPOTEstimatorSteadyState` full documentation of all parameters. \n",
"\n",
"| Parameter | Type | Description |\n",
"|--------------------------------|-----------------------|-----------------------------------------------------------------------------|\n",
"| scorers | list, scorer | List of scorers for cross-validation; see |\n",
"| scorers_weights | list | Weights applied to scorers during optimization |\n",
"| classification | bool | Problem type: True for classification, False for regression |\n",
"| cv | int, cross-validator | Cross-validation strategy: int for folds or custom cross-validator |\n",
"| max_depth | int | Maximum pipeline depth |\n",
"| other_objective_functions | list | Additional objective functions; default: [average_path_length_objective] |\n",
"| other_objective_functions_weights | list | Weights for additional objective functions; default: [-1] |\n",
"| objective_function_names | list | Names for objective functions; default: None (uses function names) |\n",
"| bigger_is_better | bool | Optimization direction: True for maximize, False for minimize |\n",
"| generations | int | Number of optimization generations; default: 50 |\n",
"| max_time_mins | float | Maximum optimization time (minutes); default: infinite |\n",
"| max_eval_time_mins | float | Maximum evaluation time per individual (minutes); default: 300 |\n",
"| n_jobs | int | Number of parallel processes; default: 1 |\n",
"| memory_limit | str | Memory limit per job; default: \"4GB\" |\n",
"| verbose | int | Optimization process verbosity: 0 (none), 1 (progress), 3 (best individual), 4 (warnings), 5+ (full warnings) |\n",
"| memory | str, memory object | If supplied, pipeline will cache each transformer after calling fit with joblib.Memory. |\n",
"| periodic_checkpoint_folder | str | Folder to save the population to periodically. If None, no periodic saving will be done. If provided, training will resume from this checkpoint.|\n",
2024-09-20 14:48:56 -07:00
" \n"
]
},
2024-09-20 19:44:59 -07:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
2024-09-23 09:02:43 -07:00
"# Pipeline caching in TPOT (joblib.Memory)\n",
"\n",
"With the memory parameter, pipelines can cache the results of each transformer after fitting them. This feature is used to avoid repeated computation by transformers within a pipeline if the parameters and input data are identical to another fitted pipeline during optimization process. TPOT allows users to specify a custom directory path or joblib.Memory in case they want to re-use the memory cache in future TPOT runs (or a warm_start run).\n",
"\n",
"There are three methods for enabling memory caching in TPOT:"
]
},
{
"cell_type": "code",
2024-09-23 19:45:04 -07:00
"execution_count": 22,
2024-09-23 09:02:43 -07:00
"metadata": {},
"outputs": [],
"source": [
"from tpot2 import TPOTClassifier\n",
"from tempfile import mkdtemp\n",
"from joblib import Memory\n",
"from shutil import rmtree\n",
"\n",
"# Method 1, auto mode: TPOT uses memory caching with a temporary directory and cleans it up upon shutdown\n",
"est = TPOTClassifier(memory='auto')\n",
"\n",
"# Method 2, with a custom directory for memory caching\n",
"est = TPOTClassifier(memory='/to/your/path')\n",
"\n",
"# Method 3, with a Memory object\n",
2024-09-23 19:45:04 -07:00
"memory = Memory(location='./to/your/path', verbose=0)\n",
"est = TPOTClassifier(memory=memory)\n"
2024-09-23 09:02:43 -07:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note: TPOT does NOT clean up memory caches if users set a custom directory path or Memory object. We recommend that you clean up the memory caches when you don't need it anymore.**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Checkpointing\n",
2024-09-23 09:02:43 -07:00
"\n",
"TPOT can checkpoint its progress to disk and resume from that point later if the `periodic_checkpoint_folder` parameter is used. TPOT will save its internal dataframe of pipelines and their performance to disk every generation, allowing you to interrupt TPOTs execution and resume it later on the same or a different machine.\n",
"\n",
"This feature is useful in several scenarios:\n",
"\n",
"Interrupting TPOTs execution and resuming it later on the same or a different machine.\n",
"Handling unexpected terminations, such as power outages, cluster job cancellations, bugs, errors, or out-of-memory issues. The checkpointed dataframe can be loaded and inspected to help diagnose problems.\n",
"Running TPOT on a cluster and periodically saving its progress to disk.\n",
"\n",
"**Note: TPOT does not clean up the checkpoint files. If the `periodic_checkpoint_folder` parameter is set, it will always continue training from the last saved point, even if the input data has changed. A common issue is forgetting to change this folder between experiments, and TPOT continuing training from pipelines optimized for another dataset. If you intend to start a run from scratch, you must either remove the parameter, supply an empty folder, or delete the original checkpoint folder.**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Parallelization\n",
"\n",
"See Tutorial 7 for more details on parallelization with Dask, including information of using multiple nodes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# FAQ and Debugging\n",
2024-09-20 19:44:59 -07:00
"\n",
"If you are experiencing issues with TPOT, here are some common issues and how to address them.\n",
"\n",
2024-09-23 09:02:43 -07:00
"* Performance is lower than expected. what can I do?\n",
2024-09-20 19:44:59 -07:00
" * TPOT may have to be run for a longer duration, increase `max_time_mins`, `early_stop`, or `generations`.\n",
" * Individual pipelines may need more time to complete fitting, increase `max_eval_time_seconds`.\n",
" * The configuration may not include the optimal model types or hyperparameter ranges, explore other included templates or customize your own search space (see Tutorial 2!)\n",
2024-09-23 09:02:43 -07:00
" * Check that `periodic_checkpoint_folder` is set correctly. A common issue is forgetting to change this folder between experiments, and TPOT continuing training from pipelines optimized for another dataset.\n",
2024-09-20 20:13:51 -07:00
"* TPOT is too slow! It is running forever and never terminating\n",
2024-09-20 19:44:59 -07:00
" * Check that at least one of the three termination conditions is set to a reasonable level. These are `max_time_mins`, `early_stop`, or `generations`. Additionally check that `max_eval_time_seconds` is giving enough time for most models to train without being overly long. (Some estimators may take an unreasonably long time to fit, this parameter is intended to prevent them from slowing everything to a halt. In my experience, SVC and SVR tend to be the culprits, so removing them from the search space may also improve run time).\n",
2024-09-20 20:13:51 -07:00
" * Set the `memory` parameter to allow TPOT to prevent repeated work when using either scikit-learn pipelines or TPOT GraphPipelines.\n",
2024-09-20 20:25:42 -07:00
" * Increase n_jobs to use more processes/CPU power. See Tutorial 7 for advanced Dask usage, including parallelizing across multiple nodes on an HPC.\n",
" * Use feature selection, either the build in configuration of sklearn methods (see Tutorial 2), or genetic feature selection (see Tutorials 3 and 5 for two different strategies).\n",
" * Use successive halving to reduce computational load (See tutorial 8).\n",
2024-09-20 19:44:59 -07:00
"* Many pipelines in the evaluated_individuals dataframe have crashed or turned up invalid!\n",
" * This may actually be normal and is expected behavior for TPOT. In some cases, TPOT may attempt an invalid hyperparameter combination which results in the pipeline not working. Other times, the pipeline configuration itself may be invalid. For example, a selector may not select any features due to its hyperparameter. Another common example is `MultinomialNB` throwing an error because it expects positive values, but a prior transformation yielded a negative value. \n",
" * If you used custom search spaces, you can use `ConfigSpace` conditionals to prevent invalid hyperparameters (this may still occur due to how TPOT uses crossover).\n",
2024-09-20 19:54:21 -07:00
" * Setting `verbose=5` will print out the full error message for all failed pipelines. This can be useful for debugging whether or not there is something misconfigured in your pipeline, custom search space modules, or something else.\n",
2024-09-20 20:13:51 -07:00
"* TPOT is crashing due to memory issues\n",
" * Set the `memory_limit` parameter so that n_jobs*memorylimit is less than the available RAM on your machine plus some wiggle room. This should prevent crashing due to memory concerns.\n",
2024-09-20 20:25:42 -07:00
" * Using feature selection may also improve memory usage as described above.\n",
" * Remove modules that create high RAM usage (e.g. multiple PolynomialFeatures or one with high degree).\n",
2024-09-23 09:02:43 -07:00
"* Why are my TPOT runs not reproducible when random_state is set?\n",
" * Check that `periodic_checkpoint_folder` is set correctly. If this is set to a non-empty folder, TPOT will continue training from the checkpoint rather than start a new run from scratch. For TPOT runs to be reproducible, they have to have the same starting points.\n",
" * If using custom search spaces, make sure to pass in a fixed `random_state` value into the configspace of the scikit-learn modules that utilize them. TPOT does not check whether estimators do or do not take in a random state value (See Tutorial 2).\n",
" * If using the pre-built search spaces provided by TPOT, make sure to pass in `random_state` to `tpot2.config.get_configspace` or `tpot2.config.template_search_spaces.get_template_search_spaces`. This ensures all estimators that support it get a fixed random_state value. (See Tutorial 2).\n",
" * If using custom Node and Pipeline types, make sure that all random decisions utilize the rng parameter passed into the mutation/crossover functions.\n",
" * If `max_eval_time_mins` is set, TPOT will terminate pipelines that go over this time limit. If the pipeline evaluation happens to be very similar to the time limit, its possible that small random fluctuations in CPU allocation may cause a give pipeline to happen to be evaluated in one run but not another. This slightly different result would throw off the random number generator thoughout the rest of the run. Setting `max_eval_time_mins` to None or a higher value may prevent this edge case.\n",
" * If using `TPOTEstimatorSteadyState` with `n_jobs`>1, it is also possible that random fluctuations in CPU allocation slightly change the order in which pipelines are evaluated, which will affect the downstream results. `TPOTEstimatorSteadyState` is more reliably reproducible when `n_jobs=1` (This is not an issue for the default `TPOTEstimator`, `TPOTClassifier`, `TPOTRegressor` as they used a batched generational approach where execution order does not impact results).\n",
"* TPOT is not using all the CPU cores I expected given my `n_jobs` setting.\n",
" * The default TPOT algorithm uses a generational approach. This means the TPOT will need to fully evaluated `population_size` (default 50) pipelines before starting the next batch. Often, TPOT will be waiting for the last few pipelines to finish evaluating, which could be less than `n_jobs`. Some estimators or pipelines can be significantly slower to evaluated than others. This can be addressed in a few ways:\n",
" * Decrease `max_eval_time_mins` to cut long running pipeline evaluations early.\n",
" * Remove estimators or hyperparameter configurations that are prone to very slow convergence (which is very often `SVC` or `SVR`).\n",
" * Alternatively, `TPOTEstimatorSteadyState` uses a slightly different backend for the evolutionary algorithm that does not utilize the generational approach. Instead, new pipelines are generated and evaluated as soon as the previous one finishes. With this estimator, all cores should be utilized at all times. \n",
"\n",
"\n",
2024-09-20 19:54:21 -07:00
"\n",
"## Other things to be aware of:\n",
2024-09-20 19:54:21 -07:00
"\n",
2024-09-23 09:02:43 -07:00
"* **Overfitting** On small datasets, it is not impossible for TPOT to over fit the cross validation score itself. This can lead to lower than expected performance on held out datasets. TPOT will always return the model with the highest CV score as its final fitted_pipeline. However, if the highest performing model as evaluated by cross validation actually was just overfit to the CV score, it may actually be worse performing compared to other models on the pareto front.\n",
2024-09-20 19:54:21 -07:00
" * Using a secondary complexity objective and evaluating the entire pareto front may be beneficial. In some cases a lower performing pipeline with lower complexity can actually perform better on held out sets. These can either be evaluated and compared on a held out validation set, or sometimes, if very data limited, simply using a different seed of splitting the CV folds can work as well.\n",
" * TPOT can do this automatically. The `validation_strategy` parameter than select between re-testing the final pareto front on either a held out validation set (percent of data set by `validation_fraction`) or on a different seed for splitting the CV folds. These can be selected by setting `validation_strategy` to \"split\" or \"reshuffled\", respectively.\n",
" * Increasing the number of folds of cross validation can mitigate this. \n",
" * Nested cross validation can also be used to estimate the performance of the TPOT optimization algorithm itself.\n",
" * Removing more complex methods from the search space can reduce the changes of overfitting"
2024-09-20 19:44:59 -07:00
]
},
2024-09-20 14:48:56 -07:00
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# More Options\n",
"\n",
"`tpot2.TPOTClassifier` and `tpot2.TPOTRegressor` have a simplified set of hyperparameters with default values set for classification and regression problems. Currently, both of these use the standard evolutionary algorithm in the `tpot2.TPOTEstimator` class. If you want more control you can look into either the `tpot2.TPOTEstimator` or `tpot2.TPOTEstimatorSteadyState` class.\n",
"\n",
"There are two evolutionary algorithms built into TPOT2, which corresponds to two different estimator classes.\n",
"\n",
"1. The `tpot2.TPOTEstimator` uses a standard evolutionary algorithm that evaluates exactly population_size individuals each generation. This is similar to the algorithm in TPOT1. The next generation does not start until the previous is completely finished evaluating. This leads to underutilized CPU time as the cores are waiting for the last individuals to finish training, but may preserve diversity in the population. \n",
"\n",
"2. The `tpot2.TPOTEstimatorSteadyState` differs in that it will generate and evaluate the next individual as soon as an individual finishes evaluation. The number of individuals being evaluated is determined by the n_jobs parameter. There is no longer a concept of generations. The population_size parameter now refers to the size of the list of evaluated parents. When an individual is evaluated, the selection method updates the list of parents. This allows more efficient utilization when using multiple cores.\n"
]
},
2024-09-23 19:45:04 -07:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### tpot2.TPOTEstimatorSteadyState"
]
},
2024-09-20 14:48:56 -07:00
{
"cell_type": "code",
2024-09-23 19:45:04 -07:00
"execution_count": 27,
2024-09-20 14:48:56 -07:00
"metadata": {},
2024-09-23 19:45:04 -07:00
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Evaluations: : 21it [00:13, 1.61it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.9786392405063291\n"
]
}
],
2024-09-20 14:48:56 -07:00
"source": [
"import tpot2\n",
"import sklearn\n",
"import sklearn.datasets\n",
"\n",
"\n",
2024-09-23 19:45:04 -07:00
"graph_search_space = tpot2.search_spaces.pipelines.GraphSearchPipeline(\n",
" root_search_space= tpot2.config.get_search_space([\"KNeighborsClassifier\", \"LogisticRegression\", \"DecisionTreeClassifier\"]),\n",
" leaf_search_space = tpot2.config.get_search_space(\"selectors\"), \n",
" inner_search_space = tpot2.config.get_search_space([\"transformers\"]),\n",
" max_size = 10,\n",
")\n",
2024-09-20 14:48:56 -07:00
"\n",
"est = tpot2.TPOTEstimatorSteadyState( \n",
" search_space = graph_search_space,\n",
" scorers=['roc_auc_ovr',tpot2.objectives.complexity_scorer],\n",
" scorers_weights=[1,-1],\n",
"\n",
2024-09-23 19:45:04 -07:00
"\n",
2024-09-20 14:48:56 -07:00
" classification=True,\n",
"\n",
" max_eval_time_mins=15,\n",
" max_time_mins=30,\n",
2024-09-23 19:45:04 -07:00
" early_stop=10, #In TPOTEstimatorSteadyState, since there are no generations, early_stop is the number of pipelines to evaluate before stopping.\n",
2024-09-20 14:48:56 -07:00
" verbose=2)\n",
"\n",
"\n",
"scorer = sklearn.metrics.get_scorer('roc_auc_ovo')\n",
2024-09-23 19:45:04 -07:00
"X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)\n",
2024-09-20 14:48:56 -07:00
"X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, train_size=0.75, test_size=0.25)\n",
"est.fit(X_train, y_train)\n",
2024-09-23 19:45:04 -07:00
"print(scorer(est, X_test, y_test))"
2024-09-20 14:48:56 -07:00
]
},
{
"cell_type": "code",
2024-09-23 19:45:04 -07:00
"execution_count": 28,
2024-09-20 14:48:56 -07:00
"metadata": {},
2024-09-23 19:45:04 -07:00
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAnYAAAHWCAYAAAD6oMSKAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpAAAbKUlEQVR4nO3dfZBddZ3n8U933zx1SCdpWjohIZGBJICQhGQUJcA4gsPAjhGKGkVmrNV1ZnVndlYdy9LCmVlGXdwS1LLWGtHxadhRdFQsIj5QCssogUENJEiIECEm6Tw0STok5Dn9sH8QIiE8JJCQ9NfXq4oq7ul77vmde6n6vTn3nnOaBgYGBgIAwKDXfKQHAADAoSHsAACKEHYAAEUIOwCAIoQdAEARwg4AoAhhBwBQhLADAChC2AEAFCHsAACKEHYAAEUIOwCAIoQdAEARwg4AoAhhBwBQhLADAChC2AEAFCHsAACKEHYAAEUIOwCAIoQdAEARwg4AoAhhBwBQhLADAChC2AEAFCHsAACKEHYAAEUIOwCAIoQdAEARwg4AoAhhBwBQhLADAChC2AEAFCHsAACKEHYAAEUIOwCAIoQdAEARwg4AoAhhBwBQhLADAChC2AEAFCHsAACKEHYAAEUIOwCAIoQdAEARwg4AoAhhBwBQhLADAChC2AEAFCHsAACKEHYAAEUIOwCAIoQdAEARwg4AoAhhBwBQhLADAChC2AEAFCHsAACKEHYAAEUIOwCAIoQdAEARwg4AoAhhBwBQhLADAChC2AEAFCHsAACKEHYAAEUIOwCAIoQdAEARwg4AoAhhBwBQhLADAChC2AEAFCHsAACKEHYAAEUIOwCAIoQdAEARwg4AoAhhBwBQhLADAChC2AEAFCHsAACKEHYAAEUIOwCAIoQdAEARwg4AoAhhBwBQhLADAChC2AEAFCHsAACKEHYAAEUIOwCAIoQdAEARwg4AoAhhBwBQhLADAChC2AEAFCHsAACKEHYAAEUIOwCAIoQdAEARwg4AoAhhBwBQhLADAChC2AEAFCHsAACKEHYAAEUIOwCAIoQdAEARwg4AoAhhBwBQhLADAChC2AEAFCHsAACKEHYAAEUIOwCAIoQdAEARwg4AoAhhBwBQhLADAChC2AEAFCHsAACKEHYAAEUIOwCAIoQdAEARwg4AoAhhBwBQhLADAChC2AEAFCHsAACKEHYAAEUIOwCAIoQdAEARwg4AoAhhBwBQhLADAChC2AEAFCHsAACKEHYAAEUIOwCAIoQdAEARwg4AoAhhBwBQhLADAChC2AEAFCHsAACKEHYAAEUIOwCAIoQdAEARwg4AoAhhBwBQhLADAChC2AEAFCHsAACKEHYAAEUIOwCAIoQdAEARwg4AoAhhBwBQhLADAChC2AEAFCHsAACKEHYAAEUIOwCAIoQdAEARwg4AoAhhBwBQhLADAChC2AEAFCHsAACKEHYAAEUIOwCAIoQdAEARwg4AoAhhBwBQhLADAChC2AEAFCHsAACKEHYAAEUIOwCAIoQdAEARwg4AoAhhBwBQhLADAChC2AEAFCHsAACKEHYAAEUIOwCAIoQdAEARwg4AoAhhBwBQhLADAChC2AEAFCHsAACKEHYAAEUIOwCAIoQdAEARwg4AoAhhBwBQhLADAChC2AEAFCHsAACKEHYAAEUIOwCAIoQdAEARwg4AoAhhBwBQhLADAChC2AEAFCHsAACKEHYAAEUIOwCAIoQdAEARwg4AoIjGkR7AS6Gvry89PT3p7u5Od3d31q1dm53bt6e/ry/NLS0ZNmJEXjZuXDo7O9PZ2Zn29va0tLQc6WEDAM/B/L6/poGBgYEjPYjDZePGjVm0aFF+ec892bF1awZ6e3PM9u0Z3dOTIb29aR4YSH9TU3Y3GtnU3p4tI0akqdHI8JEjc8asWZkxY0bGjh17pHcDAHgK8/uzKxl2q1evzp133JFlS5dmyLZtmbRiZcb39GT01q0Z0tf3rOvtbmnJppEjs6a9PSsmnZDdra05ccqUzDn33IwfP/4l3AMA4OnM78+vVNj19vZm/vz5+fn8+Tlm/fqcvHxFJq5fn5b+/oN+rb7m5nR1dOTXkydlS0dHXjlnTubMmZNG43fi22sAOGqY3w9cmbBbu3ZtvjdvXjZ2rcopS5dmyqpVaT4Eu9bf1JSlEybkV1OmpH3ihFw8d27GjRt3CEYMADwf8/vBKRF2y5cvz3e+8Y20rl6T2UuWpG3btkO+jc2trVlw6qnZdvzxufTNb8rkyZMP+TYAgN8yvx+8QR92y5cvz7dvuCHHLl+RVz3wQBov4LDsgeptbs7drzgtPZMm5bK3vGXQf/gAcLQyv78wg/o6dmvXrs13vvGNtC9fkVcvXnxYP/QkafT35zX3L077ihX5zjf+LWvXrj2s2wOA30Xm9xdu0IZdb29vvjdvXlpXr8lZDzxwSL5vPxDNAwM5a/EDGbFmdb4/b156e3tfku0CwO8C8/uLM2jDbv78+dnYtSqzlyw57CX/dI3+/sx+YEl6Vq3KnXfe+ZJuGwAqM7+/OIMy7FavXp2fz5+fU5YuPSw/pDwQo7dty7SHluZnd9yRNWvWHJExAEAl5vcXb1CG3Z133JFj1q/PlFWrjug4pq5alWPWr8/8O+44ouMAgArM7y/eoAu7jRs3ZtnSpTl5+YqX7Hv3Z9M8MJCTlq/IsoceysaNG4/oWABgMDO/HxqDLuwWLVqUIdu2ZeL69Ud6KEmSE9avT2Pbttx3331HeigAMGiZ3w+NQRV2fX19+eU992TSipUv6DYih0NLf38mr1yZ+xYsSN9z3KcOAHhm5vdD56DC7vrrr8/s2bOzadOmvO1tb8uJJ56493Tg+++/P6997Wufc/158+blU5/61HM+56qrrspnPvOZ/ZbffvvtueSSS7Jj69aM7+k5mGE/pzU7d+a/PfBAzv/Fz3PRgl/kfQ/+Kpt6d+fG7u7872WPHNBrjN/Qkx1bt+bmm2/OG9/4xpx99tmZN2/e3r8/db+XLFmSGTNmZObMmbn77rvz/ve//0Xvw80335zTTz89zc3Nuf/++1/06wHAofLhD384p59+es4444z8/u//fpYtW7bfc3p6nphH/8s3vv6CtvHPXSv3eXzqHT/N3Hvv2fvPrhcYi0/O7z1P646jed494Dve3njjjbnmmmty2223ZfTo0UmeuNbMDTfckLe+9a0H9Bpz5859YaPcY+fOnRno7c2YLVsOar2+gYG0NDXtt3xgYCB/veSBvHX88fnsaaclSe7YuDGbDvLaNSN6erJ9y5Z86EMfyuLFi5Mkl112WdasWZOOjo599vumm27KW97ylnzwgx9Mkpx11lkHvh99fWlpadlv+bRp0/Ktb30r73rXuw5q3ABwON155525/fbbs3DhwjQajXR1dWXkyJH7Pa+7uzsDvb1peoG/rfvnrq785cQT9j4e1Whk3pmzXvC4nzR669YM9Pamu7s7L3vZy/YuP5rn3QMOuyuvvDK33nrrPjv23ve+N9dcc03+/M//fJ/n9vb25n3ve1/uuuuu7Nq1K1dddVUuueSSfOUrX8n999+fa6+9Ng899FCuuOKKNBqNzJkzJ//+7/+eX/ziF0mShQsX5rzzzktXV1euvvrqXH755UmSdevW5f9+9av53OrVubCjI++d/PIkyee7VuamRx9NU5L/OvGEzD3uuNz92GO5rmtl2hqNrNu1K5+adkre/atfZWtfX5KBXDPtlGzYvSsjW1pyaWfn3rGfM3ZskuQXmzbvXfbjDRty3cqV2TXQnwnDhuXaaadkZEtLvrN6Vf6pqyuNJEPXrMn4k35vb9j19vbm1a9+de666658/etfz5IlS/K6170un/zkJ9NoNHL77bfnXe96V774xS/mS1/6UrZs2ZIPfOADWbp0aQYGBvLRj340Z511Vj7+8Y/n0UcfzbJlyzJt2rRcffXV+302Y8aMSZLs3r07PT09Wbdu3YF+rABw2Dz44INpbW3dewLCsGHD9h4U+sQnPpEdO3Zk1qxZueyyyzJy6xOXN+nfc3Ttuq6V+fGGnuwa6M8V44/PFeP
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
2024-09-20 14:48:56 -07:00
"source": [
"fitted_pipeline = est.fitted_pipeline_ # access best pipeline directly\n",
2024-09-23 19:45:04 -07:00
"fitted_pipeline.plot()"
2024-09-20 14:48:56 -07:00
]
},
{
"cell_type": "code",
2024-09-23 19:45:04 -07:00
"execution_count": 30,
2024-09-20 14:48:56 -07:00
"metadata": {},
2024-09-23 19:45:04 -07:00
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>roc_auc_score</th>\n",
" <th>complexity_scorer</th>\n",
" <th>Parents</th>\n",
" <th>Variation_Function</th>\n",
" <th>Individual</th>\n",
" <th>Submitted Timestamp</th>\n",
" <th>Completed Timestamp</th>\n",
" <th>Eval Error</th>\n",
" <th>Pareto_Front</th>\n",
" <th>Instance</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.474805</td>\n",
" <td>20.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>&lt;tpot2.search_spaces.pipelines.graph.GraphPipe...</td>\n",
" <td>1.727144e+09</td>\n",
" <td>1.727144e+09</td>\n",
" <td>None</td>\n",
" <td>NaN</td>\n",
" <td>[('LogisticRegression_1', 'FastICA_1'), ('Fast...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.962983</td>\n",
" <td>78.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>&lt;tpot2.search_spaces.pipelines.graph.GraphPipe...</td>\n",
" <td>1.727144e+09</td>\n",
" <td>1.727144e+09</td>\n",
" <td>None</td>\n",
" <td>NaN</td>\n",
" <td>[('DecisionTreeClassifier_1', 'PCA_1'), ('Quan...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.962310</td>\n",
" <td>57.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>&lt;tpot2.search_spaces.pipelines.graph.GraphPipe...</td>\n",
" <td>1.727144e+09</td>\n",
" <td>1.727144e+09</td>\n",
" <td>None</td>\n",
" <td>NaN</td>\n",
" <td>[('DecisionTreeClassifier_1', 'SelectFwe_1')]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.956908</td>\n",
" <td>66.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>&lt;tpot2.search_spaces.pipelines.graph.GraphPipe...</td>\n",
" <td>1.727144e+09</td>\n",
" <td>1.727144e+09</td>\n",
" <td>None</td>\n",
" <td>NaN</td>\n",
" <td>[('DecisionTreeClassifier_1', 'SelectFwe_2'), ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0.879195</td>\n",
" <td>15.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>&lt;tpot2.search_spaces.pipelines.graph.GraphPipe...</td>\n",
" <td>1.727144e+09</td>\n",
" <td>1.727144e+09</td>\n",
" <td>None</td>\n",
" <td>NaN</td>\n",
" <td>[('DecisionTreeClassifier_1', 'SelectFwe_1')]</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" roc_auc_score complexity_scorer Parents Variation_Function \\\n",
"0 0.474805 20.0 NaN NaN \n",
"1 0.962983 78.0 NaN NaN \n",
"2 0.962310 57.0 NaN NaN \n",
"3 0.956908 66.0 NaN NaN \n",
"4 0.879195 15.0 NaN NaN \n",
"\n",
" Individual Submitted Timestamp \\\n",
"0 <tpot2.search_spaces.pipelines.graph.GraphPipe... 1.727144e+09 \n",
"1 <tpot2.search_spaces.pipelines.graph.GraphPipe... 1.727144e+09 \n",
"2 <tpot2.search_spaces.pipelines.graph.GraphPipe... 1.727144e+09 \n",
"3 <tpot2.search_spaces.pipelines.graph.GraphPipe... 1.727144e+09 \n",
"4 <tpot2.search_spaces.pipelines.graph.GraphPipe... 1.727144e+09 \n",
"\n",
" Completed Timestamp Eval Error Pareto_Front \\\n",
"0 1.727144e+09 None NaN \n",
"1 1.727144e+09 None NaN \n",
"2 1.727144e+09 None NaN \n",
"3 1.727144e+09 None NaN \n",
"4 1.727144e+09 None NaN \n",
"\n",
" Instance \n",
"0 [('LogisticRegression_1', 'FastICA_1'), ('Fast... \n",
"1 [('DecisionTreeClassifier_1', 'PCA_1'), ('Quan... \n",
"2 [('DecisionTreeClassifier_1', 'SelectFwe_1')] \n",
"3 [('DecisionTreeClassifier_1', 'SelectFwe_2'), ... \n",
"4 [('DecisionTreeClassifier_1', 'SelectFwe_1')] "
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
2024-09-20 14:48:56 -07:00
"source": [
2024-09-23 19:45:04 -07:00
"#view the summary of all evaluated individuals as a pandas dataframe\n",
"est.evaluated_individuals.head()"
2024-09-20 14:48:56 -07:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### tpot2.TPOTEstimator"
]
},
{
"cell_type": "code",
2024-09-23 19:45:04 -07:00
"execution_count": 31,
2024-09-20 14:48:56 -07:00
"metadata": {},
2024-09-23 19:45:04 -07:00
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Generation: 100%|██████████| 5/5 [01:43<00:00, 20.78s/it]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.9944903581267218\n"
]
}
],
2024-09-20 14:48:56 -07:00
"source": [
"import tpot2\n",
"import sklearn\n",
"import sklearn.datasets\n",
"\n",
"est = tpot2.TPOTEstimator( \n",
" search_space = graph_search_space,\n",
" population_size=30,\n",
" generations=5,\n",
" scorers=['roc_auc_ovr'], #scorers can be a list of strings or a list of scorers. These get evaluated during cross validation. \n",
" scorers_weights=[1],\n",
" classification=True,\n",
" n_jobs=1, \n",
" early_stop=5, #how many generations with no improvement to stop after\n",
" \n",
" #List of other objective functions. All objective functions take in an untrained GraphPipeline and return a score or a list of scores\n",
" other_objective_functions= [ ],\n",
" \n",
" #List of weights for the other objective functions. Must be the same length as other_objective_functions. By default, bigger is better is set to True. \n",
" other_objective_functions_weights=[],\n",
" verbose=2)\n",
"\n",
"scorer = sklearn.metrics.get_scorer('roc_auc_ovo')\n",
2024-09-23 19:45:04 -07:00
"X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)\n",
2024-09-20 14:48:56 -07:00
"X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, train_size=0.75, test_size=0.25)\n",
"est.fit(X_train, y_train)\n",
"print(scorer(est, X_test, y_test))"
]
2024-09-23 19:45:04 -07:00
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Regression Example"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import tpot2\n",
"import sklearn\n",
"import sklearn.metrics\n",
"import sklearn.datasets\n",
"\n",
"scorer = sklearn.metrics.get_scorer('neg_mean_squared_error')\n",
"X, y = sklearn.datasets.load_diabetes(return_X_y=True)\n",
"X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, train_size=0.75, test_size=0.25)\n",
"\n",
"est = tpot2.tpot_estimator.templates.TPOTRegressor(n_jobs=4, max_time_mins=30, verbose=2, cv=5)\n",
"est.fit(X_train, y_train)\n",
"\n",
"print(scorer(est, X_test, y_test))"
]
2024-09-20 14:48:56 -07:00
}
],
"metadata": {
"kernelspec": {
"display_name": "tpot_dev",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "7fe1fe9ef32cd5efd76326a08046147513534f0dd2318301a1a96ae9071c1c4e"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}