SIGN IN SIGN UP
EpistasisLab / tpot UNCLAIMED

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

10048 0 1 Jupyter Notebook
2024-09-20 14:48:56 -07:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# What to expect from AutoML software\n",
"Automated machine learning (AutoML) takes a higher-level approach to machine learning than most practitioners are used to, so we've gathered a handful of guidelines on what to expect when running AutoML software such as TPOT.\n",
"\n",
"#### AUTOML ALGORITHMS AREN'T INTENDED TO RUN FOR ONLY A FEW MINUTES\n",
"Of course, you can run TPOT for only a few minutes, and it will find a reasonably good pipeline for your dataset. However, if you don't run TPOT for long enough, it may not find the best possible pipeline for your dataset. It may not even find any suitable pipeline at all, in which case a RuntimeError('A pipeline has not yet been optimized. Please call fit() first.') will be raised. Often it is worthwhile to run multiple instances of TPOT in parallel for a long time (hours to days) to allow TPOT to thoroughly search the pipeline space for your dataset.\n",
2024-09-20 14:48:56 -07:00
"\n",
"#### AUTOML ALGORITHMS CAN TAKE A LONG TIME TO FINISH THEIR SEARCH\n",
"AutoML algorithms aren't as simple as fitting one model on the dataset; they consider multiple machine learning algorithms (random forests, linear models, SVMs, etc.) in a pipeline with multiple preprocessing steps (missing value imputation, scaling, PCA, feature selection, etc.), the hyperparameters for all of the models and preprocessing steps, and multiple ways to ensemble or stack the algorithms within the pipeline.\n",
2024-09-20 14:48:56 -07:00
"\n",
"As such, TPOT will take a while to run on larger datasets, but it's important to realize why. With the default TPOT settings (100 generations with 100 population size), TPOT will evaluate 10,000 pipeline configurations before finishing. To put this number into context, think about a grid search of 10,000 hyperparameter combinations for a machine learning algorithm and how long that grid search will take. That is 10,000 model configurations to evaluate with 10-fold cross-validation, which means that roughly 100,000 models are fit and evaluated on the training data in one grid search. That's a time-consuming procedure, even for simpler models like decision trees.\n",
"\n",
"Typical TPOT runs will take hours to days to finish (unless it's a small dataset), but you can always interrupt the run partway through and see the best results so far. TPOT also provides a warm_start and a periodic_checkpoint_folder parameter that lets you restart a TPOT run from where it left off.\n",
2024-09-20 14:48:56 -07:00
"\n",
"#### AUTOML ALGORITHMS CAN RECOMMEND DIFFERENT SOLUTIONS FOR THE SAME DATASET\n",
"If you're working with a reasonably complex dataset or run TPOT for a short amount of time, different TPOT runs may result in different pipeline recommendations. TPOT's optimization algorithm is stochastic, which means that it uses randomness (in part) to search the possible pipeline space. When two TPOT runs recommend different pipelines, this means that the TPOT runs didn't converge due to lack of time or that multiple pipelines perform more-or-less the same on your dataset.\n",
2024-09-20 14:48:56 -07:00
"\n",
"This is actually an advantage over fixed grid search techniques: TPOT is meant to be an assistant that gives you ideas on how to solve a particular machine learning problem by exploring pipeline configurations that you might have never considered, then leaves the fine-tuning to more constrained parameter tuning techniques such as grid search or bayesian optimization."
2024-09-20 14:48:56 -07:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# TPOT with code\n",
"\n",
"We've designed the TPOT interface to be as similar as possible to scikit-learn.\n",
2024-09-20 14:48:56 -07:00
"\n",
"TPOT can be imported just like any regular Python module. To import TPOT, type:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
2025-02-21 15:56:46 -08:00
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Matplotlib is building the font cache; this may take a moment.\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n"
]
}
],
2024-09-20 14:48:56 -07:00
"source": [
2024-12-23 11:11:12 -08:00
"import tpot\n",
"from tpot import TPOTClassifier"
2024-09-20 14:48:56 -07:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"then create an instance of TPOT as follows:"
]
},
{
"cell_type": "code",
2024-09-23 19:45:04 -07:00
"execution_count": 2,
2024-09-20 14:48:56 -07:00
"metadata": {},
"outputs": [],
"source": [
"classification_optimizer = TPOTClassifier()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It's also possible to use TPOT for regression problems with the TPOTRegressor class. Other than the class name, a TPOTRegressor is used the same way as a TPOTClassifier. You can read more about the TPOTClassifier and TPOTRegressor classes in the API documentation."
]
},
{
"cell_type": "code",
2024-09-23 19:45:04 -07:00
"execution_count": 3,
2024-09-20 14:48:56 -07:00
"metadata": {},
"outputs": [],
"source": [
2024-12-23 11:11:12 -08:00
"from tpot import TPOTRegressor\n",
2024-09-20 14:48:56 -07:00
"regression_optimizer = TPOTRegressor()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Fitting a TPOT model works exactly like any other sklearn estimator. Some example code with custom TPOT parameters might look like:"
]
},
{
"cell_type": "code",
"execution_count": 4,
2024-09-20 14:48:56 -07:00
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
2025-02-21 15:56:46 -08:00
"Generation: : 5it [00:32, 6.57s/it]\n"
2024-09-20 14:48:56 -07:00
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
2025-02-21 15:56:46 -08:00
"auroc_score: 0.9950396825396826\n"
2024-09-20 14:48:56 -07:00
]
}
],
"source": [
"import sklearn\n",
"import sklearn.datasets\n",
"import sklearn.metrics\n",
2024-12-23 11:11:12 -08:00
"import tpot\n",
2024-09-20 14:48:56 -07:00
"\n",
2024-09-23 19:45:04 -07:00
"classification_optimizer = TPOTClassifier(search_space=\"linear-light\", max_time_mins=30/60, n_jobs=30, cv=5)\n",
2024-09-20 14:48:56 -07:00
"\n",
"X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)\n",
"X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, random_state=1, test_size=0.2)\n",
"\n",
"classification_optimizer.fit(X_train, y_train)\n",
"\n",
"auroc_score = sklearn.metrics.roc_auc_score(y_test, classification_optimizer.predict_proba(X_test)[:,1])\n",
"print(\"auroc_score: \", auroc_score)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Scorers, Objective Functions, and multi objective optimization.\n",
"\n",
2024-12-23 11:11:12 -08:00
"There are two ways of passing objectives into TPOT. \n",
2024-09-20 14:48:56 -07:00
"\n",
2024-12-23 11:11:12 -08:00
"1. `scorers`: Scorers are functions that have the signature (estimator, X_test, y_test) and take in estimators that are expected to be fitted to training data. These can be produced with the [sklearn.metrics.make_scorer](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html) function. This function is used to evaluate the test folds during cross validation (defined in the `cv` parameter). These are passed into TPOT via the scorers parameter. This can take in the scorer itself or the string corresponding to a scoring function ([as listed here](https://scikit-learn.org/stable/modules/model_evaluation.html)). TPOT also supports passing in a list of several scorers for multi-objective optimization. For each fold of CV, TPOT only fits the estimator once, then evaluates all provided scorers in a loop.\n",
2024-09-20 14:48:56 -07:00
"\n",
2024-12-23 11:11:12 -08:00
"2. `other_objective_functions` : Other objective functions in TPOT have the signature (estimator) and returns a float or list of floats. These get passed a single unfitted estimator once, outside of cross validation. The user may choose to fit the pipeline within this objective function as well.\n",
2024-09-20 14:48:56 -07:00
"\n",
"\n",
"\n",
2024-12-23 11:11:12 -08:00
"Each scorer and objective function must be accompanied by a list of weights corresponding to the list of objectives, these are `scorers_weights` and `other_objective_function_weights`, respectively. By default, TPOT maximizes objective functions (this can be changed by `bigger_is_better=False`). Positive weights means that TPOT will seek to maximize that objective, and negative weights correspond to minimization. For most selectors (and the default), only the sign matters. The scale of the weight may matter if using a custom selection function for the optimization algorithm. A zero weight means that the score will not have an impact on the selection algorithm.\n",
2024-09-20 14:48:56 -07:00
"\n",
"Here is an example of using two scorers\n",
"\n",
2024-12-23 11:11:12 -08:00
" scorers=['roc_auc_ovr',tpot.objectives.complexity_scorer],\n",
2024-09-20 14:48:56 -07:00
" scorers_weights=[1,-1],\n",
"\n",
"\n",
"Here is an example with a scorer and a secondary objective function\n",
"\n",
" scorers=['roc_auc_ovr'],\n",
" scorers_weights=[1],\n",
2024-12-23 11:11:12 -08:00
" other_objective_functions=[tpot.objectives.number_of_leaves_objective],\n",
2024-09-20 14:48:56 -07:00
" other_objective_functions_weights=[-1],\n",
"\n",
"\n",
2024-09-20 19:44:59 -07:00
"TPOT will always automatically name the scorers based on the function name for the columns in the final results dataframe. TPOT will use the function name as the column name for `other_objective_functions`. However, if you would like to specify custom column names, you can set the `objective_function_names` to be a list of names (str) for each value returned by the function in `other_objective_functions`. This can be useful if your additional functions return more than one value per function.\n",
2024-09-20 14:48:56 -07:00
"\n",
2024-09-20 19:44:59 -07:00
"It is possible to have either the scorer or other_objective_function to return multiple values. In that case, just make sure that the `scorers_weights` and `other_objective_function_weights` are the same length as the number of returned scores.\n",
2024-09-20 14:48:56 -07:00
"\n",
"\n",
"TPOT comes with a few additional built in objective functions you can use. The first table are objectives applied to fitted pipelines, and thus are passee into the `scorers` parameter. The second table are objective functions for the `other_objective_functions` param.\n",
"\n",
"Scorers:\n",
"| Function | Description |\n",
"| :--- | :----: |\n",
2024-12-23 11:11:12 -08:00
"| tpot.objectives.complexity_scorer | Estimates the number of learned parameters across all classifiers and regressors in the pipelines. Additionally, currently transformers add 1 point and selectors add 0 points (since they don't affect the complexity of the \"final\" predictive pipeline.) |\n",
2024-09-20 14:48:56 -07:00
"\n",
"Other Objective Functions.\n",
"\n",
"| Function | Description |\n",
"| :--- | :----: |\n",
2024-12-23 11:11:12 -08:00
"| tpot.objectives.average_path_length | Computes the average shortest path from all nodes to the root/final estimator (only supported for GraphPipeline) |\n",
"| tpot.objectives.number_of_leaves_objective | Calculates the number of leaves (input nodes) in a GraphPipeline |\n",
"| tpot.objectives.number_of_nodes_objective | Calculates the number of nodes in a pipeline (whether it is an scikit-learn Pipeline, GraphPipeline, Feature Union, or the previous nested within each other) |"
2024-09-20 14:48:56 -07:00
]
},
2024-09-20 19:44:59 -07:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Measuring Model Complexity\n",
2024-09-20 19:44:59 -07:00
"\n",
"When running TPOT, including a secondary objective that measures model complexity can sometimes be beneficial. More complex models can yield higher performance, but this comes at the cost of interpretability. Simpler models may be more interpretable but often have lower predictive performance. Sometimes, however, vast increases in complexity only marginally improve predictive performance. There may be other simpler and more interpretable pipelines with marginal performance decreases that could be acceptable for the increased interpretability. However, these pipelines are often missed when optimizing purely for performance. By including both performance and complexity as objective functions, TPOT will attempt to optimize the best pipeline for all complexity levels simultaneously. After optimization, the user will be able to see the complexity vs performance tradeoff and decide which pipeline best suits their needs. \n",
2024-09-20 19:44:59 -07:00
"\n",
2024-12-23 11:11:12 -08:00
"Two methods of measuring complexity to consider would be `tpot.objectives.number_of_nodes_objective` or `tpot.objectives.complexity_scorer`. The number of nodes objective simply calculates the number of steps within a pipeline. This is a simple metric, however it does not differentiate between the complexity of different model types. For example, a simple LogisticRegression counts the same as the much more complex XGBoost. The complexity scorer tries to estimate the number of learned parameters included in the classifiers and regressors of the pipeline. It is challenging and potentially subjective how to exactly quantify and compare complexity between different classes of models. However, this function provides a reasonable heuristic for the evolutionary algorithm that at least separates out qualitatively more or less complex algorithms from one another. While it may be hard to compare the relative complexities of LogisticRegression and XGBoost exactly, for example, both will always be on opposite ends of the complexity values returned by this function. This allows for pareto fronts with LogisticRegression on one side, and XGBoost on the other.\n",
2024-09-20 19:44:59 -07:00
"\n",
"An example of this analysis is demonstrated in a following section."
]
},
2024-09-20 14:48:56 -07:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Built In Configurations\n",
2024-09-25 11:35:14 -07:00
"TPOT can be used to optimize hyperparameters, select models, and optimize pipelines of models including determining the sequence of steps. **Tutorial 2** goes into more detail on how to customize search spaces with custom hyperparameter ranges, model types, and possible pipeline configurations. TPOT also comes with a handful of default operators and parameter configurations that we believe work well for optimizing machine learning pipelines. Below is a list of the current built-in configurations that come with TPOT. These can be passed in as strings to the `search space` parameter of any of the TPOT estimators.\n",
2024-09-20 14:48:56 -07:00
"\n",
"| String | Description |\n",
"| :--- | :----: |\n",
"| linear | A linear pipeline with the structure of \"Selector->(transformers+Passthrough)->(classifiers/regressors+Passthrough)->final classifier/regressor.\" For both the transformer and inner estimator layers, TPOT may choose one or more transformers/classifiers, or it may choose none. The inner classifier/regressor layer is optional. |\n",
2024-09-23 10:43:58 -07:00
"| linear-light | Same search space as linear, but without the inner classifier/regressor layer and with a reduced set of faster running estimators. |\n",
2024-09-20 14:48:56 -07:00
"| graph | TPOT will optimize a pipeline in the shape of a directed acyclic graph. The nodes of the graph can include selectors, scalers, transformers, or classifiers/regressors (inner classifiers/regressors can optionally be not included). This will return a custom GraphPipeline rather than an sklearn Pipeline. More details in Tutorial 6. |\n",
2024-09-23 10:43:58 -07:00
"| graph-light | Same as graph search space, but without the inner classifier/regressors and with a reduced set of faster running estimators. |\n",
2024-09-20 14:48:56 -07:00
"| mdr |TPOT will search over a series of feature selectors and Multifactor Dimensionality Reduction models to find a series of operators that maximize prediction accuracy. The TPOT MDR configuration is specialized for genome-wide association studies (GWAS), and is described in detail online here.\n",
"\n",
"Note that TPOT MDR may be slow to run because the feature selection routines are computationally expensive, especially on large datasets. |\n",
"\n",
2024-12-23 11:11:12 -08:00
"The `linear` and `graph` configurations by default allow for additional stacked classifiers/regressors within the pipeline in addition to the final classifier/regressor. If you would like to disable this, you can manually get the search space without inner classifier/regressors through the function `tpot.config.template_search_spaces.get_template_search_spaces` with `inner_predictios=False`. You can pass the resulting search space into the `search space` param. "
2024-09-28 17:13:16 -07:00
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
2024-12-23 11:11:12 -08:00
"import tpot\n",
"from tpot.search_spaces.pipelines import SequentialPipeline\n",
"from tpot.config import get_search_space\n",
2024-09-28 17:13:16 -07:00
"\n",
"stc_search_space = SequentialPipeline([\n",
" get_search_space(\"selectors\"),\n",
" get_search_space(\"all_transformers\"),\n",
" get_search_space(\"classifiers\"),\n",
"])\n",
"\n",
2024-12-23 11:11:12 -08:00
"est = tpot.TPOTEstimator(\n",
2024-09-28 17:13:16 -07:00
" search_space = stc_search_space,\n",
2024-12-23 11:11:12 -08:00
" scorers=[\"roc_auc_ovr\", tpot.objectives.complexity_scorer],\n",
2024-09-28 17:13:16 -07:00
" scorers_weights=[1.0, -1.0],\n",
" classification = True,\n",
" cv = 5,\n",
" max_eval_time_mins = 10,\n",
" early_stop = 2,\n",
" verbose = 2,\n",
" n_jobs=4,\n",
")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using a built in method"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
2024-12-23 11:11:12 -08:00
"est = tpot.TPOTEstimator(\n",
2024-09-28 17:13:16 -07:00
" search_space = \"linear\",\n",
2024-12-23 11:11:12 -08:00
" scorers=[\"roc_auc_ovr\", tpot.objectives.complexity_scorer],\n",
2024-09-28 17:13:16 -07:00
" scorers_weights=[1.0, -1.0],\n",
" classification = True,\n",
" cv = 5,\n",
" max_eval_time_mins = 10,\n",
" early_stop = 2,\n",
" verbose = 2,\n",
" n_jobs=4,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2024-12-23 11:11:12 -08:00
"The specific hyperparameter ranges used by TPOT can be found in files in the tpot/config folder. The template search spaces listed above are defined in tpot/config/template_search_spaces.py. Search spaces for individual models can be acquired in the tpot/config/get_configspace.py file (`tpot.config.get_search_space`). More details on customizing search spaces can be found in Tutorial 2.\n",
2024-09-25 11:35:14 -07:00
"\n",
"\n",
2024-12-23 11:11:12 -08:00
" `tpot.config.template_search_spaces.get_template_search_spaces`\n",
2024-09-25 11:35:14 -07:00
" Returns a search space which can be optimized by TPOT.\n",
"\n",
" Parameters\n",
" ----------\n",
" search_space: str or SearchSpace\n",
" The default search space to use. If a string, it should be one of the following:\n",
" - 'linear': A search space for linear pipelines\n",
" - 'linear-light': A search space for linear pipelines with a smaller, faster search space\n",
" - 'graph': A search space for graph pipelines\n",
" - 'graph-light': A search space for graph pipelines with a smaller, faster search space\n",
" - 'mdr': A search space for MDR pipelines\n",
" If a SearchSpace object, it should be a valid search space object for TPOT.\n",
" \n",
" classification: bool, default=True\n",
" Whether the problem is a classification problem or a regression problem.\n",
"\n",
" inner_predictors: bool, default=None\n",
" Whether to include additional classifiers/regressors before the final classifier/regressor (allowing for ensembles). \n",
" Defaults to False for 'linear-light' and 'graph-light' search spaces, and True otherwise. (Not used for 'mdr' search space)\n",
" \n",
" cross_val_predict_cv: int, default=None\n",
" The number of folds to use for cross_val_predict. \n",
" Defaults to 0 for 'linear-light' and 'graph-light' search spaces, and 5 otherwise. (Not used for 'mdr' search space)\n",
"\n",
" get_search_space_params: dict\n",
" Additional parameters to pass to the get_search_space function.\n",
"\n",
"### cross_val_predict_cv\n",
"\n",
"Additionally, utilizing `cross_val_predict_cv` may increase performance when training models with inner classifiers/regressors. If this parameter is set, during model training any classifiers or regressors that is not the final predictor will use `sklearn.model_selection.cross_val_predict` to pass out of sample predictions into the following steps of the model. The model will still be fit to the full data which will be used for predictions after training. Training downstream models on out of sample predictions can often prevent overfitting and increase performance. The reason is that this gives downstream models a estimate of how upstream models compare on unseen data. Otherwise, if an upsteam model heavily overfits the data, downsteam models may simply learn to blindly trust the seemingly well-predicting model, propagating the over-fitting through to the end result.\n",
"\n",
"The downside is that cross_val_predict_cv is significantly more computationally demanding, and may not be necessary for your given dataset. \n"
]
},
{
"cell_type": "code",
2024-09-28 17:13:16 -07:00
"execution_count": 7,
2024-09-25 11:35:14 -07:00
"metadata": {},
"outputs": [],
"source": [
2024-12-23 11:11:12 -08:00
"linear_with_cross_val_predict_sp = tpot.config.template_search_spaces.get_template_search_spaces(search_space=\"linear\", classification=True, inner_predictors=True, cross_val_predict_cv=5)\n",
2024-09-25 11:35:14 -07:00
"classification_optimizer = TPOTClassifier(search_space=linear_with_cross_val_predict_sp, max_time_mins=30/60, n_jobs=30, cv=5)"
2024-09-20 14:48:56 -07:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Terminating Optimization (Early Stopping)\n",
2024-09-20 15:27:26 -07:00
"\n",
"Note that we use a short time duration for a quick example, but in practice, you may need to run TPOT for a longer duration. By default, TPOT sets a time limit of 1 hour with a max limit of 5 minutes per pipeline. In practice, you may want to increase these values.\n",
2024-09-20 15:27:26 -07:00
"\n",
"There are three methods of terminating a TPOT run and ending the optimization process. TPOT will terminate as soon as one of the conditions is met.\n",
"* `max_time_mins` : (Default, 60 minutes) After this many minutes, TPOT will terminate and return the best pipeline it found so far.\n",
"* `early_stop` : The number of generations without seeing an improvement in performance, after which TPOT terminates. Generally, a value of around 5 to 20 is sufficient to be reasonably sure that performance has converged.\n",
"* `generations`: The total number of generations of the evolutionary algorithm to run.\n",
2024-09-20 15:27:26 -07:00
"\n",
"By default, TPOT will run until the time limit is up, with no generation or early stop limits."
2024-09-20 14:48:56 -07:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Best Practices and tips:\n",
2024-09-20 14:48:56 -07:00
"\n",
"* When running tpot from an .py script, it is important to protect code with `if __name__==\"__main__\":` . This is because of how TPOT handles parallelization with Python and Dask."
2024-09-20 14:48:56 -07:00
]
},
{
"cell_type": "code",
2024-09-28 17:13:16 -07:00
"execution_count": 8,
2024-09-20 14:48:56 -07:00
"metadata": {},
2024-09-23 19:45:04 -07:00
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
2025-02-21 15:56:46 -08:00
"Generation: : 1it [03:13, 193.20s/it]\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 0 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 1 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 2 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 3 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 4 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 5 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 6 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 7 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 8 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 9 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 10 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 11 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 12 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 13 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 14 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 15 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 16 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 17 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 18 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 19 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 20 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 21 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 22 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 23 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 24 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 25 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 26 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 27 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 28 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 29 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 30 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 31 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 32 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 33 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 34 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 35 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 36 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 37 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 38 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 39 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 40 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 41 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 42 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 43 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 44 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 45 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 46 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 47 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 48 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 49 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 50 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 51 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 52 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 53 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 54 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 55 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 56 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 57 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 58 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:307: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 59 are removed. Consider decreasing the number of bins.\n",
" warnings.warn(\n"
2024-09-23 19:45:04 -07:00
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
2025-02-21 15:56:46 -08:00
"0.999621947852182\n"
2024-09-23 19:45:04 -07:00
]
}
],
2024-09-20 14:48:56 -07:00
"source": [
"from dask.distributed import Client, LocalCluster\n",
2024-12-23 11:11:12 -08:00
"import tpot\n",
2024-09-20 14:48:56 -07:00
"import sklearn\n",
"import sklearn.datasets\n",
"import numpy as np\n",
"\n",
"if __name__==\"__main__\":\n",
" scorer = sklearn.metrics.get_scorer('roc_auc_ovo')\n",
" X, y = sklearn.datasets.load_digits(return_X_y=True)\n",
" X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, train_size=0.75, test_size=0.25)\n",
"\n",
"\n",
2024-12-23 11:11:12 -08:00
" est = tpot.TPOTClassifier(n_jobs=4, max_time_mins=3, verbose=2, early_stop=3)\n",
2024-09-20 14:48:56 -07:00
" est.fit(X_train, y_train)\n",
"\n",
"\n",
" print(scorer(est, X_test, y_test))"
]
},
2024-09-20 15:27:26 -07:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
2024-09-20 19:44:59 -07:00
"# Example analysis and the Estimator class \n",
"\n",
"Here we use a toy example dataset included in scikit-learn. We will use the `light` configuration and the `complexity_scorer` to estimate complexity.\n",
"\n",
"Note, for this toy example, we set a relatively short run time. In practice, we would recommend running TPOT for a longer duration with an `early_stop` value of around 5 to 20 (more details below)."
2024-09-20 15:27:26 -07:00
]
},
{
"cell_type": "code",
2024-09-28 17:13:16 -07:00
"execution_count": 9,
2024-09-20 15:27:26 -07:00
"metadata": {},
"outputs": [
{
2024-09-20 19:44:59 -07:00
"name": "stderr",
"output_type": "stream",
"text": [
2025-02-21 15:56:46 -08:00
"Generation: : 4it [02:34, 38.64s/it]\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/neural_network/_multilayer_perceptron.py:690: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.\n",
" warnings.warn(\n"
2024-09-20 19:44:59 -07:00
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
2025-02-21 15:56:46 -08:00
"0.9978289188015632\n"
2024-09-20 15:27:26 -07:00
]
}
],
"source": [
"from dask.distributed import Client, LocalCluster\n",
2024-12-23 11:11:12 -08:00
"import tpot\n",
2024-09-20 15:27:26 -07:00
"import sklearn\n",
"import sklearn.datasets\n",
"import numpy as np\n",
"\n",
2024-12-23 11:11:12 -08:00
"import tpot.objectives\n",
2024-09-20 15:27:26 -07:00
"\n",
"\n",
"scorer = sklearn.metrics.get_scorer('roc_auc_ovr')\n",
"\n",
2024-09-20 19:44:59 -07:00
"X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)\n",
2024-09-20 15:27:26 -07:00
"X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, train_size=0.75, test_size=0.25)\n",
"\n",
"\n",
2024-12-23 11:11:12 -08:00
"est = tpot.TPOTClassifier(\n",
" scorers=[scorer, tpot.objectives.complexity_scorer],\n",
2024-09-20 19:44:59 -07:00
" scorers_weights=[1.0, -1.0],\n",
2024-09-20 15:27:26 -07:00
"\n",
2024-09-25 11:35:14 -07:00
" search_space=\"linear\",\n",
2024-09-20 15:27:26 -07:00
" n_jobs=4, \n",
" max_time_mins=60, \n",
" max_eval_time_mins=10,\n",
2024-09-20 15:27:26 -07:00
" early_stop=2,\n",
" verbose=2,)\n",
"est.fit(X_train, y_train)\n",
"\n",
"print(scorer(est, X_test, y_test))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2024-09-20 19:44:59 -07:00
"You can access the best pipeline selected by TPOT with the `fitted_pipeline_` attribute. This is the pipeline with the highest cross validation score (on the first scorer, or first objective function if no scorer is provided.)"
]
},
{
"cell_type": "code",
2024-09-28 17:13:16 -07:00
"execution_count": 10,
2024-09-20 19:44:59 -07:00
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
2025-02-21 15:56:46 -08:00
"<style>#sk-container-id-1 {\n",
" /* Definition of color scheme common for light and dark mode */\n",
" --sklearn-color-text: black;\n",
" --sklearn-color-line: gray;\n",
" /* Definition of color scheme for unfitted estimators */\n",
" --sklearn-color-unfitted-level-0: #fff5e6;\n",
" --sklearn-color-unfitted-level-1: #f6e4d2;\n",
" --sklearn-color-unfitted-level-2: #ffe0b3;\n",
" --sklearn-color-unfitted-level-3: chocolate;\n",
" /* Definition of color scheme for fitted estimators */\n",
" --sklearn-color-fitted-level-0: #f0f8ff;\n",
" --sklearn-color-fitted-level-1: #d4ebff;\n",
" --sklearn-color-fitted-level-2: #b3dbfd;\n",
" --sklearn-color-fitted-level-3: cornflowerblue;\n",
"\n",
" /* Specific color for light theme */\n",
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, white)));\n",
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
" --sklearn-color-icon: #696969;\n",
"\n",
" @media (prefers-color-scheme: dark) {\n",
" /* Redefinition of color scheme for dark theme */\n",
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, #111)));\n",
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
" --sklearn-color-icon: #878787;\n",
" }\n",
"}\n",
"\n",
"#sk-container-id-1 {\n",
" color: var(--sklearn-color-text);\n",
"}\n",
"\n",
"#sk-container-id-1 pre {\n",
" padding: 0;\n",
"}\n",
"\n",
"#sk-container-id-1 input.sk-hidden--visually {\n",
" border: 0;\n",
" clip: rect(1px 1px 1px 1px);\n",
" clip: rect(1px, 1px, 1px, 1px);\n",
" height: 1px;\n",
" margin: -1px;\n",
" overflow: hidden;\n",
" padding: 0;\n",
" position: absolute;\n",
" width: 1px;\n",
"}\n",
"\n",
"#sk-container-id-1 div.sk-dashed-wrapped {\n",
" border: 1px dashed var(--sklearn-color-line);\n",
" margin: 0 0.4em 0.5em 0.4em;\n",
" box-sizing: border-box;\n",
" padding-bottom: 0.4em;\n",
" background-color: var(--sklearn-color-background);\n",
"}\n",
"\n",
"#sk-container-id-1 div.sk-container {\n",
" /* jupyter's `normalize.less` sets `[hidden] { display: none; }`\n",
" but bootstrap.min.css set `[hidden] { display: none !important; }`\n",
" so we also need the `!important` here to be able to override the\n",
" default hidden behavior on the sphinx rendered scikit-learn.org.\n",
" See: https://github.com/scikit-learn/scikit-learn/issues/21755 */\n",
" display: inline-block !important;\n",
" position: relative;\n",
"}\n",
"\n",
"#sk-container-id-1 div.sk-text-repr-fallback {\n",
" display: none;\n",
"}\n",
"\n",
"div.sk-parallel-item,\n",
"div.sk-serial,\n",
"div.sk-item {\n",
" /* draw centered vertical line to link estimators */\n",
" background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));\n",
" background-size: 2px 100%;\n",
" background-repeat: no-repeat;\n",
" background-position: center center;\n",
"}\n",
"\n",
"/* Parallel-specific style estimator block */\n",
"\n",
"#sk-container-id-1 div.sk-parallel-item::after {\n",
" content: \"\";\n",
" width: 100%;\n",
" border-bottom: 2px solid var(--sklearn-color-text-on-default-background);\n",
" flex-grow: 1;\n",
"}\n",
"\n",
"#sk-container-id-1 div.sk-parallel {\n",
" display: flex;\n",
" align-items: stretch;\n",
" justify-content: center;\n",
" background-color: var(--sklearn-color-background);\n",
" position: relative;\n",
"}\n",
"\n",
"#sk-container-id-1 div.sk-parallel-item {\n",
" display: flex;\n",
" flex-direction: column;\n",
"}\n",
"\n",
"#sk-container-id-1 div.sk-parallel-item:first-child::after {\n",
" align-self: flex-end;\n",
" width: 50%;\n",
"}\n",
"\n",
"#sk-container-id-1 div.sk-parallel-item:last-child::after {\n",
" align-self: flex-start;\n",
" width: 50%;\n",
"}\n",
"\n",
"#sk-container-id-1 div.sk-parallel-item:only-child::after {\n",
" width: 0;\n",
"}\n",
"\n",
"/* Serial-specific style estimator block */\n",
"\n",
"#sk-container-id-1 div.sk-serial {\n",
" display: flex;\n",
" flex-direction: column;\n",
" align-items: center;\n",
" background-color: var(--sklearn-color-background);\n",
" padding-right: 1em;\n",
" padding-left: 1em;\n",
"}\n",
"\n",
"\n",
"/* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is\n",
"clickable and can be expanded/collapsed.\n",
"- Pipeline and ColumnTransformer use this feature and define the default style\n",
"- Estimators will overwrite some part of the style using the `sk-estimator` class\n",
"*/\n",
"\n",
"/* Pipeline and ColumnTransformer style (default) */\n",
"\n",
"#sk-container-id-1 div.sk-toggleable {\n",
" /* Default theme specific background. It is overwritten whether we have a\n",
" specific estimator or a Pipeline/ColumnTransformer */\n",
" background-color: var(--sklearn-color-background);\n",
"}\n",
"\n",
"/* Toggleable label */\n",
"#sk-container-id-1 label.sk-toggleable__label {\n",
" cursor: pointer;\n",
" display: block;\n",
" width: 100%;\n",
" margin-bottom: 0;\n",
" padding: 0.5em;\n",
" box-sizing: border-box;\n",
" text-align: center;\n",
"}\n",
"\n",
"#sk-container-id-1 label.sk-toggleable__label-arrow:before {\n",
" /* Arrow on the left of the label */\n",
" content: \"▸\";\n",
" float: left;\n",
" margin-right: 0.25em;\n",
" color: var(--sklearn-color-icon);\n",
"}\n",
"\n",
"#sk-container-id-1 label.sk-toggleable__label-arrow:hover:before {\n",
" color: var(--sklearn-color-text);\n",
"}\n",
"\n",
"/* Toggleable content - dropdown */\n",
"\n",
"#sk-container-id-1 div.sk-toggleable__content {\n",
" max-height: 0;\n",
" max-width: 0;\n",
" overflow: hidden;\n",
" text-align: left;\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-0);\n",
"}\n",
"\n",
"#sk-container-id-1 div.sk-toggleable__content.fitted {\n",
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-0);\n",
"}\n",
"\n",
"#sk-container-id-1 div.sk-toggleable__content pre {\n",
" margin: 0.2em;\n",
" border-radius: 0.25em;\n",
" color: var(--sklearn-color-text);\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-0);\n",
"}\n",
"\n",
"#sk-container-id-1 div.sk-toggleable__content.fitted pre {\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-fitted-level-0);\n",
"}\n",
"\n",
"#sk-container-id-1 input.sk-toggleable__control:checked~div.sk-toggleable__content {\n",
" /* Expand drop-down */\n",
" max-height: 200px;\n",
" max-width: 100%;\n",
" overflow: auto;\n",
"}\n",
"\n",
"#sk-container-id-1 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {\n",
" content: \"▾\";\n",
"}\n",
"\n",
"/* Pipeline/ColumnTransformer-specific style */\n",
"\n",
"#sk-container-id-1 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
" color: var(--sklearn-color-text);\n",
" background-color: var(--sklearn-color-unfitted-level-2);\n",
"}\n",
"\n",
"#sk-container-id-1 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
" background-color: var(--sklearn-color-fitted-level-2);\n",
"}\n",
"\n",
"/* Estimator-specific style */\n",
"\n",
"/* Colorize estimator box */\n",
"#sk-container-id-1 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-2);\n",
"}\n",
"\n",
"#sk-container-id-1 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-2);\n",
"}\n",
"\n",
"#sk-container-id-1 div.sk-label label.sk-toggleable__label,\n",
"#sk-container-id-1 div.sk-label label {\n",
" /* The background is the default theme color */\n",
" color: var(--sklearn-color-text-on-default-background);\n",
"}\n",
"\n",
"/* On hover, darken the color of the background */\n",
"#sk-container-id-1 div.sk-label:hover label.sk-toggleable__label {\n",
" color: var(--sklearn-color-text);\n",
" background-color: var(--sklearn-color-unfitted-level-2);\n",
"}\n",
"\n",
"/* Label box, darken color on hover, fitted */\n",
"#sk-container-id-1 div.sk-label.fitted:hover label.sk-toggleable__label.fitted {\n",
" color: var(--sklearn-color-text);\n",
" background-color: var(--sklearn-color-fitted-level-2);\n",
"}\n",
"\n",
"/* Estimator label */\n",
"\n",
"#sk-container-id-1 div.sk-label label {\n",
" font-family: monospace;\n",
" font-weight: bold;\n",
" display: inline-block;\n",
" line-height: 1.2em;\n",
"}\n",
"\n",
"#sk-container-id-1 div.sk-label-container {\n",
" text-align: center;\n",
"}\n",
"\n",
"/* Estimator-specific */\n",
"#sk-container-id-1 div.sk-estimator {\n",
" font-family: monospace;\n",
" border: 1px dotted var(--sklearn-color-border-box);\n",
" border-radius: 0.25em;\n",
" box-sizing: border-box;\n",
" margin-bottom: 0.5em;\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-0);\n",
"}\n",
"\n",
"#sk-container-id-1 div.sk-estimator.fitted {\n",
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-0);\n",
"}\n",
"\n",
"/* on hover */\n",
"#sk-container-id-1 div.sk-estimator:hover {\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-2);\n",
"}\n",
"\n",
"#sk-container-id-1 div.sk-estimator.fitted:hover {\n",
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-2);\n",
"}\n",
"\n",
"/* Specification for estimator info (e.g. \"i\" and \"?\") */\n",
"\n",
"/* Common style for \"i\" and \"?\" */\n",
"\n",
".sk-estimator-doc-link,\n",
"a:link.sk-estimator-doc-link,\n",
"a:visited.sk-estimator-doc-link {\n",
" float: right;\n",
" font-size: smaller;\n",
" line-height: 1em;\n",
" font-family: monospace;\n",
" background-color: var(--sklearn-color-background);\n",
" border-radius: 1em;\n",
" height: 1em;\n",
" width: 1em;\n",
" text-decoration: none !important;\n",
" margin-left: 1ex;\n",
" /* unfitted */\n",
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
" color: var(--sklearn-color-unfitted-level-1);\n",
"}\n",
"\n",
".sk-estimator-doc-link.fitted,\n",
"a:link.sk-estimator-doc-link.fitted,\n",
"a:visited.sk-estimator-doc-link.fitted {\n",
" /* fitted */\n",
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
" color: var(--sklearn-color-fitted-level-1);\n",
"}\n",
"\n",
"/* On hover */\n",
"div.sk-estimator:hover .sk-estimator-doc-link:hover,\n",
".sk-estimator-doc-link:hover,\n",
"div.sk-label-container:hover .sk-estimator-doc-link:hover,\n",
".sk-estimator-doc-link:hover {\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-3);\n",
" color: var(--sklearn-color-background);\n",
" text-decoration: none;\n",
"}\n",
"\n",
"div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,\n",
".sk-estimator-doc-link.fitted:hover,\n",
"div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,\n",
".sk-estimator-doc-link.fitted:hover {\n",
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-3);\n",
" color: var(--sklearn-color-background);\n",
" text-decoration: none;\n",
"}\n",
"\n",
"/* Span, style for the box shown on hovering the info icon */\n",
".sk-estimator-doc-link span {\n",
" display: none;\n",
" z-index: 9999;\n",
" position: relative;\n",
" font-weight: normal;\n",
" right: .2ex;\n",
" padding: .5ex;\n",
" margin: .5ex;\n",
" width: min-content;\n",
" min-width: 20ex;\n",
" max-width: 50ex;\n",
" color: var(--sklearn-color-text);\n",
" box-shadow: 2pt 2pt 4pt #999;\n",
" /* unfitted */\n",
" background: var(--sklearn-color-unfitted-level-0);\n",
" border: .5pt solid var(--sklearn-color-unfitted-level-3);\n",
"}\n",
"\n",
".sk-estimator-doc-link.fitted span {\n",
" /* fitted */\n",
" background: var(--sklearn-color-fitted-level-0);\n",
" border: var(--sklearn-color-fitted-level-3);\n",
"}\n",
"\n",
".sk-estimator-doc-link:hover span {\n",
" display: block;\n",
"}\n",
"\n",
"/* \"?\"-specific style due to the `<a>` HTML tag */\n",
"\n",
"#sk-container-id-1 a.estimator_doc_link {\n",
" float: right;\n",
" font-size: 1rem;\n",
" line-height: 1em;\n",
" font-family: monospace;\n",
" background-color: var(--sklearn-color-background);\n",
" border-radius: 1rem;\n",
" height: 1rem;\n",
" width: 1rem;\n",
" text-decoration: none;\n",
" /* unfitted */\n",
" color: var(--sklearn-color-unfitted-level-1);\n",
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
"}\n",
"\n",
"#sk-container-id-1 a.estimator_doc_link.fitted {\n",
" /* fitted */\n",
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
" color: var(--sklearn-color-fitted-level-1);\n",
"}\n",
"\n",
"/* On hover */\n",
"#sk-container-id-1 a.estimator_doc_link:hover {\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-3);\n",
" color: var(--sklearn-color-background);\n",
" text-decoration: none;\n",
"}\n",
"\n",
"#sk-container-id-1 a.estimator_doc_link.fitted:hover {\n",
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-3);\n",
"}\n",
"</style><div id=\"sk-container-id-1\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>Pipeline(steps=[(&#x27;minmaxscaler&#x27;, MinMaxScaler()),\n",
" (&#x27;selectpercentile&#x27;,\n",
" SelectPercentile(percentile=68.60012151662)),\n",
2024-09-23 19:45:04 -07:00
" (&#x27;featureunion-1&#x27;,\n",
2025-02-21 15:56:46 -08:00
" FeatureUnion(transformer_list=[(&#x27;skiptransformer&#x27;,\n",
" SkipTransformer()),\n",
2024-09-20 19:44:59 -07:00
" (&#x27;passthrough&#x27;,\n",
" Passthrough())])),\n",
2024-09-23 19:45:04 -07:00
" (&#x27;featureunion-2&#x27;,\n",
2024-09-20 19:44:59 -07:00
" FeatureUnion(transformer_list=[(&#x27;skiptransformer&#x27;,\n",
" SkipTransformer()),\n",
" (&#x27;passthrough&#x27;,\n",
" Passthrough())])),\n",
2025-02-21 15:56:46 -08:00
" (&#x27;mlpclassifier&#x27;,\n",
" MLPClassifier(activation=&#x27;identity&#x27;, alpha=0.0023692590029,\n",
" hidden_layer_sizes=[139, 139],\n",
" learning_rate=&#x27;invscaling&#x27;,\n",
" learning_rate_init=0.0004707733364,\n",
" n_iter_no_change=32))])</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-1\" type=\"checkbox\" ><label for=\"sk-estimator-id-1\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">&nbsp;&nbsp;Pipeline<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.pipeline.Pipeline.html\">?<span>Documentation for Pipeline</span></a><span class=\"sk-estimator-doc-link fitted\">i<span>Fitted</span></span></label><div class=\"sk-toggleable__content fitted\"><pre>Pipeline(steps=[(&#x27;minmaxscaler&#x27;, MinMaxScaler()),\n",
" (&#x27;selectpercentile&#x27;,\n",
" SelectPercentile(percentile=68.60012151662)),\n",
2024-09-23 19:45:04 -07:00
" (&#x27;featureunion-1&#x27;,\n",
2025-02-21 15:56:46 -08:00
" FeatureUnion(transformer_list=[(&#x27;skiptransformer&#x27;,\n",
" SkipTransformer()),\n",
2024-09-23 19:45:04 -07:00
" (&#x27;passthrough&#x27;,\n",
" Passthrough())])),\n",
" (&#x27;featureunion-2&#x27;,\n",
" FeatureUnion(transformer_list=[(&#x27;skiptransformer&#x27;,\n",
" SkipTransformer()),\n",
" (&#x27;passthrough&#x27;,\n",
" Passthrough())])),\n",
2025-02-21 15:56:46 -08:00
" (&#x27;mlpclassifier&#x27;,\n",
" MLPClassifier(activation=&#x27;identity&#x27;, alpha=0.0023692590029,\n",
" hidden_layer_sizes=[139, 139],\n",
" learning_rate=&#x27;invscaling&#x27;,\n",
" learning_rate_init=0.0004707733364,\n",
" n_iter_no_change=32))])</pre></div> </div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-2\" type=\"checkbox\" ><label for=\"sk-estimator-id-2\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">&nbsp;MinMaxScaler<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.preprocessing.MinMaxScaler.html\">?<span>Documentation for MinMaxScaler</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>MinMaxScaler()</pre></div> </div></div><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-3\" type=\"checkbox\" ><label for=\"sk-estimator-id-3\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">&nbsp;SelectPercentile<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.feature_selection.SelectPercentile.html\">?<span>Documentation for SelectPercentile</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>SelectPercentile(percentile=68.60012151662)</pre></div> </div></div><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-4\" type=\"checkbox\" ><label for=\"sk-estimator-id-4\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">&nbsp;featureunion-1: FeatureUnion<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.pipeline.FeatureUnion.html\">?<span>Documentation for featureunion-1: FeatureUnion</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>FeatureUnion(transformer_list=[(&#x27;skiptransformer&#x27;, SkipTransformer()),\n",
" (&#x27;passthrough&#x27;, Passthrough())])</pre></div> </div></div><div class=\"sk-parallel\"><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><label>skiptransformer</label></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-5\" type=\"checkbox\" ><label for=\"sk-estimator-id-5\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">SkipTransformer</label><div class=\"sk-toggleable__content fitted\"><pre>SkipTransformer()</pre></div> </div></div></div></div></div><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><label>passthrough</label></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-6\" type=\"checkbox\" ><label for=\"sk-estimator-id-6\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">Passthrough</label><div class=\"sk-toggleable__content fitted\"><pre>Passthrough()</pre></div> </div></div></div></div></div></div></div><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-7\" type=\"checkbox\" ><label for=\"sk-estimator-id-7\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">&nbsp;featureunion-2: FeatureUnion<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.pipeline.FeatureUnion.html\">?<span>Documentation for featureunion-2: FeatureUnion</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>FeatureUnion(transformer_list=[(&#x27;skiptransformer&#x27;, SkipTransformer()),\n",
" (&#x27;passthrough&#x27;, Passthrough())])</pre></div> </div></div><div class=\"sk-parallel\"><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><label>skiptransformer</label></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-8\" type=\"checkbox\" ><label for=\"sk-estimator-id-8\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">SkipTransformer</label><div class=\"sk-toggleable__content fitted\"><pre>SkipTransformer()</pre></div> </div></div></div></div></div><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label fitted sk-toggleable\"><label>passthrough</label></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-9\" type=\"checkbox\" ><label for=\"sk-estimator-id-9\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">Passthrough</label><div class=\"sk-toggleable__content fitted\"><pre>Passthrough()</pre></div> </div></div></div></div></div></div></div><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-10\" type=\"checkbox\" ><label for=\"sk-estimator-id-10\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">&nbsp;MLPClassifier<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.neural_network.MLPClassifier.html\">?<span>Documentation for MLPClassifier</span></a></label><div class=\"sk-toggleable__content fitted\"><pre>MLPClassifier(activation=&#x27;identity&#x27;, alpha=0.0023692590029,\n",
" hidden_layer_sizes=[139, 139], learning_rate=&#x27;invscaling&#x27;,\n",
" learning_rate_init=0.0004707733364, n_iter_no_change=32)</pre></div> </div></div></div></div></div></div>"
2024-09-20 19:44:59 -07:00
],
"text/plain": [
2025-02-21 15:56:46 -08:00
"Pipeline(steps=[('minmaxscaler', MinMaxScaler()),\n",
" ('selectpercentile',\n",
" SelectPercentile(percentile=68.60012151662)),\n",
2024-09-23 19:45:04 -07:00
" ('featureunion-1',\n",
2025-02-21 15:56:46 -08:00
" FeatureUnion(transformer_list=[('skiptransformer',\n",
" SkipTransformer()),\n",
2024-09-20 19:44:59 -07:00
" ('passthrough',\n",
" Passthrough())])),\n",
2024-09-23 19:45:04 -07:00
" ('featureunion-2',\n",
" FeatureUnion(transformer_list=[('skiptransformer',\n",
" SkipTransformer()),\n",
" ('passthrough',\n",
" Passthrough())])),\n",
2025-02-21 15:56:46 -08:00
" ('mlpclassifier',\n",
" MLPClassifier(activation='identity', alpha=0.0023692590029,\n",
" hidden_layer_sizes=[139, 139],\n",
" learning_rate='invscaling',\n",
" learning_rate_init=0.0004707733364,\n",
" n_iter_no_change=32))])"
2024-09-20 19:44:59 -07:00
]
},
2024-09-28 17:13:16 -07:00
"execution_count": 10,
2024-09-20 19:44:59 -07:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"best_pipeline = est.fitted_pipeline_\n",
"best_pipeline"
]
},
{
"cell_type": "code",
2024-09-28 17:13:16 -07:00
"execution_count": 11,
2024-09-20 19:44:59 -07:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
2025-02-21 15:56:46 -08:00
"array([1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0,\n",
" 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0,\n",
" 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1,\n",
" 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1,\n",
" 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0,\n",
" 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1,\n",
" 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1])"
2024-09-20 19:44:59 -07:00
]
},
2024-09-28 17:13:16 -07:00
"execution_count": 11,
2024-09-20 19:44:59 -07:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"best_pipeline.predict(X_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Saving the Pipeline\n",
2024-09-20 19:44:59 -07:00
"\n",
"We recommend using dill or pickle to save the instance of the fitted_pipeline_. Note that we do not recommend pickling the TPOT object itself."
]
},
{
"cell_type": "code",
2025-02-21 15:56:46 -08:00
"execution_count": 13,
2024-09-20 19:44:59 -07:00
"metadata": {},
"outputs": [],
"source": [
"import dill as pickle\n",
"with open(\"best_pipeline.pkl\", \"wb\") as f:\n",
" pickle.dump(best_pipeline, f)\n",
"\n",
"#load the pipeline\n",
"import dill as pickle\n",
"with open(\"best_pipeline.pkl\", \"rb\") as f:\n",
" my_loaded_best_pipeline = pickle.load(f)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The evaluated_individuals Dataframe - Further analysis of results\n",
2024-09-20 15:27:26 -07:00
"\n",
"The `evaluated_individuals` attribute of the tpot estimator object is a Pandas Dataframe containing information about a run. Each row corresponds to an individual pipeline explored by tpot. The dataframe contains the following columns:\n",
"\n",
"| Column | Description |\n",
"| :--- | :----: |\n",
"| \\<n objective function columns\\> | The first set of columns will correspond to each objective function. These can either be automatically named by TPOT, or passed in by the user. |\n",
2024-09-20 15:27:26 -07:00
"| Parents | This contains a tuple that contains the indexes of the 'parents' of the current pipeline. For example, (29, 42) means that the pipelines in indexes 29 and 42 were utilized to generate that pipeline. |\n",
2024-09-20 19:44:59 -07:00
"| Variation_Function | The function applied to the parents to generate the new pipeline |\n",
"| Individual | The individual class that represents a specific pipeline and hyperparameter configuration. This class also contains functions for mutation and crossover. To get the sklearn estimator/pipeline object from the individual you can call the `export_pipeline()` function. (as in, `pipe = ind.export_pipeline()`) |\n",
"| Generation | The generation where the individual was created. (Note that the higher performing pipelines from previous generations may still be present in the current \"population\" of a given generation if selected.) |\n",
"| Submitted Timestamp | Timestamp, in seconds, at which the pipeline was sent to be evaluated. This is the output of time.time(), which is \"Return the time in seconds since the epoch as a floating-point number. \" |\n",
"| Completed Timestamp | Timestamp at which the pipeline evaluation completed in the same units as Submitted Timestamp |\n",
"| Pareto_Front\t | If you have multiple parameters, this column is True if the pipeline performance fall on the pareto front line. This is the set of pipelines with scores that are strictly better than pipelines not on the line, but not strictly better than one another. |\n",
"| Instance | This contains the unfitted pipeline evaluated for this row. (This is the pipeline returned by calling the export_pipeline() function of the individual class) |\n"
2024-09-20 15:27:26 -07:00
]
},
{
"cell_type": "code",
2025-02-21 15:56:46 -08:00
"execution_count": 14,
2024-09-20 19:44:59 -07:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['roc_auc_score', 'complexity_scorer']"
]
},
2025-02-21 15:56:46 -08:00
"execution_count": 14,
2024-09-20 19:44:59 -07:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#get the score/objective column names generated by TPOT\n",
"est.objective_names"
]
},
{
"cell_type": "code",
2025-02-21 15:56:46 -08:00
"execution_count": 15,
2024-09-20 15:27:26 -07:00
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>roc_auc_score</th>\n",
2024-09-20 19:44:59 -07:00
" <th>complexity_scorer</th>\n",
2024-09-20 15:27:26 -07:00
" <th>Parents</th>\n",
" <th>Variation_Function</th>\n",
" <th>Individual</th>\n",
" <th>Generation</th>\n",
" <th>Submitted Timestamp</th>\n",
" <th>Completed Timestamp</th>\n",
" <th>Eval Error</th>\n",
" <th>Pareto_Front</th>\n",
" <th>Instance</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
2024-12-23 11:11:12 -08:00
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>&lt;tpot.search_spaces.pipelines.sequential.Seque...</td>\n",
2024-09-20 15:27:26 -07:00
" <td>0.0</td>\n",
2025-02-21 15:56:46 -08:00
" <td>1.740178e+09</td>\n",
" <td>1.740178e+09</td>\n",
2024-12-23 11:11:12 -08:00
" <td>INVALID</td>\n",
2024-09-20 15:27:26 -07:00
" <td>NaN</td>\n",
2025-02-21 15:56:46 -08:00
" <td>(MaxAbsScaler(), RFE(estimator=ExtraTreesClass...</td>\n",
2024-09-20 15:27:26 -07:00
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
2024-09-28 17:13:16 -07:00
" <td>NaN</td>\n",
" <td>NaN</td>\n",
2024-09-20 15:27:26 -07:00
" <td>NaN</td>\n",
" <td>NaN</td>\n",
2024-12-23 11:11:12 -08:00
" <td>&lt;tpot.search_spaces.pipelines.sequential.Seque...</td>\n",
2024-09-20 15:27:26 -07:00
" <td>0.0</td>\n",
2025-02-21 15:56:46 -08:00
" <td>1.740178e+09</td>\n",
" <td>1.740179e+09</td>\n",
2024-09-28 17:13:16 -07:00
" <td>INVALID</td>\n",
2024-09-20 15:27:26 -07:00
" <td>NaN</td>\n",
2025-02-21 15:56:46 -08:00
" <td>(RobustScaler(quantile_range=(0.1386847479391,...</td>\n",
2024-09-20 15:27:26 -07:00
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
2024-09-28 17:13:16 -07:00
" <td>NaN</td>\n",
" <td>NaN</td>\n",
2024-09-20 15:27:26 -07:00
" <td>NaN</td>\n",
" <td>NaN</td>\n",
2024-12-23 11:11:12 -08:00
" <td>&lt;tpot.search_spaces.pipelines.sequential.Seque...</td>\n",
2024-09-20 15:27:26 -07:00
" <td>0.0</td>\n",
2025-02-21 15:56:46 -08:00
" <td>1.740178e+09</td>\n",
" <td>1.740178e+09</td>\n",
2024-09-28 17:13:16 -07:00
" <td>INVALID</td>\n",
2024-09-20 15:27:26 -07:00
" <td>NaN</td>\n",
2025-02-21 15:56:46 -08:00
" <td>(RobustScaler(quantile_range=(0.0087917518794,...</td>\n",
2024-09-20 15:27:26 -07:00
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
2025-02-21 15:56:46 -08:00
" <td>NaN</td>\n",
" <td>NaN</td>\n",
2024-09-28 17:13:16 -07:00
" <td>NaN</td>\n",
" <td>NaN</td>\n",
2024-12-23 11:11:12 -08:00
" <td>&lt;tpot.search_spaces.pipelines.sequential.Seque...</td>\n",
2024-09-20 15:27:26 -07:00
" <td>0.0</td>\n",
2025-02-21 15:56:46 -08:00
" <td>1.740178e+09</td>\n",
" <td>1.740178e+09</td>\n",
" <td>INVALID</td>\n",
2024-09-20 15:27:26 -07:00
" <td>NaN</td>\n",
2025-02-21 15:56:46 -08:00
" <td>(Passthrough(), Passthrough(), FeatureUnion(tr...</td>\n",
2024-09-20 15:27:26 -07:00
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
2025-02-21 15:56:46 -08:00
" <td>0.969262</td>\n",
" <td>241.2</td>\n",
2024-12-23 11:11:12 -08:00
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>&lt;tpot.search_spaces.pipelines.sequential.Seque...</td>\n",
2024-09-20 15:27:26 -07:00
" <td>0.0</td>\n",
2025-02-21 15:56:46 -08:00
" <td>1.740178e+09</td>\n",
" <td>1.740178e+09</td>\n",
" <td>None</td>\n",
2024-09-20 15:27:26 -07:00
" <td>NaN</td>\n",
2025-02-21 15:56:46 -08:00
" <td>(RobustScaler(quantile_range=(0.0359502923061,...</td>\n",
2024-09-20 15:27:26 -07:00
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
2024-09-20 19:44:59 -07:00
" <td>...</td>\n",
2024-09-20 15:27:26 -07:00
" </tr>\n",
" <tr>\n",
2024-12-23 11:11:12 -08:00
" <th>245</th>\n",
2025-02-21 15:56:46 -08:00
" <td>0.986280</td>\n",
" <td>44.0</td>\n",
" <td>(184, 184)</td>\n",
" <td>ind_crossover</td>\n",
2024-12-23 11:11:12 -08:00
" <td>&lt;tpot.search_spaces.pipelines.sequential.Seque...</td>\n",
" <td>4.0</td>\n",
2025-02-21 15:56:46 -08:00
" <td>1.740179e+09</td>\n",
" <td>1.740179e+09</td>\n",
2024-09-20 15:27:26 -07:00
" <td>None</td>\n",
2025-02-21 15:56:46 -08:00
" <td>NaN</td>\n",
" <td>(RobustScaler(quantile_range=(0.1428289713161,...</td>\n",
2024-09-20 15:27:26 -07:00
" </tr>\n",
" <tr>\n",
2024-12-23 11:11:12 -08:00
" <th>246</th>\n",
2025-02-21 15:56:46 -08:00
" <td>0.902845</td>\n",
" <td>9.0</td>\n",
" <td>(145, 148)</td>\n",
" <td>ind_mutate , ind_mutate , ind_crossover</td>\n",
2024-12-23 11:11:12 -08:00
" <td>&lt;tpot.search_spaces.pipelines.sequential.Seque...</td>\n",
" <td>4.0</td>\n",
2025-02-21 15:56:46 -08:00
" <td>1.740179e+09</td>\n",
" <td>1.740179e+09</td>\n",
2024-09-28 17:13:16 -07:00
" <td>None</td>\n",
2024-09-20 15:27:26 -07:00
" <td>NaN</td>\n",
2025-02-21 15:56:46 -08:00
" <td>(MinMaxScaler(), SelectFwe(alpha=0.00184795618...</td>\n",
2024-09-20 15:27:26 -07:00
" </tr>\n",
" <tr>\n",
2024-12-23 11:11:12 -08:00
" <th>247</th>\n",
2025-02-21 15:56:46 -08:00
" <td>0.992851</td>\n",
" <td>5301.0</td>\n",
" <td>(155, 133)</td>\n",
" <td>ind_mutate , ind_mutate , ind_crossover</td>\n",
2024-12-23 11:11:12 -08:00
" <td>&lt;tpot.search_spaces.pipelines.sequential.Seque...</td>\n",
" <td>4.0</td>\n",
2025-02-21 15:56:46 -08:00
" <td>1.740179e+09</td>\n",
" <td>1.740179e+09</td>\n",
" <td>None</td>\n",
2024-09-20 15:27:26 -07:00
" <td>NaN</td>\n",
2025-02-21 15:56:46 -08:00
" <td>(MaxAbsScaler(), SelectFwe(alpha=0.00212090942...</td>\n",
2024-09-20 15:27:26 -07:00
" </tr>\n",
" <tr>\n",
2024-12-23 11:11:12 -08:00
" <th>248</th>\n",
2025-02-21 15:56:46 -08:00
" <td>0.992349</td>\n",
" <td>7749.0</td>\n",
" <td>(152, 152)</td>\n",
" <td>ind_mutate</td>\n",
2024-12-23 11:11:12 -08:00
" <td>&lt;tpot.search_spaces.pipelines.sequential.Seque...</td>\n",
" <td>4.0</td>\n",
2025-02-21 15:56:46 -08:00
" <td>1.740179e+09</td>\n",
" <td>1.740179e+09</td>\n",
2024-09-20 15:27:26 -07:00
" <td>None</td>\n",
" <td>NaN</td>\n",
2025-02-21 15:56:46 -08:00
" <td>(MinMaxScaler(), SelectFromModel(estimator=Ext...</td>\n",
2024-09-20 15:27:26 -07:00
" </tr>\n",
" <tr>\n",
2024-12-23 11:11:12 -08:00
" <th>249</th>\n",
2025-02-21 15:56:46 -08:00
" <td>0.515242</td>\n",
2024-12-23 11:11:12 -08:00
" <td>9.0</td>\n",
2025-02-21 15:56:46 -08:00
" <td>(182, 182)</td>\n",
" <td>ind_mutate</td>\n",
2024-12-23 11:11:12 -08:00
" <td>&lt;tpot.search_spaces.pipelines.sequential.Seque...</td>\n",
" <td>4.0</td>\n",
2025-02-21 15:56:46 -08:00
" <td>1.740179e+09</td>\n",
" <td>1.740179e+09</td>\n",
2024-09-23 19:45:04 -07:00
" <td>None</td>\n",
2024-09-20 15:27:26 -07:00
" <td>NaN</td>\n",
2025-02-21 15:56:46 -08:00
" <td>(MaxAbsScaler(), VarianceThreshold(threshold=0...</td>\n",
2024-09-20 15:27:26 -07:00
" </tr>\n",
" </tbody>\n",
"</table>\n",
2024-12-23 11:11:12 -08:00
"<p>250 rows × 11 columns</p>\n",
2024-09-20 15:27:26 -07:00
"</div>"
],
"text/plain": [
2025-02-21 15:56:46 -08:00
" roc_auc_score complexity_scorer Parents \\\n",
"0 NaN NaN NaN \n",
"1 NaN NaN NaN \n",
"2 NaN NaN NaN \n",
"3 NaN NaN NaN \n",
"4 0.969262 241.2 NaN \n",
".. ... ... ... \n",
"245 0.986280 44.0 (184, 184) \n",
"246 0.902845 9.0 (145, 148) \n",
"247 0.992851 5301.0 (155, 133) \n",
"248 0.992349 7749.0 (152, 152) \n",
"249 0.515242 9.0 (182, 182) \n",
"\n",
" Variation_Function \\\n",
"0 NaN \n",
"1 NaN \n",
"2 NaN \n",
"3 NaN \n",
"4 NaN \n",
".. ... \n",
"245 ind_crossover \n",
"246 ind_mutate , ind_mutate , ind_crossover \n",
"247 ind_mutate , ind_mutate , ind_crossover \n",
"248 ind_mutate \n",
"249 ind_mutate \n",
2024-09-20 15:27:26 -07:00
"\n",
" Individual Generation \\\n",
2024-12-23 11:11:12 -08:00
"0 <tpot.search_spaces.pipelines.sequential.Seque... 0.0 \n",
"1 <tpot.search_spaces.pipelines.sequential.Seque... 0.0 \n",
"2 <tpot.search_spaces.pipelines.sequential.Seque... 0.0 \n",
"3 <tpot.search_spaces.pipelines.sequential.Seque... 0.0 \n",
"4 <tpot.search_spaces.pipelines.sequential.Seque... 0.0 \n",
2024-09-20 15:27:26 -07:00
".. ... ... \n",
2024-12-23 11:11:12 -08:00
"245 <tpot.search_spaces.pipelines.sequential.Seque... 4.0 \n",
"246 <tpot.search_spaces.pipelines.sequential.Seque... 4.0 \n",
"247 <tpot.search_spaces.pipelines.sequential.Seque... 4.0 \n",
"248 <tpot.search_spaces.pipelines.sequential.Seque... 4.0 \n",
"249 <tpot.search_spaces.pipelines.sequential.Seque... 4.0 \n",
2024-09-20 15:27:26 -07:00
"\n",
" Submitted Timestamp Completed Timestamp Eval Error Pareto_Front \\\n",
2025-02-21 15:56:46 -08:00
"0 1.740178e+09 1.740178e+09 INVALID NaN \n",
"1 1.740178e+09 1.740179e+09 INVALID NaN \n",
"2 1.740178e+09 1.740178e+09 INVALID NaN \n",
"3 1.740178e+09 1.740178e+09 INVALID NaN \n",
"4 1.740178e+09 1.740178e+09 None NaN \n",
2024-09-20 15:27:26 -07:00
".. ... ... ... ... \n",
2025-02-21 15:56:46 -08:00
"245 1.740179e+09 1.740179e+09 None NaN \n",
"246 1.740179e+09 1.740179e+09 None NaN \n",
"247 1.740179e+09 1.740179e+09 None NaN \n",
"248 1.740179e+09 1.740179e+09 None NaN \n",
"249 1.740179e+09 1.740179e+09 None NaN \n",
2024-09-20 15:27:26 -07:00
"\n",
" Instance \n",
2025-02-21 15:56:46 -08:00
"0 (MaxAbsScaler(), RFE(estimator=ExtraTreesClass... \n",
"1 (RobustScaler(quantile_range=(0.1386847479391,... \n",
"2 (RobustScaler(quantile_range=(0.0087917518794,... \n",
"3 (Passthrough(), Passthrough(), FeatureUnion(tr... \n",
"4 (RobustScaler(quantile_range=(0.0359502923061,... \n",
2024-09-20 15:27:26 -07:00
".. ... \n",
2025-02-21 15:56:46 -08:00
"245 (RobustScaler(quantile_range=(0.1428289713161,... \n",
"246 (MinMaxScaler(), SelectFwe(alpha=0.00184795618... \n",
"247 (MaxAbsScaler(), SelectFwe(alpha=0.00212090942... \n",
"248 (MinMaxScaler(), SelectFromModel(estimator=Ext... \n",
"249 (MaxAbsScaler(), VarianceThreshold(threshold=0... \n",
2024-09-20 15:27:26 -07:00
"\n",
2024-12-23 11:11:12 -08:00
"[250 rows x 11 columns]"
2024-09-20 19:44:59 -07:00
]
},
2025-02-21 15:56:46 -08:00
"execution_count": 15,
2024-09-20 19:44:59 -07:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = est.evaluated_individuals\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Let's plot the performances of the different pipelines, including the Pareto front\n",
2024-09-20 19:44:59 -07:00
"\n",
"Plotting the performance of multiple objectives in a scatterplot is a helpful way to visualize the tradeoff between model complexity and predictive performance. This is best visualized when plotting the Pareto front pipelines, which present the best-performing pipeline along the spectrum of complexity. Generally, higher complexity models may yield higher performance but be more difficult to interpret. "
2024-09-20 19:44:59 -07:00
]
},
{
"cell_type": "code",
2025-02-21 15:56:46 -08:00
"execution_count": 18,
2024-09-20 19:44:59 -07:00
"metadata": {},
"outputs": [
{
"data": {
2025-02-21 15:56:46 -08:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAc4AAAHWCAYAAAD+Y2lGAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAXoJJREFUeJzt3Qd4FFXXB/CTXkkhoUNCKCK9gxA6KILSREVsNNFPQlFUigUQKSqCVBtIsYFSRAVFXulNQQhNaiDSIYUUUjdlvudcmHV7ttf/73lWszOzs7ObsGfvveee6yVJkkQAAABgFG/jDgMAAACGwAkAAGACBE4AAAATIHACAACYAIETAADABAicAAAAJkDgBAAAMAECJwAAgAkQOAEAAEyAwAkuZc6cOVSrVi3y8fGhZs2aOfpyPMaWLVvE+x0YGEheXl6UmZlp0fl27twpzsP/lw0dOpRq1qxphavV/xy2wNfM127v5wXHQeAEi6xcuVJ8SMg3/mC97777aPTo0XTr1i2rPtfWrVtpwoQJFB8fTytWrKBZs2ZZ9fygW3p6Oj355JMUFBRES5Ysoa+//ppCQkIcfVkADuPruKcGdzJ9+nSKi4ujgoIC2rt3L3366af066+/0smTJyk4ONgqz7F9+3by9vamL7/8kvz9/a1yTijboUOH6M6dO/Tee+9Rjx49yFV06tSJ8vPz7f634qjnBftB4ASr6NWrF7Vq1Ur8/MILL1BUVBTNmzePfvrpJxo8eLBF587LyxPBNyUlRbR6rPWBxOsbcKDnc4J+/L6ziIgIciX8JYt7QDzlecF+0FULNtGtWzfx/+TkZOW2b775hlq2bCkCVfny5empp56iK1euqD2uS5cu1KhRIzp8+LD45s4B88033xTdwNw9m5ubq+wW5m5iVlxcLFpDtWvXpoCAADHmxI8pLCxUOzdvf/TRR+n3338XQZ6v4/PPP1eOSf3www/07rvvUrVq1ahcuXL0+OOPU1ZWljjPK6+8QhUrVqTQ0FAaNmyY1rn52vg18zF8DQ0aNBCtbk3yNXCrvE2bNuIDlsdsv/rqK61jeRzx1VdfFY/hc1avXp2ef/55SktLUx7D1zF16lSqU6eOOKZGjRqiO1vz+vRZu3at8ncSHR1Nzz77LF27dk3t9zFkyBDxc+vWrcX7pDqep+nSpUs0atQoqlevnjgnf4F64okn6N9//yVrkd9D7rqXx135/d6wYYPacbrGGlX/vtq3by+ukXtKPvvsM63nMfe9NfS8p06doq5du4q/a/47+/DDD81+3v/973/UoUMH8YWG/y75Pee/e7A9tDjBJi5cuCD+zx+cbObMmfTOO++IsTJukaamptKiRYtEcExMTFRrzfCYGrdgObDyB3mlSpVEoPviiy/o4MGDtGzZMnEcf/AxPt+qVatEoHvttdfor7/+otmzZ9Pp06fpxx9/VLuus2fPihbwSy+9RCNHjhQfNjJ+DH+QTpo0iZKSksT1+fn5iRZERkYGTZs2jf78808RsPnDdsqUKcrHcpBs2LAh9e3bl3x9femXX34RAaS0tJQSEhLUroHPzdc6YsQIEZSWL18ughEHMD4Hy8nJoY4dO4rXMHz4cGrRooUImD///DNdvXpVBDk+Nz8fB+EXX3yR6tevTydOnKCPP/6Yzp07Rxs3bjT4O+LXwV8COCDya+cx6QULFtC+ffuUv5O33npLvEf83svd8fwFxVC37v79+8XvjgM9B0x+bzhwcNCwVrf9+fPnadCgQfR///d/4j3kLy4coDmJ6cEHHzT4WP5d9u7dW/wt8t8Cf2F6+eWXRU8Gv9fM0vdW3/M+/PDD9Nhjj4nnXrduHU2cOJEaN24s/t5Ned5//vlHfHlo0qSJ+L1wgOW/K/7dgR3wepwA5lqxYgWv5yr98ccfUmpqqnTlyhVpzZo1UlRUlBQUFCRdvXpV+vfffyUfHx9p5syZao89ceKE5Ovrq7a9c+fO4nyfffaZ1nMNGTJECgkJUdt29OhRcfwLL7ygtv31118X27dv367cFhsbK7Zt2bJF7dgdO3aI7Y0aNZIUCoVy++DBgyUvLy+pV69ease3a9dOnEtVXl6e1vX27NlTqlWrlto2+Rp2796t3JaSkiIFBARIr732mnLblClTxHEbNmzQOm9paan4/9dffy15e3tLe/bsUdvP7x0/dt++fZI+/DorVqwoXnN+fr5y+6ZNm8Rj+fk1f8eHDh3Sez5D78OBAwfE47/66iut95z/r/r71XxfdZHfw/Xr1yu3ZWVlSVWqVJGaN29u8Dnkv6+5c+cqtxUWFkrNmjUT74f8+zflveXr4Ws35nlV3wN+3sqVK0sDBw5UbjP2eT/++GNxn//Ngf2hqxasgpNGKlSoILqVuLXBXUfc2uPuKO5C42/S/C2bW03yrXLlylS3bl3asWOH2rn42zO3hIzBCUhs/Pjxatu55ck2b96stp1bTD179tR5Lu4G5RamrG3btmIcVG6FqG7nLmbuIpapjpNy9y6/vs6dO9PFixfFfVXcrcitSRm/b9yq42Nl69evp6ZNm9KAAQO0rpO7AeVuVm6R3H///Wrvq9xNrvm+qvr777/F2CW3ilXH4x555BFxPs33zViq70NRUZHoPeAuR269HjlyhKylatWqau9NWFiY+P1xS/nmzZsGH8s9AtzjIOOWJt/n94O7cC19b/XhfxPcg6L6vNxdr/p7N/Z55R4aziHgf1tgX+iqBavgaQo8DYU/lLhrlQMBd3HK3WocgDhI6qIarBgHW2MTgHhMjZ+HP5xVcVDmDxferxk49YmJiVG7Hx4eLv7PXwY0t/OHFQdEuSuau8h4XOrAgQMimUkVHyefS9fzsMjISNGVp9rVPXDgQIOvnd9X7srlwGsoqUcX+X1R7aqW8Yc2dxWag7NJuduXu055rJR/7zLNLxCW4N+3/AVCxn9/jLuH+fdvKOhqTqdRfewDDzxg0XurD3dda14z/96PHz+uvG/s83I3NQ9Z8DAFDy10795ddAHzEID87w5sB4ETrIK/OctZtZo4yPAHxm+//SYKF+j6Jq7KnCxXzQ8kfQydW9e1GdouBwUOcvzBxQGHM4k50HLg59Ywj01ptgjKOp+x+Lw8PsbPqYtmwLeHMWPGiKDJyVTt2rUTXxj4d8O9EK7UMrLFe2vM793Y5+W/4927d4sWKPcO8Nju999/L1qmnDSl77nAOhA4weY4mYQ/HLi1J3+zt5bY2FjxYcPf1LmLS8aJLpyVyvttjROBOOORE3dUW5PmdOepvmc8B7asY44dOyaCtrFfHGTy+8LJUnI3oIy3mfu+ccILJ+vMnTtXuY2n/FhaaUgTJ8Lw35Tq6+bkGVZW9aHr16+L7GzVVqfmYy15by1hyvNyy5KP4xsHWi4Iwslc/HfnSvNtXRHa9GBz3IXE34B5qodmq4rv8ziYuTg7ks2fP19tu/yNncfsbE3+dq/ZLcktL3NxNy1/gGpmBas+D48Zc3fo0qVLdXaZcnDQh3sHeOoMT8NQnebAvQLcVWju+8bvhebvmLOTS0pKyJo4+Km+N9nZ2WJKD09PMdRNy3hsmqchyRQKhbjP3aOc2Wzpe2sJY5/39u3bWvvlEpTGTkUC86HFCXb5Fj1jxgyaPHmyGEPq37+/mCfJczz5w4/T7l9//XWzzs0JNNzC4ekS3KrhhByessLTU/h5eM6crT300EOia7ZPnz4iyYSnkvAHHwemGzdumHXON954Q7TeeIoFJyfxBzp/WHKrloMdv+7nnntOTKXgKRncyuBShBygzpw5I7bL81X1jSt/8MEHIgmL3zOeliFPR+FWF88fNQdPkeCSfNxFy0lQPOb7xx9/KMeCrYV7Lng6D09/4TF1ntLD12/MlxUe4+TXzn+LfB7u4jx69Kj4G5LH2y15by1h7PPyFBTuquUvONw7wGOfn3zyiRhH5bmdYFsInGAXnMDAH1I85sctT3m8hoMOz1uzBCdJcBEBnpfIgZhbHBykOVnHHjjBhoPc22+/Lb4A8PPzvEBuwWhm5BqLx3337NkjXgO/Jv4iwIGYu+X4w1HuquN5ffyecmuLj+N
2024-09-20 19:44:59 -07:00
"text/plain": [
"<Figure size 500x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
2025-02-21 15:56:46 -08:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1IAAAHWCAYAAAB9mLjgAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQAATVRJREFUeJzt3Qd4FOX6//87EJJAQoj03gSV3sWAlSqiyAEVORyaCIqA9OYREBRQFBCUptKOogiCKPClC4j0Ir2IiPSmAqGXML/rfq7/7n83BMgsG7a9X9e1JDszOzuzs0vms8/z3BNmWZYlAAAAAIAUS5PyRQEAAAAAiiAFAAAAADYRpAAAAADAJoIUAAAAANhEkAIAAAAAmwhSAAAAAGATQQoAAAAAbCJIAQAAAIBNBCkAAAAAsIkgBQB38OGHH0rhwoUlbdq0UrZsWV9vTsiYP3++eb2joqIkLCxMzpw5c8+e+8knnzS31NaiRQuJiYlJ9ecBAHgfQQpAwJk0aZI5sXbc9ET7gQcekPbt28uJEye8+lwLFy6UHj16SNWqVWXixIkyaNAgr64fyfv777/lpZdekvTp08uoUaPkyy+/lOjoaAlEFy9elHfeeUeWLVvms23Q53f9zGTIkEGKFy8ub7/9tiQkJNzz7dm5c6fZpj///DNV/29wvfXq1UtC7bgDSF3hqbx+AEg1AwYMkEKFCsnly5fll19+kTFjxsj//d//yfbt282Jojf89NNPkiZNGhk/frxERER4ZZ24s/Xr18u5c+fk3XfflRo1akgg0xPq/v37m9/vRSvX7ehnRFvAzp8/b74kGDhwoHmPr1y50oSNexmk9DXR16NgwYKp9n+Dq5IlS0qoHncAqYMgBSBg1alTRypWrGh+f/XVVyVLliwybNgw+eGHH6Rx48Z3fRKkYezkyZOmVcRbIcqyLBP8dJ24NX3dVVxcnK83Jai88MILkjVrVvP766+/Lg0bNpSZM2fKmjVrJD4+3uP1Xr9+XW7cuOE3Xza4/t9wJ/p51O3WL0wAwA7+1wAQNKpVq2Z+7t+/3zntq6++kgoVKpjgkjlzZnn55Zfl0KFDbo/Tb4v12+qNGzfK448/bgLUW2+9Zb6h1+58Fy5ccHYP0q5DjhNHbS25//77JTIy0nyrro+5cuWK27p1+rPPPisLFiwwJ3a6HePGjTPdfXR906ZNM99a58mTRzJmzGhOdM+ePWvW06lTJ8mePbtpQWjZsuVN69Zt033WZXQbtKuWtjgk5dgGbbV7+OGHTVdIHfP1v//976ZldRxS586dzWN0nXnz5pVmzZrJX3/95VxGt6Nfv35SpEgRs0y+fPlM98ek23cr06dPdx4TPan/z3/+I0eOHHE7Hs2bNze/V6pUybxOOpbodn799Vdz8hwbG2ter+rVq5twkFy3L2196dKli2TLls10F/zXv/4lp06duuW6tfVGl+vYseNN8w4fPmzGzg0ePDjZx2rXNX0epcfZ8T7SLl+udP/r169vtl2X79atmyQmJroto0Hl448/lhIlSphjmCNHDnnttdfk9OnT4o3PzNWrV6Vv377m2GTKlMns82OPPSZLly69aZ90Hz766COzPY7PgLYyqd27d5v3sX7edDv1ff/jjz+6HYcXX3zR/P7UU085XxPXLnCjR482+6nrzZ07t7Rr184rY+Qcn7upU6eabo36udPPu6N7453em67j2m53zFJ63AEENlqkAASNffv2mZ/aMqW021KfPn3MWBttsdKT5U8++cSEJT3xdm3t0DE5eiKuQUtPnvQkVU8AP/vsM1m3bp188cUXZrkqVaqYn7q+yZMnmxPGrl27ytq1a83J9K5du+T777932649e/aYFjI96W3durU8+OCDznn6GD1p0/Ebv//+u9m+dOnSmW/H9QRZT7w0EOjJp3ZV0hNdBw1NerJZr149CQ8Pl9mzZ8sbb7xhTrj1xNOVrlu3tVWrViakTJgwwZwQ6kmjrsMRGPTEWffhlVdekfLly5sApSfBGhj0xFLXrc+noaxNmzZSrFgx2bZtmwwfPlx+++03mTVr1m2Pke6HhkINSLrvOqZtxIgRJtw4jsl///tf8xrpa+/ooqUn67eyY8cOs90aojTQ6eunYVUD2fLly6Vy5cpuy3fo0EHuu+8+Ewb1hFfDgI6v+/bbb5Ndv54oa9jS+driqcHJ4ZtvvjGtjE2aNEn2sXoyrcepbdu2Zh0NGjQw00uXLu1cRk++a9eubbZTw8nixYtl6NChZp/1cQ76/nG8fm+++aYJP59++ql53fT10/2+m8+Mhgl9n+t7Vd+n2rVSu7TqtulnIGmhFQ3y2pqj7wMNPBqc9FjoeEINKPqe1jCmXxZo4JgxY4Z5DfTzp9s/cuRI8+WDvoeU46e+5zV8aJdO3X/9/OhrqN09U7qf+mWEa/hXjpY4pV+CaCuUhh/9AkB/T8l7M6XHLCXHHUAQsAAgwEycONHS/74WL15snTp1yjp06JA1depUK0uWLFb69Omtw4cPW3/++aeVNm1aa+DAgW6P3bZtmxUeHu42/YknnjDrGzt27E3P1bx5cys6Otpt2ubNm83yr776qtv0bt26mek//fSTc1qBAgXMtPnz57stu3TpUjO9ZMmS1tWrV53TGzdubIWFhVl16tRxWz4+Pt6sy9XFixdv2t7atWtbhQsXdpvm2Iaff/7ZOe3kyZNWZGSk1bVrV+e0vn37muVmzpx503pv3Lhhfn755ZdWmjRprBUrVrjN19dOH7ty5UrrVnQ/s2fPbvb50qVLzulz5swxj9XnT3qM169fb91J/fr1rYiICGvfvn3OaUePHrUyZsxoPf744zets0aNGs79UZ07dzbvlTNnzri9J/TmsGDBAvPYefPmuT136dKl3ZZLjr5H9bH9+vVL9v2l8wYMGOA2vVy5claFChWc9/X11uWmTJnitpy+r5KbnpQ+ty63Z88esz379++3xo0bZ94DOXLksC5cuGBdv37dunLlitvjTp8+bea/8sorzmn6WF1XbGyseR+5ql69ulWqVCnr8uXLzmn6WlepUsUqWrSoc9r06dPNOvRz4ErXp8eyVq1aVmJionP6p59+apafMGHCbffTcYyTu7l+7vQz4vr5sfPeTOkxu91xBxAc6NoHIGDpN9b6za92LdOWJG050NYg/TZcx31o64m2Ruk3045bzpw5pWjRojd1V9Jv1PXb6JTQghZKu4e50pYpNXfuXLfp2qKi314nR7vNuX7Drt9wawuHtgi50unaJVG7FDq4jrNyfAP/xBNPyB9//GHuu9Juf9pq46Cvm7b66LIO2mJQpkwZ8w16Uo5CBNr1SVsOHnroIbfX1dFFLOnr6mrDhg1m7JO2mmmXL4e6deua9SV93VJCWwa0aIK2eGh3RYdcuXLJv//9b9NylrQqnbaguBZW0NdF13PgwIHbvte0i9mUKVOc07SoydatW00L5t3S8UqudJtcj42+7trdrmbNmm6vu7Yo6vv+dq+7Kz3meuz1PaktXNo9U1937d6mLW2OMU762fnnn3/M+01bZjdt2nTTunR8laP7mtLltXCFfua0Ncuxjdraq+//vXv33tRNLilt2dEuhtqt1XXMkraQaYtjSt8jWulx0aJFbjdX2irr+vnx5L15p2MGIPjRtQ9AwNKTJS17rt3atCueniQ6Tr70pE0DiYam5CTtHqThK6UD5fWEW59HT0JdaUjT7j9JT8iTVg9zlT9/frf7erKsNBwmna4ntxqQHF0XtcuRdk9bvXq1KY7hSpdzrCu551Havc11fI1289KT49vR11W7/rmeQCdXJCI5jtfFtWujg56sauixS7tr6r4nt04NfPqaaQB1dF9M7rXQ10HdbqyRHm/tvqfdtRyFSDRU6Um3Y7yPp3QdSV/PpMdGX3c9pjoezu7r7krDsgYSff/r+LekXSa1u6p2UdNxTteuXbvtezjpNO0+qp857U6rt1ttp37W7L5H9LOpQfl2YdeVjgW8XbGJpNtu972ZkmMGIPgRpAAErNudLOkJtLY6zJs3z21Mi0PSi6B6UkUvpeWib7fu5LbtdtP1RNU
2024-09-20 19:44:59 -07:00
"text/plain": [
"<Figure size 1000x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"#replace nans in pareto front with 0\n",
"fig, ax = plt.subplots(figsize=(5,5))\n",
"sns.scatterplot(df[df['Pareto_Front']!=1], x='roc_auc_score', y='complexity_scorer', label='other', ax=ax)\n",
"sns.scatterplot(df[df['Pareto_Front']==1], x='roc_auc_score', y='complexity_scorer', label='Pareto Front', ax=ax)\n",
"ax.title.set_text('Performance of all pipelines')\n",
"#log scale y\n",
"ax.set_yscale('log')\n",
"plt.show()\n",
"\n",
"#replace nans in pareto front with 0\n",
"fig, ax = plt.subplots(figsize=(10,5))\n",
"sns.scatterplot(df[df['Pareto_Front']==1], x='roc_auc_score', y='complexity_scorer', label='Pareto Front', ax=ax)\n",
"ax.title.set_text('Performance of only the Pareto Front')\n",
"#log scale y\n",
"# ax.set_yscale('log')\n",
"plt.show()"
]
},
{
"cell_type": "code",
2025-02-21 15:56:46 -08:00
"execution_count": 19,
2024-09-20 19:44:59 -07:00
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>roc_auc_score</th>\n",
" <th>complexity_scorer</th>\n",
" <th>Parents</th>\n",
" <th>Variation_Function</th>\n",
" <th>Individual</th>\n",
" <th>Generation</th>\n",
" <th>Submitted Timestamp</th>\n",
" <th>Completed Timestamp</th>\n",
" <th>Eval Error</th>\n",
" <th>Pareto_Front</th>\n",
" <th>Instance</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
2025-02-21 15:56:46 -08:00
" <th>51</th>\n",
" <td>0.996818</td>\n",
" <td>582.0</td>\n",
" <td>(13, 13)</td>\n",
" <td>ind_mutate</td>\n",
2024-12-23 11:11:12 -08:00
" <td>&lt;tpot.search_spaces.pipelines.sequential.Seque...</td>\n",
2025-02-21 15:56:46 -08:00
" <td>1.0</td>\n",
" <td>1.740179e+09</td>\n",
" <td>1.740179e+09</td>\n",
2024-09-20 19:44:59 -07:00
" <td>None</td>\n",
" <td>1.0</td>\n",
2025-02-21 15:56:46 -08:00
" <td>(MinMaxScaler(), SelectPercentile(percentile=6...</td>\n",
2024-09-20 19:44:59 -07:00
" </tr>\n",
" <tr>\n",
2025-02-21 15:56:46 -08:00
" <th>133</th>\n",
" <td>0.996239</td>\n",
" <td>31.0</td>\n",
" <td>(65, 65)</td>\n",
2024-09-20 19:44:59 -07:00
" <td>ind_mutate</td>\n",
2024-12-23 11:11:12 -08:00
" <td>&lt;tpot.search_spaces.pipelines.sequential.Seque...</td>\n",
2025-02-21 15:56:46 -08:00
" <td>2.0</td>\n",
" <td>1.740179e+09</td>\n",
" <td>1.740179e+09</td>\n",
2024-09-20 19:44:59 -07:00
" <td>None</td>\n",
" <td>1.0</td>\n",
2025-02-21 15:56:46 -08:00
" <td>(StandardScaler(), SelectFwe(alpha=0.002276474...</td>\n",
2024-09-20 19:44:59 -07:00
" </tr>\n",
" <tr>\n",
2025-02-21 15:56:46 -08:00
" <th>185</th>\n",
" <td>0.995843</td>\n",
" <td>30.9</td>\n",
" <td>(133, 133)</td>\n",
2024-09-28 17:13:16 -07:00
" <td>ind_mutate</td>\n",
2024-12-23 11:11:12 -08:00
" <td>&lt;tpot.search_spaces.pipelines.sequential.Seque...</td>\n",
2025-02-21 15:56:46 -08:00
" <td>3.0</td>\n",
" <td>1.740179e+09</td>\n",
" <td>1.740179e+09</td>\n",
" <td>None</td>\n",
2024-09-20 19:44:59 -07:00
" <td>1.0</td>\n",
2025-02-21 15:56:46 -08:00
" <td>(StandardScaler(), SelectFwe(alpha=0.000234016...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>233</th>\n",
" <td>0.995115</td>\n",
" <td>30.7</td>\n",
" <td>(185, 185)</td>\n",
" <td>ind_mutate</td>\n",
" <td>&lt;tpot.search_spaces.pipelines.sequential.Seque...</td>\n",
" <td>4.0</td>\n",
" <td>1.740179e+09</td>\n",
" <td>1.740179e+09</td>\n",
2024-09-20 19:44:59 -07:00
" <td>None</td>\n",
" <td>1.0</td>\n",
2025-02-21 15:56:46 -08:00
" <td>(StandardScaler(), SelectFwe(alpha=0.000234016...</td>\n",
2024-09-20 19:44:59 -07:00
" </tr>\n",
" <tr>\n",
2025-02-21 15:56:46 -08:00
" <th>85</th>\n",
" <td>0.990894</td>\n",
" <td>26.0</td>\n",
" <td>(6, 23)</td>\n",
2024-12-23 11:11:12 -08:00
" <td>ind_crossover</td>\n",
" <td>&lt;tpot.search_spaces.pipelines.sequential.Seque...</td>\n",
2025-02-21 15:56:46 -08:00
" <td>1.0</td>\n",
" <td>1.740179e+09</td>\n",
" <td>1.740179e+09</td>\n",
" <td>None</td>\n",
" <td>1.0</td>\n",
2025-02-21 15:56:46 -08:00
" <td>(MaxAbsScaler(), SelectFwe(alpha=0.00114277554...</td>\n",
" </tr>\n",
" <tr>\n",
2025-02-21 15:56:46 -08:00
" <th>228</th>\n",
" <td>0.990081</td>\n",
" <td>19.0</td>\n",
" <td>(162, 162)</td>\n",
2024-09-28 17:13:16 -07:00
" <td>ind_mutate</td>\n",
2024-12-23 11:11:12 -08:00
" <td>&lt;tpot.search_spaces.pipelines.sequential.Seque...</td>\n",
" <td>4.0</td>\n",
2025-02-21 15:56:46 -08:00
" <td>1.740179e+09</td>\n",
" <td>1.740179e+09</td>\n",
" <td>None</td>\n",
" <td>1.0</td>\n",
2025-02-21 15:56:46 -08:00
" <td>(MaxAbsScaler(), VarianceThreshold(threshold=0...</td>\n",
" </tr>\n",
" <tr>\n",
2025-02-21 15:56:46 -08:00
" <th>215</th>\n",
" <td>0.988614</td>\n",
" <td>9.0</td>\n",
" <td>(162, 162)</td>\n",
2024-12-23 11:11:12 -08:00
" <td>ind_mutate</td>\n",
" <td>&lt;tpot.search_spaces.pipelines.sequential.Seque...</td>\n",
" <td>4.0</td>\n",
2025-02-21 15:56:46 -08:00
" <td>1.740179e+09</td>\n",
" <td>1.740179e+09</td>\n",
" <td>None</td>\n",
" <td>1.0</td>\n",
2024-12-23 11:11:12 -08:00
" <td>(MaxAbsScaler(), VarianceThreshold(threshold=0...</td>\n",
" </tr>\n",
" <tr>\n",
2025-02-21 15:56:46 -08:00
" <th>121</th>\n",
" <td>0.982524</td>\n",
" <td>7.0</td>\n",
" <td>(10, 10)</td>\n",
" <td>ind_mutate</td>\n",
2024-12-23 11:11:12 -08:00
" <td>&lt;tpot.search_spaces.pipelines.sequential.Seque...</td>\n",
" <td>2.0</td>\n",
2025-02-21 15:56:46 -08:00
" <td>1.740179e+09</td>\n",
" <td>1.740179e+09</td>\n",
" <td>None</td>\n",
" <td>1.0</td>\n",
2025-02-21 15:56:46 -08:00
" <td>(MaxAbsScaler(), SelectFwe(alpha=0.03019980124...</td>\n",
2024-09-20 19:44:59 -07:00
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
2025-02-21 15:56:46 -08:00
" roc_auc_score complexity_scorer Parents Variation_Function \\\n",
"51 0.996818 582.0 (13, 13) ind_mutate \n",
"133 0.996239 31.0 (65, 65) ind_mutate \n",
"185 0.995843 30.9 (133, 133) ind_mutate \n",
"233 0.995115 30.7 (185, 185) ind_mutate \n",
"85 0.990894 26.0 (6, 23) ind_crossover \n",
"228 0.990081 19.0 (162, 162) ind_mutate \n",
"215 0.988614 9.0 (162, 162) ind_mutate \n",
"121 0.982524 7.0 (10, 10) ind_mutate \n",
2024-09-20 19:44:59 -07:00
"\n",
" Individual Generation \\\n",
2025-02-21 15:56:46 -08:00
"51 <tpot.search_spaces.pipelines.sequential.Seque... 1.0 \n",
"133 <tpot.search_spaces.pipelines.sequential.Seque... 2.0 \n",
"185 <tpot.search_spaces.pipelines.sequential.Seque... 3.0 \n",
"233 <tpot.search_spaces.pipelines.sequential.Seque... 4.0 \n",
"85 <tpot.search_spaces.pipelines.sequential.Seque... 1.0 \n",
"228 <tpot.search_spaces.pipelines.sequential.Seque... 4.0 \n",
"215 <tpot.search_spaces.pipelines.sequential.Seque... 4.0 \n",
"121 <tpot.search_spaces.pipelines.sequential.Seque... 2.0 \n",
2024-09-20 19:44:59 -07:00
"\n",
" Submitted Timestamp Completed Timestamp Eval Error Pareto_Front \\\n",
2025-02-21 15:56:46 -08:00
"51 1.740179e+09 1.740179e+09 None 1.0 \n",
"133 1.740179e+09 1.740179e+09 None 1.0 \n",
"185 1.740179e+09 1.740179e+09 None 1.0 \n",
"233 1.740179e+09 1.740179e+09 None 1.0 \n",
"85 1.740179e+09 1.740179e+09 None 1.0 \n",
"228 1.740179e+09 1.740179e+09 None 1.0 \n",
"215 1.740179e+09 1.740179e+09 None 1.0 \n",
"121 1.740179e+09 1.740179e+09 None 1.0 \n",
2024-09-20 19:44:59 -07:00
"\n",
" Instance \n",
2025-02-21 15:56:46 -08:00
"51 (MinMaxScaler(), SelectPercentile(percentile=6... \n",
"133 (StandardScaler(), SelectFwe(alpha=0.002276474... \n",
"185 (StandardScaler(), SelectFwe(alpha=0.000234016... \n",
"233 (StandardScaler(), SelectFwe(alpha=0.000234016... \n",
"85 (MaxAbsScaler(), SelectFwe(alpha=0.00114277554... \n",
"228 (MaxAbsScaler(), VarianceThreshold(threshold=0... \n",
"215 (MaxAbsScaler(), VarianceThreshold(threshold=0... \n",
"121 (MaxAbsScaler(), SelectFwe(alpha=0.03019980124... "
2024-09-20 19:44:59 -07:00
]
},
2025-02-21 15:56:46 -08:00
"execution_count": 19,
2024-09-20 19:44:59 -07:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#plot only the pareto front pipelines\n",
"sorted_pareto_front = df[df['Pareto_Front']==1].sort_values('roc_auc_score', ascending=False)\n",
"sorted_pareto_front"
]
},
2024-09-26 09:26:47 -07:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In some cases, you may want to select a slightly lower performing pipeline that is signficantly less complex."
]
},
2024-09-20 19:44:59 -07:00
{
"cell_type": "code",
2025-02-21 15:56:46 -08:00
"execution_count": 20,
2024-09-20 19:44:59 -07:00
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
2025-02-21 15:56:46 -08:00
"<style>#sk-container-id-2 {\n",
" /* Definition of color scheme common for light and dark mode */\n",
" --sklearn-color-text: black;\n",
" --sklearn-color-line: gray;\n",
" /* Definition of color scheme for unfitted estimators */\n",
" --sklearn-color-unfitted-level-0: #fff5e6;\n",
" --sklearn-color-unfitted-level-1: #f6e4d2;\n",
" --sklearn-color-unfitted-level-2: #ffe0b3;\n",
" --sklearn-color-unfitted-level-3: chocolate;\n",
" /* Definition of color scheme for fitted estimators */\n",
" --sklearn-color-fitted-level-0: #f0f8ff;\n",
" --sklearn-color-fitted-level-1: #d4ebff;\n",
" --sklearn-color-fitted-level-2: #b3dbfd;\n",
" --sklearn-color-fitted-level-3: cornflowerblue;\n",
"\n",
" /* Specific color for light theme */\n",
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, white)));\n",
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
" --sklearn-color-icon: #696969;\n",
"\n",
" @media (prefers-color-scheme: dark) {\n",
" /* Redefinition of color scheme for dark theme */\n",
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, #111)));\n",
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
" --sklearn-color-icon: #878787;\n",
" }\n",
"}\n",
"\n",
"#sk-container-id-2 {\n",
" color: var(--sklearn-color-text);\n",
"}\n",
"\n",
"#sk-container-id-2 pre {\n",
" padding: 0;\n",
"}\n",
"\n",
"#sk-container-id-2 input.sk-hidden--visually {\n",
" border: 0;\n",
" clip: rect(1px 1px 1px 1px);\n",
" clip: rect(1px, 1px, 1px, 1px);\n",
" height: 1px;\n",
" margin: -1px;\n",
" overflow: hidden;\n",
" padding: 0;\n",
" position: absolute;\n",
" width: 1px;\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-dashed-wrapped {\n",
" border: 1px dashed var(--sklearn-color-line);\n",
" margin: 0 0.4em 0.5em 0.4em;\n",
" box-sizing: border-box;\n",
" padding-bottom: 0.4em;\n",
" background-color: var(--sklearn-color-background);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-container {\n",
" /* jupyter's `normalize.less` sets `[hidden] { display: none; }`\n",
" but bootstrap.min.css set `[hidden] { display: none !important; }`\n",
" so we also need the `!important` here to be able to override the\n",
" default hidden behavior on the sphinx rendered scikit-learn.org.\n",
" See: https://github.com/scikit-learn/scikit-learn/issues/21755 */\n",
" display: inline-block !important;\n",
" position: relative;\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-text-repr-fallback {\n",
" display: none;\n",
"}\n",
"\n",
"div.sk-parallel-item,\n",
"div.sk-serial,\n",
"div.sk-item {\n",
" /* draw centered vertical line to link estimators */\n",
" background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));\n",
" background-size: 2px 100%;\n",
" background-repeat: no-repeat;\n",
" background-position: center center;\n",
"}\n",
"\n",
"/* Parallel-specific style estimator block */\n",
"\n",
"#sk-container-id-2 div.sk-parallel-item::after {\n",
" content: \"\";\n",
" width: 100%;\n",
" border-bottom: 2px solid var(--sklearn-color-text-on-default-background);\n",
" flex-grow: 1;\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-parallel {\n",
" display: flex;\n",
" align-items: stretch;\n",
" justify-content: center;\n",
" background-color: var(--sklearn-color-background);\n",
" position: relative;\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-parallel-item {\n",
" display: flex;\n",
" flex-direction: column;\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-parallel-item:first-child::after {\n",
" align-self: flex-end;\n",
" width: 50%;\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-parallel-item:last-child::after {\n",
" align-self: flex-start;\n",
" width: 50%;\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-parallel-item:only-child::after {\n",
" width: 0;\n",
"}\n",
"\n",
"/* Serial-specific style estimator block */\n",
"\n",
"#sk-container-id-2 div.sk-serial {\n",
" display: flex;\n",
" flex-direction: column;\n",
" align-items: center;\n",
" background-color: var(--sklearn-color-background);\n",
" padding-right: 1em;\n",
" padding-left: 1em;\n",
"}\n",
"\n",
"\n",
"/* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is\n",
"clickable and can be expanded/collapsed.\n",
"- Pipeline and ColumnTransformer use this feature and define the default style\n",
"- Estimators will overwrite some part of the style using the `sk-estimator` class\n",
"*/\n",
"\n",
"/* Pipeline and ColumnTransformer style (default) */\n",
"\n",
"#sk-container-id-2 div.sk-toggleable {\n",
" /* Default theme specific background. It is overwritten whether we have a\n",
" specific estimator or a Pipeline/ColumnTransformer */\n",
" background-color: var(--sklearn-color-background);\n",
"}\n",
"\n",
"/* Toggleable label */\n",
"#sk-container-id-2 label.sk-toggleable__label {\n",
" cursor: pointer;\n",
" display: block;\n",
" width: 100%;\n",
" margin-bottom: 0;\n",
" padding: 0.5em;\n",
" box-sizing: border-box;\n",
" text-align: center;\n",
"}\n",
"\n",
"#sk-container-id-2 label.sk-toggleable__label-arrow:before {\n",
" /* Arrow on the left of the label */\n",
" content: \"▸\";\n",
" float: left;\n",
" margin-right: 0.25em;\n",
" color: var(--sklearn-color-icon);\n",
"}\n",
"\n",
"#sk-container-id-2 label.sk-toggleable__label-arrow:hover:before {\n",
" color: var(--sklearn-color-text);\n",
"}\n",
"\n",
"/* Toggleable content - dropdown */\n",
"\n",
"#sk-container-id-2 div.sk-toggleable__content {\n",
" max-height: 0;\n",
" max-width: 0;\n",
" overflow: hidden;\n",
" text-align: left;\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-0);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-toggleable__content.fitted {\n",
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-0);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-toggleable__content pre {\n",
" margin: 0.2em;\n",
" border-radius: 0.25em;\n",
" color: var(--sklearn-color-text);\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-0);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-toggleable__content.fitted pre {\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-fitted-level-0);\n",
"}\n",
"\n",
"#sk-container-id-2 input.sk-toggleable__control:checked~div.sk-toggleable__content {\n",
" /* Expand drop-down */\n",
" max-height: 200px;\n",
" max-width: 100%;\n",
" overflow: auto;\n",
"}\n",
"\n",
"#sk-container-id-2 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {\n",
" content: \"▾\";\n",
"}\n",
"\n",
"/* Pipeline/ColumnTransformer-specific style */\n",
"\n",
"#sk-container-id-2 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
" color: var(--sklearn-color-text);\n",
" background-color: var(--sklearn-color-unfitted-level-2);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
" background-color: var(--sklearn-color-fitted-level-2);\n",
"}\n",
"\n",
"/* Estimator-specific style */\n",
"\n",
"/* Colorize estimator box */\n",
"#sk-container-id-2 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-2);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-2);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-label label.sk-toggleable__label,\n",
"#sk-container-id-2 div.sk-label label {\n",
" /* The background is the default theme color */\n",
" color: var(--sklearn-color-text-on-default-background);\n",
"}\n",
"\n",
"/* On hover, darken the color of the background */\n",
"#sk-container-id-2 div.sk-label:hover label.sk-toggleable__label {\n",
" color: var(--sklearn-color-text);\n",
" background-color: var(--sklearn-color-unfitted-level-2);\n",
"}\n",
"\n",
"/* Label box, darken color on hover, fitted */\n",
"#sk-container-id-2 div.sk-label.fitted:hover label.sk-toggleable__label.fitted {\n",
" color: var(--sklearn-color-text);\n",
" background-color: var(--sklearn-color-fitted-level-2);\n",
"}\n",
"\n",
"/* Estimator label */\n",
"\n",
"#sk-container-id-2 div.sk-label label {\n",
" font-family: monospace;\n",
" font-weight: bold;\n",
" display: inline-block;\n",
" line-height: 1.2em;\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-label-container {\n",
" text-align: center;\n",
"}\n",
"\n",
"/* Estimator-specific */\n",
"#sk-container-id-2 div.sk-estimator {\n",
" font-family: monospace;\n",
" border: 1px dotted var(--sklearn-color-border-box);\n",
" border-radius: 0.25em;\n",
" box-sizing: border-box;\n",
" margin-bottom: 0.5em;\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-0);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-estimator.fitted {\n",
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-0);\n",
"}\n",
"\n",
"/* on hover */\n",
"#sk-container-id-2 div.sk-estimator:hover {\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-2);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-estimator.fitted:hover {\n",
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-2);\n",
"}\n",
"\n",
"/* Specification for estimator info (e.g. \"i\" and \"?\") */\n",
"\n",
"/* Common style for \"i\" and \"?\" */\n",
"\n",
".sk-estimator-doc-link,\n",
"a:link.sk-estimator-doc-link,\n",
"a:visited.sk-estimator-doc-link {\n",
" float: right;\n",
" font-size: smaller;\n",
" line-height: 1em;\n",
" font-family: monospace;\n",
" background-color: var(--sklearn-color-background);\n",
" border-radius: 1em;\n",
" height: 1em;\n",
" width: 1em;\n",
" text-decoration: none !important;\n",
" margin-left: 1ex;\n",
" /* unfitted */\n",
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
" color: var(--sklearn-color-unfitted-level-1);\n",
"}\n",
"\n",
".sk-estimator-doc-link.fitted,\n",
"a:link.sk-estimator-doc-link.fitted,\n",
"a:visited.sk-estimator-doc-link.fitted {\n",
" /* fitted */\n",
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
" color: var(--sklearn-color-fitted-level-1);\n",
"}\n",
"\n",
"/* On hover */\n",
"div.sk-estimator:hover .sk-estimator-doc-link:hover,\n",
".sk-estimator-doc-link:hover,\n",
"div.sk-label-container:hover .sk-estimator-doc-link:hover,\n",
".sk-estimator-doc-link:hover {\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-3);\n",
" color: var(--sklearn-color-background);\n",
" text-decoration: none;\n",
"}\n",
"\n",
"div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,\n",
".sk-estimator-doc-link.fitted:hover,\n",
"div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,\n",
".sk-estimator-doc-link.fitted:hover {\n",
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-3);\n",
" color: var(--sklearn-color-background);\n",
" text-decoration: none;\n",
"}\n",
"\n",
"/* Span, style for the box shown on hovering the info icon */\n",
".sk-estimator-doc-link span {\n",
" display: none;\n",
" z-index: 9999;\n",
" position: relative;\n",
" font-weight: normal;\n",
" right: .2ex;\n",
" padding: .5ex;\n",
" margin: .5ex;\n",
" width: min-content;\n",
" min-width: 20ex;\n",
" max-width: 50ex;\n",
" color: var(--sklearn-color-text);\n",
" box-shadow: 2pt 2pt 4pt #999;\n",
" /* unfitted */\n",
" background: var(--sklearn-color-unfitted-level-0);\n",
" border: .5pt solid var(--sklearn-color-unfitted-level-3);\n",
"}\n",
"\n",
".sk-estimator-doc-link.fitted span {\n",
" /* fitted */\n",
" background: var(--sklearn-color-fitted-level-0);\n",
" border: var(--sklearn-color-fitted-level-3);\n",
"}\n",
"\n",
".sk-estimator-doc-link:hover span {\n",
" display: block;\n",
"}\n",
"\n",
"/* \"?\"-specific style due to the `<a>` HTML tag */\n",
"\n",
"#sk-container-id-2 a.estimator_doc_link {\n",
" float: right;\n",
" font-size: 1rem;\n",
" line-height: 1em;\n",
" font-family: monospace;\n",
" background-color: var(--sklearn-color-background);\n",
" border-radius: 1rem;\n",
" height: 1rem;\n",
" width: 1rem;\n",
" text-decoration: none;\n",
" /* unfitted */\n",
" color: var(--sklearn-color-unfitted-level-1);\n",
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
"}\n",
"\n",
"#sk-container-id-2 a.estimator_doc_link.fitted {\n",
" /* fitted */\n",
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
" color: var(--sklearn-color-fitted-level-1);\n",
"}\n",
"\n",
"/* On hover */\n",
"#sk-container-id-2 a.estimator_doc_link:hover {\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-3);\n",
" color: var(--sklearn-color-background);\n",
" text-decoration: none;\n",
"}\n",
"\n",
"#sk-container-id-2 a.estimator_doc_link.fitted:hover {\n",
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-3);\n",
"}\n",
"</style><div id=\"sk-container-id-2\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>Pipeline(steps=[(&#x27;maxabsscaler&#x27;, MaxAbsScaler()),\n",
" (&#x27;selectfwe&#x27;, SelectFwe(alpha=0.0301998012478)),\n",
2024-09-23 19:45:04 -07:00
" (&#x27;featureunion-1&#x27;,\n",
2024-09-28 17:13:16 -07:00
" FeatureUnion(transformer_list=[(&#x27;skiptransformer&#x27;,\n",
" SkipTransformer()),\n",
2024-09-23 19:45:04 -07:00
" (&#x27;passthrough&#x27;,\n",
" Passthrough())])),\n",
" (&#x27;featureunion-2&#x27;,\n",
" FeatureUnion(transformer_list=[(&#x27;skiptransformer&#x27;,\n",
" SkipTransformer()),\n",
2024-09-20 19:44:59 -07:00
" (&#x27;passthrough&#x27;,\n",
" Passthrough())])),\n",
" (&#x27;kneighborsclassifier&#x27;,\n",
2025-02-21 15:56:46 -08:00
" KNeighborsClassifier(n_jobs=1, n_neighbors=2))])</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-11\" type=\"checkbox\" ><label for=\"sk-estimator-id-11\" class=\"sk-toggleable__label sk-toggleable__label-arrow \">&nbsp;&nbsp;Pipeline<a class=\"sk-estimator-doc-link \" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.pipeline.Pipeline.html\">?<span>Documentation for Pipeline</span></a><span class=\"sk-estimator-doc-link \">i<span>Not fitted</span></span></label><div class=\"sk-toggleable__content \"><pre>Pipeline(steps=[(&#x27;maxabsscaler&#x27;, MaxAbsScaler()),\n",
" (&#x27;selectfwe&#x27;, SelectFwe(alpha=0.0301998012478)),\n",
2024-09-23 19:45:04 -07:00
" (&#x27;featureunion-1&#x27;,\n",
2024-09-28 17:13:16 -07:00
" FeatureUnion(transformer_list=[(&#x27;skiptransformer&#x27;,\n",
" SkipTransformer()),\n",
2024-09-23 19:45:04 -07:00
" (&#x27;passthrough&#x27;,\n",
" Passthrough())])),\n",
" (&#x27;featureunion-2&#x27;,\n",
" FeatureUnion(transformer_list=[(&#x27;skiptransformer&#x27;,\n",
" SkipTransformer()),\n",
2024-09-20 19:44:59 -07:00
" (&#x27;passthrough&#x27;,\n",
" Passthrough())])),\n",
" (&#x27;kneighborsclassifier&#x27;,\n",
2025-02-21 15:56:46 -08:00
" KNeighborsClassifier(n_jobs=1, n_neighbors=2))])</pre></div> </div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-12\" type=\"checkbox\" ><label for=\"sk-estimator-id-12\" class=\"sk-toggleable__label sk-toggleable__label-arrow \">&nbsp;MaxAbsScaler<a class=\"sk-estimator-doc-link \" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.preprocessing.MaxAbsScaler.html\">?<span>Documentation for MaxAbsScaler</span></a></label><div class=\"sk-toggleable__content \"><pre>MaxAbsScaler()</pre></div> </div></div><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-13\" type=\"checkbox\" ><label for=\"sk-estimator-id-13\" class=\"sk-toggleable__label sk-toggleable__label-arrow \">&nbsp;SelectFwe<a class=\"sk-estimator-doc-link \" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.feature_selection.SelectFwe.html\">?<span>Documentation for SelectFwe</span></a></label><div class=\"sk-toggleable__content \"><pre>SelectFwe(alpha=0.0301998012478)</pre></div> </div></div><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-14\" type=\"checkbox\" ><label for=\"sk-estimator-id-14\" class=\"sk-toggleable__label sk-toggleable__label-arrow \">&nbsp;featureunion-1: FeatureUnion<a class=\"sk-estimator-doc-link \" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.pipeline.FeatureUnion.html\">?<span>Documentation for featureunion-1: FeatureUnion</span></a></label><div class=\"sk-toggleable__content \"><pre>FeatureUnion(transformer_list=[(&#x27;skiptransformer&#x27;, SkipTransformer()),\n",
" (&#x27;passthrough&#x27;, Passthrough())])</pre></div> </div></div><div class=\"sk-parallel\"><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><label>skiptransformer</label></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-15\" type=\"checkbox\" ><label for=\"sk-estimator-id-15\" class=\"sk-toggleable__label sk-toggleable__label-arrow \">SkipTransformer</label><div class=\"sk-toggleable__content \"><pre>SkipTransformer()</pre></div> </div></div></div></div></div><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><label>passthrough</label></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-16\" type=\"checkbox\" ><label for=\"sk-estimator-id-16\" class=\"sk-toggleable__label sk-toggleable__label-arrow \">Passthrough</label><div class=\"sk-toggleable__content \"><pre>Passthrough()</pre></div> </div></div></div></div></div></div></div><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-17\" type=\"checkbox\" ><label for=\"sk-estimator-id-17\" class=\"sk-toggleable__label sk-toggleable__label-arrow \">&nbsp;featureunion-2: FeatureUnion<a class=\"sk-estimator-doc-link \" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.pipeline.FeatureUnion.html\">?<span>Documentation for featureunion-2: FeatureUnion</span></a></label><div class=\"sk-toggleable__content \"><pre>FeatureUnion(transformer_list=[(&#x27;skiptransformer&#x27;, SkipTransformer()),\n",
" (&#x27;passthrough&#x27;, Passthrough())])</pre></div> </div></div><div class=\"sk-parallel\"><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><label>skiptransformer</label></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-18\" type=\"checkbox\" ><label for=\"sk-estimator-id-18\" class=\"sk-toggleable__label sk-toggleable__label-arrow \">SkipTransformer</label><div class=\"sk-toggleable__content \"><pre>SkipTransformer()</pre></div> </div></div></div></div></div><div class=\"sk-parallel-item\"><div class=\"sk-item\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><label>passthrough</label></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-19\" type=\"checkbox\" ><label for=\"sk-estimator-id-19\" class=\"sk-toggleable__label sk-toggleable__label-arrow \">Passthrough</label><div class=\"sk-toggleable__content \"><pre>Passthrough()</pre></div> </div></div></div></div></div></div></div><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-20\" type=\"checkbox\" ><label for=\"sk-estimator-id-20\" class=\"sk-toggleable__label sk-toggleable__label-arrow \">&nbsp;KNeighborsClassifier<a class=\"sk-estimator-doc-link \" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.neighbors.KNeighborsClassifier.html\">?<span>Documentation for KNeighborsClassifier</span></a></label><div class=\"sk-toggleable__content \"><pre>KNeighborsClassifier(n_jobs=1, n_neighbors=2)</pre></div> </div></div></div></div></div></div>"
2024-09-20 19:44:59 -07:00
],
"text/plain": [
2025-02-21 15:56:46 -08:00
"Pipeline(steps=[('maxabsscaler', MaxAbsScaler()),\n",
" ('selectfwe', SelectFwe(alpha=0.0301998012478)),\n",
2024-09-23 19:45:04 -07:00
" ('featureunion-1',\n",
2024-09-28 17:13:16 -07:00
" FeatureUnion(transformer_list=[('skiptransformer',\n",
" SkipTransformer()),\n",
2024-09-23 19:45:04 -07:00
" ('passthrough',\n",
" Passthrough())])),\n",
" ('featureunion-2',\n",
" FeatureUnion(transformer_list=[('skiptransformer',\n",
" SkipTransformer()),\n",
2024-09-20 19:44:59 -07:00
" ('passthrough',\n",
" Passthrough())])),\n",
" ('kneighborsclassifier',\n",
2025-02-21 15:56:46 -08:00
" KNeighborsClassifier(n_jobs=1, n_neighbors=2))])"
2024-09-20 15:27:26 -07:00
]
},
2025-02-21 15:56:46 -08:00
"execution_count": 20,
2024-09-20 15:27:26 -07:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
2024-09-20 19:44:59 -07:00
"#access the best performing pipeline with the lowest complexity\n",
"\n",
"best_pipeline_lowest_complexity = sorted_pareto_front.iloc[-1]['Instance']\n",
"best_pipeline_lowest_complexity"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plot performance over time + Continuing a run from where it left off\n",
2024-09-20 19:44:59 -07:00
"\n",
"Plotting performance over time is a good way to assess whether or not the TPOT model has converged. If performance asymptotes over time, there may not be much more performance to be gained by running for a longer period. If the plot looks like it is still improving, it may be worth running TPOT for a longer duration. \n",
2024-09-20 19:44:59 -07:00
"\n",
"In this case, we can see that performance is near optimal and has slowed, so more time is likely unnecessary."
2024-09-20 20:25:42 -07:00
]
},
{
"cell_type": "code",
2025-02-21 15:56:46 -08:00
"execution_count": 21,
2024-09-20 20:25:42 -07:00
"metadata": {},
"outputs": [
{
"data": {
2025-02-21 15:56:46 -08:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA2MAAAHACAYAAAAvJrBgAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAQUFJREFUeJzt3Qd8VFX2wPGTQkIJECT03pQFadIEUVxBqjRZBURB1EWKBVAQliqICP+Vpa6gq6CAii6IiiuICK4o0hGlSVt6RwgtJJN5/8+5MGMGAiZxkjfv+ft+PuPkvXkzc3ODmXty7j03zLIsSwAAAAAAWSo8a98OAAAAAKAIxgAAAADABgRjAAAAAGADgjEAAAAAsAHBGAAAAADYgGAMAAAAAGxAMAYAAAAANiAYAwAAAAAbRNrxpm7g9Xrl0KFDkjt3bgkLC7O7OQAAAABsYlmWnD17VooWLSrh4WnPdxGMZZAGYiVKlLC7GQAAAABCxP79+6V48eJpvp5gLIM0I+br8Dx58tjdHAAAAAA2iY+PN4kaX4yQVgRjGeSbmqiBGMEYAAAAgLB0Ll+igAcAAAAA2IBgDAAAAABsQDAGAAAAADYgGAMAAAAAGxCMAQAAAIANCMYAAAAAwAYEYwAAAABgA4IxAAAAALABwRgAAAAA2IBgDAAAAABsQDAGAAAAADYgGAMAAAAAGxCMAQAAAIANIu14UwDICsleS16Yt0l+OnjG7qYAAIBMULNUPhndroo4FcEYANf6ducJ+fe6A3Y3AwAAZJKCebKLkxGMAXCtD9buN/etqhWVDrVK2N0cAAAQZLE5s4mTEYwBcKXTFxLli81HzddP3lVWbi2W1+4mAQAABKCABwBX+njjIUlM9kqlInkIxAAAQEgiGAPg6imKD9YqbndTAAAAUkUwBsB1tHri5kPxEhURLm2qF7O7OQAAAKkiGAPgOr4KivdWLiT5ckXZ3RwAAIBUEYwBcJWEpGT5aMNB8/WDVFAEAAAhjGAMgKt8ufWonLmYJEXyZpcG5ePsbg4AAMB1EYwBcJUP1l6eoviXmsUlIjzM7uYAAABcF8EYANc4dPqifLPjuD8YAwAACGUEYwBcY/76A2JZIreXvUlK5c9ld3MAAABuiGAMgCt4vZZ/iuIDNSncAQAAQh/BGABXWP2/U7Lv1AWJiY6U5lUK290cAACA30QwBsAVPli739y3qlZEckZF2t0cAACA30QwBsDxziYkyX9+PGy+foC9xQAAgEMQjAFwvIWbDktCklfKF4yRGiVi7W4OAABAmhCMAXDNFMUHaxWXsDD2FgMAAM5AMAbA0XYcPSsb9p02Gzy3q8HeYgAAwDkIxgA42ofrLpezv6diQSmQO9ru5gAAAKQZwRgAx0pK9pqNntWDFO4AAAAOQzAGwLGWbTsmJ84lSlxMtNx9SwG7mwMAAOCsYGzq1KlSunRpyZ49u9StW1dWr1593WuTkpJk5MiRUq5cOXN9tWrVZNGiRQHXnD17Vvr06SOlSpWSHDlySP369WXNmjUB1zz66KNmkX/KW7NmzTLtewSQOT5Yezkr1v62YpItwvZfZwAAAOli6+hl7ty50q9fPxk+fLisX7/eBFdNmzaVY8eOpXr9kCFDZPr06TJ58mTZsmWL9OjRQ9q1aycbNmzwX/PEE0/IkiVLZNasWfLjjz9KkyZNpHHjxnLw4MGA19Lg6/Dhw/7be++9l+nfL4DgOXY2QZZtv/y74oFaFO4AAADOE2ZZlmXXm2smrHbt2jJlyhRz7PV6pUSJEvL000/LwIEDr7m+aNGiMnjwYOndu7f/XPv27U0GbPbs2XLx4kXJnTu3fPzxx9KyZUv/NTVr1pTmzZvLSy+95M+MnT59WhYsWJDhtsfHx0vevHnlzJkzkidPngy/DoCMmf71Lhnz+TapUTJWPup1h93NAQAAf2DxGYwNbMuMJSYmyrp160zWyt+Y8HBzvHLlylSfc+nSJTM9MSUNxFasWGG+9ng8kpycfMNrfJYvXy4FCxaUW265RXr27CknT568YXv1vbWTU94A2EP/hvTr3mIU7gAAAM5kWzB24sQJEzgVKlQo4LweHzlyJNXn6BTG8ePHy44dO0wWTacjzp8/30wzVJoVq1evnowaNUoOHTpkXl8zZhrc+a7xTVF85513ZOnSpTJ27Fj5+uuvTeZMr7+eMWPGmGjXd9MMHgB7rN93WnYdPy/Zs4XLfVWL2N0cAACADHHUiveJEydKhQoVpGLFihIVFSVPPfWUdOvWzWTUfHStmP7VvFixYhIdHS2TJk2STp06BVzTsWNHad26tVSpUkXatm0rCxcuNEU+NFt2PYMGDTJpR99t//7Lf5UHkPU+vJIVa1GliOTOns3u5gAAADgrGIuLi5OIiAg5evRowHk9Lly4cKrPKVCggFnndf78edm7d69s27ZNYmJipGzZsv5rtNKiZrrOnTtnAiatzqhVGFNeczV9TNuzc+fO616jgZ3O/0x5A5D1LiR6ZOGmy5lupigCAAAnsy0Y08yWFtbQqYI+OvVQj3Wq4Y3omjDNfOkasXnz5kmbNm2uuSZXrlxSpEgR+eWXX2Tx4sWpXuNz4MABs2ZMrwcQ2j7/8Yicu+SRUvlzSt0yN9ndHAAAgAyLFBtpWfuuXbtKrVq1pE6dOjJhwgST9dKph6pLly4m6NL1WmrVqlWmRH316tXN/YgRI0wAN2DAAP9rauCl0xS1MIdmuvr372+mNfpeUzNmL774oqnCqBm4Xbt2meeXL1/erEkDENp8hTseqFnc7BEIAADgVLYGYx06dJDjx4/LsGHDTNEODbJ0E2dfUY99+/YFrPVKSEgwe43t3r3bTE9s0aKFWSMWGxvrv0bXc+n6Ls123XTTTSboGj16tGTLdnldiU6N3LRpk7z99tumvL2Wy9e9yLToh05FBP7IEpKS5attx+T8JY+EootJybJqzynRGKx9TfYWAwAAzmbrPmNOxj5jcKN/Lt8p4xZtl1DX8OYC8vZjdexuBgAAwO+KDWzNjAEILYdPJ5j70vlzSpm4XBKKoiMj5NnGFexuBgAAwO9GMAbAL9HjNfcP1Cohvf9c3u7mAAAAuJqj9hkDkLkSky8HY1ER/GoAAADIbIy4AFyTGcsWQZVCAACAzEYwBsDv0pVgLCoywu6mAAAAuB7BGIBrpylG8qsBAAAgszHiAuCX6Ek29wRjAAAAmY8RF4Br1oxRwAMAACDzMeICcM00xWgyYwAAAJmOEReAazNjBGMAAACZjhEXAD+CMQAAgKzDiAuAH2vGAAAAsg4jLgB+lLYHAADIOoy4APgxTREAACDrMOICcG1mjGmKAAAAmY4RF4BrMmOUtgcAAMh8jLgAGJ5kr3ity18zTREAACDzMeICEDBFURGMAQAAZD5GXAACpigq1owBAABkPkZcAAKCsfAwkUiCMQAAgEzHiAuAcYmy9gAAAFmKURcAg7L2AAAAWYtRF4CrNnyOsLspAAAAfwgEYwAM9hgDAADIWoy6AAROUyQYAwAAyBKMugAETlNkzRgAAECWYNQF4Ko1Y/xaAAAAyAqMugAYlLYHAADIWoy6AASsGcsWEWZ3UwAAAP4QCMYAGJS2BwAAyFoEYwAMCngAAABkLUZdAIxET7K5Z58xAACArMGoC4DBPmMAAABZKzKL3w/Icj8fPSuvfL5NthyKt7spIe3cJY+5Z5oiAABA1iAYg2uduZgk/1jys8z6fq8key27m+MYtxTObXcTAAAA/hAIxuA6Gnh9uHa/jFu8XU6dTzTnmt9aWJ64syzroX5DjqgIKRuXy+5mAAAA/CEQjMFV1u/7RYZ/vFl+PHjGHJcvGCMjWlWWBhXi7G4aAAAAEIBgDK5w7GyCjP18u8xbf8Ac546OlD733ixd6pWSbKyBAgAAQAgiGIPj98aa+d0embR0p78AxYO1isuAZhUlLiba7uYBAAAA10UwBsf6+ufj8uKnm2X38fPmuFqJWHmxdWWpXiLW7qYBAAAAv4lgDI6z7+QFGfXZFlmy5ag5jouJMpmwv9xWXMLDw+xuHgAAAJAmBGNwjAuJHnlt+S6Z/t/
2024-09-20 20:25:42 -07:00
"text/plain": [
"<Figure size 1000x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#get columns where roc_auc_score is not NaN\n",
"scores_and_times = df[df['roc_auc_score'].notna()][['roc_auc_score', 'Completed Timestamp']].sort_values('Completed Timestamp', ascending=True).to_numpy()\n",
"\n",
"#get best score at a given time\n",
"best_scores = np.maximum.accumulate(scores_and_times[:,0])\n",
"times = scores_and_times[:,1]\n",
"times = times - df['Submitted Timestamp'].min()\n",
"\n",
"fig, ax = plt.subplots(figsize=(10,5))\n",
"ax.plot(times, best_scores)\n",
"ax.set_xlabel('Time (seconds)')\n",
"ax.set_ylabel('Best Score')\n",
"plt.show()\n"
2024-09-20 15:27:26 -07:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Checkpointing\n",
"\n",
"There are two ways to resume TPOT. \n",
"* If the `warm_start` parameter is set to True, subsequent calls to `fit` will continue training where it left off (The conventional scikit-learn default is to retrain from scratch on subsequent calls to fit). \n",
"* If `periodic_checkpoint_folder` is set, TPOT will periodically save its current state to disk. If TPOT is interrupted (job canceled, PC shut off, crashes), you can resume training from where it left off. The checkpoint folder stores a data frame of all evaluated pipelines. This data frame can be loaded and inspected to help diagnose problems when debugging.\n",
"\n",
"\n",
"**Note: TPOT does not clean up the checkpoint files. If the `periodic_checkpoint_folder` parameter is set, training from the last saved point will always continue, even if the input data has changed. A common issue is forgetting to change this folder between experiments and TPOT continuing training from pipelines optimized for another dataset. If you intend to start a run from scratch, you must either remove the parameter, supply an empty folder, or delete the original checkpoint folder.**"
]
},
2024-09-20 14:48:56 -07:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Common parameters\n",
"\n",
2024-09-23 09:02:43 -07:00
"Here is a subset of the most common parameters to customize and what they do. See the docs for `TPOTEstimator` or `TPOTEstimatorSteadyState` full documentation of all parameters. \n",
"\n",
"| Parameter | Type | Description |\n",
"|--------------------------------|-----------------------|-----------------------------------------------------------------------------|\n",
"| scorers | list, scorer | List of scorers for cross-validation; see |\n",
"| scorers_weights | list | Weights applied to scorers during optimization |\n",
"| classification | bool | Problem type: True for classification, False for regression |\n",
"| cv | int, cross-validator | Cross-validation strategy: int for folds or custom cross-validator |\n",
"| max_depth | int | Maximum pipeline depth |\n",
"| other_objective_functions | list | Additional objective functions; default: [average_path_length_objective] |\n",
"| other_objective_functions_weights | list | Weights for additional objective functions; default: [-1] |\n",
"| objective_function_names | list | Names for objective functions; default: None (uses function names) |\n",
"| bigger_is_better | bool | Optimization direction: True for maximize, False for minimize |\n",
"| generations | int | Number of optimization generations; default: 50 |\n",
"| max_time_mins | float | Maximum optimization time (minutes); default: infinite |\n",
"| max_eval_time_mins | float | Maximum evaluation time per individual (minutes); default: 300 |\n",
"| n_jobs | int | Number of parallel processes; default: 1 |\n",
"| memory_limit | str | Memory limit per job; default: \"4GB\" |\n",
"| verbose | int | Optimization process verbosity: 0 (none), 1 (progress), 3 (best individual), 4 (warnings), 5+ (full warnings) |\n",
"| memory | str, memory object | If supplied, pipeline will cache each transformer after calling fit with joblib.Memory. |\n",
"| periodic_checkpoint_folder | str | Folder to save the population to periodically. If None, no periodic saving will be done. If provided, training will resume from this checkpoint.|\n",
2024-09-20 14:48:56 -07:00
" \n"
]
},
2024-09-20 19:44:59 -07:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
2024-09-28 15:54:13 -07:00
"# Preventing Overfitting\n",
"\n",
"On small datasets, it is not impossible for TPOT to overfit the cross-validation score itself. This can lead to lower-than-expected performance on held-out datasets. TPOT will always return the model with the highest CV score as its final fitted_pipeline. However, if the highest performing model, as evaluated by cross-validation, actually was just overfit to the CV score, it may actually be worse performing compared to other models on the Pareto front.\n",
"  * Using a secondary complexity objective and evaluating the entire pareto front may be beneficial. In some cases a lower performing pipeline with lower complexity can actually perform better on held out sets. These can either be evaluated and compared on a held out validation set, or sometimes, if very data limited, simply using a different seed of splitting the CV folds can work as well.\n",
"    * TPOT can do this automatically. The `validation_strategy` parameter can be set to re-test the final pareto front on either a held-out validation set (percent of data set by `validation_fraction`) or a different seed for splitting the CV folds. These can be selected by setting `validation_strategy` to \"split\" or \"reshuffled\", respectively.\n",
"  * Increasing the number of folds of cross-validation can mitigate this. \n",
"  * Nested cross-validation can also be used to estimate the performance of the TPOT optimization algorithm itself.\n",
"  * Removing more complex methods from the search space can reduce the chances of overfitting"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tips and tricks for speeding up TPOT\n",
"\n",
"TPOT can be a computationally demanding algorithm as it fits thousands of complex machine learning pipelines on potentially large datasets. There are several strategies available for improving run time by reducing the compute needed. \n",
"\n",
"There are three main strategies implemented in TPOT to reduce redundant work and/or prevent wasting compute on poorly performing pipelines.\n",
"\n",
"1. TPOT pipelines will often have the exact same components doing the exact same computation (e.g. the first steps of the pipeline remain the same and only the parameters of the final classifier changed.) In these cases, that The first strategy is to simply cache these repeat computations so that they only happen once. More info in the next subsection.\n",
"2. Successive Halving. This idea was first tested with TPOT by Parmentier et al. in [\"TPOT-SH: a Faster Optimization Algorithm to Solve the AutoML Problem on Large Datasets\"](https://www.researchgate.net/profile/Laurent-Parmentier-4/publication/339263193_TPOT-SH_A_Faster_Optimization_Algorithm_to_Solve_the_AutoML_Problem_on_Large_Datasets/links/5e5fd8b8a6fdccbeba1c6a56/TPOT-SH-A-Faster-Optimization-Algorithm-to-Solve-the-AutoML-Problem-on-Large-Datasets.pdf). The algorithm operates in two stages. Initially, it trains early generations using a small data subset and a large population size. Later generations then evaluate a smaller set of promising pipelines on larger, or even full, data portions. This approach rapidly identifies top-performing pipeline configurations through initial rough evaluations, followed by more comprehensive assessments. More information on this strategy in Tutorial 8.\n",
"3. Most often, we will be evaluating pipelines using cross validation. However, we can often tell within the first few folds whether or not the pipeline is going have a reasonable change of outperforming the previous best pipelines. For example, if the best score so far is .92 AUROC and the average score of the first five folds of our current pipeline is only around .61, we can be reasonably confident that the next five folds are unlikely to this pipeline ahead of the others. We can save a significant amount of compute by not computing the rest of the folds. There are two strategies that TPOT can use to accomplish this (More information on these strategies in Tutorial 8).\n",
" 1. Threshold Pruning: Pipelines must achieve a score above a predefined percentile threshold (based on previous pipeline scores) to proceed in each cross-validation (CV) fold.\n",
" 2. Selection Pruning: Within each population, only the top N% of pipelines (ranked by performance in the previous CV fold) are selected to evaluate in the next fold.\"\n",
" \n",
"\n",
"## Pipeline caching in TPOT (joblib.Memory)\n",
2024-09-23 09:02:43 -07:00
"\n",
"With the memory parameter, pipelines can cache the results of each transformer after fitting them. This feature is used to avoid repeated computation by transformers within a pipeline if the parameters and input data are identical to another fitted pipeline during the optimization process. TPOT allows users to specify a custom directory path or joblib.Memory in case they want to re-use the memory cache in future TPOT runs (or a warm_start run).\n",
2024-09-23 09:02:43 -07:00
"\n",
"There are three methods for enabling memory caching in TPOT:"
]
},
{
"cell_type": "code",
2025-02-21 15:56:46 -08:00
"execution_count": 22,
2024-09-23 09:02:43 -07:00
"metadata": {},
"outputs": [],
"source": [
2024-12-23 11:11:12 -08:00
"from tpot import TPOTClassifier\n",
2024-09-23 09:02:43 -07:00
"from tempfile import mkdtemp\n",
"from joblib import Memory\n",
"from shutil import rmtree\n",
"\n",
"# Method 1, auto mode: TPOT uses memory caching with a temporary directory and cleans it up upon shutdown\n",
"est = TPOTClassifier(memory='auto')\n",
"\n",
"# Method 2, with a custom directory for memory caching\n",
"est = TPOTClassifier(memory='/to/your/path')\n",
"\n",
"# Method 3, with a Memory object\n",
2024-09-23 19:45:04 -07:00
"memory = Memory(location='./to/your/path', verbose=0)\n",
"est = TPOTClassifier(memory=memory)\n"
2024-09-23 09:02:43 -07:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note: TPOT does NOT clean up memory caches if users set a custom directory path or Memory object. We recommend that you clean up the memory caches when you don't need it anymore.**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2024-09-28 15:54:13 -07:00
"## \n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Advanced Parallelization (HPC and multi-node training)\n",
"\n",
"See Tutorial 7 for more details on parallelization with Dask, including information of using multiple nodes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# FAQ and Debugging\n",
2024-09-20 19:44:59 -07:00
"\n",
"If you are experiencing issues with TPOT, here are some common issues and how to address them.\n",
"\n",
"* Performance is lower than expected. What can I do?\n",
2024-09-20 19:44:59 -07:00
" * TPOT may have to be run for a longer duration, increase `max_time_mins`, `early_stop`, or `generations`.\n",
" * Individual pipelines may need more time to complete fitting; increase `max_eval_time_seconds.`\n",
" * The configuration may not include the optimal model types or hyperparameter ranges, explore other included templates, or customize your own search space (see Tutorial 2!)\n",
" * Check that `periodic_checkpoint_folder` is set correctly. A common issue is forgetting to change this folder between experiments and TPOT continuing training from pipelines optimized for another dataset.\n",
2024-09-20 20:13:51 -07:00
"* TPOT is too slow! It is running forever and never terminating\n",
" * Check that at least one of the three termination conditions is set to a reasonable level. These are `max_time_mins`, `early_stop`, or `generations`. Additionally, check that `max_eval_time_seconds` gives enough time for most models to train without being overly long. (Some estimators may take an unreasonably long time to fit; this parameter is intended to prevent them from slowing everything to a halt. In my experience, SVC and SVR tend to be the culprits, so removing them from the search space may also improve run time).\n",
2024-09-20 20:13:51 -07:00
" * Set the `memory` parameter to allow TPOT to prevent repeated work when using either scikit-learn pipelines or TPOT GraphPipelines.\n",
2024-09-20 20:25:42 -07:00
" * Increase n_jobs to use more processes/CPU power. See Tutorial 7 for advanced Dask usage, including parallelizing across multiple nodes on an HPC.\n",
" * Use feature selection, either the build in configuration of sklearn methods (see Tutorial 2), or genetic feature selection (see Tutorials 3 and 5 for two different strategies).\n",
" * Use successive halving to reduce computational load (See tutorial 8).\n",
"* Many pipelines in the evaluated_individuals data frame have crashed or turned up invalid!\n",
" * This is normal and is expected behavior for TPOT. In some cases, TPOT may attempt an invalid hyperparameter combination, resulting in the pipeline not working. Other times, the pipeline configuration itself may be invalid. For example, a selector may not select any features due to its hyperparameter. Another common example is `MultinomialNB` throwing an error because it expects positive values, but a prior transformation yielded a negative value. \n",
2024-09-20 19:44:59 -07:00
" * If you used custom search spaces, you can use `ConfigSpace` conditionals to prevent invalid hyperparameters (this may still occur due to how TPOT uses crossover).\n",
2024-09-20 19:54:21 -07:00
" * Setting `verbose=5` will print out the full error message for all failed pipelines. This can be useful for debugging whether or not there is something misconfigured in your pipeline, custom search space modules, or something else.\n",
2024-09-20 20:13:51 -07:00
"* TPOT is crashing due to memory issues\n",
" * Set the `memory_limit` parameter so that n_jobs*memorylimit is less than the available RAM on your machine, plus some wiggle room. This should prevent crashing due to memory concerns.\n",
" * Using feature selection may also improve memory usage, as described above.\n",
2024-09-20 20:25:42 -07:00
" * Remove modules that create high RAM usage (e.g. multiple PolynomialFeatures or one with high degree).\n",
2024-09-23 09:02:43 -07:00
"* Why are my TPOT runs not reproducible when random_state is set?\n",
" * Check that `periodic_checkpoint_folder` is set correctly. If this is set to a non-empty folder, TPOT will continue training from the checkpoint rather than start a new run from scratch. For TPOT runs to be reproducible, they have to have the same starting points.\n",
" * If using custom search spaces, pass in a fixed `random_state` value into the configspace of the scikit-learn modules that utilize them. TPOT does not check whether estimators do or do not take in a random state value (See Tutorial 2).\n",
2024-12-23 11:11:12 -08:00
" * If using the pre-built search spaces provided by TPOT, make sure to pass in `random_state` to `tpot.config.get_configspace` or `tpot.config.template_search_spaces.get_template_search_spaces`. This ensures all estimators that support it get a fixed random_state value. (See Tutorial 2).\n",
" * If using custom Node and Pipeline types, ensure all random decisions utilize the rng parameter passed into the mutation/crossover functions.\n",
" * If `max_eval_time_mins` is set, TPOT will terminate pipelines that exceed this time limit. If the pipeline evaluation happens to be very similar to the time limit, small random fluctuations in CPU allocation may cause a given pipeline to be evaluated in one run but not another. This slightly different result would throw off the random number generator throughout the rest of the run. Setting `max_eval_time_mins` to None or a higher value may prevent this edge case.\n",
2024-09-23 09:02:43 -07:00
" * If using `TPOTEstimatorSteadyState` with `n_jobs`>1, it is also possible that random fluctuations in CPU allocation slightly change the order in which pipelines are evaluated, which will affect the downstream results. `TPOTEstimatorSteadyState` is more reliably reproducible when `n_jobs=1` (This is not an issue for the default `TPOTEstimator`, `TPOTClassifier`, `TPOTRegressor` as they used a batched generational approach where execution order does not impact results).\n",
"* TPOT is not using all the CPU cores I expected, given my `n_jobs` setting.\n",
" * The default TPOT algorithm uses a generational approach. This means the TPOT will need to evaluate `population_size` (default 50) pipelines before starting the next batch. At the end of each generation, TPOT may leave threads unused while it waits for the last few pipelines to finish evaluating. Some estimators or pipelines can be significantly slower to evaluate than others. This can be addressed in a few ways:\n",
" * Decrease `max_eval_time_mins` to cut long-running pipeline evaluations early.\n",
2024-09-23 09:02:43 -07:00
" * Remove estimators or hyperparameter configurations that are prone to very slow convergence (which is very often `SVC` or `SVR`).\n",
" * Alternatively, `TPOTEstimatorSteadyState` uses a slightly different backend for the evolutionary algorithm that does not utilize the generational approach. Instead, new pipelines are generated and evaluated as soon as the previous one finishes. With this estimator, all cores should be utilized at all times. \n",
2024-09-28 15:54:13 -07:00
" * Sometimes, setting n_jobs to a multiple of the number of threads can help minimize the chances of threads being idle while waiting for others to finish"
2024-09-20 19:44:59 -07:00
]
},
2024-09-20 14:48:56 -07:00
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# More Options\n",
"\n",
2024-12-23 11:11:12 -08:00
"`tpot.TPOTClassifier` and `tpot.TPOTRegressor` have a simplified set of hyperparameters with default values set for classification and regression problems. Currently, both of these use the standard evolutionary algorithm in the `tpot.TPOTEstimator` class. If you want more control, you can look into either the `tpot.TPOTEstimator` or `tpot.TPOTEstimatorSteadyState` class.\n",
2024-09-20 14:48:56 -07:00
"\n",
2024-12-23 11:11:12 -08:00
"There are two evolutionary algorithms built into TPOT, which corresponds to two different estimator classes.\n",
2024-09-20 14:48:56 -07:00
"\n",
2024-12-23 11:11:12 -08:00
"1. The `tpot.TPOTEstimator` uses a standard evolutionary algorithm that evaluates exactly population_size individuals each generation. This is similar to the algorithm in TPOT1. The next generation does not start until the previous is completely finished evaluating. This leads to underutilized CPU time as the cores are waiting for the last individuals to finish training, but may preserve diversity in the population. \n",
2024-09-20 14:48:56 -07:00
"\n",
2024-12-23 11:11:12 -08:00
"2. The `tpot.TPOTEstimatorSteadyState` differs in that it will generate and evaluate the next individual as soon as an individual finishes the evaluation. The number of individuals being evaluated is determined by the n_jobs parameter. There is no longer a concept of generations. The population_size parameter now refers to the size of the list of evaluated parents. When an individual is evaluated, the selection method updates the list of parents. This allows more efficient utilization when using multiple cores.\n"
2024-09-20 14:48:56 -07:00
]
},
2024-09-23 19:45:04 -07:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
2024-12-23 11:11:12 -08:00
"### tpot.TPOTEstimatorSteadyState"
2024-09-23 19:45:04 -07:00
]
},
2024-09-20 14:48:56 -07:00
{
"cell_type": "code",
2025-02-21 15:56:46 -08:00
"execution_count": 23,
2024-09-20 14:48:56 -07:00
"metadata": {},
2024-09-23 19:45:04 -07:00
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
2025-02-21 15:56:46 -08:00
"Evaluations: : 119it [00:37, 3.21it/s]\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/preprocessing/_data.py:2785: UserWarning: n_quantiles (688) is greater than the total number of samples (426). n_quantiles is set to n_samples.\n",
2024-09-28 17:13:16 -07:00
" warnings.warn(\n"
2024-09-23 19:45:04 -07:00
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
2025-02-21 15:56:46 -08:00
"0.9816225907664725\n"
2024-09-23 19:45:04 -07:00
]
}
],
2024-09-20 14:48:56 -07:00
"source": [
2024-12-23 11:11:12 -08:00
"import tpot\n",
2024-09-20 14:48:56 -07:00
"import sklearn\n",
"import sklearn.datasets\n",
"\n",
"\n",
2024-12-23 11:11:12 -08:00
"graph_search_space = tpot.search_spaces.pipelines.GraphSearchPipeline(\n",
" root_search_space= tpot.config.get_search_space([\"KNeighborsClassifier\", \"LogisticRegression\", \"DecisionTreeClassifier\"]),\n",
" leaf_search_space = tpot.config.get_search_space(\"selectors\"), \n",
" inner_search_space = tpot.config.get_search_space([\"transformers\"]),\n",
2024-09-23 19:45:04 -07:00
" max_size = 10,\n",
")\n",
2024-09-20 14:48:56 -07:00
"\n",
2024-12-23 11:11:12 -08:00
"est = tpot.TPOTEstimatorSteadyState( \n",
2024-09-20 14:48:56 -07:00
" search_space = graph_search_space,\n",
2024-12-23 11:11:12 -08:00
" scorers=['roc_auc_ovr',tpot.objectives.complexity_scorer],\n",
2024-09-20 14:48:56 -07:00
" scorers_weights=[1,-1],\n",
"\n",
2024-09-23 19:45:04 -07:00
"\n",
2024-09-20 14:48:56 -07:00
" classification=True,\n",
"\n",
" max_eval_time_mins=15,\n",
" max_time_mins=30,\n",
2024-09-23 19:45:04 -07:00
" early_stop=10, #In TPOTEstimatorSteadyState, since there are no generations, early_stop is the number of pipelines to evaluate before stopping.\n",
2024-09-28 17:13:16 -07:00
" n_jobs=30,\n",
2024-09-20 14:48:56 -07:00
" verbose=2)\n",
"\n",
"\n",
"scorer = sklearn.metrics.get_scorer('roc_auc_ovo')\n",
2024-09-23 19:45:04 -07:00
"X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)\n",
2024-09-20 14:48:56 -07:00
"X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, train_size=0.75, test_size=0.25)\n",
"est.fit(X_train, y_train)\n",
2024-09-23 19:45:04 -07:00
"print(scorer(est, X_test, y_test))"
2024-09-20 14:48:56 -07:00
]
},
{
"cell_type": "code",
2025-02-21 15:56:46 -08:00
"execution_count": 24,
2024-09-20 14:48:56 -07:00
"metadata": {},
2025-02-21 15:56:46 -08:00
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAnYAAAHWCAYAAAD6oMSKAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAb19JREFUeJzt3QdY1dUbB/AvU5YsUUEUB+GeuEXL0rQyybRclWk2bVhaNsw0rX/Ttk2ztNKsbJhmWmYWbsG9wAmylC1ckPl/3qPQHagoF353fD/Pw5OdH9x7HPf+3vuec97XoaysrAxEREREZPUctZ4AEREREZkHAzsiIiIiG8HAjoiIiMhGMLAjIiIishEM7IiIiIhsBAM7IiIiIhvBwI6IiIjIRjCwIyIiIrIRDOyIiIiIbAQDOyIiIiIbwcCOiIiIyEYwsCMiIiKyEQzsiIiIiGwEAzsiIiIiG8HAjoiIiMhGMLAjIiIishEM7IiIiIhsBAM7IiIiIhvBwI6IiIjIRjCwIyIiIrIRDOyIiIiIbAQDOyIiIiIbwcCOiIiIyEYwsCMiIiKyEQzsiIiIiGwEAzsiIiIiG8HAjoiIiMhGMLAjIiIishHOWk+AiMicSkpKkJGRgdTUVPV1OiUFZ/PzUVpSAkcnJ9Rxd0f9wEA0bNhQffn7+8PJyUnraRMRmYVDWVlZmXkeiohIO5mZmdi1axf2xMSgIC8PZcXF8MrPh09GBlyKi+FYVoZSBwcUOTsj298fue7ucHB2hpunJzqEh6NTp07w8/PT+rdBRFQtDOyIyKolJSVhY1QUjsXFwUWnQ0h8AoIyMuCTlweXkpIL/lyRkxOyPT2R7O+P+JAmKPLwQPOwMET064egoKBa/T0QEZkLAzsiskrFxcXYsGEDtm3YAK+0NFx1Ih6N09LgVFp62Y9V4uiIkwEBONw0BLkBAegeEYGIiAg4O3O3ChFZFwZ2RGR1UlJSsHL5cmSeTETruDiEJSaqpdbqkqXauOBgHAwLg3/jYNwUGYnAwECzzJmIqDYwsCMiq3LixAn8tHQpPJKS0fXAAXjrdGZ/jhwPD0S3aQNdo0a4ddRING3a1OzPQURUExjYEZFVBXXLlixBvRPx6LF/P5yvYNm1qoodHbGlXVtkhIRgxJgxDO6IyCqwjh0RWc3yq2Tq/E/Eo9e+fTUa1Al5/N5798E/Ph4/Lf1OPT8RkaVjYEdEVnFQQvbUyfJrz/37zbKfrirkeXru2w/35CT8tny5mgcRkSVjYEdEFk9Ov8pBCdlTV9OZOmPyfF33H0BGYiI2btxYq89NRHS5GNgRkcXXqZOSJnL6tSYOSlSFj06HVrFx2BoVheTkZE3mQERUFQzsiMiiSfFhqVMnJU201DIxUc1jQ1SUpvMgIroYBnZEZNFtwqSjhBQfrq19dRcizx96Ih7HYmPVvIiILBEDOyKyWNL7VdqESUcJS9AkLQ3OOh12796t9VSIiCrFwI6ILFJJSQn2xMSo3q9X0iasJsg8miYkYHd0tJofEZGlYWBHRBYpIyMDBXl5CMrIgCUJSj83L5kfEZGlYWBHRLVu9uzZaNeuHTp06IBu3brh2LFjJt+TmpqKsuJi3LRm9RU9x5eJiSjUy/Rdu20rhsZEI3JHjPqKz8+/osf1yctT85L56du6dav6vbi4uGDFihVX9NhERNXlXO1HICK6DFILbt26ddi5c6cKgk6ePAlPT0+T75PAyesKgy+xMCkRtwcGwlVv7NtOneHp5ITqcCkpUfOS+bVv375ivFGjRvj8888xd+7caj0+EVF1MLAjololrbkCAgJUUCcaN26s/rt69WrMmjULBQUFKpt3w/XXw8doufPTkwn4PS0NRaWlGNagISae/9mPEuKx8vRpOAAY3jAQLg4OOFVYiNG7diLYzQ0ft21X6Vwm7N2DmaFXoZm7O/pu3YInmzVTj/vwgf2Y1CQErT098fqxY9iWk42i0jLc17gxIhs0gHdGJk4btRiT34d8OTpyIYSItMPAjohq1fXXX4+ZM2eibdu26td33XUXmjVrhjfeeAN//fUX3N3d8cILL2D1mjUYej74E1GZmUg5exbLOnVG6fmgrJ+fH5LOnsWmrCz82LkLXB0dkVVUBF8XF3yeeNIkQyeBnoODAxq4umJ+u/bo6u2N6JxsODoA9V1cEZ2TowK7Q3l5Kqj7PjVFfa88dkFJCW7ftUs9p2txsQpAiYgsDQM7IqpVdevWxY4dO9Ry7Nq1a1Vwt2jRIlVCpHfv3up7zp49i6aS/WrUqOLnorIy8XdGJrbn7FD/n1dSgmP5+SoYG9EwUAV1QoK6CzEO9Lp6++DX06fgCAeMDAxUvz6Wr0NjNzc4OThgQ2YmYnU6/HL6lPr+3JJiJBQUwLGsFCXsG0tEFoiBHRHVOmdnZxXQyZcsy06ePBlDhgzBF198UfE9C+fPR6lel4fSMuCRkBAMb9jQ4LEksLtSnevWxctHj6ggblxQI/yTmYG/0jMQXtf73HMCmHPVVejh42vwczscHOHkzLdPIrI83AxCRLXq0KFDOHLkiPp1WVkZ9u7diwceeEBl8E6cOKHGc3JykH3mDIr0gqe+fr5qaTT/fP24kwUFOFNcjD6+vliWmlJxAlaWYoVk5iSrdzHuTk5wc3TCjpwcXOXhgS7e3urQRVefc4FdX18/fJOcjJLzXS9i8/LUrwudneHq5lYjfz5ERNXBj5xEVKtyc3PxyCOPqOBNdO3aFY899hjCw8MxYsQIFBYWqgMIsvcu29+/4ueu9vPHYZ0OI3ftVJm0us7O+KB1G/T398e+3FwM27kDzg4OGNGgIe4ODlZLq3ft2Y3m7u4XPDwhwr291fKr7L3r5u2Dt48fR+fzGTt5DAkgh+2IUc9Z//zevBx/P7QKDDR4HFlKvummm1S7MSl3EhYWhk2bNtXYnyMRUWUcyuQjMxGRhZFM3m/ff4+b1/+jSoxYiiInJ6y45mrcdPvtBuVOiIgsAZdiicgiNWzYEA7OzsiupMadlmQ+Mi+ZHxGRpeFSLBFZJH9/f7h5eiLZ3x8B1TggYW6/nS3AFx9/jG+WLasYi4iIwLx58zSdFxGRYGBHRBbJyckJHcLDsTM9HW3j4+Gk1x5MKyWOjvDv2RPfzJiBa665RuvpEBGZ4FIsEVmsTp06ocjDAycDAmrk8XPO5CApORmnTp9CURXq0iUEBKDYwwMdO3askfkQEVUXAzsislh+fn5oHhaGw01DUOogDcPMRwI5OaELlKG4uBgZGRkovchZMnn+I01D0LxlSzUvIiJLxMCOiCxaRL9+yA0IQFxwcI0+T0lJcUUJlsrEBgereUT07Vuj8yAiqg4GdkRk0YKCgtA9IgIHw8KQ4+Fhtsd1kSLDrnUMxnS6PBScPWvyvdkeHjjUMgw9+vZV8yEislQM7IjI4smpU7/GwYhu0wbF53vCmoOvry8cHAwfLzsry2BJVp4vum0b+AcHo0+fPmZ7biKimsDAjoisorfskMhI6Bo1wpZ2bc22387ZyQne3ue6TJQrKS1Bdna2+rU8jzxfflAj3BQZqeZBRGTJGNgRkVUIDAzEraNGIiMkBJvatzNb5s7TwwN16hj2fc3P1yGvsFA9jzyfPK88PxGRpWNLMSKyKidOnMBPS7+DR1ISuh44AG+drtqPWVJSglOnT6Os7FytvDxvbxzq1h1lLZpjxJgxaNq0qRlmTkRU8xjYEZHVSUlJwcrly5F5MhGt4+IQlpgIx2q+leny85GRnYWkli0R17o1EjMykF9UhK+++goOZi61QkRUUxjYEZFVktpzGzZswLYNG+CVlobQE/FokpZ2RR0qpKOEFB/e17ABUt3dsWHbNmzcuFFl8pYsWYLRo0fXyO+BiMjcGNgRkVVLSkrCxg0bcCw2Fs46HZomJCAoPQM+eXlwKSm54M8VOTkhW3rR1vPHiSZNVEeJoCZNMOfllxEbG1vxfVKMeN++fSxzQkRWgYEdEdmEzMxM7N69G7ujo1GQl4ey4mJ45efDOyMTrsXFcCwrRamDIwqdnZHj74dcd3c4ODvDzdMTHbt2VW3CJIj77rvvMGrUKIP
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
2024-09-20 14:48:56 -07:00
"source": [
"fitted_pipeline = est.fitted_pipeline_ # access best pipeline directly\n",
2024-09-23 19:45:04 -07:00
"fitted_pipeline.plot()"
2024-09-20 14:48:56 -07:00
]
},
{
"cell_type": "code",
2025-02-21 15:56:46 -08:00
"execution_count": 25,
2024-09-20 14:48:56 -07:00
"metadata": {},
2025-02-21 15:56:46 -08:00
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>roc_auc_score</th>\n",
" <th>complexity_scorer</th>\n",
" <th>Parents</th>\n",
" <th>Variation_Function</th>\n",
" <th>Individual</th>\n",
" <th>Submitted Timestamp</th>\n",
" <th>Completed Timestamp</th>\n",
" <th>Eval Error</th>\n",
" <th>Pareto_Front</th>\n",
" <th>Instance</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.841954</td>\n",
" <td>95.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>&lt;tpot.search_spaces.pipelines.graph.GraphPipel...</td>\n",
" <td>1.740179e+09</td>\n",
" <td>1.740179e+09</td>\n",
" <td>None</td>\n",
" <td>NaN</td>\n",
" <td>[('DecisionTreeClassifier_1', 'SelectPercentil...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.967781</td>\n",
" <td>89.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>&lt;tpot.search_spaces.pipelines.graph.GraphPipel...</td>\n",
" <td>1.740179e+09</td>\n",
" <td>1.740179e+09</td>\n",
" <td>None</td>\n",
" <td>NaN</td>\n",
" <td>[('DecisionTreeClassifier_1', 'SelectFwe_1'), ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.972412</td>\n",
" <td>22.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>&lt;tpot.search_spaces.pipelines.graph.GraphPipel...</td>\n",
" <td>1.740179e+09</td>\n",
" <td>1.740179e+09</td>\n",
" <td>None</td>\n",
" <td>NaN</td>\n",
" <td>[('KNeighborsClassifier_1', 'ColumnOneHotEncod...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.975926</td>\n",
" <td>54.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>&lt;tpot.search_spaces.pipelines.graph.GraphPipel...</td>\n",
" <td>1.740179e+09</td>\n",
" <td>1.740179e+09</td>\n",
" <td>None</td>\n",
" <td>NaN</td>\n",
" <td>[('KNeighborsClassifier_1', 'SelectFwe_1')]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0.964352</td>\n",
" <td>84.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>&lt;tpot.search_spaces.pipelines.graph.GraphPipel...</td>\n",
" <td>1.740179e+09</td>\n",
" <td>1.740179e+09</td>\n",
" <td>None</td>\n",
" <td>NaN</td>\n",
" <td>[('DecisionTreeClassifier_1', 'ZeroCount_1'), ...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" roc_auc_score complexity_scorer Parents Variation_Function \\\n",
"0 0.841954 95.0 NaN NaN \n",
"1 0.967781 89.0 NaN NaN \n",
"2 0.972412 22.0 NaN NaN \n",
"3 0.975926 54.0 NaN NaN \n",
"4 0.964352 84.0 NaN NaN \n",
"\n",
" Individual Submitted Timestamp \\\n",
"0 <tpot.search_spaces.pipelines.graph.GraphPipel... 1.740179e+09 \n",
"1 <tpot.search_spaces.pipelines.graph.GraphPipel... 1.740179e+09 \n",
"2 <tpot.search_spaces.pipelines.graph.GraphPipel... 1.740179e+09 \n",
"3 <tpot.search_spaces.pipelines.graph.GraphPipel... 1.740179e+09 \n",
"4 <tpot.search_spaces.pipelines.graph.GraphPipel... 1.740179e+09 \n",
"\n",
" Completed Timestamp Eval Error Pareto_Front \\\n",
"0 1.740179e+09 None NaN \n",
"1 1.740179e+09 None NaN \n",
"2 1.740179e+09 None NaN \n",
"3 1.740179e+09 None NaN \n",
"4 1.740179e+09 None NaN \n",
"\n",
" Instance \n",
"0 [('DecisionTreeClassifier_1', 'SelectPercentil... \n",
"1 [('DecisionTreeClassifier_1', 'SelectFwe_1'), ... \n",
"2 [('KNeighborsClassifier_1', 'ColumnOneHotEncod... \n",
"3 [('KNeighborsClassifier_1', 'SelectFwe_1')] \n",
"4 [('DecisionTreeClassifier_1', 'ZeroCount_1'), ... "
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
2024-09-20 14:48:56 -07:00
"source": [
2024-09-23 19:45:04 -07:00
"#view the summary of all evaluated individuals as a pandas dataframe\n",
"est.evaluated_individuals.head()"
2024-09-20 14:48:56 -07:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2024-12-23 11:11:12 -08:00
"### tpot.TPOTEstimator"
2024-09-20 14:48:56 -07:00
]
},
{
"cell_type": "code",
2025-02-21 15:56:46 -08:00
"execution_count": 26,
2024-09-20 14:48:56 -07:00
"metadata": {},
2025-02-21 15:56:46 -08:00
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Generation: : 5it [10:06, 121.38s/it]\n",
"/opt/anaconda3/envs/tpotenv/lib/python3.10/site-packages/sklearn/linear_model/_sag.py:349: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge\n",
" warnings.warn(\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.996046608406159\n"
]
}
],
2024-09-20 14:48:56 -07:00
"source": [
2024-12-23 11:11:12 -08:00
"import tpot\n",
2024-09-20 14:48:56 -07:00
"import sklearn\n",
"import sklearn.datasets\n",
"\n",
2024-12-23 11:11:12 -08:00
"est = tpot.TPOTEstimator( \n",
2024-09-20 14:48:56 -07:00
" search_space = graph_search_space,\n",
2024-09-28 17:13:16 -07:00
" max_time_mins=10,\n",
2024-09-20 14:48:56 -07:00
" scorers=['roc_auc_ovr'], #scorers can be a list of strings or a list of scorers. These get evaluated during cross validation. \n",
" scorers_weights=[1],\n",
" classification=True,\n",
" n_jobs=1, \n",
" early_stop=5, #how many generations with no improvement to stop after\n",
" \n",
" #List of other objective functions. All objective functions take in an untrained GraphPipeline and return a score or a list of scores\n",
" other_objective_functions= [ ],\n",
" \n",
" #List of weights for the other objective functions. Must be the same length as other_objective_functions. By default, bigger is better is set to True. \n",
" other_objective_functions_weights=[],\n",
" verbose=2)\n",
"\n",
"scorer = sklearn.metrics.get_scorer('roc_auc_ovo')\n",
2024-09-23 19:45:04 -07:00
"X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)\n",
2024-09-20 14:48:56 -07:00
"X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, train_size=0.75, test_size=0.25)\n",
"est.fit(X_train, y_train)\n",
"print(scorer(est, X_test, y_test))"
]
2024-09-23 19:45:04 -07:00
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Regression Example"
]
},
{
"cell_type": "code",
2025-02-21 15:56:46 -08:00
"execution_count": 27,
2024-09-23 19:45:04 -07:00
"metadata": {},
2025-02-21 15:56:46 -08:00
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Generation: : 24it [18:15, 45.63s/it]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"-2968.0005982574958\n"
]
}
],
2024-09-23 19:45:04 -07:00
"source": [
2024-12-23 11:11:12 -08:00
"import tpot\n",
2024-09-23 19:45:04 -07:00
"import sklearn\n",
"import sklearn.metrics\n",
"import sklearn.datasets\n",
"\n",
"scorer = sklearn.metrics.get_scorer('neg_mean_squared_error')\n",
"X, y = sklearn.datasets.load_diabetes(return_X_y=True)\n",
"X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, train_size=0.75, test_size=0.25)\n",
"\n",
2024-12-23 11:11:12 -08:00
"est = tpot.tpot_estimator.templates.TPOTRegressor(n_jobs=4, max_time_mins=30, verbose=2, cv=5, early_stop=5)\n",
2024-09-23 19:45:04 -07:00
"est.fit(X_train, y_train)\n",
"\n",
"print(scorer(est, X_test, y_test))"
]
2024-09-20 14:48:56 -07:00
}
],
"metadata": {
"kernelspec": {
2025-02-21 15:56:46 -08:00
"display_name": "tpotenv",
2024-09-20 14:48:56 -07:00
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
2025-02-21 15:56:46 -08:00
"version": "3.10.16"
2024-09-20 14:48:56 -07:00
},
2024-12-23 11:11:12 -08:00
"orig_nbformat": 4
2024-09-20 14:48:56 -07:00
},
"nbformat": 4,
"nbformat_minor": 2
}