Modin: Scale your Pandas workflows by changing a single line of code
TAGS
20 tagsModin 0.37.1 This release includes a bug fix and a test fix. Key Features and Updates Since 0.37.0 ------------------------------------- * Stability and Bugfixes * FIX-#7684: When we exceed max_cost for all available Backends an error may occur (#7685) * Update testing suite * TEST-#7686: Fix comparisons in caster tests to check the backend instead of type (#7687) Contributors ------------ @sfc-gh-jkew @sfc-gh-mvashishtha
Modin 0.37.0 This release includes bugfixes for `Series.json`, `DataFrame.rename`, and `eval`, plus performance improvements for joins with `AutoSwitchBackend` enabled. Key Features and Updates Since 0.36.0 ------------------------------------- * Stability and Bugfixes * FIX-#7624: Add proper implementation for `Series.to_json` (#7673) * FIX-#7664: Add typing_extensions dependency (#7665) * FIX-#7667: Fix `axis=None` case for `DataFrame.rename` (#7674) * FIX-#7669: Respect eval(inplace=False). (#7670) * FIX-#7671: Fix transfer message truncating on larger sizes (#7672) * FIX-#7675: Allow backend switching to backends other than provided arguments (#7679) * Update testing suite * TEST-#7681: Interop Tests should all use Backend.get instead of "Ray" (#7680) * New Features * FEAT-#7676: in-place casting between DataFrame engines (#7666) Contributors ------------ @sfc-gh-dpetersohn @sfc-gh-joshi @sfc-gh-mvashishtha
Modin 0.36.0 This release includes a bug fix, a performance improvement for query() and eval(), and changes to the testing suite. Key Features and Updates Since 0.35.0 ------------------------------------- * Stability and Bugfixes * FIX-#7653: Respect `AutoSwitchBackend` for `DataFrame.T`/`Series.T~ (#7654) * Performance enhancements * PERF-#7657: Fork pandas eval and query implementation to improve performance. (#7658) * Update testing suite * TEST-#7659: Ignore ray.init() warning about accelerators environment variable. (#7660) * TEST-#7661: Run push-to-main ray tests in parallel. (#7662) Contributors ------------ @sfc-gh-dpetersohn @sfc-gh-joshi @sfc-gh-mvashishtha
Modin 0.35.0 This release includes various bug fixes and improvements, and adds support for pandas 2.3. Key Features and Updates Since 0.34.0 ------------------------------------- * Stability and Bugfixes * FIX-#7622: Fall back to printing backend switching progress when tqdm is not available. (#7623) * FIX-#7638: Suppress default to pandas warnings on native pandas backend (#7639) * FIX-#7640: Respect AutoSwitchBackend.disable() in init. (#7641) * FIX-#7645: Stop raising an error for applying numpy ufuncs. (#7646) * Performance enhancements * PERF-#7435: Use shallow copies in native pandas mode (#7634) * Update testing suite * TEST-#7629: Update code for mypy 1.17 (#7630) * TEST-#7643: Fix residual failures from pandas 2.3 (#7644) * New Features * FEAT-#7604: Support pandas 2.3 (#7635) * FEAT-#7627: Define move_to and move_from methods (#7628) * FEAT-#7636: Make AutoSwitchBackend False by default. (#7637) * FEAT-#7647: Shorten the hybrid progress bar text (#7648) Contributors ------------ @sfc-gh-joshi @sfc-gh-mvashishtha @sfc-gh-vrpatel
Modin 0.34.1 This release includes a bug fix. Key Features and Updates Since 0.34.0 ------------------------------------- * Stability and Bugfixes * FIX-#7622: Fall back to printing backend switching progress when tqdm is not available. (#7623) Contributors ------------ @sfc-gh-mvashishtha
Modin 0.34.0 This release includes various bug fixes and improvements. Key Features and Updates Since 0.33.0 ------------------------------------- * Stability and Bugfixes * FIX-#5961: Preserve dtypes when inserting column to empty frame. (#7601) * FIX-#7551: Fix name ambiguity for `value_counts()` on Pandas backend (#7585) * FIX-#7582: Add copy parameter to __array__ methods. (#7584) * FIX-#7595: Log backend switching information with the modin logger. (#7597) * FIX-#7611: Display 'modin.pandas' instead of 'None' in backend switching information. (#7612) * FIX-#7616: Implement __array_function__ stub (#7617) * Update testing suite * TEST-#7451: Use https for modin-datasets.intel.com (#7596) * TEST-#7587: Stop calling np.array(copy=None) for numpy<2 (#7588) * TEST-#7598: Allow xgboost to log to root. (#7599) * TEST-#7602: Fix test_pickle by correctly using fixtures. (#7603) * TEST-#7611: Cap mpi4py<4.1 in CI. (#7614) * New Features * FEAT-#7606: Consider self_cost in hybrid casting calculator (#7605) * FEAT-#7607: Support pinning groupby objects in place. (#7608) * FEAT-#7618, FEAT-#7544: Support set_backend() for groupby objects. (#7619) * FEAT-#7620: Support pin_backend(inplace=False) for groupby objects. (#7621) Contributors ------------ @sfc-gh-vrpatel @sfc-gh-joshi @sfc-gh-mvashishtha @sfc-gh-jkew
Modin 0.33.2 This patch release includes some bug fixes. Key Features and Updates Since 0.33.1 ------------------------------------- * Stability and Bugfixes * FIX-#5961: Preserve dtypes when inserting column to empty frame. (#7601) * FIX-#7551: Fix name ambiguity for `value_counts()` on Pandas backend (#7585) * FIX-#7595: Log backend switching information with the modin logger. (#7597) * Update testing suite * TEST-#7598: Allow xgboost to log to root. (#7599) * TEST-#7602: Fix test_pickle by correctly using fixtures. (#7603) * Uncategorized improvements Contributors ------------ @sfc-gh-vrpatel @sfc-gh-mvashishtha
Modin 0.33.1 This patch releases fixes a regression introduced in Modin 0.33.0. Key Features and Updates Since 0.33.0 ------------------------------------- * Stability and Bugfixes * FIX-#7582: Add copy parameter to __array__ methods. (#7584) Contributors ------------ @sfc-gh-mvashishtha
Modin 0.33.0 This release introduces a set of features for switching Modin execution between multiple backends (e.g. Ray and local Pandas) manually or automatically. It also includes several bug fixes. Key Features and Updates Since 0.32.0 ------------------------------------- * Stability and Bugfixes * FIX-#7327: Use sort parameter of DataFrame.stack (#7396) * FIX-#7346: Handle execution on Dask workers to avoid creating conflicting clients (#7347) * FIX-#7375: Fix Series.duplicated dropping name (#7395) * FIX-#7381: Fix Series binary operators ignoring fill_value (#7394) * FIX-#7383: Avoid broadcast issue in partition manager with custom NPartitions (#7399) * FIX-#7404: Implement interchange protocol for datetime columns (#7434) * FIX-#7405: Internally sort indices for loc/iloc set (#7440) * FIX-#7413: Always use positional index before computing argmin/argmax (#7463) * FIX-#7461: Set backend correctly with environment variables. (#7462) * FIX-#7465: Properly implement Series.rename_axis (#7466) * FIX-#7486: Add support for `.astype(pandas.CategoricalDtype(…))` (#7487) * FIX-#7490: Exclude move_to and _update_inplace from casting. (#7491) * FIX-#7495: Separate extensions for aliases. (#7496) * FIX-#7521: Fix wrong extension being used when backend is pinned (#7546) * FIX-#7528: Dispatch module-level extensions to the correct backend (#7529) * FIX-#7532: Display choices in error message of environment vars (#7533) * FIX-#7536: setuptools / ray version conflict in pkg_resources._vendor (#7537) * FIX-#7538: set_backend should exit early if there is nothing to do (#7539) * FIX-#7547: native qc move_to_me_cost does not work with non-subclasses (#7548) * FIX-#7553: Fix groupby when AutoSwitchBackend is disabled. (#7554) * FIX-#7555: Get the correct extension when AutoSwitchBackend is False. (#7556) * FIX-#7559: Create the dummy query compiler just once per backend. (#7560) * FIX-#7562: Raise AttributeError for missing extension properties. (#7563) * FIX-#7569: Fix handling of pyarrow dtype and empty dataframes (#7570) * FIX-#7576: Fix ambiguous AttributeError message (#7577) * FIX-#7578: Change groupby extension allow list and fix cached_property extensions (#7579) * Performance enhancements * PERF-#7397: Avoid materializing index/columns in shape checks (#7398) * Refactor Codebase * REFACTOR-#7315: Refactor axis checks in squeeze (#7400) * REFACTOR-#7418: Rename internal interchange protocol methods. (#7422) * REFACTOR-#7427: Require query compilers to expose engine and storage format. (#7430) * REFACTOR-#7470: Combine backend casting and extension code at the API layer. (#7485) * REFACTOR-#7493: Improve the clarity of the costing functions (#7494) * REFACTOR-#7527: Add more costing logic to the base query compiler. (#7530) * REFACTOR-#7534: Provide internal, overridable method for max_shape (#7535) * REFACTOR-#7564: Fix docstrings for transfer thresholds. (#7565) * Update testing suite * TEST-#7419: Fix a few errors in CI (#7420) * TEST-#7421: Fix unidist with APT-installed MPI (#7423) * TEST-#7431: Fix formatting for isort 6 and black 25 (#7432) * TEST-#7437: Check execution-filter outputs correctly in CI. (#7438) * TEST-#7441: Correctly skip sanity tests if we don't need them. (#7442) * TEST-#7457: Fix SSL certificate error in notebooks by using http. (#7458) * TEST-#7497: Skip tests requiring lxml on windows. (#7500) * TEST-#7571: xfail test_read_csv_s3_issue4658 due to missing s3 bucket (#7572) * Documentation improvements * DOCS-#7566: Add pandas on snowflake + backend pinning to documentation page (#7567) * New Features * FEAT-#7433: Replace NativeDataFrameMode with a complete "native" execution. (#7436) * FEAT-#7445: Add metrics interface so third-parties can collect metrics from the modin frontend (#7444) * FEAT-#7448: Allow QueryCompilerCaster to apply cost-optimization on automatic casting (#7464) * FEAT-#7455: Add Backend config variable as an alias for execution. (#7456) * FEAT-#7459: Add methods to get and set backend. (#7460) * FEAT-#7468: Add progress bar for engine switch (#7469) * FEAT-#7472: Add an option register dataframe and series accessors with a particular backend. (#7473) * FEAT-#7474: Register general functions with a particular backend. (#7489) * FEAT-#7475: Choose the correct __init__ method from extensions and apply casting to __init__. (#7488) * FEAT-#7477: Move the query compiler calculator so it can be used in more places (#7478) * FEAT-#7480: Implement max_cost interface (#7481) * FEAT-#7482: Add "from_qc" API to QueryCompiler and BackendCostCalculator to handle asymmetric information scenarios (#7483) * FEAT-#7492: Allow I/O function accessors. (#7502) * FEAT-#7505: Support post-operation automatic backend switch. (#7506) * FEAT-#7507: Support pre-operation automatic backend switch. (#7512) * FEAT-#7509: Add AutoSwitchBackend configuration variable (#7510) * FEAT-#7511: Support pre-operation switch for init by passing arguments to cost functions. (#7531) * FEAT-#7521: Support pinning objects to a backend (#7522) * FEAT-#7523: Improve formal definition of the automatic switching algorithm (#7524) * FEAT-#7540: Ability to configure NativeQueryCompiler AutoSwitch Settings (#7561) * FEAT-#7542: Support post-operation backend switch for groupby. (#7545) * FEAT-#7543: Let plugins register groupby accessors. (#7575) * FEAT-#7549: Emit metrics on auto-switch and casting behavior (#7550) * FEAT-#7557: Add operation and size information to backend switch progress (#7558) * FEAT-#7573: Dispatch __array_ufunc__ to query compilers (#7574) Contributors ------------ @CRiddler @YarShev @anmyachev @data-makerman @devin-petersohn @emmanuel-ferdman @mpeleshenko @noloerino @sfc-gh-dpetersohn @sfc-gh-jkew @sfc-gh-joshi @sfc-gh-mvashishtha
Modin 0.32.0 This release introduces support for Polars API, a new query compiler for small data, more functions that can use dynamic partitioning, as well as several bug fixes. Key Features and Updates Since 0.31.0 ------------------------------------- * Stability and Bugfixes * FIX-#0000: Fix type hint (#7343) * FIX-#7113: Fix docstring overrides for subclasses. (#7354) * FIX-#7134: Use a separate docstring class for BasePandasDataset. (#7353) * FIX-#7329: Do not sort columns on df.update (#7330) * FIX-#7351: Add ipython method calls to non-lookup list (#7352) * FIX-#7355: Cpu count would be set incorrectly on a cluster (#7356) * FIX-#7357: Fix `NoAttributeError` on `DataFrame.copy` (#7358) * FIX-#7371: Fix inserting datelike values into a DataFrame (#7372) * FIX-#7373: Try a previous version of `motoserver/moto` service, pin to 5.0.13 (#7374) * FIX-#7379: Fix __imul__ performing addition instead of multiplication (#7380) * FIX-#7387: Limit the number of pytest workers for tests with Ray engine on Windows (#7388) * FIX-#7389: Fix uploading artifacts (#7390) * Refactor Codebase * REFACTOR-#0000: Update copyright date (#7333) * Documentation improvements * DOCS-#0000: Update RunLLM Ask AI widget script path (#7345) * DOCS-#7335: Fix borken links in Modin Usage Examples page (#7336) * DOCS-#7382: Add documentation on how to use Modin Native query compiler (#7386) * New Features * FEAT-#4605: Add native query compiler (#7259) * FEAT-#7308: Interoperability between query compilers (#7376) * FEAT-#7331: Initial Polars API (#7332) * FEAT-#7337: Using dynamic partitionning in `broadcast_apply` (#7338) * FEAT-#7340: Add more granular lazy flags to query compiler (#7348) * FEAT-#7368: Add a new environment variable for using dynamic partitioning (#7369) Contributors ------------ @MortalHappiness @Retribution98 @YarShev @ZhipengXue97 @anmyachev @arunjose696 @devin-petersohn @likawind @sfc-gh-joshi @sfc-gh-mvashishtha
Modin 0.31.0 First release compatible with NumPy 2.0. Key Features and Updates Since 0.30.0 ------------------------------------- * Stability and Bugfixes * FIX-#7138: Stop reloading modules for custom docstrings. (#7307) * FIX-#7263: Empty docstrings should not be inherited (#7264) * FIX-#7272: Remove HDK engine (#7275) * FIX-#7277: Remove Cudf storage format as unmaintained (#7290) * FIX-#7278: Make sure `enable_logging` decorator preserve type hints (#7279) * FIX-#7292: Prepare Modin code to NumPy 2.0 (#7293) * FIX-#7295: Unpin numexpr to allow versions >= 2.8.4 to match pandas (#7296) * FIX-#7309: Update versioneer with `versioneer install --vendor` (#7311) * FIX-#7320: Bump the github-actions group with 3 updates (#7319) * FIX-#7321: Using 'C' engine instead of 'pyarrow' for getting metadata in 'read_csv' (#7322) * Performance enhancements * PERF-#7299: Avoid using `synchronize_labels` for `combine` function (#7300) * Refactor Codebase * REFACTOR-#7271: Remove `instance_type` attribute of axis partitions (#7268) * REFACTOR-#7273: Remove deprecated functions from utils.py, accessor.py and io.py (#7274) * REFACTOR-#7285: Remove deprecated configs (#7286) * REFACTOR-#7294: Reduce access of methods `_modin_frame` methods from `_query_compiler` (#7297) * REFACTOR-#7313: Add similar methods as in #7294 for operating on columns (#7314) * Update testing suite * TEST-#0000: Add a Dependabot config to auto-update GitHub action versions (#7318) * TEST-#7316: Run a subset of CI tests with python 3.10 and 3.11 on a scheduled basis (#7289) * Documentation improvements * DOCS-#0000: Adds RunLLM widget to docs (#7326) * DOCS-#7287: Update Modin on Dask documentation (#7288) * New Features * FEAT-#6574: UserWarning no longer displayed when Series/DataFrames are small (#7323) * FEAT-#7249: Add `reload_modin` feature (#7280) * FEAT-#7265: Automatic publication of Modin wheel to PyPI (#7262) * FEAT-#7283: Introduce MinRowPartitionSize and MinColumnPartitionSize (#7284) * FEAT-#7310: NumPy 2.0 support (#7312) Contributors ------------ @Jayson729 @Retribution98 @YarShev @anmyachev @arunjose696 @kurtmckee @sfc-gh-dpetersohn @vsreekanti
Modin 0.27.1 This release pins numpy<2. Key Features and Updates Since 0.27.0 ------------------------------------- * Stability and Bugfixes * FIX-#6968: Align API with pandas (#6969) * FIX-#7302: Pin numpy<2 (072453b) * New Features * FEAT-#7265: Automatic publication of Modin wheel to PyPI (#7262) Contributors ------------ @anmyachev @dchigarev @sfc-gh-dpetersohn
Modin 0.28.3 This release pins numpy<2. Key Features and Updates Since 0.28.2 ------------------------------------- * Stability and Bugfixes * FIX-#7302: Pin numpy<2 (072453b) * New Features * FEAT-#7265: Automatic publication of Modin wheel to PyPI (#7262) Contributors ------------ @anmyachev @sfc-gh-dpetersohn
Modin 0.29.1 This release pins numpy<2. Key Features and Updates Since 0.29.0 ------------------------------------- * Stability and Bugfixes * FIX-#7302: Pin numpy<2 (072453b) * New Features * FEAT-#7265: Automatic publication of Modin wheel to PyPI (#7262) Contributors ------------ @anmyachev @sfc-gh-dpetersohn
Modin 0.30.1 This release pins numpy<2. Key Features and Updates Since 0.30.0 ------------------------------------- * Stability and Bugfixes * FIX-#7302: Pin numpy<2 (072453b) Contributors ------------ @anmyachev
Modin 0.30.0 This release introduces support for DataFrame API standard, a distributed implementation for right merge/join, more efficient implementation of internal operators, which gives a performance boost to almost all distributed Modin functions, improved compatibility with pandas on pyarrow backend, type hints for pandas API to improve UX. Key Features and Updates Since 0.29.0 ------------------------------------- * Stability and Bugfixes * FIX-#0000: Fix badge in README.md (#7213) * FIX-#0000: Make merge tests more stable by sorting results (#7266) * FIX-#6967: Remove read_pickle_distributed/to_pickle_distributed functions as deprecated (#7258) * FIX-#7093: Make sure 'idxmax' and 'idxmin' can work with string columns (#7193) * FIX-#7102: Remove `enable_api_only` mode in modin logging (#7194) * FIX-#7103: Move lower-level functionality logging to debug (#7184) * FIX-#7143: Constructing a DataFrame from a Modin Series with tuple name should produce MultiIndex columns (#7214) * FIX-#7185: Add extra check for some config classes (#7189) * FIX-#7201: Update docs on how to enable Modin logs for high-level API and low-level API (#7209) * FIX-#7206: Make sure df.melt handle duplicate value_vars correctly (#7208) * FIX-#7219: Pin dataframe-api-compat>=0.2.7 (#7220) * FIX-#7221: Don't use 'use_legacy_dataset=False' for 'ParquetDataset' (#7222) * FIX-#7224: Importing modin.pandas.api.extensions overwrites re-export of pandas.api submodules (#7225) * FIX-#7233: Display property name in default_to_pandas error messages (#7269) * FIX-#7234: Deprecate HDK engine (#7235) * FIX-#7238: Fix docstring inheritance for `cached_property` and use it (#7239) * FIX-#7240: Allow `doc_checker.py` works with `functools.cached_property` (#7241) * FIX-#7246: Pin pyarrow>=10.0.1 as pandas 2.2.* does (#7247) * FIX-#7248: Make sure '_validate_dtypes_sum_prod_mean' works correctly with datetime types (#7237) * FIX-#7250: Revert "PERF-#6666: Avoid internal reset_index for left merge" (#7251) * Performance enhancements * PERF-#7227: Call 'modin_frame.combine()' for merge and join only when necessary (#7228) * PERF-#7230: Don't preserve bad partition for 'merge' (#7229) * Refactor Codebase * REFACTOR-#7242: Add type hints for `modin/core/dataframe/algebra/` (#7243) * REFACTOR-#7260: Use `extract_dtype` internal function in more places (#7261) * Update testing suite * TEST-#7049: Add some sanity tests with pyarrow-backed pandas dataframes (#7199) * TEST-#7191: Fix ASV after changing default branch (#7190) * Documentation improvements * DOCS-#0000: Fix a typo with MODIN_CPUS number (#7198) * DOCS-#0000: Supplement Optmization Notes with a link to configs (#7197) * DOCS-#7217: Update docs as to when Modin operators work best (#7218) * DOCS-#7255: Update docs as to from_* functions (#7256) * New Features * FEAT-#5394: Reduce amount of remote calls for Map operator (#7136) * FEAT-#5394: Reduce amount of remote calls for TreeReduce and GroupByReduce operators (#7245) * FEAT-#6492: Add `from_map` feature to create dataframe (#7215) * FEAT-#6498: Make Fold operator more flexible (#7257) * FEAT-#6808: Implement '__arrow_array__' for Series (#7200) * FEAT-#6890: Modin implementation of DataFrame API standard (#7216) * FEAT-#7139: Use ray-core instead of ray-default (#6955) * FEAT-#7187: Change "master" branch to "main" (#7188) * FEAT-#7202: Use custom resources for Ray (#7205) * FEAT-#7203: Make sure Modin works correctly with pandas, which uses pyarrow as a backend (#7204) * FEAT-#7207: Add the ability to assing a df to a columns selection without d2p (#7210) * FEAT-#7252: Add type hints for `base.py` (#7253) * FEAT-#7254: Support right merge/join (#7226) Contributors ------------ @Retribution98 @YarShev @anmyachev @arunjose696 @noloerino @sfc-gh-jkew
Modin 0.29.0 This release introduces `modin.pandas.testing` and `modin.pandas.arrays` modules, faster implementation (range-partitioning) for `pivot_table`, `unique`, `drop_duplicates`, `nunique`, `df.resample` functions, new functions to interact with Dask: `to/from_dask` distributed implementation for `Series.case_when`, optimization for `astype` function with scalar dtype. Key Features and Updates Since 0.28.0 ------------------------------------- * Stability and Bugfixes * FIX-#6227: Make sure `Series.unique()` with pyarrow dtype returns `ArrowExtensionArray` (#7042) * FIX-#6793: Use 'pandas_dtype' instead of 'np.dtype' for some more places in Modin code (#6794) * FIX-#7039: Pass scalar dtype as is to astype query compiler (#7152) * FIX-#7051: Update exception message for 'astype' function (#7052) * FIX-#7054: Update exception message for `shift` function (#7055) * FIX-#7056: Update exception message for `iloc/loc` functions (#7057) * FIX-#7058: Update exception message for `insert` function (#7059) * FIX-#7060: Fix 'pivot' when index or columns are of Index type (#7061) * FIX-#7062: Update exception message for `aggregate` function (#7063) * FIX-#7072: Replace MaterializationHook with the materialized object on serialization. (#7075) * FIX-#7088: Make sure `rank` raises `No axis named None...` exception (#7089) * FIX-#7115: Exclude Ray 2.10.0 from deps installation (#7116) * FIX-#7135: Fix appending a new row (#7172) * FIX-#7153: Fix 'Series.corr' with 'method != pearson' (#7158) * FIX-#7157: Make sure `quantile` function works with `numeric_only=True` (#7160) * FIX-#7170: Don't use `MinPartitionSize` configuration variable in remote context (#7177) * Performance enhancements * PERF-#5296: Partition parquet file if it has too few row groups (#7016) * PERF-#7068: Provide shape_hint="column" for some more operations with Series (#7069) * PERF-#7123: Preserve shape_hint for dropna (#7124) * PERF-#7130: Preserve partition lengths in apply_full_axis with keep_partitioning=True (#7131) * PERF-#7132: Preserve partition lengths in apply_full_axis with keep_partitioning=False (#7133) * PERF-#7150: Reduce peak memory consumption (#7149) * Refactor Codebase * REFACTOR-#3257: Move logging and caching to the `gen_data` internal function (#7046) * REFACTOR-#7105: Deprecate 'cfg.RangePartitioningGroupby' (#7161) * REFACTOR-#7106: Rename from/to_ray_dataset to from/to_ray (#7107) * REFACTOR-#7109: Remove the outdated aws_example.yaml file. (#7110) * Update testing suite * TEST-#3622: Centralize tests in Modin (#7137) * TEST-#6016: Make sure `eval_general` doesn't expect exceptions by default (#6954) * TEST-#7064: Explicitly check for exceptions in `test_groupby.py` (#7065) * TEST-#7066: Explicitly check for exceptions in `test_io.py` (#7067) * TEST-#7073: Explicitly check for exceptions in `test_default.py` (#7074) * TEST-#7076: Explicitly check for exceptions in `test_map_metadata.py` (#7077) * TEST-#7082: Explicitly check for exceptions in 'test_series.py' (#7083) * TEST-#7084: Explicitly check for exceptions in 'test_indexing.py' (#7085) * TEST-#7086: Explicitly check for exceptions in `test_reduce.py` (#7087) * TEST-#7094: Rename 'raising_exceptions' argument of 'eval_general' testing function (#7095) * TEST-#7125: Explicitly install modin in ci tests (#7126) * TEST-#7165: Add codecov token to fix CI on master (#7175) * TEST-#7166: Fix HDF tests in CI (#7167) * TEST-#7173: Update github actions (#7168) * Documentation improvements * DOCS-#2434: Clarify the use of '--signoff' option (#7145) * DOCS-#6987: Rework range-partitioning docs (#7169) * DOCS-#7144: Add information about logging from user defined function (#7155) * New Features * FEAT-#4527: Add Modin logging to AxisPartition and BlockPartition classes (#7079) * FEAT-#6783: Implement `modin.pandas.testing` module (#7045) * FEAT-#6929: Implement Series.case_when in a distributed way (#6972) * FEAT-#7004: Use generators when returning from _deploy_ray_func remote function. (#7005) * FEAT-#7021: Implement to/from_dask functions (#7022) * FEAT-#7047: Add range-partitioning implementation for '.pivot_table()' (#7048) * FEAT-#7070: Add `modin.pandas.arrays` module (#7071) * FEAT-#7078: Add modin_layer names to classes that inherit ClassLogger (#7099) * FEAT-#7090: Add range-partitioning implementation for '.unique()' and '.drop_duplicates()' (#7091) * FEAT-#7100: Add range-partitioning impl for 'nunique()' (#7101) * FEAT-#7102: Deprecate enable_api_only mode in modin logging (#7114) * FEAT-#7111: Implemented `@remote_function` decorator with cache (#7112) * FEAT-#7117: Support building range-partitioning from an index level (#7120) * FEAT-#7118: Add range-partitioning impl for 'df.resample()' (#7140) * FEAT-#7128: Update minimal supported version of Ray up to 2.1.0 (#7129) * FEAT-#7141: Add an ability to use config variables with a context manager (#7142) * FEAT-#7146: Use BaseQueryCompiler, BasePandasDataset, DataFrame or Series type hints at a high level (#7147) * FEAT-#7156: Add type hints for Series (#7154) * FEAT-#7178: Add type hints for DataFrame (#7179) * FEAT-#7180: Add type hints for modin.pandas.[functions] (#7181) Contributors ------------ @AndreyPavlenko @Retribution98 @YarShev @anmyachev @arunjose696 @dchigarev @sfc-gh-mvashishtha
Modin 0.28.2 This release reverts the pandas requirement from 2.2.1 to >=2.2,<2.3 Key Features and Updates Since 0.28.1 ------------------------------------- * New Features * FEAT-#7162: Revert pandas version to >=2.2,<2.3 (67e2541) Contributors ------------ @sfc-gh-mvashishtha
Modin 0.28.1 This release pins pandas to 2.2.1. This pin will be removed in a subsequent release. Key Features and Updates Since 0.28.0 ------------------------------------- * New Features * FEAT-#7162: Pin pandas to 2.2.1 (87d147f) Contributors ------------ @sfc-gh-dpetersohn
Modin 0.28.0 This release introduces `modin.pandas.api.extensions` module, faster implementations for `merge` and `groupby.rolling` functions and new functions to work with Ray Dataset: `to/from_ray_dataset`. It also includes some other new features, performance optimizations and bug fixes. Key Features and Updates Since 0.27.0 ------------------------------------- * Stability and Bugfixes * FIX-#6935: Fix Merge failed when right operand is an empty dataframe (#6941) * FIX-#6936: Fix 'read_parquet' when dataset is created with 'to_parquet' and 'index=False' (#6937) * FIX-#6944: Apply 'isort' formatting for scripts from tutorials (#6945) * FIX-#6946: Remove 'needs: [lint-black-isort, ...]' (#6947) * FIX-#6948: Fix groupby when Modin dataframe has several column partitions (#6951) * FIX-#6952: use `render_as_string` to get sqlalchemy engine url (#6953) * FIX-#6968: Align API with pandas (#6969) * FIX-#6974: Always use actual pandas version in 'test_all_urls_exist' (#6975) * FIX-#6982: Updating data in notebooks from yellow taxi to green taxi dataset (#6993) * FIX-#6984: Ensure the results of inplace operations materialize (for tests) (#6985) * Performance enhancements * PERF-#6976: Do not trigger unnecessary computations on `._propagate_index_objs()` (#6977) * PERF-#6979: Do not trigger `._copartition()` for identical indices on binary operations (#6980) * Refactor Codebase * REFACTOR-#6856: Rename read_pickle_distributed/to_pickle_distributed to read_pickle_glob/to_pickle_glob (#6957) * REFACTOR-#6939: Make 'modin.pandas.DataFrame._to_pandas' a public method (#6940) * REFACTOR-#6958: Remove DataFrame.to_pickle_distributed in favour of DataFrame.modin.to_pickle_distributed (#6959) * REFACTOR-#7002: get more information about exceptions from `eval_general` utility (#7003) * REFACTOR-#7008: Remove `check_exception_type` argument of `eval_general` function (#7009) * REFACTOR-#7013: Move `to_pandas` and `to_ray_dataset` into modin namespace (#7014) * REFACTOR-#7017: Align 'to_hdf' and 'hist' signatures to pandas (#7018) * Update testing suite * TEST-#6932: don't use deprecated 'pandas._testing.makeStringIndex' (#6933) * TEST-#6994: Update tests in `test_series.py` (#6995) * TEST-#6996: Update tests in `test_io.py` (#6997) * Documentation improvements * DOCS-#6871: Update Modin on Ray cluster tutorial (#6872) * DOCS-#6949: Create Modin on Dask cluster tutorial (#6950) * DOCS-#6962: Remove links to https://discuss.modin.org (#6963) * New Features * FEAT-#3044: Create Extentions Module in Modin (#6961) * FEAT-#4622: Unify data type of log_level in logging module (#6992) * FEAT-#6913: Support sqlalchemy connectables in `read_sql` by getting connection url (#6956) * FEAT-#6934: Support 'include_groups=False' parameter in 'groupby.apply()' (#6938) * FEAT-#6942: Enable range-partitioning impl for 'groupby().rolling()' by default (#6943) * FEAT-#6965: Implement `.merge()` using range-partitioning implementation (#6966) * FEAT-#6970: Implement to/from_ray_dataset functions (#6971) * FEAT-#6983: Add Pluggable Documentation Module Support (#6986) * FEAT-#7001: Do not force materialization in MetaList.__getitem__() (#7006) Contributors ------------ @AndreyPavlenko @Retribution98 @YarShev @anmyachev @arunjose696 @dchigarev @sfc-gh-dpetersohn @tochigiv