modin

mirror of https://github.com/modin-project/modin.git synced 2026-03-28 14:47:07 +00:00

Author	SHA1	Message	Date
Mahesh Vashishtha	c046a92afc	TEST-#7661: Run push-to-main ray tests in parallel. (#7662 ) Resolves #7661 --------- Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>	2025-09-09 16:35:44 -07:00
Mahesh Vashishtha	eb2bbb4d7e	TEST-#7611: Cap mpi4py<4.1 in CI. (#7614 ) mpi4py 4.1 is incompatible with the mpich versions that we install on ubuntu in CI. Resolves #7611 --------- Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>	2025-06-27 13:38:17 -07:00
sfc-gh-mvashishtha	bf5f344074	Revert "FIX-#7582: [DRAFT] Test fix for array copy parameter on python 3.10" This reverts commit `b3481645d3`, which @sfc-gh-mvashishtha accidentally force-pushed.	2025-05-27 15:42:56 -07:00
sfc-gh-mvashishtha	b3481645d3	FIX-#7582: [DRAFT] Test fix for array copy parameter on python 3.10 Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>	2025-05-27 15:40:33 -07:00
Mahesh Vashishtha	21117c6cc2	FEAT-#7472: Add an option register dataframe and series accessors with a particular backend. (#7473 ) Add an option to associate each base, dataframe, and series extension with a particular backend. Also, add `register_base_extension()`. Wrap each base, dataframe and series method in a dispatcher that checks the backend of its first argument and then dispatches to the corresponding backend's extension, if it exists. For non-method extensions, including properties and scalars (like ints), we have to extend `__getatribute__`, `__getattr__`, `__delattr__`, and `__setattr__` to implement the extension. In the future, the dispatcher could choose an appropriate backend when it has inputs from multiple backends. It could then convert all arguments to the correct backend and then choose the extension matching that backend. - [x] first commit message and PR title follow format outlined [here](https://modin.readthedocs.io/en/latest/development/contributing.html#commit-message-formatting) > _NOTE:_ If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title. - [x] passes `flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py` - [x] passes `black --check modin/ asv_bench/benchmarks scripts/doc_checker.py` - [x] signed commit with `git commit -s` <!-- you can amend your commit with a signature via `git commit -amend -s` --> - [x] Resolves #7472 - [x] tests added and passing - [x] module layout described at `docs/development/architecture.rst` is up-to-date <!-- if you have added, renamed or removed files or directories please update the documentation accordingly --> --------- Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com> Co-authored-by: Jonathan Shi <149419494+sfc-gh-joshi@users.noreply.github.com>	2025-03-24 17:12:48 -07:00
Mahesh Vashishtha	5c00dba490	FEAT-#7459: Add methods to get and set backend. (#7460 ) Add `get_backend()` to get the backend for a dataframe or series. Add `set_backend()`, and its alias `move_to()`, to set the backend of a dataframe or series. To implement `set_backend()`, extend `FactoryDispatcher` so that it can dispatch I/O operations to the backend that the user chooses instead of always using `modin.config.Backend`. `set_backend()` can then use `FactoryDispatcher.from_pandas(backend=new_backend)` to get a query compiler with the given backend. This commit also updates the documentation for "native" execution mode to reflect the updated guidance of using `Backend` to control execution. It also adds examples of using `get_backend()` and `set_backend()`. Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>	2025-03-12 20:53:03 -07:00
John Kew	6252ebde19	FEAT-#7445: Add metrics interface so third-parties can collect metrics from the modin frontend (#7444 ) Adds an interface for collecting frontend API statistics by third party systems. Specifically this will be used by Snowflake's pandas engine for Modin to collect data on interactive use cases. Signed-off-by: John Kew <john.kew@snowflake.com> Co-authored-by: Jonathan Shi <149419494+sfc-gh-joshi@users.noreply.github.com> Co-authored-by: Mahesh Vashishtha <mahesh.vashishtha@snowflake.com> Co-authored-by: Devin Petersohn <devin.petersohn@snowflake.com>	2025-02-27 13:50:39 -08:00
Mahesh Vashishtha	8c7799fdbb	TEST-#7441: Correctly skip sanity tests if we don't need them. (#7442 ) The "sanity" tests for ray, dask, and unidist are supposed to run a subset of the "test-all" suite. If we detect a change to a particular engine, we're supposed to run the `test-all` job using that engine. However, if we don't detect a change, we're supposed to run the "test-sanity" job using that engine. Prior to this commit, we always run the sanity jobs because the `if` key-value pairs [here](`4152c95e0b/.github/workflows/ci.yml (L563-L575)`) don't do anything. In this commit, use [test matrix exclusions](https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/running-variations-of-jobs-in-a-workflow) to skip executions that we don't need to run. ## Testing I have validated that we only run the expected sanity tests in the following scenarios, which are older commits on the branch I am trying to merge here: - change ray, dask, and unidist => [skip all sanity tests](https://github.com/modin-project/modin/actions/runs/13320322721/job/37203660664?pr=7442) - change ray and unidist only => [run dask sanity test only](https://github.com/modin-project/modin/actions/runs/13335886632/job/37250965225?pr=7442) - change dask and ray only => [run unidist sanity tests only](https://github.com/modin-project/modin/actions/runs/13336232376/job/37252017095?pr=7442) - change dask and unidist only => [run ray sanity tests only](https://github.com/modin-project/modin/actions/runs/13336496671/job/37252848023?pr=7442) - change unidist only => [run ray and dask sanity tests only](https://github.com/modin-project/modin/actions/runs/13335936087/job/37251081720?pr=7442) - change ray only => [run unidist and dask sanity tests only](https://github.com/modin-project/modin/actions/runs/13336041245/job/37251429105?pr=7442) - change dask only => [run ray and unidist sanity tests only](https://github.com/modin-project/modin/actions/runs/13336182248/job/37251864934?pr=7442)	2025-02-14 15:22:57 -08:00
Mahesh Vashishtha	4152c95e0b	FEAT-#7433: Replace NativeDataFrameMode with a complete "native" execution. (#7436 ) # User-facing changes Prior to this commit, users had to switch the config variable `NativeDataFrameMode` from the default of `"Default"` to `"Pandas"` to use native execution. Now native execution is another modin execution mode with `StorageFormat` of `"Native"` and `Engine` of `"Native.`" # Integration tests and CI Prior to this commit, we ran 1) a [set of tests](`8a832de870/modin/tests/pandas/native_df_mode`) checking that native Modin dataframes could interoperate with non-native dataframes 2) a [subset](`8a832de870/.github/workflows/ci.yml (L710-L719)`) of tests in native dataframe mode. Now, we run the interoperability test suite, but also run the entire rest of the test suite (except for some partitioned-execution-only tests) in native execution mode via the `test-all` job matrix. This commit also renames the interoperability test suite from modin/tests/pandas/native_df_mode to modin/tests/pandas/native_df_interoperability/. # Deleting most of the NativeQueryCompiler implementation NativeQueryCompiler had a long implementation which was mostly the same as the BaseQueryCompiler implementation. However, there were some bugs in NativeQueryCompiler, including some correctness bugs related to copying the underlying pandas dataframe (see #7435). This commit deletes most of the NativeQueryCompiler implementation, so that the native query compiler mostly works just like the BaseQueryCompiler. The main difference is that while `BaseQueryCompiler` uses a partitioned pandas dataframe (under the `Python` execution, so all in a single process), the native query compiler does not use partitions. # Warning messages about default to pandas While BaseQueryCompiler and BaseIO warn when they default to pandas, they should not do so when using native execution. We add class-level fields to these classes that tell whether to warn on default to pandas. By default, we treat warnings as errors in our test suite, so in many places we have to look for the default to pandas warning only if we are not native execution mode. For convenience, this PR adds testing utility methods to 1) detect the global native execution mode 2) detect whether a dataframe or series is using native execution 3) conditionally expect a warning about defaulting to pandas. Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>	2025-02-13 09:10:48 -08:00
Mahesh Vashishtha	cce5325b1f	TEST-#7437: Check execution-filter outputs correctly in CI. (#7438 ) Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>	2025-02-10 11:01:27 -08:00
Anatoly Myachev	02515bb8e6	TEST-#7421: Fix unidist with APT-installed MPI (#7423 ) Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>	2025-01-21 10:18:55 -08:00
Mahesh Vashishtha	caa6116d64	TEST-#7419: Fix a few errors in CI (#7420 ) - Use Miniforge3 instead of Mambaforge in conda-incubator/setup-miniconda action, per https://github.com/conda-incubator/setup-miniconda/issues/383 - Skip test that tries to use Modin with unidist installed via APT. See #7421 for details. - Remove manual conda packaging caching, an optimization which was added to speed up CI but now causes an error complaining that `CONDA_PKGS_DIR` is empty. - Fix some mypy errors in modin/__init__.py	2025-01-15 20:46:37 -08:00
Iaroslav Igoshev	33577098af	FIX-#7389: Fix uploading artifacts (#7390 ) Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>	2024-09-09 16:32:37 +02:00
Iaroslav Igoshev	f3c0a63579	FIX-#7387: Limit the number of pytest workers for tests with Ray engine on Windows (#7388 ) Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>	2024-09-06 18:34:43 +02:00
Arun Jose	cf5d638ec7	FEAT-#7308: Interoperability between query compilers (#7376 ) Co-authored-by: Anatoly Myachev <anatoliimyachev@mail.com> Co-authored-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com> Signed-off-by: arunjose696 <arunjose696@gmail.com>	2024-09-02 14:29:23 +02:00
Arun Jose	da015711d9	FEAT-#4605: Add native query compiler (#7259 ) Co-authored-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com> Signed-off-by: arunjose696 <arunjose696@gmail.com>	2024-08-26 15:26:11 +02:00
Anatoly Myachev	8fc230a0a6	FIX-#7373: Try a previous version of `motoserver/moto` service, pin to 5.0.13 (#7374 ) Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>	2024-08-26 13:34:37 +02:00
Devin Petersohn	7c1dde0716	FEAT-#7331: Initial Polars API (#7332 ) * FEAT-#7331: Initial Polars API This commit adds a polars namespace to Modin, and the DataFrame and Series objects and their respective APIs. This doesn't include error handling and is still missing several polars features: * LazyFrame * Expressions * String, Temporal, Struct, and other Series accessors * Several parameters * Operators that we don't have query compiler methods for * e.g. sin, cos, tan, etc. Those will be handled in a future PR. Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com> * Lint Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com> * flake8 Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com> * isort Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com> * headers Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com> * forgot one Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com> * Add test Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com> * header Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com> * isort Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com> * Add to CI Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com> * fix name Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com> * Update modin/polars/base.py Co-authored-by: Mahesh Vashishtha <mahesh.vashishtha@snowflake.com> * address comments Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com> * polars 1 Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com> * Update for polars 1.x and fix some hacks Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com> * Remove hax Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com> * Black Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com> * Address comments Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com> * Lint Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com> * Address comment Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com> --------- Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com> Co-authored-by: Devin Petersohn <devin.petersohn@snowflake.com> Co-authored-by: Mahesh Vashishtha <mahesh.vashishtha@snowflake.com>	2024-07-24 15:39:20 -05:00
dependabot[bot]	c647cf4e19	FIX-#7320: Bump the github-actions group with 3 updates (#7319 ) Bumps the github-actions group with 3 updates: [actions/cache](https://github.com/actions/cache), [Slashgear/action-check-pr-title](https://github.com/slashgear/action-check-pr-title) and [github/codeql-action](https://github.com/github/codeql-action). Updates `actions/cache` from 2 to 4 - [Release notes](https://github.com/actions/cache/releases) - [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md) - [Commits](https://github.com/actions/cache/compare/v2...v4) Updates `Slashgear/action-check-pr-title` from 3.0.0 to 4.3.0 - [Release notes](https://github.com/slashgear/action-check-pr-title/releases) - [Commits](https://github.com/slashgear/action-check-pr-title/compare/v3.0.0...v4.3.0) Updates `github/codeql-action` from 2 to 3 - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/v2...v3) --- updated-dependencies: - dependency-name: actions/cache dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions - dependency-name: Slashgear/action-check-pr-title dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-06-19 13:12:24 +02:00
Kurt McKee	35298c057f	Add a Dependabot config to auto-update GitHub action versions (#7318 ) Signed-off-by: Kurt McKee <contactme@kurtmckee.org>	2024-06-17 18:31:06 -05:00
Anatoly Myachev	d54f92790a	TEST-#7316: Run a subset of CI tests with python 3.10 and 3.11 on a scheduled basis (#7289 ) Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>	2024-06-17 13:12:58 +02:00
Anatoly Myachev	dea0003210	FEAT-#7265: Automatic publication of Modin wheel to PyPI (#7262 ) Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com> Co-authored-by: Devin Petersohn <devin.petersohn@snowflake.com>	2024-06-06 18:27:37 +02:00
Iaroslav Igoshev	685f7badf9	FIX-#7272: Remove HDK engine (#7275 ) Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>	2024-05-16 16:44:52 +02:00
Anatoly Myachev	df81f3a37d	FIX-#7240: Allow `doc_checker.py` works with `functools.cached_property` (#7241 ) Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>	2024-05-08 12:06:52 +02:00
Anatoly Myachev	9ca33b4107	FEAT-#6890: Modin implementation of DataFrame API standard (#7216 ) Co-authored-by: Iaroslav Igoshev <Poolliver868@mail.ru> Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>	2024-04-25 15:30:24 +02:00
Iaroslav Igoshev	30d75d72ff	FEAT-#7187: Change "master" branch to "main" (#7188 ) Co-authored-by: Anatoly Myachev <anatoliimyachev@mail.com> Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>	2024-04-17 09:49:46 +02:00
Anatoly Myachev	d1d2130ef7	TEST-#7191: Fix ASV after changing default branch (#7190 ) Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>	2024-04-16 22:40:15 +02:00
Mahesh Vashishtha	2b046e4038	TEST-#7165: Add codecov token to fix CI on master (#7175 ) Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>	2024-04-12 00:02:01 +02:00
Dmitry Chigarev	fe57e19c32	REFACTOR-#7105: Deprecate 'cfg.RangePartitioningGroupby' (#7161 ) Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>	2024-04-11 14:48:30 +02:00
Anatoly Myachev	d239010080	TEST-#7173: Update github actions (#7168 ) Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>	2024-04-11 13:46:51 +02:00
Anatoly Myachev	975b32ce16	TEST-#7166: Fix HDF tests in CI (#7167 ) Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>	2024-04-11 10:17:19 +02:00
Anatoly Myachev	d8cd9ce85c	TEST-#3622: Centralize tests in Modin (#7137 ) Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>	2024-04-09 13:12:30 +02:00
Dmitry Chigarev	8e7912234c	FEAT-#7118: Add range-partitioning impl for 'df.resample()' (#7140 ) Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>	2024-04-04 16:38:32 +02:00
Dmitry Chigarev	4380eb7406	TEST-#7125: Explicitly install modin in ci tests (#7126 ) Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>	2024-03-26 22:31:57 +01:00
Dmitry Chigarev	629bf9d72a	FEAT-#7090: Add range-partitioning implementation for '.unique()' and '.drop_duplicates()' (#7091 ) Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>	2024-03-19 13:44:29 +01:00
Dmitry Chigarev	f98f050b34	FEAT-#7100: Add range-partitioning impl for 'nunique()' (#7101 ) Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>	2024-03-19 12:34:25 +01:00
Dmitry Chigarev	a96639529a	FEAT-#6965: Implement `.merge()` using range-partitioning implementation (#6966 ) Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>	2024-03-01 12:10:09 +01:00
Devin Petersohn	cee9b98198	FEAT-#3044: Create Extentions Module in Modin (#6961 ) * FEAT-#6960: Create Exentions Module in Modin --------- Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com> Co-authored-by: Iaroslav Igoshev <Poolliver868@mail.ru>	2024-03-01 09:45:37 +01:00
Dmitry Chigarev	b875991755	FIX-#6946: Remove 'needs: [lint-black-isort, ...]' (#6947 ) Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>	2024-02-19 18:21:50 +01:00
Dmitry Chigarev	5ec075dbf8	FIX-#6944: Apply 'isort' formatting for scripts from tutorials (#6945 ) Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>	2024-02-19 16:33:30 +01:00
Iaroslav Igoshev	e55e6a0cc2	TEST-#6920: Remove testing for Ray client (#6921 ) Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>	2024-02-06 18:07:03 +01:00
Anatoly Myachev	128b286e05	TEST-#6885: Switch to black>=24.1.0 (#6887 ) Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>	2024-01-26 16:59:40 +01:00
Anatoly Myachev	097ea527c8	FIX-#6830: Pass AWS related env vars to mpiexec (#6867 ) Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>	2024-01-19 14:49:41 +01:00
Anatoly Myachev	ff13d6f6fc	TEST-#6777: Make `to_csv` tests on Unidist more stable (for `test-all-unidist` CI job) (#6851 ) Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>	2024-01-11 14:59:38 +01:00
Anatoly Myachev	31f8bd0e69	REFACTOR-#6812: Remove 'PyarrowOnRay' execution in favour of pyarrow-backed pandas dataframes (#6848 ) Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>	2024-01-10 15:27:42 +01:00
Dmitry Chigarev	324099d873	REFACTOR-#6807: Rename experimental groupby and experimental numpy variables (#6809 ) Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>	2023-12-12 16:32:18 +01:00
Anatoly Myachev	c3a4f781ee	FEAT-#6767: Provide the ability to use experimental functionality when experimental mode is not enabled globally via an environment variable (#6764 ) Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>	2023-12-08 17:31:14 +01:00
Anatoly Myachev	76d741bec2	TEST-#6777: Make `to_csv` tests on Unidist more stable (#6776 ) Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>	2023-11-29 14:35:35 +01:00
Iaroslav Igoshev	97d88b2109	FEAT-#6735: Make Modin on MPI through unidist component more obvious (#6736 ) Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>	2023-11-17 20:49:32 +01:00
Iaroslav Igoshev	7eeb9b782f	FIX-#6587: Use different env files for unidist engine for windows and linux (#6588 ) Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>	2023-09-21 13:20:35 +02:00

1 2 3 4 5 ...

340 Commits