314 Commits

Author SHA1 Message Date
Mahesh Vashishtha
c046a92afc TEST-#7661: Run push-to-main ray tests in parallel. (#7662)
Resolves #7661

---------

Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
2025-09-09 16:35:44 -07:00
Mahesh Vashishtha
eb2bbb4d7e TEST-#7611: Cap mpi4py<4.1 in CI. (#7614)
mpi4py 4.1 is incompatible with the mpich versions that we install on ubuntu in CI.

Resolves #7611

---------

Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
2025-06-27 13:38:17 -07:00
sfc-gh-mvashishtha
bf5f344074 Revert "FIX-#7582: [DRAFT] Test fix for array copy parameter on python 3.10"
This reverts commit b3481645d3, which @sfc-gh-mvashishtha accidentally force-pushed.
2025-05-27 15:42:56 -07:00
sfc-gh-mvashishtha
b3481645d3 FIX-#7582: [DRAFT] Test fix for array copy parameter on python 3.10
Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
2025-05-27 15:40:33 -07:00
Mahesh Vashishtha
21117c6cc2 FEAT-#7472: Add an option register dataframe and series accessors with a particular backend. (#7473)
Add an option to associate each base, dataframe, and series extension with a particular
backend. Also, add `register_base_extension()`.

Wrap each base, dataframe and series method in a dispatcher that checks
the backend of its first argument and then dispatches to the
corresponding backend's extension, if it exists. For non-method
extensions, including properties and scalars (like ints), we have to
extend `__getatribute__`, `__getattr__`, `__delattr__`, and
`__setattr__` to implement the extension.

In the future, the dispatcher could choose an appropriate backend when
it has inputs from multiple backends. It could then convert all
arguments to the correct backend and then choose the extension matching
that backend.

- [x] first commit message and PR title follow format outlined
[here](https://modin.readthedocs.io/en/latest/development/contributing.html#commit-message-formatting)
> **_NOTE:_** If you edit the PR title to match this format, you need to
add another commit (even if it's empty) or amend your last commit for
the CI job that checks the PR title to pick up the new PR title.
- [x] passes `flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py`
- [x] passes `black --check modin/ asv_bench/benchmarks
scripts/doc_checker.py`
- [x] signed commit with `git commit -s` <!-- you can amend your commit
with a signature via `git commit -amend -s` -->
- [x] Resolves #7472
- [x] tests added and passing
- [x] module layout described at `docs/development/architecture.rst` is
up-to-date <!-- if you have added, renamed or removed files or
directories please update the documentation accordingly -->

---------

Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
Co-authored-by: Jonathan Shi <149419494+sfc-gh-joshi@users.noreply.github.com>
2025-03-24 17:12:48 -07:00
John Kew
6252ebde19 FEAT-#7445: Add metrics interface so third-parties can collect metrics from the modin frontend (#7444)
Adds an interface for collecting frontend API statistics by third party systems. Specifically this will be used by Snowflake's pandas engine for Modin to collect data on interactive use cases.

Signed-off-by: John Kew <john.kew@snowflake.com>
Co-authored-by: Jonathan Shi <149419494+sfc-gh-joshi@users.noreply.github.com>
Co-authored-by: Mahesh Vashishtha <mahesh.vashishtha@snowflake.com>
Co-authored-by: Devin Petersohn <devin.petersohn@snowflake.com>
2025-02-27 13:50:39 -08:00
Mahesh Vashishtha
8c7799fdbb TEST-#7441: Correctly skip sanity tests if we don't need them. (#7442)
The "sanity" tests for ray, dask, and unidist are supposed to run a subset of the "test-all" suite. If we detect a change to a particular engine, we're supposed to run the `test-all` job using that engine. However, if we don't detect a change, we're supposed to run the "test-sanity" job using that engine.

Prior to this commit, we always run the sanity jobs because the `if` key-value pairs [here](4152c95e0b/.github/workflows/ci.yml (L563-L575)) don't do anything. In this commit, use [test matrix exclusions](https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/running-variations-of-jobs-in-a-workflow) to skip executions that we don't need to run.

## Testing

I have validated that we only run the expected sanity tests in the following scenarios, which are older commits on the branch I am trying to merge here:

- change ray, dask, and unidist => [skip all sanity tests](https://github.com/modin-project/modin/actions/runs/13320322721/job/37203660664?pr=7442)
- change ray and unidist only => [run dask sanity test only](https://github.com/modin-project/modin/actions/runs/13335886632/job/37250965225?pr=7442)
- change dask and ray only => [run unidist sanity tests only](https://github.com/modin-project/modin/actions/runs/13336232376/job/37252017095?pr=7442)
- change dask and unidist only => [run ray sanity tests only](https://github.com/modin-project/modin/actions/runs/13336496671/job/37252848023?pr=7442)
- change unidist only => [run ray and dask sanity tests only](https://github.com/modin-project/modin/actions/runs/13335936087/job/37251081720?pr=7442)
- change ray only =>  [run unidist and dask sanity tests only](https://github.com/modin-project/modin/actions/runs/13336041245/job/37251429105?pr=7442)
- change dask only => [run ray and unidist sanity tests only](https://github.com/modin-project/modin/actions/runs/13336182248/job/37251864934?pr=7442)
2025-02-14 15:22:57 -08:00
Mahesh Vashishtha
4152c95e0b FEAT-#7433: Replace NativeDataFrameMode with a complete "native" execution. (#7436)
# User-facing changes

Prior to this commit, users had to switch the config variable `NativeDataFrameMode` from the default of `"Default"` to `"Pandas"` to use native execution. Now native execution is another modin execution mode with `StorageFormat` of `"Native"` and `Engine` of `"Native.`"

# Integration tests and CI

Prior to this commit, we ran 1) a [set of tests](8a832de870/modin/tests/pandas/native_df_mode) checking that native Modin dataframes could interoperate with non-native dataframes 2) a [subset](8a832de870/.github/workflows/ci.yml (L710-L719)) of tests in native dataframe mode.

Now, we run the interoperability test suite, but also run the entire rest of the test suite (except for some partitioned-execution-only tests) in native execution mode via the `test-all` job matrix. This commit also renames the interoperability test suite from modin/tests/pandas/native_df_mode to modin/tests/pandas/native_df_interoperability/.

# Deleting most of the NativeQueryCompiler implementation

NativeQueryCompiler had a long implementation which was mostly the same as the BaseQueryCompiler implementation. However, there were some bugs in NativeQueryCompiler, including some correctness bugs related to copying the underlying pandas dataframe (see #7435). This commit deletes most of the NativeQueryCompiler implementation, so that the native query compiler mostly works just like the BaseQueryCompiler. The main difference is that while `BaseQueryCompiler` uses a partitioned pandas dataframe (under the `Python` execution, so all in a single process), the native query compiler does not use partitions.

# Warning messages about default to pandas

While BaseQueryCompiler and BaseIO warn when they default to pandas, they should not do so when using native execution. We add class-level fields to these classes that tell whether to warn on default to pandas.

By default, we treat warnings as errors in our test suite, so in many places we have to look for the default to pandas warning only if we are not native execution mode. For convenience, this PR adds testing utility methods to 1) detect the global native execution mode 2) detect whether a dataframe or series is using native execution 3) conditionally expect a warning about defaulting to pandas.

Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
2025-02-13 09:10:48 -08:00
Mahesh Vashishtha
cce5325b1f TEST-#7437: Check execution-filter outputs correctly in CI. (#7438)
Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
2025-02-10 11:01:27 -08:00
Anatoly Myachev
02515bb8e6 TEST-#7421: Fix unidist with APT-installed MPI (#7423)
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
2025-01-21 10:18:55 -08:00
Mahesh Vashishtha
caa6116d64 TEST-#7419: Fix a few errors in CI (#7420)
- Use Miniforge3 instead of Mambaforge in conda-incubator/setup-miniconda action, per https://github.com/conda-incubator/setup-miniconda/issues/383
- Skip test that tries to use Modin with unidist installed via APT. See #7421 for details.
- Remove manual conda packaging caching, an optimization which was added to speed up CI but now causes an error complaining that `CONDA_PKGS_DIR` is empty.
- Fix some mypy errors in modin/__init__.py
2025-01-15 20:46:37 -08:00
Iaroslav Igoshev
33577098af FIX-#7389: Fix uploading artifacts (#7390)
Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
2024-09-09 16:32:37 +02:00
Iaroslav Igoshev
f3c0a63579 FIX-#7387: Limit the number of pytest workers for tests with Ray engine on Windows (#7388)
Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
2024-09-06 18:34:43 +02:00
Arun Jose
cf5d638ec7 FEAT-#7308: Interoperability between query compilers (#7376)
Co-authored-by: Anatoly Myachev <anatoliimyachev@mail.com>
Co-authored-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
Signed-off-by: arunjose696 <arunjose696@gmail.com>
2024-09-02 14:29:23 +02:00
Arun Jose
da015711d9 FEAT-#4605: Add native query compiler (#7259)
Co-authored-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
Signed-off-by: arunjose696 <arunjose696@gmail.com>
2024-08-26 15:26:11 +02:00
Anatoly Myachev
8fc230a0a6 FIX-#7373: Try a previous version of motoserver/moto service, pin to 5.0.13 (#7374)
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
2024-08-26 13:34:37 +02:00
Devin Petersohn
7c1dde0716 FEAT-#7331: Initial Polars API (#7332)
* FEAT-#7331: Initial Polars API

This commit adds a polars namespace to Modin, and the DataFrame and
Series objects and their respective APIs. This doesn't include error
handling and is still missing several polars features:

* LazyFrame
* Expressions
* String, Temporal, Struct, and other Series accessors
* Several parameters
* Operators that we don't have query compiler methods for
   * e.g. sin, cos, tan, etc.

Those will be handled in a future PR.

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* Lint

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* flake8

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* isort

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* headers

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* forgot one

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* Add test

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* header

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* isort

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* Add to CI

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* fix name

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* Update modin/polars/base.py

Co-authored-by: Mahesh Vashishtha <mahesh.vashishtha@snowflake.com>

* address comments

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* polars 1

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* Update for polars 1.x and fix some hacks

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* Remove hax

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* Black

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* Address comments

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* Lint

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* Address comment

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

---------

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>
Co-authored-by: Devin Petersohn <devin.petersohn@snowflake.com>
Co-authored-by: Mahesh Vashishtha <mahesh.vashishtha@snowflake.com>
2024-07-24 15:39:20 -05:00
dependabot[bot]
c647cf4e19 FIX-#7320: Bump the github-actions group with 3 updates (#7319)
Bumps the github-actions group with 3 updates: [actions/cache](https://github.com/actions/cache), [Slashgear/action-check-pr-title](https://github.com/slashgear/action-check-pr-title) and [github/codeql-action](https://github.com/github/codeql-action).


Updates `actions/cache` from 2 to 4
- [Release notes](https://github.com/actions/cache/releases)
- [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md)
- [Commits](https://github.com/actions/cache/compare/v2...v4)

Updates `Slashgear/action-check-pr-title` from 3.0.0 to 4.3.0
- [Release notes](https://github.com/slashgear/action-check-pr-title/releases)
- [Commits](https://github.com/slashgear/action-check-pr-title/compare/v3.0.0...v4.3.0)

Updates `github/codeql-action` from 2 to 3
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/v2...v3)

---
updated-dependencies:
- dependency-name: actions/cache
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions
- dependency-name: Slashgear/action-check-pr-title
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-19 13:12:24 +02:00
Anatoly Myachev
d54f92790a TEST-#7316: Run a subset of CI tests with python 3.10 and 3.11 on a scheduled basis (#7289)
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
2024-06-17 13:12:58 +02:00
Anatoly Myachev
dea0003210 FEAT-#7265: Automatic publication of Modin wheel to PyPI (#7262)
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Co-authored-by: Devin Petersohn <devin.petersohn@snowflake.com>
2024-06-06 18:27:37 +02:00
Iaroslav Igoshev
685f7badf9 FIX-#7272: Remove HDK engine (#7275)
Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
2024-05-16 16:44:52 +02:00
Anatoly Myachev
df81f3a37d FIX-#7240: Allow doc_checker.py works with functools.cached_property (#7241)
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
2024-05-08 12:06:52 +02:00
Anatoly Myachev
9ca33b4107 FEAT-#6890: Modin implementation of DataFrame API standard (#7216)
Co-authored-by: Iaroslav Igoshev <Poolliver868@mail.ru>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
2024-04-25 15:30:24 +02:00
Iaroslav Igoshev
30d75d72ff FEAT-#7187: Change "master" branch to "main" (#7188)
Co-authored-by: Anatoly Myachev <anatoliimyachev@mail.com>
Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
2024-04-17 09:49:46 +02:00
Anatoly Myachev
d1d2130ef7 TEST-#7191: Fix ASV after changing default branch (#7190)
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
2024-04-16 22:40:15 +02:00
Mahesh Vashishtha
2b046e4038 TEST-#7165: Add codecov token to fix CI on master (#7175)
Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
2024-04-12 00:02:01 +02:00
Anatoly Myachev
d239010080 TEST-#7173: Update github actions (#7168)
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
2024-04-11 13:46:51 +02:00
Anatoly Myachev
975b32ce16 TEST-#7166: Fix HDF tests in CI (#7167)
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
2024-04-11 10:17:19 +02:00
Anatoly Myachev
d8cd9ce85c TEST-#3622: Centralize tests in Modin (#7137)
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
2024-04-09 13:12:30 +02:00
Dmitry Chigarev
629bf9d72a FEAT-#7090: Add range-partitioning implementation for '.unique()' and '.drop_duplicates()' (#7091)
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
2024-03-19 13:44:29 +01:00
Dmitry Chigarev
a96639529a FEAT-#6965: Implement .merge() using range-partitioning implementation (#6966)
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
2024-03-01 12:10:09 +01:00
Devin Petersohn
cee9b98198 FEAT-#3044: Create Extentions Module in Modin (#6961)
* FEAT-#6960: Create Exentions Module in Modin

---------

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>
Co-authored-by: Iaroslav Igoshev <Poolliver868@mail.ru>
2024-03-01 09:45:37 +01:00
Dmitry Chigarev
b875991755 FIX-#6946: Remove 'needs: [lint-black-isort, ...]' (#6947)
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
2024-02-19 18:21:50 +01:00
Dmitry Chigarev
5ec075dbf8 FIX-#6944: Apply 'isort' formatting for scripts from tutorials (#6945)
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
2024-02-19 16:33:30 +01:00
Iaroslav Igoshev
e55e6a0cc2 TEST-#6920: Remove testing for Ray client (#6921)
Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
2024-02-06 18:07:03 +01:00
Anatoly Myachev
128b286e05 TEST-#6885: Switch to black>=24.1.0 (#6887)
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
2024-01-26 16:59:40 +01:00
Anatoly Myachev
097ea527c8 FIX-#6830: Pass AWS related env vars to mpiexec (#6867)
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
2024-01-19 14:49:41 +01:00
Anatoly Myachev
ff13d6f6fc TEST-#6777: Make to_csv tests on Unidist more stable (for test-all-unidist CI job) (#6851)
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
2024-01-11 14:59:38 +01:00
Anatoly Myachev
31f8bd0e69 REFACTOR-#6812: Remove 'PyarrowOnRay' execution in favour of pyarrow-backed pandas dataframes (#6848)
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
2024-01-10 15:27:42 +01:00
Anatoly Myachev
c3a4f781ee FEAT-#6767: Provide the ability to use experimental functionality when experimental mode is not enabled globally via an environment variable (#6764)
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
2023-12-08 17:31:14 +01:00
Anatoly Myachev
76d741bec2 TEST-#6777: Make to_csv tests on Unidist more stable (#6776)
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
2023-11-29 14:35:35 +01:00
Iaroslav Igoshev
97d88b2109 FEAT-#6735: Make Modin on MPI through unidist component more obvious (#6736)
Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
2023-11-17 20:49:32 +01:00
Iaroslav Igoshev
7eeb9b782f FIX-#6587: Use different env files for unidist engine for windows and linux (#6588)
Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
2023-09-21 13:20:35 +02:00
Anatoly Myachev
5febff397d TEST-#4270: revert disabling time_groupby_agg_nunique ASV bench (#6564)
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
2023-09-18 13:57:20 +02:00
Anatoly Myachev
0e9cfdb257 REFACTOR-#4902: use isort (#6551)
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
2023-09-15 18:10:42 +02:00
Anatoly Myachev
9886c0124b FIX-#6549: remove usage of dfsql module (#6550)
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Co-authored-by: Vasily Litvinov <fam1ly.n4me@yandex.ru>
2023-09-12 15:44:27 +02:00
Anatoly Myachev
56638d2c70 TEST-#6477: Update ASV to 0.5.1 (#6432)
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
2023-09-05 09:55:14 +03:00
Anatoly Myachev
de65e832e9 TEST-#0000: download ray wheel for python 3.9 (#6513)
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
2023-08-28 08:11:35 -05:00
Anatoly Myachev
cfac1d7723 FEAT-#6511: update the minimum supported python version up to 3.9 (#6508)
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
2023-08-28 11:11:41 +03:00
Anatoly Myachev
a9869a77c5 TEST-#6505: update python version for ASV benchmarks on HDK (#6504)
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
2023-08-24 17:56:08 +02:00