mpi4py 4.1 is incompatible with the mpich versions that we install on ubuntu in CI.
Resolves#7611
---------
Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
Add an option to associate each base, dataframe, and series extension with a particular
backend. Also, add `register_base_extension()`.
Wrap each base, dataframe and series method in a dispatcher that checks
the backend of its first argument and then dispatches to the
corresponding backend's extension, if it exists. For non-method
extensions, including properties and scalars (like ints), we have to
extend `__getatribute__`, `__getattr__`, `__delattr__`, and
`__setattr__` to implement the extension.
In the future, the dispatcher could choose an appropriate backend when
it has inputs from multiple backends. It could then convert all
arguments to the correct backend and then choose the extension matching
that backend.
- [x] first commit message and PR title follow format outlined
[here](https://modin.readthedocs.io/en/latest/development/contributing.html#commit-message-formatting)
> **_NOTE:_** If you edit the PR title to match this format, you need to
add another commit (even if it's empty) or amend your last commit for
the CI job that checks the PR title to pick up the new PR title.
- [x] passes `flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py`
- [x] passes `black --check modin/ asv_bench/benchmarks
scripts/doc_checker.py`
- [x] signed commit with `git commit -s` <!-- you can amend your commit
with a signature via `git commit -amend -s` -->
- [x] Resolves#7472
- [x] tests added and passing
- [x] module layout described at `docs/development/architecture.rst` is
up-to-date <!-- if you have added, renamed or removed files or
directories please update the documentation accordingly -->
---------
Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
Co-authored-by: Jonathan Shi <149419494+sfc-gh-joshi@users.noreply.github.com>
Add `get_backend()` to get the backend for a dataframe or series. Add `set_backend()`, and its alias `move_to()`, to set the backend of a dataframe or series.
To implement `set_backend()`, extend `FactoryDispatcher` so that it can dispatch I/O operations to the backend that the user chooses instead of always using `modin.config.Backend`. `set_backend()` can then use `FactoryDispatcher.from_pandas(backend=new_backend)` to get a query compiler with the given backend.
This commit also updates the documentation for "native" execution mode to reflect the updated guidance of using `Backend` to control execution. It also adds examples of using `get_backend()` and `set_backend()`.
Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
# User-facing changes
Prior to this commit, users had to switch the config variable `NativeDataFrameMode` from the default of `"Default"` to `"Pandas"` to use native execution. Now native execution is another modin execution mode with `StorageFormat` of `"Native"` and `Engine` of `"Native.`"
# Integration tests and CI
Prior to this commit, we ran 1) a [set of tests](8a832de870/modin/tests/pandas/native_df_mode) checking that native Modin dataframes could interoperate with non-native dataframes 2) a [subset](8a832de870/.github/workflows/ci.yml (L710-L719)) of tests in native dataframe mode.
Now, we run the interoperability test suite, but also run the entire rest of the test suite (except for some partitioned-execution-only tests) in native execution mode via the `test-all` job matrix. This commit also renames the interoperability test suite from modin/tests/pandas/native_df_mode to modin/tests/pandas/native_df_interoperability/.
# Deleting most of the NativeQueryCompiler implementation
NativeQueryCompiler had a long implementation which was mostly the same as the BaseQueryCompiler implementation. However, there were some bugs in NativeQueryCompiler, including some correctness bugs related to copying the underlying pandas dataframe (see #7435). This commit deletes most of the NativeQueryCompiler implementation, so that the native query compiler mostly works just like the BaseQueryCompiler. The main difference is that while `BaseQueryCompiler` uses a partitioned pandas dataframe (under the `Python` execution, so all in a single process), the native query compiler does not use partitions.
# Warning messages about default to pandas
While BaseQueryCompiler and BaseIO warn when they default to pandas, they should not do so when using native execution. We add class-level fields to these classes that tell whether to warn on default to pandas.
By default, we treat warnings as errors in our test suite, so in many places we have to look for the default to pandas warning only if we are not native execution mode. For convenience, this PR adds testing utility methods to 1) detect the global native execution mode 2) detect whether a dataframe or series is using native execution 3) conditionally expect a warning about defaulting to pandas.
Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
- Use Miniforge3 instead of Mambaforge in conda-incubator/setup-miniconda action, per https://github.com/conda-incubator/setup-miniconda/issues/383
- Skip test that tries to use Modin with unidist installed via APT. See #7421 for details.
- Remove manual conda packaging caching, an optimization which was added to speed up CI but now causes an error complaining that `CONDA_PKGS_DIR` is empty.
- Fix some mypy errors in modin/__init__.py