Modin: Scale your Pandas workflows by changing a single line of code
FEAT-#7433: Replace NativeDataFrameMode with a complete "native" execution. (#7436)
# User-facing changes Prior to this commit, users had to switch the config variable `NativeDataFrameMode` from the default of `"Default"` to `"Pandas"` to use native execution. Now native execution is another modin execution mode with `StorageFormat` of `"Native"` and `Engine` of `"Native.`" # Integration tests and CI Prior to this commit, we ran 1) a [set of tests](https://github.com/modin-project/modin/tree/8a832de870243294c407dee6300d993647205ff3/modin/tests/pandas/native_df_mode) checking that native Modin dataframes could interoperate with non-native dataframes 2) a [subset](https://github.com/modin-project/modin/blob/8a832de870243294c407dee6300d993647205ff3/.github/workflows/ci.yml#L710-L719) of tests in native dataframe mode. Now, we run the interoperability test suite, but also run the entire rest of the test suite (except for some partitioned-execution-only tests) in native execution mode via the `test-all` job matrix. This commit also renames the interoperability test suite from modin/tests/pandas/native_df_mode to modin/tests/pandas/native_df_interoperability/. # Deleting most of the NativeQueryCompiler implementation NativeQueryCompiler had a long implementation which was mostly the same as the BaseQueryCompiler implementation. However, there were some bugs in NativeQueryCompiler, including some correctness bugs related to copying the underlying pandas dataframe (see #7435). This commit deletes most of the NativeQueryCompiler implementation, so that the native query compiler mostly works just like the BaseQueryCompiler. The main difference is that while `BaseQueryCompiler` uses a partitioned pandas dataframe (under the `Python` execution, so all in a single process), the native query compiler does not use partitions. # Warning messages about default to pandas While BaseQueryCompiler and BaseIO warn when they default to pandas, they should not do so when using native execution. We add class-level fields to these classes that tell whether to warn on default to pandas. By default, we treat warnings as errors in our test suite, so in many places we have to look for the default to pandas warning only if we are not native execution mode. For convenience, this PR adds testing utility methods to 1) detect the global native execution mode 2) detect whether a dataframe or series is using native execution 3) conditionally expect a warning about defaulting to pandas. Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
M
Mahesh Vashishtha committed
4152c95e0b94205015afd4e44563524b2d11f432
Parent: cce5325
Committed by GitHub <noreply@github.com>
on 2/13/2025, 5:10:48 PM