SIGN IN SIGN UP
apache / spark UNCLAIMED

Apache Spark - A unified analytics engine for large-scale data processing

0 0 1098 Scala

[SPARK-46386][PYTHON] Improve assertions of observation (pyspark.sql.observation)

### What changes were proposed in this pull request?
Improve and test assertions of observation (pyspark.sql.observation).

### Why are the changes needed?
Better error handling.

### Does this PR introduce _any_ user-facing change?
Yes, PySparkAssertionError is raised in the cases below:

```py
>>> observation = Observation()
>>> observation.get()
Traceback (most recent call last):
...
pyspark.errors.exceptions.base.PySparkAssertionError: [NO_OBSERVE_BEFORE_GET] Should observe by calling `DataFrame.observe` before `get`.
>>> df.observe(observation, count(lit(1)))
DataFrame[id: bigint, val: double, label: string]
>>> df.observe(observation, count(lit(1)))
Traceback (most recent call last):
...
    raise PySparkAssertionError(error_class="REUSE_OBSERVATION", message_parameters={})
pyspark.errors.exceptions.base.PySparkAssertionError: [REUSE_OBSERVATION] An Observation can be used with a DataFrame only once.
```

### How was this patch tested?
Test change only.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #44324 from xinrong-meng/test_observe.

Authored-by: Xinrong Meng <xinrong@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
X
Xinrong Meng committed
62d4bab3f2b30cbd5c87f0bb475f8b57e230e02e
Parent: 7004f9e
Committed by Hyukjin Kwon <gurwls223@apache.org> on 12/16/2023, 5:45:50 PM