I had a crazy issue where a databricks job was taking over 13 hours and failing which had tzken just over 1 hour previously on another workspace.

Turned out after a bit of staring at logs I had a face palming moment because yet again I got bit by spot instances/spot workers.

I found in my logs Spark UI these weird errors, “Executor 1 removed” “Executor 2 removed” ,.. thinking memory issues or asymmetric shuffle issues but then hovered my mouse and saw

Executor 1 Removed at 2026/03/10 19:17:48
Reason: {"cause": "spot instance preemption","detectionMechanism": null}

i had tried many solutions. including skew detection, suspectijg skew issue.

(df.groupBy(spark_partition_id().alias("partition"))
.count()
.orderBy(col("count").desc())
.display())