Unlock Spark's full potential on Google Cloud. Choose serverless ease or cluster control, boosted by high-speed processing, AI assistance, and seamless open lakehouse connectivity.
Benefits
Operational simplicity with serverless Spark
Google Cloud Serverless for Apache Spark offers instant autoscaling and near-zero configuration. Get a 3.6x query performance boost* with Lightning Engine (Preview). Dataplex Universal Catalog unifies metadata, simplifying operations.
Run Spark your preferred way
One size does not fit all. Google Cloud gives you the flexibility to choose between serverless, managed clusters, and compute clusters for your Spark workloads.
Key features
Using Google Cloud Serverless for Apache Spark to boost productivity and performance with Lightning Engine* and Gemini. This experience is a deeply integrated environment to run Apache Spark and SQL workloads directly from BigQuery. It provides unified security, runtime metadata using BigLake metastore, and governance through Dataplex Universal Catalog. Maximize productivity with integrated CI/CD, Gemini in notebooks, and eliminate Apache Spark cluster management.
* The queries are derived from the TPC-DS standard and TPC-H standard and as such are not comparable to published TPC-DS standard and TPC-H standard results, as these runs do not comply with all requirements of the TPC-DS standard and TPC-H standard specification.
Dataproc is your fully managed and highly scalable service for deploying and operating dedicated Spark, Hadoop, and a vast ecosystem of 30+ open source tools. Its integration with the broader Google Cloud products and services, including Lightning Engine for Dataproc on Google Compute Engine (premium tier), makes it ideal for data lake modernization, efficient ETL pipelines, and secure, large-scale data science initiatives where cluster control is paramount.
Whether you prefer the zero-ops simplicity of Google Cloud Serverless for Apache Spark or the control of managed Dataproc clusters, you can accelerate your entire machine learning life cycle. Benefit from:
Develop and operationalize Spark for data science seamlessly with Vertex AI. Use Spark from Vertex AI Workbench for interactive development with built-in security and Gemini assistance. Integrate Spark processing into Vertex AI Pipelines for robust MLOps.
Google Cloud's Spark offerings provide robust compatibility with open source formats like Apache Iceberg, Delta Lake, and Hudi. Leverage BigLake metastore or Dataproc Metastore for unified metadata management across formats, enabling an open lakehouse architecture where you can process data with your choice of Spark engine.
Apache Spark is a trademark of The Apache Software Foundation.
Tell us what you’re solving for. A Google Cloud expert will help you find the best solution.