Query Working
Types of queries
- Interactive query jobs. By default, BigQuery runs queries as interactive query jobs, which are intended to start executing as quickly as possible.
- Batch query jobs. Batch queries have lower priority than interactive queries. When a project or reservation is using all of its available compute resources, batch queries are more likely to be queued and remain in the queue. After a batch query starts running, the batch query runs the same as an interactive query. For more information, see query queues.
- Continuous query jobs (Preview). With these jobs, the query runs continuously, letting you analyze incoming data in BigQuery in real time and then write the results to a BigQuery table, or export the results to Bigtable or Pub/Sub. You can use this capability to perform time sensitive tasks, such as creating and immediately acting on insights, applying real time machine learning (ML) inference, and building event-driven data pipelines.
Run a query | BigQuery | Google Cloud
It is possible to get INTERACTIVE
job priority even if your query is scheduled as BATCH
priority. If the query has not started or queued within 24 hours, it will change to interactive priority which makes your query to be executed as soon as possible.
Query Queues
BigQuery automatically determines the number of queries that can run concurrently, called the dynamic concurrency. Additional queries are queued until processing resources become available.
The queue length is limited to 1,000 interactive queries and 20,000 batch queries per project per region, regardless of whether the project is on-demand or using a reservation.
Use query queues | BigQuery | Google Cloud
Example
SELECT * REPLACE
SELECT * EXCEPT
SQL comparison
BigQuery standard SQL supports compliance with the SQL 2011 standard and has extensions that support querying nested and repeated data. Redshift SQL is based on PostgreSQL but has several differences which are detailed in the Redshift documentation. For a detailed comparison between Redshift and BigQuery SQL syntax and functions, see the Redshift to BigQuery SQL translation reference.
Query execution using slots
When BigQuery executes a query job, it converts the SQL statement into an execution plan, broken up into a series of query stages, which themselves are composed of more granular sets of execution steps. BigQuery uses a heavily distributed parallel architecture to run these queries, and the stages model the units of work that many potential workers may execute in parallel. Data is passed between stages by using a fast distributed shuffle architecture, which is discussed in more detail on the Google Cloud blog.
BigQuery query execution is dynamic, which means that the query plan can be modified while a query is in flight. Stages that are introduced while a query is running are often used to improve data distribution throughout query workers. In addition, query execution might be impacted by the changing amount of available capacity as other queries complete or begin execution, or slots are added to the reservation by the autoscaler.
BigQuery can run multiple stages concurrently, can use speculative execution to accelerate a query, and can dynamically repartition a stage to achieve optimal parallelization.
BigQuery slots execute individual units of work at each stage of the query. For example, if BigQuery determines that a stage's optimal parallelization factor is 10, it requests 10 slots to process that stage.
Slot resource economy
If a query requests more slots than are available, BigQuery queues up individual units of work and waits for slots to become available. As progress on query execution is made, and as slots free up, these queued up units of work get dynamically picked up for execution.
BigQuery can request any number of slots for a particular stage of a query. The number of slots requested is not related to the amount of capacity you purchase, but rather an indication of the most optimal parallelization factor chosen by BigQuery for that stage. Units of work queue up and get executed as slots become available.
Fair scheduling in BigQuery
BigQuery allocates slot capacity within a single reservation using an algorithm called fair scheduling.
The BigQuery scheduler enforces the equal sharing of slots among projects with running queries within a reservation, and then within jobs of a given project. The scheduler provides eventual fairness. During short periods, some jobs might get a disproportionate share of slots, but the scheduler eventually corrects this. The goal of the scheduler is to find a balance between aggressively evicting running tasks (which results in wasting slot time) and being too lenient (which results in jobs with long running tasks getting a disproportionate share of the slot time).
Excess slot usage
When a job holds onto slots for too long, it can receive an unfair share of slots. To prevent delays, BigQuery allows other jobs to borrow additional slots, resulting in periods of total slot use above your specified slot capacity. Any excess slot usage is attributed only to the jobs that receive more than their fair share.