Skip to main content

Indexing Questions

https://use-the-index-luke.com/3-minute-test/mysql

Question 1

Is the following index a good fit for the query?

CREATE INDEX tbl_idx ON tbl (date_column)
SELECT COUNT(*)
FROM tbl
WHERE EXTRACT(YEAR FROM date_column) = 2017

Your answer:

  • Good fit: No need to change anything
  • Bad fit: Changing the index or query could improve performance

Answer

Bad fit: Changing the index or query could improve performance

Wrapping the table column in a function renders this index mostly useless for this query.

Note that the database could still read the full index end to end. Although this can be faster than reading the full table end to end, it is still not very efficient and considered not solution to this problem.

Question 2

Is the following index a good fit for the query?

CREATE INDEX tbl_idx ON tbl (a, date_column)
SELECT *
FROM tbl
WHERE a = 12
ORDER BY date_column DESC
LIMIT 1

Your answer:

  • Good fit: No need to change anything
  • Bad fit: Changing the index or query could improve performance

Answer

Good fit: No need to change anything

The statement can run as an indexed top-N query. It performs just a B-tree traversal (log(n)) and a single table access.

The trick is that the index supports the where clause as well as the order by clause. The database uses the index to find the last entry that matches the where clause and takes it as result. Even though there is an order by clause, there is no need to sort any rows.

Question 3

Is the following index a good fit for both queries?

CREATE INDEX tbl_idx ON tbl (a, b)
SELECT *
FROM tbl
WHERE a = 38
AND b = 1
SELECT *
FROM tbl
WHERE b = 1

Your answer:

  • Good fit: No need to change anything
  • Bad fit: Changing the index or a query could improve performance

Answer

Bad fit: Changing the index or a query could improve performance

The index covers the first query only, the second query cannot use the index efficiently.

Note that the database could still read the full index end to end. Although this can be faster than reading the full table end to end, it is still not very efficient and considered not solution to this problem.

Changing the index column order makes the index suitable for both queries—without additional overhead. The index should therefore look like this (columns exchanged):

CREATE INDEX tbl_idx ON tbl (b, a)

Question 4

Is the following index a good fit for the query?

CREATE INDEX tbl_idx ON tbl (text)
SELECT *
FROM tbl
WHERE text LIKE 'TJ%'

Your answer:

  • Good fit: No need to change anything
  • Bad fit: Changing the index or query could improve performance

Answer

Good fit: No need to change anything

Although like expressions starting with a wild card character (% or _) cannot use this index efficiently, a pattern that has the wild card character at the very end can! Even if the wild card character is in the middle, the index is still useful.

Question 5

This question is different. First consider the following index and query:

CREATE INDEX tbl_idx ON tbl (a, date_column)
SELECT date_column, count(*)
FROM tbl
WHERE a = 38
GROUP BY date_column

Let’s say this query returns at least a few rows and that there is no other index on this table.

To implement a new functional requirement, another condition (b = 1) is added to the where clause:

SELECT date_column, count(*)
FROM tbl
WHERE a = 38
AND b = 1
GROUP BY date_column

How will the change affect performance:

  • Same: Query performance stays about the same
  • Not enough information: Definite answer cannot be given
  • Slower: Query takes more time
  • Faster: Query take less time

Answer

Same: Query performance stays about the same

Wrong! The query will be slower.

The index happened to have all required data (columns) for the original query. It can run as so-called index-only scan, which doesn’t need to access the actual table at all.

Accessing any column that is not part of the index prevents this optimization so that the database must look into the actual table for each row that qualifies the original where clause to see if it also satisfies the new filter. Even if the new filter removes all rows, it does so after incurring additional work. Although the grouping has fewer rows to aggregate, this cannot compensate for the additional table look-ups.