How to find missing values in a sequence with SQL
Sometimes it is important to know which values in a sequence are missing, either to find unused values so they can be used, or to find “holes” in the data. In this article I’ll show you how to find missing values, how to find the start and end of ranges of missing values, and how to optimize the queries for best performance.
Exclusion joins
Possibly the most efficient technique, depending upon the application, is to use an exclusion join against a list of all legal values (for example, an integers table). For instance, at my current employer we assign a unique tracking ID to certain bits of data. For reasons lost in the mist of time, we use three-character combinations of letters and numbers. It’s effectively a base-36 number system. It is not the most efficient thing to work with in SQL! Since there is no magical built-in way to get the database to assign the next unused value in the sequence, we keep a table with all 363 legal values, and do an exclusion join against the list of legal values. This is an acceptable way to find the next values, but of course it’s nowhere near optimal; when transactional consistency is needed, we have to lock tables up and do an expensive query. An identity (auto_increment) column would be preferable.
Putting three-character codes behind and assuming you want to analyze some existing data for holes without creating lists of legal values, it is possible to find missing values in a sequence by matching it against itself. For example, I am helping someone design a database to store information about gravestones. The original data was hand-entered into a spreadsheet, with a single column to keep track of gravestone numbers. There are duplicate and missing values in the sequence, both of which can indicate data problems, so it’s highly desirable to find and fix them. After importing the spreadsheets verbatim into a staging table, I ran a number of analyses to find data problems before transforming the data into the final tables.
The setup
If you want to follow along with the examples, you can run the following queries to create some sample data in MySQL:
create table sequence (
id int not null primary key
);
insert into sequence(id) values
(1), (2), (3), (4), (6), (7), (8), (9),
(10), (15), (16), (17), (18), (19), (20);
Notice the values 5, 11, 12, 13, and 14 are missing from the sequence. If you are using Microsoft SQL Server, change sequence to #sequence from here on:
create table #sequence (id int not null primary key)
insert into #sequence(id)
select 1
union all select 2
union all select 3
union all select 4
union all select 6
union all select 7
union all select 8
union all select 9
union all select 10
union all select 15
union all select 16
union all select 17
union all select 18
union all select 19
union all select 20
Finding duplicate and missing numbers
Finding duplicate numbers is easy:
select id, count(*) from sequence group by id having count(*) > 1;
In this case there are no duplicates, since I’m not concentrating on that in this post (finding duplicates is straightforward enough that I hope you can see how it’s done). I had to scratch my head for a second to find missing numbers in the sequence, though. Here is my first shot at it:
select l.id + 1 as start from sequence as l left outer join sequence as r on l.id + 1 = r.id where r.id is null;
The idea is to exclusion join against the same sequence, but shifted by one position. Any number with an adjacent number will join successfully, and the WHERE clause will eliminate successful matches, leaving the missing numbers. Here is the result:
| start |
|---|
| 5 |
| 11 |
| 21 |
Find ranges of missing values with subqueries
The above query identifies the start of ranges of missing numbers, but not the end. It also gives a false positive for 21, which is a missing number because it’s off the end of the whole sequence. I wanted to solve both problems. After squinting at it a while, I realized I could solve this problem with a correlated subquery, as follows:
select start, stop from (
select m.id + 1 as start,
(select min(id) - 1 from sequence as x where x.id > m.id) as stop
from sequence as m
left outer join sequence as r on m.id = r.id - 1
where r.id is null
) as x
where stop is not null;
The final WHERE clause makes sure the upper end of the sequence isn’t counted as a hole. This can be written several ways, some of them without wrapping the whole thing in a subquery (for example, where r.id is null and r < (select max(id) from sequence)), but it suits me fine as it is. Here is the result:
| start | stop |
|---|---|
| 5 | 5 |
| 11 | 14 |
If the sequence is a another data type, such as dates or letters of the alphabet, it is often possible to use some other functions to get the “next” value in the sequence in the join condition. For example, if the id column is CHAR(1), here is a query for MySQL:
insert into sequence(id) values
('a'), ('b'), ('c'), ('e'),
('f'), ('g'), ('l'), ('m'), ('n');
select start, stop from (
select char(ascii(m.id) + 1) as start,
(select char(min(ascii(id)) - 1) from sequence as x where x.id > m.id) as stop
from sequence as m
left outer join sequence as r on ascii(m.id) = ascii(r.id) - 1
where r.id is null
) as x
where stop <> '';
It’s necessary to change the final WHERE clause to stop <> '' because CHAR() skips NULLs, converting CHAR(NULL) to the empty string. Here is the result:
| start | stop |
|---|---|
| d | d |
| h | k |
Performance analysis: rewriting without subqueries
I don’t like correlated subqueries. In fact, I avoid subqueries if at all possible. Correlated subqueries are especially bad because, depending on the query optimizer, they may force the RDBMS to build a temporary table and probe into it for each value in the left-most table, which is O(n2). It dawned on me that the query could be written as left joins:
select l.id + 1 as start, min(fr.id) - 1 as stop
from sequence as l
left outer join sequence as r on l.id = r.id - 1
left outer join sequence as fr on l.id < fr.id
where r.id is null and fr.id is not null
group by l.id, r.id;
Of course, this is hardly, if at all, better. The < in the join condition makes the join essentially a CROSS JOIN, which is still an O(n2) join. Just to see how the query optimizer handles this, I ran it through both MySQL and Microsoft SQL server, and looked at the query plans. I filled the table with values up to 5000 (large enough that the DBMSs would create statistics on the index) and then made holes in the sequence as follows:
| start | stop |
|---|---|
| 5 | 5 |
| 11 | 14 |
| 100 | 100 |
| 855 | 855 |
| 1230 | 1230 |
| 2400 | 2500 |
SQL Server actually optimized the first query significantly better, which highlights one of my favorite principles: always measure performance, and never try to “optimize by eye!” Here are the numbers (see my article about using awk to sum up query statistics):
-- query one -- Scans: 9 Logical reads: 35 Physical reads: 0 Read-ahead reads: 0 CPU time: 10 ms Elapsed time: 49 ms -- query two -- Scans: 9 Logical reads: 77 Physical reads: 0 Read-ahead reads: 0 CPU time: 60 ms Elapsed time: 148 ms
MySQL’s results did not vary as much; the execution time was .8% faster on the first query, and the EXPLAIN result showed that the second query caused ‘Range checked for each record’. The two query plans were actually very different, according to EXPLAIN.

Works well, except that (as expected, I assume) seq 2,3,4 returns 5, where someone (i.e. me) would actually want 1.
Joe
24 Jan 06 at 4:25 am
Hi, I actually needed the opposite of the above. Instead of missing start and end, I need the non-missing start and end. How do I change the last query to do this?
Sebastian
22 Mar 06 at 5:50 am
Sebastian, I’m not completely clear on what you mean. Do you mean “instead of finding the start and end of missing ranges, find the start and end of contiguous ranges?”
Xaprb
22 Mar 06 at 10:05 am
Sebastian, please see Find contiguous ranges with SQL for an answer to the problem.
Xaprb
22 Mar 06 at 9:53 pm
Hi Xaprb,
Yes that’s what I needed. Thanks.
Sebastian
23 Mar 06 at 6:17 am
I was wondering if it would really be much more complicated, using the table structure above to simply find and return a list of missing numbers. Instead of ranges, simply list out like: 5, 11, 12, 13, 14, or somthing like that. Thank you.
Hess
28 Aug 06 at 7:10 pm
You could use some of the techniques I’ve outlined in How to simulate the GROUP_CONCAT function.
Xaprb
28 Aug 06 at 7:12 pm
Hi, I actually need to get the missing range from a column which will be in sequence on certain criterias in another column in that table. Can you please explain how this can be achieved. (My question may be silly but I don’t know how to do it. I am new to SQL).
Sara
22 Sep 06 at 8:55 am
Hi Sara, I’m afraid I don’t quite understand your question. Can you post an example data set and explain what the answer should be?
Xaprb
22 Sep 06 at 8:58 am
Hi, see below a table A with three columns the records will be as
Now i want to ge the missing seq in the column a1 where a3 is ‘aaa’. The result should be some what like 2-3 and 5-5. Basically I need to get the missing range based on a critiera in column a3.
Thanks in Advance
Sara
22 Sep 06 at 9:36 am
Hi Sara, I think this will do what you need:
select start, stop from( select m.a1 + 1 as start, (select min(a1)-1 from sequence as x where x.a3 = 'aaa' and x.a1 > m.a1) as stop from sequence as m left outer join sequence as r on m.a3 = r.a3 and m.a1 = r.a1 - 1 where r.a1 is null and m.a3 = 'aaa' ) as x where stop is not null; -- result: +-------+------+ | start | stop | +-------+------+ | 2 | 3 | | 5 | 5 | +-------+------+Xaprb
22 Sep 06 at 10:32 am
Thank you very much. Is there a way to do this without sub-queries(just out of curiosity).
Sara
22 Sep 06 at 11:17 am
You’re welcome. The query I gave is just an adaptation of the query earlier in the article, and I showed some ways to avoid subqueries towards the end of the article. It’s just a matter of figuring out where to put the extra WHERE clauses, to restrict your data set to those rows that have ‘aaa’ in the a3 column.
Xaprb
22 Sep 06 at 11:25 am
Just in case anyone might find this useful: In my current job we use Hypersonic in our testing environment. Apparently HSQL didn’t like the condition in the left outer join. I took a stab at rewriting it using “NOT EXISTS” and seems to work pretty well.
select start, stop from ( select m.id + 1 as start, (select min(id) - 1 from sequence as x where x.id > m.id) as stop from sequence as m where not exists (select id from sequence as r where m.id=r.id - 1 ) ) as x where stop is not nullRobert
30 Oct 06 at 1:24 pm
Thanks for this info, unfortunatly my seqs numbers to check are included in another sequence. i.e
Where I what to check the out-of-seq in col2 for each col1… thanks
bob
9 Apr 07 at 3:20 pm
Is it possible to query for the Minimum ID (missing if possible, or new if not) in a way that would work in both SQL and JET?
Vitaly
19 Apr 07 at 10:25 am
You could also try this much faster sollution (on a big dataset) using “not exists”:
It will only give the first value in a gap, but combining it with an union like this:
giving the result:
5
5
11
14
Then you can parse the result two by two (startvalue, endvalue in a gap), this is extremely fast on mysql 4.1 at least. giving a result in about a second or two on a table with over 200000 rows.
haakon
11 May 07 at 7:07 am
Any thoughts on how to make this query work with dates? I have a scenario where I need to find overlapping or gaps between ranges of dates. For example:
start_date end_date———- ———-
01/01/2007 01/31/2007
02/15/2007 02/28/2007
04/01/2007 04/20/2007
04/15/2007 04/30/2007
An ideal solution would return to me a recordset for the gaps:
gap_start gap_end——— ——–
02/01/2007 02/14/2007
03/01/2007 03/31/2007
and another query might return any overlaps:
gap_start gap_end——— ———
04/15/2007 04/20/2007
With a little bit of preliminary work, it looks like this algorithm could be adapted as long as there was a way to use dates in the sequence table.
Any thoughts or ideas?
Thanks,
Tim
Tim
16 May 07 at 4:47 pm
It will depend on your system, but as long as your RDBMS supports adding and subtracting dates, certainly. I referred to getting the ‘next’ value in the sequence — Given 2007-05-16, next is the 17th. In all systems I’ve worked with there is something equivalent to ADDATE() or similar.
Xaprb
16 May 07 at 6:45 pm
Thanks so much for this article! I was dreading having to make a table of legal values and do an exclusion join on them - I feel like that would be such a hack! Your solution is much more elegant.
I was able to adapt your idea to suit my purposes, but it got a little ugly. The data model I’m working with contains entries for inventory reports, and I wanted to figure out the dates for the ones we’re missing, which I could cross-reference with the shift schedules to figure out who to yell at. However, inventory entries are done at the beginning of the day AND at the end, so it’s not as simple as comparing dates. I want to know if we’re missing the opening entry on a given date, if we’re missing the closing entry on that date, or both. Figuring out both the date and the is_opening flag was a pain. Here’s an example dataset:
date | is_opening
———— ————
2007-05-01 | t
2007-05-01 | f
2007-05-02 | t
2007-05-02 | f
2007-05-03 | f
2007-05-05 | t
2007-05-05 | f
2007-05-06 | t
2007-05-06 | f
2007-05-07 | f
2007-05-08 | t
2007-05-08 | f
2007-05-09 | t
2007-05-09 | f
2007-05-10 | f
2007-05-11 | t
2007-05-11 | f
2007-05-12 | t
2007-05-12 | f
The answer I’m looking for is:
start | start_is_opening | stop | stop_is_opening
———— —————— ———— —————–
2007-05-03 | t | 2007-05-03 | t
2007-05-04 | t | 2007-05-04 | f
2007-05-07 | t | 2007-05-07 | t
2007-05-10 | t | 2007-05-10 | t
The query ended up being pretty horrendous, but rather straightforward if you think about it:
SELECT
CASE loj.start_is_opening
WHEN ‘t’ THEN loj.start
WHEN ‘f’ THEN DATE(loj.start INTERVAL ‘1 day’)
END AS start,
NOT loj.start_is_opening AS start_is_opening,
CASE roj.stop_is_opening
WHEN ‘t’ THEN DATE(roj.stop - INTERVAL ‘1 day’)
WHEN ‘f’ THEN roj.stop
END AS stop,
NOT roj.stop_is_opening AS stop_is_opening
FROM
(
SELECT ql.date AS start, ql.is_opening AS start_is_opening
FROM quisnomicon ql
WHERE NOT EXISTS (
SELECT qr.date
FROM quisnomicon qr
WHERE (
ql.date = qr.date - INTERVAL ‘1 day’
AND ql.is_opening = ‘f’
AND qr.is_opening = ‘t’
)
OR
(
ql.date = qr.date
AND ql.is_opening = ‘t’
AND qr.is_opening = ‘f’
)
)
) AS loj,
(
SELECT qr.date AS stop, qr.is_opening AS stop_is_opening
FROM quisnomicon qr
WHERE NOT EXISTS (
SELECT ql.date
FROM quisnomicon ql
WHERE (
ql.date = qr.date - INTERVAL ‘1 day’
AND ql.is_opening = ‘f’
AND qr.is_opening = ‘t’
)
OR
(
ql.date = qr.date
AND ql.is_opening = ‘t’
AND qr.is_opening = ‘f’
)
)
) AS roj
WHERE loj.start = (
SELECT MAX(q.date)
FROM quisnomicon q
WHERE q.date loj.start
);
Postgres reports that the query’s cost=5660.50..17660.63. Is there some clever way of making it less expensive? Am I doing something that’s obviously wrong?
Thanks again for the basic idea on how to accomplish this query in the first place!
Jack
1 Jul 07 at 6:34 am
I cant get this to work :(
It adds the values find, but when I tired and run:
**********************************
SELECT START ,
STOP FROM (
SELECT m.id 1 AS
START , (
SELECT min( id ) -1
FROM sequence AS x
WHERE x.id > m.id
) AS
STOP
FROM sequence AS m
LEFT OUTER JOIN sequence AS r ON m.id = r.id -1
WHERE r.id IS NULL
) AS x
WHERE STOP IS NOT NULL ;
**********************************
I get this:
**********************************
SQL query: Edit
SHOW KEYS FROM
MySQL said: Documentation
#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ” at line 1
**********************************
Any ideas?
Some1
13 Jul 07 at 5:05 am
You have to use a version of MySQL that supports subqueries. What version are you using? My guess is 4.0.x.
Xaprb
13 Jul 07 at 4:54 pm
Hello,
Your query works very well, and I am very pleased with it, however. My database has some entries in it’s identity column with characters entires… (i.e. 1, 2, 3, 3a, 3b, 4 …..).
And I was wondering if there was a way to execute this query with those entries present, either by ignoring them, or what not.
Thanks,
Whalen
Whalen
25 Jul 07 at 3:22 pm
I will appreciate it very much if you could suggest how to look for sequence patterns with SQL.
Thanks,
Joris
Joris
14 Sep 07 at 12:14 pm
hello
I have a problem ,
my table name is cbl ,it contains the following entries
seq country
1———-USA
2———-UK
2———-USA
2———-Canada
5———-USA
4———-UK
6———-Canada
i want the result to Find the missing sequence as the following based on the table
USA 3,4
UK 1,3
Canada 1,3,4,5
i am using sql builder which is used in access
please do help out , I have been trying to figure it out for some time now
thanks in advance
john
6 Nov 07 at 8:25 am
the fastest and more staight straightforward apporach is using a generic number table (pre-populate from 1 to 10 Million, for example)
Then left join your table with the number table to find missing rows.
You can fully benifit the index on 2 tables.
Especially for large tales, the merge join will only scan each table just one time.
– david
david wei
15 Nov 07 at 3:56 pm
What if I had a table like Sara’s example (from above)…
a1 a2 a3
——————
1 ddd aaa
1 ooo bbb
4 vvv aaa
5 qqq bbb
7 hhh bbb
6 www aaa
and I wanted to get the missing numbers in column a1 for each value in a3, like…
—– ——- ——
| a3 | start | stop |
—– ——- ——
| aaa | 2 | 3 |
| aaa | 5 | 5 |
| bbb | 2 | 4 |
| bbb | 6 | 6 |
—– ——- ——
Thank you for any help!
ryan
3 Dec 07 at 10:20 pm
One of your columns is extra, so I’ll just remove it to clarify the query:
select * from sara; +----+------+ | a1 | a2 | +----+------+ | 1 | aaa | | 2 | bbb | | 4 | aaa | | 5 | bbb | | 7 | bbb | | 6 | aaa | +----+------+ select a2, start, stop from( select m.a2, m.a1 + 1 as start, (select min(a1)-1 from sara as x where x.a1 > m.a1 and x.a2 = m.a2) as stop from sara as m left outer join sara as r on m.a2 = r.a2 and m.a1 = r.a1 - 1 where r.a1 is null group by m.a2, start ) as x where stop is not null; +------+-------+------+ | a2 | start | stop | +------+-------+------+ | aaa | 2 | 3 | | aaa | 5 | 5 | | bbb | 3 | 4 | | bbb | 6 | 6 | +------+-------+------+Xaprb
4 Dec 07 at 9:35 am
Using the example above:
select l.id 1 as start, min(fr.id) - 1 as stop
from sequence as l
left outer join sequence as r on l.id = r.id - 1
left outer join sequence as fr on l.id
Cody
14 Dec 07 at 11:38 am
Awesome. Thanks so much for publishing this method of checking for missing rows.
Bobby Breaux
16 Feb 08 at 3:04 pm
select l.id + 1 as start, min(fr.id) - 1 as stop
from sequence as l
left outer join sequence as r on l.id = r.id - 1
left outer join sequence as fr on l.id
Gabor Kiss
10 Mar 08 at 6:08 am
That is a nice solution. However I cannot understand
the semantics of “min(fr.id)”. There is no GROUP BY
constriction here. I replaced “MIN(fr.id)” with
simple “fr.id” and I got the same answer.
What did I miss? :-)
Gabor Kiss
10 Mar 08 at 6:09 am
Fantastic! Thanks for putting in the effort to share your work.
John
28 Mar 08 at 5:11 pm
I really enjoyed this article on “How to find missing values in a sequence with SQL”. I have a related problem involving finding a missing period of time from a datetime field. For example, I might need to go through a week and find all the times in a variable work day (maybe 8 hours one day and 10 the next) where there is a gap of 5 minutes or more, and keep a sum of the missed time for the day. For example, if two lines were 1/1/2008 01:00:00 AM and 1/1/2008 01:11:00 AM, then that would be a total of 11 minutes that would be added to the total for the day. I want to ignore any gaps less than 5 minutes. Thank you for any help you can give me on this!
Rob
18 Apr 08 at 5:04 pm
Thank you! Very helpful in solving a skip in a primary key that used to be auto-assigned.
Keith
9 May 08 at 2:19 pm
Greate soution! However I cannot find the gap if it starts with 1. How to solve this?
Daniel
27 May 08 at 8:53 am
Hey this is a great article… but my problem is more complicated
I want to find the missing sequences for a 3 column join like
Col1 Col2 Col3
1 1 1
1 2 2
1 2 4
2 2 3
2 2 5
3 3 6
3 5 8
And so on… I need to find out all unused permutations in this series.. A query to solve this will be of great help.. Thanks in advance
Jay
30 May 08 at 1:49 pm
if i want to find gap for example a01, a03, a05?
charlene
20 Aug 08 at 1:37 am
i have a table
FROM TO
1 5
6 10
15 20
29 30
i need a query which can give me missing range as
11-14
thanks in advance
arvind
24 Sep 08 at 2:05 am
This is a great article. It wasn’t exactly what it needed for my particular problem but it definitely got me in the right direction. Thanks for sharing your thoughts and wisdom.
Michael McSorely
10 Oct 08 at 2:54 am
Wonderful. Thank you very much. This article helped a lot in understanding how to solve the sql problem I was working on for days.
Satya
18 Oct 08 at 5:07 am
Great stuff. I just wrote a check writing app, and need to find gaps in check numbers. Works like a charm.
Thanks much.
RJB
18 Nov 08 at 4:24 pm
How to get?????????????????????????
ID
—
5
11
12
13
14
Serg
30 Nov 08 at 10:12 am
on the firist example how to find these values
5, 11, 12, 13, 14,
YASSER
3 Dec 08 at 3:19 am
Exclusion join against a numbers table. Links are in the article.
Xaprb
3 Dec 08 at 9:24 am
Thank you for the “Finding duplicate and missing numbers”. It was a big help.
happiness
16 Dec 08 at 7:50 am
hi Xaprb,
it’s nice to get missing range. but sometime i just need to get like this:
start end
1 4
6 10
16 20
what should i do?
ehm
16 Dec 08 at 12:48 pm