SQL Cheatsheet

Debugging Duplicates In A Dataset

Sometimes you are working on a query and you want to debug whether there are duplicates in your table.

select
  id
  count(*)
group by id
having count(*) > 1

The query will show all the rows (grouped by id) with counts greater than 1.

Unintentional duplicates can occur after a join. Instead of fixing this by using select distinct (which is a code smell), try to figure out why the duplication is happening.

Once you find the rows with duplicates, you can then select * for those records to see if there is any relationship between them that explains the duplication.

SQL, cheatsheet