Contains Duplicate I — Leetcode#217

SudoPurge
3 min readMay 27, 2023

Problem Description:

Given an integer array nums, return true if any value appears at least twice in the array, and return false if every element is distinct.

Example:

Input: nums = [1,2,3,1]
Output: true

Input: nums = [1,2,3,4]
Output: false

In this problem, the objective is to write a function that returns true if there are multiple entries of the same value (duplicates), or false otherwise.

WHY? This algorithm is essential when implementing data structures such as sets or dictionaries since they are not allowed to contain duplicate items (or keys in the case of dictionaries). These are also useful when you want to implement a function that returns all the unique values in an array.

Solution:

To build our intuition, lets approach the problem using the brute force method. The easiest way to go about this is to sequentially select each element from the array from the start, and for each element, parse the rest of the array on its right to check if they match.

Naive approach illustration

But this is the not the best approach as it takes O(n²) time because in the worst case scenario we would need to iterate through the entire array for every element.

A better approach is to first sort the array. Different sorting algorithms have different time and space complexities but the most common choice is MergeSort. Then we can simply iterate over the array once trying to match two neighbors, since if two values are indeed duplicates, they would be next to each other.

Using a presorted array

However, while this is faster than the naive approach, this is also not the fastest solution. Because we need to sort the array first, we need to account for the time required for that. MergeSort has a time complexity of O(n.log(n)), and iterating through the array once requires a O(n), resulting in overall O(n.log(n)).

The fastest solution to this problem uses a data structure called the hashset, which essentially keeps track of every time in the original array as we parse it from left to right. If a duplicate item is parsed, it will have already existed in the hashset. Note that this approach does not require sorting the array beforehand, and since we only need to parse throught the entire array just once in the worst case, its time complexity is O(n).

def containsDuplicate(arr):    
HASHSET = set()
for i in arr:
if (i in HASHSET):
return True
else:
HASHSET.add(i) # add to hashset if not duplicate item
return False

containsDuplicate([1, 2, 3, 4, 1])
=> True
containsDuplicate([1, 2, 3])
=> False

However, using the hashset means there is a bigger overhead in terms of space requirements. Unlike the other two approach which have a space complexity of O(1), this method has a space complexity of O(n) because in the worst case scenario, we would need to store a copy of the entire original array in the hashset. Space is generally cheap though. Below you can see a comparison of the time complexity of each of the three algorithms.

Red: Naive Approach, Purple: Using Presorted Array, Green: Using Hashset

--

--