Repeated DNA Sequences

Problem#

Return all 10-letter-long substrings of a DNA string s that appear more than once.

Examples#

"AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT" → ["AAAAACCCCC", "CCCCCAAAAA"]

Constraints#

1 <= s.length <= 10^5, characters A/C/G/T.

Approach#

Slide a fixed-size window of length 10. Use a seen set; when a 10-char substring is seen for the second time, record it in an output set (to dedupe further repeats).

Solution#

class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        if (s.size() < 10) return {};
        unordered_set<string> seen, out;
        for (int i = 0; i + 10 <= (int)s.size(); ++i) {
            string sub = s.substr(i, 10);
            if (!seen.insert(sub).second) out.insert(sub);
        }
        return {out.begin(), out.end()};
    }
};

func findRepeatedDnaSequences(s string) []string {
    if len(s) < 10 {
        return []string{}
    }
    seen := map[string]bool{}
    out := map[string]bool{}
    for i := 0; i+10 <= len(s); i++ {
        sub := s[i : i+10]
        if seen[sub] {
            out[sub] = true
        } else {
            seen[sub] = true
        }
    }
    res := make([]string, 0, len(out))
    for k := range out {
        res = append(res, k)
    }
    return res
}

from typing import List

class Solution:
    def findRepeatedDnaSequences(self, s: str) -> List[str]:
        if len(s) < 10:
            return []
        seen: set[str] = set()
        out: set[str] = set()
        for i in range(len(s) - 9):
            sub = s[i:i + 10]
            if sub in seen:
                out.add(sub)
            else:
                seen.add(sub)
        return list(out)

function findRepeatedDnaSequences(s) {
    if (s.length < 10) return [];
    const seen = new Set();
    const out = new Set();
    for (let i = 0; i + 10 <= s.length; i++) {
        const sub = s.substring(i, i + 10);
        if (seen.has(sub)) out.add(sub);
        else seen.add(sub);
    }
    return Array.from(out);
}

class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        if (s.length() < 10) return new ArrayList<>();
        Set<String> seen = new HashSet<>();
        Set<String> out = new HashSet<>();
        for (int i = 0; i + 10 <= s.length(); i++) {
            String sub = s.substring(i, i + 10);
            if (!seen.add(sub)) out.add(sub);
        }
        return new ArrayList<>(out);
    }
}

function findRepeatedDnaSequences(s: string): string[] {
    if (s.length < 10) return [];
    const seen = new Set<string>();
    const out = new Set<string>();
    for (let i = 0; i + 10 <= s.length; i++) {
        const sub = s.substring(i, i + 10);
        if (seen.has(sub)) out.add(sub);
        else seen.add(sub);
    }
    return Array.from(out);
}

Editorial#

The classic unordered_set::insert returns {iter, true} on insertion and {iter, false} on a hit — perfect for “first occurrence vs not”. The double-set pattern dedups outputs (a substring repeated 5 times only appears once in the result). For optimal performance, encode each 10-char substring as a 20-bit integer (2 bits per base) and use a single unordered_set<int>.

Complexity#

Time: O(n).
Space: O(n).

Problem#

Examples#

Constraints#

Approach#

Solution#

Editorial#

Complexity#

Concept revision#

Keyboard shortcuts

Problem#

Examples#

Constraints#

Approach#

Solution#

Editorial#

Complexity#

Concept revision#

Related#