Analysis: Necklace by Jakub Pachocki


As it is often the case when counting the number of subsequences fulfilling some criteria, the solution to this problem uses dynamic programming. Assume the initial string representing Bessie's necklace is S, and that the other cow's name is T. For every i, we would like to compute the number of subsequences of S[1..i] that do not contain T. This information is not enough to maintain by itself, so we need to split our state some more. One possibility would be remembering the last |T| - 1 letters of our chosen subsequence; that is something we would be able to update upon adding one letter, and it would also allow us to determine whether we contain T as a substring.

Unfortunately, the number of states in the proposed approach is exponential in |T|, and thus far too large for our needs. Maybe we do not really care what the last |T| - 1 letters are exactly? In fact, it would be enough to know how 'similar' they are to T. The correct measure of similarity here is the longest suffix of these letters that is also some prefix of T. This is also information that we can maintain upon state transition, and it also allows us to determine whether the chosen subsequence contains T as a substring.

To speed up the state transitions, it may be useful to precompute which prefix of T we contain after matching T[1..j] and adding any letter c. This gives us A * |T| state transitions, where A = 26 is the size of the alphabet. The total running of our solution is thus O(A * |T| * |T| + |S| * |T|), if we just precompute the transitions naively. Although that is fast enough, it is actually just as easy to precompute the transitions with DP, reducing the complexity to O(A * |T| + |T| * |S|). A sample implementation follows.

/*
LANG: C++
*/

#include <iostream>
#include <fstream>
#include <algorithm>
#include <vector>
#include <string>

using namespace std;

int main() {
    ifstream in;
    ofstream out;
    in.open("necklace.in");
    out.open("necklace.out");

    string text, pattern;
    in >> text >> pattern;

    int n = text.size();
    int m = pattern.size();

    for (int i = 0; i < n; ++i) {
        text[i] -= 'a';
    }
    
    for (int i = 0; i < m; ++i) {
        pattern[i] -= 'a';
    }

    vector<vector<int> > next(m, vector<int>(26, 0));

    for (int i = 1; i < m; ++i) {
        int prev = next[i - 1][pattern[i - 1]];
        next[i - 1][pattern[i - 1]] = i;
        for (int j = 0; j < 26; ++j) {
            next[i][j] = next[prev][j];
        }
    }
    next[m - 1][pattern[m - 1]] = m;

    vector<int> max_taken(m, -n - 1);
    max_taken[0] = 0;
    for (int i = 0; i < n; ++i) {
        vector<int> next_taken = max_taken;
        for (int j = 0; j < m; ++j) {
            int cur = next[j][text[i]];
            if (cur < m) {
                next_taken[cur] = max(next_taken[cur], max_taken[j] + 1);
            }
        }
        max_taken = next_taken;
    }

    int result = 0;
    for (int i = 0; i < m; ++i) {
        result = max(result, max_taken[i]);
    }

    out << n - result << endl;

    in.close();
    out.close();
}