Contest Results

by Nathan Pinsker

The first approach to this problem is to simply brute-force: first, compute all substring sequences of length 1 for both spotty and non-spotty cows. Put these sequences into two different sets, and make sure that the sets are completely disjoint -- if they are, then sequences of length 1 are sufficient to tell these types of cows apart, and we are finished. Otherwise, we consider all substring sequences of length 2, and continue to consider longer and longer substrings until our subsets are completely disjoint.

The problem with this approach is that string comparison will eventually become too slow. Since there may be as many $O(NM^2)$ strings in total and $M <= 500$, and comparing all of these strings will take too much time and memory. One way around this problem is to use a rolling hash function. The main benefit of a rolling hash is that you can hash a string in $O(n)$ time, and then find in $O(1)$ time the hash of any substring of that string. (If you don't know what a rolling hash function is, learning about it is really worth your time -- it pops up all the time in programming contests!) Thus, instead of comparing and storing all substring sequences, we can simply store the hashes of all substring sequences instead. To compare two substring sequences, we instead compare their hashes. While this does have a small probability of error, the hashes are large enough (and $M$ is small enough) that this shouldn't be an issue in practice.

Contestants can also use a binary search to quickly locate the minimum value -- if all substrings with length $k$ are unique to either spotted or non-spotted cows, then all substrings with length $k+1$ are unique to either spotted or non-spotted cows as well. Using either this approach or the hashing approach is sufficient to receive full credit.

Here's Brian Dean's code, which uses the hashing approach with a very simple hash function -- it hashes a substring by simply taking a dot product between that string and a random vector of integers.

#include <iostream>
#include <fstream>
#include <cmath>
#include <set>
#include <cstdlib>
using namespace std;

int N, M;
string spotty[500], plain[500];
unsigned long long hashes1[500], hashes2[500], R[500];

int main(void)
{
  ifstream fin ("cownomics.in");
  ofstream fout ("cownomics.out");
  fin >> N >> M;
  for (int i=0; i<N; i++) fin >> spotty[i];
  for (int i=0; i<N; i++) fin >> plain[i];
  for (int i=0; i<M; i++) R[i] = rand() % 1000000000;
  int i=0, j=0;
  int best = M, dups = N;
  while (j < M) {
    // There is (very small) but some false positive risk in 
    // using hashing here, so we could have explicitly verified
    // matches if desired just to be 100% certain of correctness
    if (dups == 0) best = min(best, j-i);
    if (dups>0) {
      set<int> H;
      dups = 0;
      for (int k=0; k<N; k++) H.insert(hashes1[k] += R[j] * spotty[k][j]);
      for (int k=0; k<N; k++) if (H.count(hashes2[k] += R[j] * plain[k][j])>0) dups++;
      j++;
    } else {
      dups = 0;
      set<int> H;
      for (int k=0; k<N; k++) H.insert(hashes1[k] -= R[i] * spotty[k][i]);
      for (int k=0; k<N; k++) if (H.count(hashes2[k] -= R[i] * plain[k][i])>0) dups++;
      i++;
    }
  }
  fout << best << "\n";
}