(Analysis by Chongtian Ma)

Observation: Visualize the strings as an $N$ by $M$ grid, with coordinate $(r, c)$ denoting the $r$'th row from the top and the $c$'th column from the left. We can swap a character from $(r_1, c_1)$ to $(r_2, c_2)$ by performing at most two operations. For example, we can perform operation $1$ with $(p,q) = (c_1, c_2)$ and operation $2$ with $(x,y) = (r_1, r_2)$.

Strategy: First, we can use type $1$ operations inside row $1$ to match as many columns as possible. Then, for each column $i$ that is not matched, we can bring character $t_i$ from some row $r \geq 2$ using $2$ operations. Thus, each column takes at most $2$ operations to match.

Subtask 1 $(N, M \leq 100)$ : For each unmatched column, we can traverse the whole rest of the grid, find a matching character, and simulate the two swaps. This takes $\mathcal{O}(NM^2)$ time, since we are doing $\mathcal{O}(NM)$ work for each of the $\mathcal{O}(M)$ unmatched characters.

Full Credit $(N, M \leq 1000)$ : We can cut down the amount of work done when searching for an occurrence of character $t_i$ in the entire grid by tracking whether each character exists in each row. After all, we do not need to search through the columns of a row if we know that it contains no occurrences of character $t_i$.

We can maintain a 2D array $\texttt{has}$ of length $N \cdot 26$, where $\texttt{has[i][c]}$ contains the counts of $c$ in row $i$. When searching for some character $\texttt{ch}$, we just need to check if $\texttt{has[r][ch]}$ is positive for each row $r$, and find its column. Using this, we have sped the finding part of the algorithm from $\mathcal{O}(NM)$ to $\mathcal{O}(N+M)$, since we are no longer iterating through the columns of rows where $\texttt{ch}$ doesnt occur in.

Make sure to maintain $\texttt{has}$ when swaps are done across rows or columns.

#include <bits/stdc++.h>
using namespace std;

void solve() {
    int n, m;
    cin >> n >> m;

    string t;
    cin >> t;

    vector<string> s(n);
    for (int i = 0; i < n; i++) cin >> s[i];

    vector<vector<int>> has(n, vector<int>(26, 0));

    for (int i = 0; i < n; i++) {
        for (int j = 0; j < m; j++) {
            has[i][s[i][j] - 'a']++;
        }
    }

    vector<array<int, 4>> ops;

    auto op = [&](int type, int a, int b, int c) {
        // store 1-indexed for printing
        ops.push_back({type, a + 1, b + 1, c + 1});

        if (type == 1) {
            swap(s[a][b], s[a][c]);
        } else {
            has[a][s[a][c] - 'a']--;
            has[b][s[b][c] - 'a']--;
            swap(s[a][c], s[b][c]);
            has[a][s[a][c] - 'a']++;
            has[b][s[b][c] - 'a']++;
        }
    };

    // Swap within s1 to fix as many positions as possible
    for (int i = 0; i < m; i++) {
        if (s[0][i] != t[i]) {
            for (int j = 0; j < m; j++) {
                if (s[0][j] == t[i] && s[0][j] != t[j]) {
                    op(1, 0, i, j);
                    break;
                }
            }
        }
    }

    // Import needed characters from other rows
    for (int i = 0; i < m; i++) {
        if (s[0][i] != t[i]) {
            for (int j = 1; j < n; j++) {
                if (has[j][t[i] - 'a']) {
                	// find the column
                    int idx = int(find(s[j].begin(), s[j].end(), t[i]) - s[j].begin());
                    op(1, j, i, idx);
                    op(2, 0, j, i);
                    break;
                }
            }
        }
    }

    printf("%d\n", (int)ops.size());
    for (auto &v : ops) {
        printf("%d %d %d %d\n", v[0], v[1], v[2], v[3]);
    }
}

int main() {
    cin.tie(0) -> sync_with_stdio(0);

    int T;
    cin >> T;
    while (T--) solve();
    return 0;
}