(Analysis by Akshaj Arora, Benjamin Qi)

Subtask 1: $O(N^3)$

Consider each cyclic shift independently. The minimum number of swaps needed to sort a permutation is the number of inversions, which is the number of pairs of indices $i<j$ where $p_i>p_j$. We can check all such pairs for each cyclic shift in $O(N^2)$ time, resulting in $O(N^3)$ overall complexity.

Subtask 2: $O(N^2)$

When performing a cyclic shift, only $N-1$ pairs are affected. Hence, we can calculate the number of inversions once in the original permutation and update for each cyclic shift in $O(N)$ time, resulting in a total complexity of $O(N^2)$.

Full Solution 1: $O(N\log{N})$

We can find the number of initial inversions in $O(N\log{N})$ time using a segment tree, ordered set, merge sort, or any other inversion counting algorithm. When the permutation is cyclically shifted to the right, moving the element with value $i$ from the back to the front, $N-i$ inversions are removed and $i-1$ new inversions are created, resulting in a net change of $2i-N-1$ inversions, which can be calculated in $O(1)$ time. The reasoning is the same as in this past USACO problem.

It remains to calculate $S=\sum|p_i-i|$ for each cyclic shift $k$ times to the right. To do this, iterate on each value $j$ in the permutation. For some range of $k$, $j$ will located at a position $i<j$, so the contribution $S$ will be $j-i$, and for the remaining values of $k$ the contribution will be $i-j$. Since $i$ depends piecewise linearly on $k$, the contribution from $j$ can be expressed as a piecewise linear function in terms of $k$. We can use prefix sums to compute the sum of linear functions for all $k$, allowing us to compute $S$ for all $k$ in $O(N)$ time.

Full Solution 2: $O(N)$

$p$ is good iff it avoids the subsequence 321 (i.e., no $i<j<k$ such that $p_i>p_j>p_k$).

Proof:

Now, for each element $j$ find the nearest greater element to the left $i$ and the nearest smaller element $k$ to the right (cyclically) using monotonic stacks. If the cyclic shift contains $i$, $j$, and $k$ in this order, then it is bad. We can use prefix sums to mark which intervals of shifts are bad.

Akshaj's C++ code:

#include <bits/stdc++.h>
using namespace std;
 
void solve() {
  int n;
  cin >> n;
  vector<int> p(n);
  for (auto &i : p)
    cin >> i;
 
  // solve
  stack<int> s;
  vector<int> nl(n, -1);
  for (int j = 0; j < 2; j++) {
    for (int i = 0; i < n; i++) {
      while (s.size() && p[i] >= p[s.top()])
        s.pop();
      if (j == 1 && s.size())
        nl[i] = s.top();
      s.push(i);
    }
  }
  while (s.size())
    s.pop();
  vector<int> nr(n, -1);
  for (int j = 0; j < 2; j++) {
    for (int i = n - 1; i >= 0; i--) {
      while (s.size() && p[i] <= p[s.top()])
        s.pop();
      if (j == 1 && s.size())
        nr[i] = s.top();
      s.push(i);
    }
  }
 
  vector<int> ps(2 * n + 1);
  for (int i = 0; i < n; i++) {
    int l = nl[i], r = nr[i];
    if (min(l, r) == -1)
      continue;
    if ((l < i) ^ (i < r) ^ (l < r)) {
      if (l < r)
        l += n;
      ps[r + 1]++;
      ps[l + 1]--;
    }
  }
  for (int i = 1; i <= 2 * n; i++)
    ps[i] += ps[i - 1];
  vector<int> ans;
  for (int i = 1; i <= n; i++) {
    if (max(ps[i], ps[i + n]) == 0)
      ans.push_back(n - i);
  }
  reverse(ans.begin(), ans.end());
 
  // ans
  int m = size(ans);
  cout << m << '\n';
  for (int i = 0; i < m; i++)
    cout << ans[i] << " \n"[i == m - 1];
  if (m == 0)
    cout << "\n";
}
 
signed main() {
  cin.tie(0)->sync_with_stdio(0);
  int t = 1;
  cin >> t;
  while (t--)
    solve();
}