Subtask 1: $O(N^3)$
Consider each cyclic shift independently. The minimum number of swaps needed to sort a permutation is the number of inversions, which is the number of pairs of indices $i<j$ where $p_i>p_j$. We can check all such pairs for each cyclic shift in $O(N^2)$ time, resulting in $O(N^3)$ overall complexity.
Subtask 2: $O(N^2)$
When performing a cyclic shift, only $N-1$ pairs are affected. Hence, we can calculate the number of inversions once in the original permutation and update for each cyclic shift in $O(N)$ time, resulting in a total complexity of $O(N^2)$.
Full Solution 1: $O(N\log{N})$
We can find the number of initial inversions in $O(N\log{N})$ time using a segment tree, ordered set, merge sort, or any other inversion counting algorithm. When the permutation is cyclically shifted to the right, moving the element with value $i$ from the back to the front, $N-i$ inversions are removed and $i-1$ new inversions are created, resulting in a net change of $2i-N-1$ inversions, which can be calculated in $O(1)$ time. The reasoning is the same as in this past USACO problem.
It remains to calculate $S=\sum|p_i-i|$ for each cyclic shift $k$ times to the right. To do this, iterate on each value $j$ in the permutation. For some range of $k$, $j$ will located at a position $i<j$, so the contribution $S$ will be $j-i$, and for the remaining values of $k$ the contribution will be $i-j$. Since $i$ depends piecewise linearly on $k$, the contribution from $j$ can be expressed as a piecewise linear function in terms of $k$. We can use prefix sums to compute the sum of linear functions for all $k$, allowing us to compute $S$ for all $k$ in $O(N)$ time.
Full Solution 2: $O(N)$
$p$ is good iff it avoids the subsequence 321 (i.e., no $i<j<k$ such that $p_i>p_j>p_k$).
Proof:
Now, for each element $j$ find the nearest greater element to the left $i$ and the nearest smaller element $k$ to the right (cyclically) using monotonic stacks. If the cyclic shift contains $i$, $j$, and $k$ in this order, then it is bad. We can use prefix sums to mark which intervals of shifts are bad.
Akshaj's C++ code:
#include <bits/stdc++.h>
using namespace std;
void solve() {
int n;
cin >> n;
vector<int> p(n);
for (auto &i : p)
cin >> i;
// solve
stack<int> s;
vector<int> nl(n, -1);
for (int j = 0; j < 2; j++) {
for (int i = 0; i < n; i++) {
while (s.size() && p[i] >= p[s.top()])
s.pop();
if (j == 1 && s.size())
nl[i] = s.top();
s.push(i);
}
}
while (s.size())
s.pop();
vector<int> nr(n, -1);
for (int j = 0; j < 2; j++) {
for (int i = n - 1; i >= 0; i--) {
while (s.size() && p[i] <= p[s.top()])
s.pop();
if (j == 1 && s.size())
nr[i] = s.top();
s.push(i);
}
}
vector<int> ps(2 * n + 1);
for (int i = 0; i < n; i++) {
int l = nl[i], r = nr[i];
if (min(l, r) == -1)
continue;
if ((l < i) ^ (i < r) ^ (l < r)) {
if (l < r)
l += n;
ps[r + 1]++;
ps[l + 1]--;
}
}
for (int i = 1; i <= 2 * n; i++)
ps[i] += ps[i - 1];
vector<int> ans;
for (int i = 1; i <= n; i++) {
if (max(ps[i], ps[i + n]) == 0)
ans.push_back(n - i);
}
reverse(ans.begin(), ans.end());
// ans
int m = size(ans);
cout << m << '\n';
for (int i = 0; i < m; i++)
cout << ans[i] << " \n"[i == m - 1];
if (m == 0)
cout << "\n";
}
signed main() {
cin.tie(0)->sync_with_stdio(0);
int t = 1;
cin >> t;
while (t--)
solve();
}