(Analysis by Franklyn Wang and Dhruv Rohatgi)

We first ask ourselves, if the answer is $k$, what does that entail? If the answer is $k$, this means that none of the final $n - k$ change their relative order, and that they must already be in order.

This gives a lower bound on $k$. It turns out that it also gives the answer, since we can insert the other $n - k$ numbers into their correct positions in the last numbers $k$.

For an example: $(3, 4, 5, 2, (1, 6)) \rightarrow (4, 5, 2, (1, 3, 6)) \rightarrow (5, 2, (1, 3, 4, 6)) \rightarrow (2, (1, 3, 4, 5, 6)) \rightarrow (1, 2, 3, 4, 5, 6)$

Now we need to find a sequence of instructions of length $n-k$. The first instruction is the number of unsorted cows minus one, plus the number of cows in the sorted suffix with indices smaller than the first cow's index. In the above example, the first cow needs to move $3 + 1$ spaces down the line.

After this instruction, the first cow will become part of the sorted suffix, and we recurse. Unfortunately, a naive implementation of this algorithm will take $O(N^2)$ time in the worst case.

We can speed it up with a data structure that maintains a set $S \subseteq \{1,\dots,n\}$ and performs the following operations efficiently: (1) For some $x \in \{1,\dots,n\}$, insert $x$ into $S$; (2) for some $y \in \{1,\dots,n\}$, count the number of elements of $S$ which are smaller than $y$.

There are a number of data structures which can solve this; perhaps the simplest is a Fenwick tree, which supports point updates (add $v$ to element $i$ of an array $A$) and prefix sums (given some $i$, compute $A_1 + \dots + A_i$). Inserting an element $x$ corresponds to incrementing $A_x$, and counting the elements smaller than $y$ corresponds to computing $A_1 + \dots + A_{y-1}$.

Both operations take logarithmic time, and the algorithm performs $O(N)$ such operations, for an overall time complexity of $O(N \log N)$.

#include <iostream>
#include <algorithm>
using namespace std;
#define MAXN 100100
 
int T[MAXN];
 
void inc(int i)
{
	for(i++;i<MAXN;i+=(i&-i))
		T[i]++;
}
 
int getSum(int i)
{
	int c = 0;
	for(i++;i>0;i-=(i&-i))
		c += T[i];
	return c;
}
 
 
int p[MAXN];
 
int main()
{
	int N;
	cin >> N;
	for(int i=0;i<N;i++)
	{
		cin >> p[i];
		p[i]--;
	}
	int j = N-1;
	while(j > 0 && p[j-1] < p[j])
		j--;
	cout << j << '\n';
	for(int i=j;i<N;i++)
		inc(p[i]);
	for(int i=0;i<j;i++)
	{
		cout << (j - 1 - i) + getSum(p[i]);
		if(i < j - 1) cout << ' ';
		inc(p[i]);
	}
	cout << '\n';
	return 0;
}