Processing math: 100%
(Analysis by Benjamin Qi)

Observe that the edges iai induce a directed graph where every vertex has out-degree 1. This is known as a functional graph. We can solve the problem for each connected component of the graph independently, so for the remainder of the analysis, we will assume the graph consists of a single connected component.

Call cow i inactive if it contributes 0 to the collective pleasure value rather than vi. From the sample case, among those cows on a simple cycle, it is easy to see that at least one of the cows must be inactive. Consider the cow c in the cycle that occurs latest in p. Then either ac either has not departed her farm already (in which case ac is inactive) or she has (in which case c is inactive).

As a connected component in a functional graph always contains exactly one simple cycle, the answer must be at most the sum of all vi minus the minimum vi among that cycle. Furthermore, we can always construct p that achieves this bound. The construction is as follows:

  1. Let cow c be the cow corresponding to the minimum vi along the cycle.
  2. Any permutation p such that i appears earlier than ai in p for all ic suffices.
  3. After removing edge cac from the component, the component no longer contains a cycle. Therefore, the remaining edges form a directed tree rooted at c. So it is always possible to construct p (ex. DFS backward through the tree and add each node to the front of p when it's traversed for the first time).

In the code below, for each connected component I use Floyd's algorithm to detect a vertex along the cycle. After that, I mark every vertex in the connected component as visited. As each connected component is processed in time proportional to its size, the runtime is O(N).

#include <bits/stdc++.h>
using namespace std;

template <class T> using V = vector<T>;
#define all(x) begin(x), end(x)

vector<int> a, v;
vector<vector<int>> child;
vector<bool> done;

void mark_as_done(int x) {
	if (done[x]) return;
	done[x] = true;
	for (int c : child[x]) mark_as_done(c);
}

int solve(int start) {
	int x = start, y = start;
	do {
		x = a[x], y = a[a[y]];
	} while (x != y);
	int min_along_cycle = INT_MAX;
	do {
		min_along_cycle = min(min_along_cycle, v[x]);
		x = a[x];
	} while (x != y);
	mark_as_done(x);
	return min_along_cycle;
}

int main() {
	ios::sync_with_stdio(false);
	cin.tie(nullptr);
	int N;
	cin >> N;
	a.resize(N + 1);
	v.resize(N + 1);
	child.resize(N + 1);
	int64_t ans = 0;
	for (int i = 1; i <= N; ++i) {
		cin >> a[i] >> v[i];
		ans += v[i];
		child[a[i]].push_back(i);
	}
	done.resize(N + 1);
	for (int i = 1; i <= N; ++i)
		if (!done[i]) ans -= solve(i);
	cout << ans << "\n";
}

Alternatively, if you are familiar with Gold topics, the answer is just the weight of a maximum spanning forest of the graph (treating the edges as undirected), which can be computed with Kruskal's algorithm.

#include <bits/stdc++.h>
using namespace std;

struct DSU {
	vector<int> e;
	void init(int N) { e = vector<int>(N, -1); }
	int get(int x) { return e[x] < 0 ? x : e[x] = get(e[x]); }
	bool unite(int x, int y) {
		x = get(x), y = get(y);
		if (x == y) return 0;
		if (e[x] > e[y]) swap(x, y);
		e[x] += e[y];
		e[y] = x;
		return 1;
	}
};

int main() {
	ios::sync_with_stdio(false);
	cin.tie(nullptr);
	int N;
	cin >> N;
	vector<tuple<int, int, int>> edges;
	for (int i = 1; i <= N; ++i) {
		int a, v;
		cin >> a >> v;
		edges.push_back({v, i, a});
	}
	sort(edges.rbegin(), edges.rend());
	DSU D;
	D.init(N + 1);
	int64_t ans = 0;
	for (auto [v, x, y] : edges)
		if (D.unite(x, y)) ans += v;
	cout << ans << "\n";
}

Bonus: Solve the problem when the vi can be negative.