Consider any magical configuration. Let $m$ be the minimum number of cows in any stack, and consider some stack $i$ achieving this minimum. Then the $m-1$ stacks before it will all "contribute": that is, one cow from each of these stacks will land on stack $i$. But of course stack $i$ contributes to itself, so we have already accounted for all $m$ cows which land on stack $i$. This means that stack $i-m$ (wrapping around the index if necessary) cannot contribute to stack $m$. So stack $i-m$ must also have only $m$ cows. Applying the above proof repeatedly, we see that if $j \equiv i \pmod{g}$, where $g = \gcd(N,m)$, then stack $j$ must have only $m$ cows.

We will show inductively that none of stacks $i-1, i-2, \dots, i-g+1$ can have more than $m+1$ cows. Start with stack $i-1$. If it had more than $m+1$ cows, then it would contribute to stack $i+m$. But we know that stack $i+m$ has only $m$ cows, and we know that these $m$ cows come from stacks $i+1, i+2, \dots, i+m$. So this is impossible.

More generally, consider stack $i-k$ for some $k>0$. There are two cases.

If stack $i-k+1$ has $m$ cows, then the logic we described for stack $i-1$ applies: stack $i-k$ cannot contribute to stack $i-k+m+1$, so it must have at most $m+1$ cows.

If on the other hand stack $i-k+1$ does not have $m$ cows, then by our inductive hypothesis it must have $m+1$ cows. This implies that every stack $j$, where $j \equiv i-k+1 \pmod{g}$, must have exactly $m+1$ cows: by a parallel induction, we know that every such stack can have at most $m+1$ cows, and by the previous periodicity fact we proved above, if any such stack had $m$ cows then stack $i-k+1$ would also have $m$ cows.

So in particular, stack $i-k+m+1$ has exactly $m+1$ cows. We know that each of the stacks $i-k+2, i-k+3, \dots, i-k+m+1$ contribute to stack $i-k+m+1$, simply because they all have at least $m$ cows. And we know that stack $i-k+1$ contributes, since it has $m+1$ cows. So stack $i-k$ must not contribute to stack $i-k+m+1$. We conclude that stack $i-k$ cannot have more than $m+1$ cows.

This argument shows that every stack has either $m$ or $m+1$ cows. Together with the periodicity fact, this means that for every $j$, stack $j$ has the same number of cows as stack $j+g$. So the configuration is periodic with period $g$. It is not hard to verify that any configuration satisfying these two properties is magical.

Now that we have characterized magical configurations, it remains to count them. Fix some $m$, and assume that $m < N$. Then by our characterization above, there are $2^{\gcd(N,m)} - 1$ magical configurations for which the minimum number of cows in any stack is $m$. Taking care of the case $m = N$, the total number of magical configurations is

$$2 - 2^N + \sum_{m=1}^N \left ( 2^{\gcd(m,N)} - 1 \right ).$$

Calculating
this sum directly is too slow, and only receives partial credit. To speed it up,
observe that for a fixed gcd $g$, the summand $2^g$ is fixed. Furthermore, the
number of times this summand appears in the sum is the number of $m$ with
$1 \leq m \leq N$ and $\gcd(m,N) = g$. Equivalently, it is the number of $m'$
with $1 \leq m' \leq \frac{N}{g}$ and $\gcd(m', \frac{N}{g}) = 1$. But this is
precisely $\varphi(\frac{N}{g})$, the Euler totient function of $\frac{N}{g}$.
Therefore the sum is equal to
$$2 - N - 2^N + \sum_{g \mid N} 2^g \varphi(\frac{N}{g}).$$

To efficiently compute this sum, we start by prime factorizing $N$ in $O(\sqrt{N})$ time: simply divide out all prime divisors of magnitude at most $\sqrt{N}$; the remaining number must be either $1$ or prime, since $N$ cannot have multiple prime factors of magnitude greater than $\sqrt{N}$.

Now the $O(\sqrt{N})$ divisors of $N$ can be enumerated quickly. For each divisor, we use fast exponentiation to compute $2^g$, and we compute the totient function using the formula

$$\varphi(p_1^{e_1}p_2^{e_2}\cdots p_i^{e_i}) = p_1^{e_1-1}p_2^{e_2-1}\cdots p_i^{e_i-1}(p_1-1)(p_2-1)\cdots(p_i-1).$$

A simple (though by no means tight) bound on the overall time complexity is $O(\sqrt{N}\log N)$. Below is an implementation of the algorithm described above. Note that depth-first search is used to iterate over the divisors of $N$, allowing the totient function to be computed with only constant overhead.

#include <iostream> #include <algorithm> #include <vector> using namespace std; #define MOD 1000000007 vector<long long> p; vector<int> e; int ans; long long origN; int fexp(int a,long long e) { if(e==0) return 1; int tmp = fexp(a,e/2); tmp = (tmp*((long long)tmp))%MOD; if(e&1) tmp = (tmp*((long long)a))%MOD; return tmp; } long long gcd(long long a,long long b) { if(b==0) return a; return gcd(b,a%b); } void dfs(int i,long long cdiv, long long sdiv, long long smult) { if(i == p.size()) { if(cdiv < origN) ans = (ans + fexp(2,cdiv)*((long long)((origN/(cdiv*sdiv))*smult)))%MOD; return; } for(int j=0;j<e[i];j++) { dfs(i+1,cdiv,sdiv*p[i],smult*(p[i]-1)); cdiv *= p[i]; } dfs(i+1,cdiv,sdiv,smult); } int main() { long long N; cin >> N; origN = N; int i = 2; long long bound = N; for(i=2;i*((long long)i) < bound;i++) if(N%i == 0) { int mult = 0; while(N%i == 0) { mult++; N /= i; } p.push_back(i); e.push_back(mult); } if(i*((long long)i) == bound && N%i == 0) { int mult = 0; while(N%i == 0) { mult++; N /= i; } p.push_back(i); e.push_back(mult); } if(N > 1) { p.push_back(N); e.push_back(1); } dfs(0,1,1,1); ans = (ans + MOD - (origN - 1)%MOD)%MOD; ans = (ans+1)%MOD; cout << ans << '\n'; }