Processing math: 100%
(Analysis by Dhruv Rohatgi )

Consider any magical configuration. Let m be the minimum number of cows in any stack, and consider some stack i achieving this minimum. Then the m1 stacks before it will all "contribute": that is, one cow from each of these stacks will land on stack i. But of course stack i contributes to itself, so we have already accounted for all m cows which land on stack i. This means that stack im (wrapping around the index if necessary) cannot contribute to stack m. So stack im must also have only m cows. Applying the above proof repeatedly, we see that if ji(modg), where g=gcd(N,m), then stack j must have only m cows.

We will show inductively that none of stacks i1,i2,,ig+1 can have more than m+1 cows. Start with stack i1. If it had more than m+1 cows, then it would contribute to stack i+m. But we know that stack i+m has only m cows, and we know that these m cows come from stacks i+1,i+2,,i+m. So this is impossible.

More generally, consider stack ik for some k>0. There are two cases.

If stack ik+1 has m cows, then the logic we described for stack i1 applies: stack ik cannot contribute to stack ik+m+1, so it must have at most m+1 cows.

If on the other hand stack ik+1 does not have m cows, then by our inductive hypothesis it must have m+1 cows. This implies that every stack j, where jik+1(modg), must have exactly m+1 cows: by a parallel induction, we know that every such stack can have at most m+1 cows, and by the previous periodicity fact we proved above, if any such stack had m cows then stack ik+1 would also have m cows.

So in particular, stack ik+m+1 has exactly m+1 cows. We know that each of the stacks ik+2,ik+3,,ik+m+1 contribute to stack ik+m+1, simply because they all have at least m cows. And we know that stack ik+1 contributes, since it has m+1 cows. So stack ik must not contribute to stack ik+m+1. We conclude that stack ik cannot have more than m+1 cows.

This argument shows that every stack has either m or m+1 cows. Together with the periodicity fact, this means that for every j, stack j has the same number of cows as stack j+g. So the configuration is periodic with period g. It is not hard to verify that any configuration satisfying these two properties is magical.

Now that we have characterized magical configurations, it remains to count them. Fix some m, and assume that m<N. Then by our characterization above, there are 2gcd(N,m)1 magical configurations for which the minimum number of cows in any stack is m. Taking care of the case m=N, the total number of magical configurations is

22N+Nm=1(2gcd(m,N)1).
Calculating this sum directly is too slow, and only receives partial credit. To speed it up, observe that for a fixed gcd g, the summand 2g is fixed. Furthermore, the number of times this summand appears in the sum is the number of m with 1mN and gcd(m,N)=g. Equivalently, it is the number of m with 1mNg and gcd(m,Ng)=1. But this is precisely φ(Ng), the Euler totient function of Ng. Therefore the sum is equal to
2N2N+gN2gφ(Ng).

To efficiently compute this sum, we start by prime factorizing N in O(N) time: simply divide out all prime divisors of magnitude at most N; the remaining number must be either 1 or prime, since N cannot have multiple prime factors of magnitude greater than N.

Now the O(N) divisors of N can be enumerated quickly. For each divisor, we use fast exponentiation to compute 2g, and we compute the totient function using the formula

φ(pe11pe22peii)=pe111pe212pei1i(p11)(p21)(pi1).

A simple (though by no means tight) bound on the overall time complexity is O(NlogN). Below is an implementation of the algorithm described above. Note that depth-first search is used to iterate over the divisors of N, allowing the totient function to be computed with only constant overhead.

#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;
#define MOD 1000000007
 
 
vector<long long> p;
vector<int> e;
int ans;
long long origN;
 
int fexp(int a,long long e)
{
	if(e==0) return 1;
	int tmp = fexp(a,e/2);
	tmp = (tmp*((long long)tmp))%MOD;
	if(e&1) tmp = (tmp*((long long)a))%MOD;
	return tmp;
}
 
long long gcd(long long a,long long b)
{
	if(b==0) return a;
	return gcd(b,a%b);
}
 
void dfs(int i,long long cdiv, long long sdiv, long long smult)
{
	if(i == p.size())
	{
		if(cdiv < origN)
			ans = (ans + fexp(2,cdiv)*((long long)((origN/(cdiv*sdiv))*smult)))%MOD;
		return;
	}
	for(int j=0;j<e[i];j++)
	{
		dfs(i+1,cdiv,sdiv*p[i],smult*(p[i]-1));
		cdiv *= p[i];
	}
	dfs(i+1,cdiv,sdiv,smult);
}
 
int main()
{
	long long N;
	cin >> N;
	origN = N;
	int i = 2;
	long long bound = N;
	for(i=2;i*((long long)i) < bound;i++)
		if(N%i == 0)
		{
			int mult = 0;
			while(N%i == 0)
			{
				mult++;
				N /= i;
			}
			p.push_back(i);
			e.push_back(mult);
		}
	if(i*((long long)i) == bound && N%i == 0)
	{
		int mult = 0;
		while(N%i == 0)
		{
			mult++;
			N /= i;
		}
		p.push_back(i);
		e.push_back(mult);
	}
	if(N > 1)
	{
		p.push_back(N);
		e.push_back(1);
	}
	dfs(0,1,1,1);
	ans = (ans + MOD - (origN - 1)%MOD)%MOD;
	ans = (ans+1)%MOD;
	cout << ans << '\n';
}