Entropy Makes the Collatz Series Go DownBradley Berg November 21, 2018 Edited October 21, 2021ABSTRACTAn equivalent redefinition of the Collatz series is used that has three kinds of segments. Inputs to each segment have a string of zeros or ones in the low order bits represented as powers of two. In the corresponding output of each segment these terms are replaced by powers of three or are deleted. These input terms have no entropy and entropy is increased in outputs. As the series progresses entropy increases; randomizing the sequence. The low order two bits in inputs to each segment determine which of the three kinds of transitions to apply. As these bits become more random, the rate that each kind of segment occurs averages out. An average distribution causes the sequence to decrease. A linear counter example consists of an infinitely long sequence with transactions that produce ever increasing values. Values would need to be highly skewed in favor of odd numbers to sustain the increase. Even when the starting value is contrived to produce mostly odd (increasing) transitions, information encoded in the starting value is lost. Eventually even and odd values balance out and the sequence decreases. This makes it impossible to construct an infinitely long series. I. Overview A. Rewritten transitions for the Collatz sequence merge two or three steps: N is Odd: N' = (3 * N + 1) / 2 N % 4 = 0: N' = N / 4 N % 4 = 2: N' = (3 * N + 2) / 4 All intermediate values in this sequence have: N % 3 = 2 The low order two bits of N determine the type of the next transition. B. The average occurance and gain (output to input ratio) is: Occurrence Gain Limit Gain ^ Occurrence N is Odd: 50% 3/2 = 1.5 1.5^.5 = 1.22474 N % 4 = 0: 25% 1/4 = .25 .25^.25 = 0.70711 N % 4 = 2: 25% 3/4 = .75 .75^.25 = 0.9306 The gain of a sequence is the product of the gains for each transition. In the limit the average gain of a Collatz sequence is: Ga = 1.22474 * 0.70711 * 0.9306 = 0.80592 The average gain is under unity so on average Collatz sequences will steadily be reduced until terminated. Gains vary in actual sequences. C. A Collatz sequence will have segments with consecutive runs of each transition. Each of these chains have length k with these input and output forms: Chain Input Chain Output Gain N is Odd: j * 2^k - 1 j * 3^k - 1 1.5^k N % 4 = 0: j * 4^k j .25^k N % 4 = 2: j * 4^k + 2 j * 3^k + 2 .75^k Each chain takes a value where the low order bits are a string of zeros or ones and replaces or removes them. Inputs have a 2^k term that transition to 3^k or a 4^k term that is removed. Strings of zeros or ones have no entropy and the 3^k term has positive entropy. Entropy is added each time a chain of transitions is applied. In particular, each chain hashes the low order two bits that determine the type of the next chain. In a very long sequence each type of transition occurs closer to the average rate. In order for a Collatz sequence to avoid eventual termination it needs to either be circular or produce values in ever greater ranges. In the latter case any infinite sequence requires a gain over one to keep growing. This cannot be sustained indefinitely. Entropy ensures that it will drift towards the average gain of 0.8059 and eventually terminate. Over twice as many odd transitions than even are needed in order to achieve the minimum eternal growth rate; which is far askew from equity. In a long sequence the entropy of the low order bit would need to remain below 0.9183. Odd N % 4 = 0 N % 4 = 2 1.5^.674 * .25^.163 * .75^.163 = 1.00043 gain slightly over 1 The series self regulates in that as more odd transactions are applied the entropy goes up which pushes the number of odd transactions down towards the 50% average. In turn this lowers the average gain lowers until it descends below one leading to eventual termination. After the first few chains in even the most contrived runs, entropy measures above .99. This is well above the 0.9183 minimum value required for termination. Here is an example run with metrics for each step. II. Collatz Conjecture even N: N' = N / 2 odd N: N' = N * 3 + 1 The sequence eventually reduces to 1. notation: % modulus -> transition to the the next value in a sequence A. Rewritten Conjecture The original series is rewritten to produce only values where: N % 3 = 2 Two or three iterations are merged into each revised transition. Intermediate values that are skipped have: N % 3 = 0 or 1. They account for 2/3 of the values in the original series and skipping them simplifies analysis. The transition out of an odd number is always even so we can take 2 steps: N' = {N * 3 + 1} / 2 On an even number apply one of these two transitions: When N % 4 = 0 then: N' = N / 4 When N % 4 = 2 then: N' = ([N / 2] * 3 + 1) / 2 = {N * 3 + 2} / 4 The first even case merges two steps and the second takes three steps. You can determine the type of the next transition to apply by looking at the low two bits of N. Before starting the revised series the initial value may not meet the condition N % 3 = 2. In this case apply the original transitions until it does. This only takes a few steps at most. After that the revised series can be run. Eventually they reduce to two; which is trivially reduced to one using the original series. original: 13 -> 40 -> 20 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1 terminate revised: 13 advance 20 -> 5 -> 8 -> 2 terminate -> 1 This Windows program prints the revised Collatz series: collatz.exe Run from a command line: collatz 63_728_127 B. Transition Values Have N % 3 = 2 We briefly show that the revised transitions consume and produce values where N % 3 = 2. Odd: N' = (3 * N + 1) / 2 N = 3 * j + 2 Follows from: N % 3 = 2 = 3 * [2 * k + 1] + 2 For N to be odd, j must be odd. = 6 * k + 5 N' = (3 * [6 * k + 5] + 1) / 2 Substitute = ([18 * k + 15 ] + 1) / 2 = 9 * k + 8 N' % 3 = (9 * k + 8) % 3 = 2 N%4=0: N' = N / 4 N = 3 * j + 2 Follows from: N % 3 = 2 = 3 * [4 * k + 2] + 2 For N = 4j, j must be 4k+2 = 12 * k + 8 N' = [12 * k + 8] / 4 Substitute = 3 * k + 2 N' % 3 = (3 * k + 2) % 3 = 2 N%4=2: N' = (3 * N + 2) / 4 N = 3 * j + 2 Follows from: N % 3 = 2 = 3 * [4 * k] + 2 For N = 4j + 2, j must be 4k. = 12 * k + 2 N' = (3 * [12 * k + 2] + 2 ) / 4 Substitute = (36 * k + 8) / 4 = 9 * k + 2 N' % 3 = (9 * k + 2) % 3 = 2 Values where N % 3 = 0 or 1 occur as intermediate values within the new transitions. The number of possible values between transitions is reduced by 2/3. III. Advance the Initial Value An arbitrary starting number may not have N % 3 = 2. To begin the revised sequence we need to first advance from the starting number until the condition is met. notation: /\ bitwise and A. When N % 3 = 0 DO until N /\ 1: DO until the number is odd, N = N / 2; Advance one even transition. - N = (3 * N + 1) / 2; Advance an odd then even transition. Examples: 20 -> 10 -> 5 -> 8 36 -> 18 -> 9 -> 14 B. When N % 3 = 1 IF N /\ 1: IF N is odd, N = (3 * N + 1) / 2; Advance an odd then even transition. ELSE: ELSE N is even, N = N / 2; Advance one even transition. . Examples: 7 -> 11 10 -> 5 IV. Transition Chains There are often consecutive iterations of the same kind of transition in a sequence to form a chain. If you were to graph a series you'd see a tree where the branches consist of these chains. notation: ^ exponentiation A. By-Four Chain Where N % 4 = 0 A By-Four chain has inputs that are multiples of four. The length of the chain is determined by looking at the low order bits of the input value. The number of pairs of zero bits gives you the length of the chain. The first value of a By-Four chain can be written as: j * 4^k Then the last value is just j after k divisions by 4. B. Odd Chain Where N % 2 = 1 Odd chains consume an odd input and have odd intermediate values. The output is an even number. The number of consecutive low order 1 bits determines the chain length. For example an input of 19 is 10011 base 2 so the subsequent chain has two Odd transitions: 19 -> 29 -> 44 A chain of length k takes an input of the form: j * 2^k - 1 The output value is then: j * 3^k - 1 N(1) = (3 * N + 1) / 2 First transition = (3 * [j * 2^k - 1] + 1) / 2 Substitute N = j * 2^k - 1 = ([3 * j * 2^k - 3] + 1) / 2 = [3 * j * 2^k - 2] / 2 = 3^1 * j * 2^[k-1] - 1 N(i+1) = (3 * {3^i * j * 2^[k-i] - 1} + 1) / 2 Subsequent transitions = ({9 * j * 2^[k-i] - 3} + 1) / 2 = {9 * j * 2^[k-i] - 2} / 2 = 3^[i+1] * j * 2^[k-i-1] - 1 N(k) = 3^k * j - 1 Odd chain output C. Deuce Chain Where N % 4 = 2 The two low order bits of the input end with bits 10. The length of the chain can be determined by counting the pairs of zeros above them. The number of transitions is that count plus one. For example: 258 = 1_00_00_00_10 base 2 There are 3 pairs of zeros above the trailing 10 so there are 4 transitions in the chain and the output is 83. When there is an odd number of zeros the output next connects to a By-Four chain. 386 = 110_00_00_10 base 2 There are 2 pairs of zeros so there are 3 Deuce nodes before transitioning to a Deuce chain starting with 164. When the first value of a Deuce chain is written as: j * 4^k + 2 The last value is then: j * 3^k + 2 N(1) = (3 * N + 2) / 4 First transition = {3 * [j * 4^k + 2] + 2} / 4 Substitute N = j * 4^k + 2 = {[3 * j * 4^k + 6] + 2} / 4 = [3 * j * 4^k + 8] / 4 = 3^1 * j * 4^[k-1] + 2 N(i+1) = {3 * (3^i * j * 4^[k-i] + 2) + 2} / 4 Subsequent transitions = {(3^[i+1] * j * 4^[k-i] + 6) + 2} / 4 = {(3^[i+1] * j * 4^[k-i] + 8)} / 4 = 3^[i+1] * j * 4^[k-i-1] - 1 N(k) = 3^k * j + 2 By-Four chain output V. Transition Metrics A statistical argument for termination considers that on average values along a path decrease and eventually reduce to 1. The average change in each transition is represented using a gain factor. When the gain is under one this means that on average each transition is a reduction. This is similar to American Roulette where betting on red each round has odds that ensure you will lose an average of 7.26%; making the gain 0.9274 = (1 - 7.26%). Even if you have an initial lucky run with a gain under unity eventually you will lose all your money. A. Gain For Transitions The gain of a transition is the ratio, output / input. The gain for individual transitions is its formula divided by the input N. For large N the "+1" and "+2" terms become insignificant. They are dropped when discussing the limits of sequences. Output / Input Limit N is Odd: [(3 * N + 1) / 2] / N 1.5 N % 4 = 0: (N / 4) / N .25 N % 4 = 2: [(3 * N + 2) / 4] .75 The gain of a series of transitions is the product of the gains for each transition. A single kind of transition repeated k times has a final gain of: gain ^ k The gain of a sequence with different types of transitions depends on the number of times each type of transition was applied. For the revised sequences this would be: Gt = 1.5 ^ (# of Odd) * .25 ^ (# of By4) * .75 * (# of Deuce) More useful for analysis is the average gain. Instead of using the number of occurrences, the subtotals are divided by the total number of iterations. In other words we use the probability that each kind of transition occurred. Ga = 1.5 ^ p(Odd) * .25 ^ p(By4) * .75 ^ p(Deuce) For a run length of 100 iterations if 50 were Odd, 20 were By-Four, and 30 were Deuce then the average gain is: Ga = [1.5 ^ .5] * [.25 ^ .2] * [.75 ^ .3] = 0.85144 In an idealized Collatz sequence 50% would be Odd with Deuce and By-Four transitions each occurring 25% of the time. Ga = (3/2)^.5 * (1/4)^.25 * (3/4)^.25 = (27/64) ^ (1/4) = 0.80593 Performing many runs the actual average gains were: Range (3n+1)/2 n/4 (3n+2)/4 Ga to 10^6 0.5068 0.2501 0.2431 0.80969 to 10^9 0.5043 0.2502 0.2456 0.80815 1B to 2B 0.5039 0.2501 0.2458 0.80800 The average gain for an individual sequence can vary quite a bit. When there are none or few odd chains it is low. n (3n+1)/2 n/4 (3n+2)/4 1.5^pa * .25^pb * .75^pc 1_267_611_776 0 0.8236 0.1764 0.303485 As the portion of odd chains increases the average gain increases. n (3n+1)/2 n/4 (3n+2)/4 1.5^pa * .25^pb * .75^pc 1_005_925_919 0.5687 0.1400 0.2913 0.953995 B. Entropy In this context we use Shannon entropy to measure the uniformity of the occurrence of each kind of transition. Since the kind of transition is determined by the two low order bits we track the entropy of each. The entropy metric ranges from zero to one. For a sequence of bits the entropy is one if they are uniform and zero if they are all the same. p0 = (zero bits) / (total bits) p1 = (one bits) / (total bits) = 1 - p0 H = -p0 * log2( p0 ) + -p1 * log2( p1 ) To get a sense of how this applies, as we run a single Collatz sequence the entropy of each of the low two bits is computed between each chain. Entropy goes up as the bits change and goes down when they stay the same. This tells us how well the transitions are blended. In By-Four chains the type chain that follows is determined by j % 4. Entropy is unchanged as the upper bits reamain the same and only low order zeros with no entropy are shifted out. After Deuce and Odd chains the next transition depends on j and k. For these two cases the two low order bits of the output are hashed based on their inputs. Odd Deuce Chain In j * 2^k - 1 j * 4^k + 2 Chain Out j * 3^k - 1 j * 3^k + 2The 2^k and 4^k terms are all zeros with zero entropy and are transformed into 3^k with positive entropy. This adds entropy in the form of randomness to any individual Collatz sequence.Over time a very long sequence will move closer to the average and the sequence eventually terminates. A sequence has to run far askew from the averages to avoid convergence. For example to skew the frequency of By-Four and Deuce chains you would need: 50% Odd 6% By-Four 44% Deuce Gain=1.00396 A 6% occurrence is a very significant change; not a slight deviation. In this example entropy would be added in 94% of the chains making it exceedingly rare and short lived. It is well known that we only need to advance until the series reaches a number that is less than the starting point. After that you can use recursion to show it terminates. This implies that the effects of entropy need to take effect early on. Analyzing entropy beyond that would be pointless. It would remain high and would be misleading. Using truncated series with any even starting point the next step goes down and we are done. Also if the low bits of N are 01 (N % 4 = 1) we get: N = 4 * j + 1 -> 3 * (4 * j + 1) + 1 = 12 * j + 4 = 3 * j + 1 < N = 4 * j + 1 Now we only need consider remaining odd starting values for k > 2: j * 2^k - 1 -> j * 3^k - 1 where j is even C. Observations Here we give metrics for the first few orbits with various numbers of chains. There is a warm up period while enough entropy accumulates to dominate over the initial state. Adding one bit of entropy per Odd or Deuce chain then the length of the period is 4/3 * log2( n ). During this period entropy is biased downward. Also, since the goal is to run the sequence until it is lower than the starting value, any subsequent values are not critical. They will also have higher entropy giving an upward bias. Metrics are only taken for the region after the warm up period to the point where the sequence goes below the starting value. The values listed are the initial sequence value, the number of chains, the average entropy, and the minimum value. In the first section metrics are calculated starting with the fiftieth chain. Only orbits are listed with over 100 chains and the worst case (lowest) entropy. Orbits with average entropy over .98 are discarded. N Chains Average Minimum 1_004_910_504_406 101 0.97855 0.96440 1_006_847_123_698 102 0.97925 0.96764 1_009_507_654_930 103 0.97812 0.96226 1_016_869_791_785 104 0.97900 0.96511 1_010_222_525_375 105 0.97798 0.96309 1_018_973_403_257 106 0.97471 0.96171 1_011_458_120_662 107 0.97171 0.95689 1_019_225_482_855 108 0.97425 0.95522 1_016_511_374_619 109 0.97520 0.96012 1_009_097_630_686 110 0.97304 0.96162 1_012_606_892_238 111 0.97971 0.96898 1_016_251_181_895 112 0.97794 0.96485 1_006_442_994_665 113 0.97962 0.96026 1_009_074_973_979 114 0.97511 0.96435 1_009_300_305_682 115 0.97969 0.96373 1_004_039_432_955 121 0.97661 0.96880 1_017_723_198_430 122 0.97518 0.96323 1_004_753_675_742 123 0.97927 0.96378 1_019_270_409_215 124 0.97997 0.96124 1_012_163_792_146 125 0.97944 0.96564 1_019_057_497_270 127 0.97831 0.95842 1_009_934_311_788 142 0.97933 0.96564 This next chart tracks orbits in the same region with over 200 chains. It illustrates how longer orbits have higher entropy. For these longer orbits Entropy is substantially higher. N Chains Average Minimum 1_019_524_626_430 201 0.99810 0.99517 1_011_050_942_290 202 0.99705 0.99342 1_012_645_261_039 203 0.99184 0.98394 1_019_445_805_308 204 0.99306 0.97947 1_008_195_131_391 205 0.99358 0.98272 1_015_571_878_206 206 0.99509 0.98566 1_011_482_948_255 208 0.99471 0.99005 1_014_934_024_239 209 0.98914 0.96991 1_019_229_503_550 211 0.99248 0.97494 1_012_087_192_295 213 0.99760 0.99186 1_016_311_780_350 217 0.99659 0.99162 1_005_443_653_055 219 0.99367 0.98477 1_019_868_848_999 220 0.99796 0.99290 1_003_500_978_358 231 0.99349 0.98741 1_015_160_288_158 236 0.99463 0.98407 1_009_542_611_052 253 0.99174 0.98403 D. Disjoint Branch If a graph tree of all orbits was not fully connected there would be a path that never reached the root. The branch could either be cyclic or infinitely long. Cyclic branches would have repeating values whereas within acyclic branches all values are unique. To remain unique an acyclic disjoint branch would contain values from a pool of increasingly larger values. A characteristic of pseudo random number generators is that they can produce a sequence that repeats. If the Collatz sequence, acting a generator, was to repeat then the sequence would cycle and the branch would not be infinitely long. This raises the prospect of a cyclical counter example with a very long period. As with the gambler that does not quit while ahead, an infinite branch would succumb to the pressure of the house odds. To thwart them Odd transitions would need to be applied over 2/3 of the time. An infinite disjoint branch would contain more than twice as many odd values as even. Ga = 1.5^.672 * .25^.164 * .75^.164 = 0.99794 Just under breaking even Ga = 1.5^.674 * .25^.163 * .75^.163 = 1.00043 Just over breaking even To sustain these proportions in a long sequence the entropy of the low order bit would need to average below 0.9183. H = -1/3 * log2( 1/3 ) + -2/3 * log2( 2/3 ) = 0.9183 Another way to beat the odds is if By-Four chains were to never occur. Even and odd values could potentially appear half the time and the path would on average keep increasing. Ga = 1.5 ^ .5 * .25 ^ .0 * .75 ^ .5 = 1.06066 Long paths of any length can be contrived, but eventually will produce a By-Four chain. Actual paths can start off with very few By-Four chains for values (such as the path starting at 71), but eventually terminate. A tiny number of By-Four chains could occur and still have a gain over unity. A disjoint branch would have to meet this constraint. Allowing 4% By-Four chains (highly skewed from the 25% average) we get: Ga = 1.5 ^ .48 * .25 ^ .04 * .75 ^ .48 = 1.00108 A statistical argument alone is insufficient for a proof. Physical imperfections aside, the model for roulette assumes that each spin is an independent purely random event. This differs from the Collatz sequence where a given starting number produces a deterministic pseudo random sequence. The remaining issue concerns the distribution of even and odd values over an infinite disjoint branch. We need to show there is enough entropy so that there are fewer than twice as many odd values as evens in order for the series to converge.A starting value can be contrived to create a sequence of any desired fixed length. The initial value contains a finite amount of information. As each chain is run strings of ones or zeros with no entropy are replaced by a power of three with high entropy. After running each chain the information in the starting value becomes randomized until all of the information in the initial state is lost. Consequently an infinitely long chain cannot be created.The overall increase in entropy for odd chains forms a self-regulating mechanism. Any skew in the ratio of odd to even values moves to 50:50 due to the increase in entropy. Collatz chains become sufficiently randomized after the first few chains for even the most contrived runs. We've observed that orbits with over 100 chains have entropy is sustained above 0.975 with minimums over 0.95. This goes up in longer orbits and is over 0.995 with 200 chains. This is well above the 0.9183 level required for eternal growth. However, to be conclusive we'd need to prove that entropy rises in longer chains and has a lower bound. VI. Conclusion A revised sequence combines two or three steps in the original. N is Odd: N' = (3 * N + 1) / 2 N % 4 = 0: N' = N / 4 N % 4 = 2: (3 * N + 2) / 4 Values between each step have N % 3 = 2. The other 2/3 of values are subsumed within each transition. The low two bits of each value produced determine the next step. In the limit the gain in each of these transitions is 1.5, .25, and .75 respectively. Gains in a sequence of transitions is the product of each individual gain. The average gain with a uniform distribution of each type of transition is then: Ga = (3/2)^.5 * (1/4)^.25 * (3/4)^.25 = (27/64) ^ (1/4) = 0.80593 An orbit that never reduced and instead continuosly increases would require twice as many odd transitions as even. Ga = 1.5^.672 * .25^.164 * .75^.164 = 0.99794 Just under breaking even Repetitions of the revised steps form chains and combine to give: N is Odd: N = 2^K * j - 1 N' = 3^k * j - 1 N % 4 = 0: N = 4^k * j N' = j N % 4 = 2: N = 4^K * j + 2 N' = 3^k * j + 2 Starting values of interest are those that do not immediately reduce. 2 * j * 3^k - 1 where k > 2 The entropy of a bit stream is: H = -p0 * log2( p0 ) + -p1 * log2( p1 ) The entropy required for an orbit that never reduced would be: H = -1/3 * log2( 1/3 ) + -2/3 * log2( 2/3 ) = 0.9183 A disjoint branch needs to on average always increase. To do so it needs to consist primarily of Odd chains. However Odd chains increase entropy, reducing the number times they occur. A branch that started with a high proportion of Odd chains would eventually converge due to increasing entropy, making it impossible to form an infinite disjoint branch. We measure the entropy of each of the low two bits between each chain. Higher entropy implies a more uniform distribution of the kinds of each transition. Observed entropy is over .995 in orbits with over 200 chains; well over the 0.9183 minimum. Such orbits will eventually reduce below their starting value; supporting the Collatz conjecture. Entropy is more than a statistical metric. Here it acts as a force that in part determines the next transition to apply. It randomizes the lower two bits of the revised sequence resulting in a more uniform blend of transactions. The source of the entropy is the change in sequence values from 2^k to 3^k. Part of what makes the Collatz sequence intractable is the randomization of the values it produces. Ironically randomization is the feature that moves it towards termination.