How Prime Factorizations govern the Collatz Conjecture

By applying some optimizations to the Collatz Conjecture, we can reduce noise that obscures the patterns that produce the sequences. It becomes apparent that the sequences generated are related to prime factorizations: the prime factorization of the number itself for decending, whereby all 2's are removed; or the prime factorization of the next higher even number for ascending, whereby twos and threes are traded off.

Herein, I'll lay out the math, and show the interesting things it tells us about the Conjecture.

Contents

1. Get Rid of Noise

The Collatz is iterative. The usual rule to get the next value is, for any positive integer c :

Collatz ( c ) = { 3 c + 1 if c is odd c 2 if c is even

The conjecture is that if we repeat this enough times, we will always arrive at 1.

An odd c always produces an even number, so subsequently we must always execute the "even" half of the equation at least once. So, as any good engineer would do, let's optimize by getting rid of the unnecessary iteration and the noise it generates:

Collatz ( c ) = { 3 c + 1 2 if c is odd c 2 if c is even
With a bit of factoring, this becomes:
Collatz ( c ) = { 3 2 · c + 1 2 if c is odd c 2 if c is even

In this last form, the "odd" portion of the equation always produces fractional half, before adding ½. (An odd times an odd is always odd, so the division results in the fractional ½). We can get rid of this by using c + 1 as the input and making the appropriate adjustments. Note: since we're focusing on the the odd portion of the equation, I'm going to only show that portion through here.

odd(c) = 3 2 c + 1 2 = ( 3 2 ) · ( c + 1 ) + 1 2 3 2 = 3 2 ( c + 1 ) 1

Note that the last term is -1 . If the successive iteration is also odd, the first step of which is to add 1, these effectively cancel out.

odd(odd(c)) = 3 2 ( 3 2 ( c + 1 ) 1 + 1 ) 1 = 3 2 ( 3 2 ( c + 1 ) ) 1 = ( 3 2 ) 2 · ( c + 1 ) 1

That is, we can redefine "odd" algorithmically:

  1. Add 1.
  2. While this value is even, divide by 2 and multiply by 3.
  3. Subtract 1.

Or, to write it mathematically, if we knew how many times to repeat n:

odd(c) = ( 3 2 ) n · ( c + 1 ) 1
So how do we know the value of n?
  • c is odd → c + 1 is even, and thus the prime factorization of c + 1 contains at least one 2.
  • c + 1 is even → 3 2 ( c + 1 ) is whole.
  • Looking at the prime factorization if c + 1 , multiplying by 3 2 in effect replaces a 2 with a 3.
  • ( c + 1 ) mod 4 = 0 3 2 ( c + 1 ) is even, since there were two 2s in the prime factorization of c + 1 , leaving one remaining.
  • ( c + 1 ) mod 8 = 0 3 2 ( c + 1 ) mod 4 = 0 .
  • ( c + 1 ) mod 16 = 0 3 2 ( c + 1 ) mod 8 = 0 .
  • ...and so forth.

So, generally, n = # of 2s in prime factorization of c + 1 .

1.1. So what's this telling us

  • For any c the number c + 1 governs whether it can ascend, and how far. I will call this the "control number".
  • An ascending Collatz series is related by the prime factorizations of the corresponding control numbers, c + 1 . These prime factorizations will all be the same overall size, and will contain the same set of numbers, with only the numbers of 2s and 3s varying.
  • The combined number of 2s and 3s in the prime factorization c + 1 tells us the length of this "ascending chain".
  • The numbers of 2s in the prime factorization c + 1 tells us how many links are left to reach the upper end of the ascending chain. If there are no 2s, we are at the upper bound of the ascending chain.
  • If there are neither 2s nor 3s in c + 1 , then c is not part of an ascending chain.

1.2. Example

For clarity, let's review some examples:

15
Collatz     Control         Ctrl factors
  15:       16 = (start)    2^4   
  23:       24 = (Inc,3)    2^3   *  3^1   
  35:       36 = (Inc,3)    2^2   *  3^2   
  53:       54 = (Inc,3)    2^1   *  3^3   
  80:       81 = (Inc,3)    3^4   
  40:       41 = (dec  )   41^1   
  20:       21 = (Dec,3)    3^1   *  7^1   
  10:       11 = (dec  )   11^1   
   5:        6 = (Dec,3)    2^1   *  3^1   
   8:        9 = (Inc,3)    3^2   
   4:        5 = (dec  )    5^1   
   2:        3 = (Dec,3)    3^1   
   1:        2 = (dec  )    2^1   
With the "standard" Collatz, the sequence from 15 would be 15, 46, 23, 70, 35, 106, 53, 160, 80, 40, 20, 10, etc. Since we debounced it, we eliminate the unnecessary and confusing waypoints, leaving us with 15, 46, 23, 70, 35, 106, 53, 160, 80 in the first ascending sequence. If we look at the prime factorizations of the next highers numbers—16, 24, 36, 54, 81—we see they are identical, except for the gradual replacing of 2s by 3s.

Lest you think this is cherry-picked data, let's look at some output from a random number 5-digit number I picked. In this variation, we won't do a full prime factorization, instead just showing the 2s, 3s, and whatever is leftover. To see some long chains, look for 335, 319, and 911.

17835
     Collatz   Control
incr  17835     17836 = 2^2        * 4459^1   
      26753     26754 = 2^1 *  3^1 * 4459^1   
      40130     40131 =        3^2 * 4459^1   
decr  40130     40130 = 2^1        * 20065^1   
      20065     20065 =              20065^1   
incr  20065     20066 = 2^1        * 10033^1   
      30098     30099 =        3^1 * 10033^1   
decr  30098     30098 = 2^1        * 15049^1   
      15049     15049 =              15049^1   
incr  15049     15050 = 2^1        * 7525^1   
      22574     22575 =        3^1 * 7525^1   
decr  22574     22574 = 2^1        * 11287^1   
      11287     11287 =              11287^1   
incr  11287     11288 = 2^3        * 1411^1   
      16931     16932 = 2^2 *  3^1 * 1411^1   
      25397     25398 = 2^1 *  3^2 * 1411^1   
      38096     38097 =        3^3 * 1411^1   
decr  38096     38096 = 2^4        * 2381^1   
      19048     19048 = 2^3        * 2381^1   
      9524       9524 = 2^2        * 2381^1   
      4762       4762 = 2^1        * 2381^1   
      2381       2381 =              2381^1   
incr  2381       2382 = 2^1 *  3^1 * 397^1   
      3572       3573 =        3^2 * 397^1   
decr  3572       3572 = 2^2        * 893^1   
      1786       1786 = 2^1        * 893^1   
      893         893 =              893^1   
incr  893         894 = 2^1 *  3^1 * 149^1   
      1340       1341 =        3^2 * 149^1   
decr  1340       1340 = 2^2        * 335^1   
      670         670 = 2^1        * 335^1   
      335         335 =              335^1   
incr  335         336 = 2^4 *  3^1 *  7^1   
      503         504 = 2^3 *  3^2 *  7^1   
      755         756 = 2^2 *  3^3 *  7^1   
      1133       1134 = 2^1 *  3^4 *  7^1   
      1700       1701 =        3^5 *  7^1   
decr  1700       1700 = 2^2        * 425^1   
      850         850 = 2^1        * 425^1   
      425         425 =              425^1   
incr  425         426 = 2^1 *  3^1 * 71^1   
      638         639 =        3^2 * 71^1   
decr  638         638 = 2^1        * 319^1   
      319         319 =              319^1   
incr  319         320 = 2^6        *  5^1   
      479         480 = 2^5 *  3^1 *  5^1   
      719         720 = 2^4 *  3^2 *  5^1   
      1079       1080 = 2^3 *  3^3 *  5^1   
      1619       1620 = 2^2 *  3^4 *  5^1   
      2429       2430 = 2^1 *  3^5 *  5^1   
      3644       3645 =        3^6 *  5^1   
decr  3644       3644 = 2^2        * 911^1   
      1822       1822 = 2^1        * 911^1   
      911         911 =              911^1   
incr  911         912 = 2^4 *  3^1 * 19^1   
      1367       1368 = 2^3 *  3^2 * 19^1   
      2051       2052 = 2^2 *  3^3 * 19^1   
      3077       3078 = 2^1 *  3^4 * 19^1   
      4616       4617 =        3^5 * 19^1   
decr  4616       4616 = 2^3        * 577^1   
      2308       2308 = 2^2        * 577^1   
      1154       1154 = 2^1        * 577^1   
      577         577 =              577^1   
incr  577         578 = 2^1        * 289^1   
      866         867 =        3^1 * 289^1   
decr  866         866 = 2^1        * 433^1   
      433         433 =              433^1   
incr  433         434 = 2^1        * 217^1   
      650         651 =        3^1 * 217^1   
decr  650         650 = 2^1        * 325^1   
      325         325 =              325^1   
incr  325         326 = 2^1        * 163^1   
      488         489 =        3^1 * 163^1   
decr  488         488 = 2^3        * 61^1   
      244         244 = 2^2        * 61^1   
      122         122 = 2^1        * 61^1   
      61           61 =              61^1   
incr  61           62 = 2^1        * 31^1   
      92           93 =        3^1 * 31^1   
decr  92           92 = 2^2        * 23^1   
      46           46 = 2^1        * 23^1   
      23           23 =              23^1   
incr  23           24 = 2^3 *  3^1 
      35           36 = 2^2 *  3^2 
      53           54 = 2^1 *  3^3 
      80           81 =        3^4 
decr  80           80 = 2^4        *  5^1   
      40           40 = 2^3        *  5^1   
      20           20 = 2^2        *  5^1   
      10           10 = 2^1        *  5^1   
      5             5 =               5^1   
incr  5             6 = 2^1 *  3^1 
      8             9 =        3^2 
decr  8             8 = 2^3 
      4             4 = 2^2 
      2             2 = 2^1 
      1             1 =

1.3. Looking at this in reverse

I said earlier that the length of an ascending chain is equal to the combined 2s and 3s in the factorization of the control number c + 1 , and that the number of 2s indicates how many steps remain to ascend.

Conversely, the 3s represent the number of prior (i.e. lower) links in the chain.

Any number c of course "comes from" 2 c . Until now, however, there was no way to tell if c also "comes from" ( 2 c 1 ) ÷ 3 except by trial-and-error: that portion of the Collatz only went forward.

By looking at the factorization of control numbers c + 1 , it is evident whether there are earlier links, and how many of them, from which it is possible to "come from":

  • If the factorization of c + 1 contains no threes, then c is not arrived at from a lower number.
  • If the factorization of c + 1 does contain threes, then the number of threes indicates the number of lower values which ascend to this one.
  • You can find these lower control numbers by changing 3s in the factorization of c + 1 to twos and multiplying out; subtract 1 for the corresponding Collatz points.

So, again picking a number at random: 18935. Our "control" number will thus be 189361 , whose factorization is 2 3 · 3 2 · 263 .

Changing the 3s to 2s, we have 2 5 · 263 = 8416 , and changing the 2s to 3s, we have 3 5 · 263 = 63909 . This ascension chain will be length 5, starting at Collatz value 8415 and ending at 63908.

8415
     8415:   8416 = (start)  2^5 * 263^1   
    12623:  12624 = (Inc,3)  2^4 *  3^1 * 263^1   
    18935:  18936 = (Inc,3)  2^3 *  3^2 * 263^1   
    28403:  28404 = (Inc,3)  2^2 *  3^3 * 263^1   
    42605:  42606 = (Inc,3)  2^1 *  3^4 * 263^1   
    63908:  63909 = (Inc,3)  3^5 * 263^1   

2. Throw it away and start again

Now that we understand the problem better, let's look at it in a fresh light.

  • Elements of an ascending chain have prime factorizations that have identical contents for numbers >= 4, and have the same sum of 2s and 3s.
  • Elements of a descending chain have the identical prime factorization, except for the number of 2s.

What we really have is an interesting game of playing with prime factorizations. Instead of seeing all the individual steps, we should look at what happens when our modified Collatz switches between ascending and descending modes.

  1. If the starting number is odd, go to step 3.
  2. Apply a descending chain by taking the minimum of the related descending chain values: the one with no 2s in the prime factorization.
  3. If the number is 1, stop.
  4. Swap out the prime factors for new ones by adding 1.
  5. Apply an ascending chain by taking the maximum of the related ascending chain values, the one with no 2s in the prime factorization.
  6. Swap out the prime factors for new ones by subtracting 1.
  7. Go to step 2.
Here are some relevant observations about prime factorizations:
  1. For any number n , the closest number sharing any single value of its prime factorization will be n ± minimum-PF-member .
  2. Since 1 is never part of a prime factorization, the primary factorizations of n and n + 1 are mutually exclusive; they cannot contain any of the same numbers.
  3. Moreover, the prime factorizations of n and n + 1 cannot be modified in a consistent way that they will occur next to eachother ever again. Looking at 15:16, for example ([3,5]:[2^4]), if we make the same change---say, adding another 2--- the resulting values, 30:32, are no longer next to each other. 2 
Applying this to the chains:
  • There will be exactly one value in ascending chain with a prime factorization devoid of 3s, and similarly exactly one devoid of 2s.
  • There will be 1 or more values in ascending chains with a prime factorization containing 3s, and 1 or more containing 2s.
  • There are thus at least 2 values in ascending chains.
  • Since the end value of an ascending chain always contains at least one 3, the prime factorization of the subsequent descending chain will never contain 3.
Imagine we are in our loop at the end of step 2 (or step 5). Iterate once more, back to the same point: we are guaranteed not to return to the same value. In step 4 (or 6) we swap our prime factors. In step 5 (or 2) we move to the end of that chain. When we swap prime factors in step 6 (or 4), observation #3 guarantees we won't end up back in the original chain, and thus applying step 2 (or step 5) we won't return to the same point.

There might be a proof by induction that the Collatz is acyclic using this. It is above my mathematical skills to do so.

A. Programs

3.1. collatz.c

This program follows the optimized Collatz sequence and prints the factorizations of the next-higher values while both ascending and descending.

Usage: collatz <number>...

/* Collatz conjecture research.
   Perette Barella.  2018-03-22.
   Please ensure I'm credited if this leads anywhere. */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <assert.h>

void prime_factor (int v) {
        int n = 2;
        char multiply = ' ';
        for (int n = 2; v != 1; n += 1) {
                int count = 0;
                while (v % n == 0) {
                        count++;
                        v = v / n;
                }
                if (count) {
                        printf ("%c %2d^%-4d", multiply, n, count);
                        multiply = '*';
                }
        }
}


void collatz (int n) {
        printf ("%d\n", n);
        int c = n;
        const char *action = "start";
        while (1) {
                printf ("%9d: %9d = (%-5s) ", c, c+1, action);
                prime_factor (c + 1);
                printf ("\n");
                if (c == 1) break;
                if (c & 1) {
                        c = (c * 3 + 1) >> 1;
                        action = "inc";
                        if ((c + 1) % 3 == 0)
                                action = "Inc,3";
                        else
                                assert (!"This doesn't happen!");
                } else {
                        c = c >> 1;
                        action = "dec";
                        if ((c + 1) % 3 == 0)
                                action = "Dec,3";
                }
        }
}


int main (int argc, char **argv) {
        while (*++argv) {
                collatz (strtol (*argv, NULL, 10));
        }
        return 0;
}

3.2. coder.c

This program follows the optimized Collatz sequence. When ascending, for a given Collatz point c the factorization of c + 1 . When descending, the factorization of c is shown. For the factorizations, 2s, 3s are prime factored out, and the rest left as a non-prime factor (unless it just happens to be prime).

Usage: coder [-f] <number>...

The -f flag causes only peaks and troughs to be shown, skipping intermediate values in ascents and descents.

/* Collatz conjecture research.
   Perette Barella.  2018-03-22.
   Please ensure I'm credited if this leads anywhere. */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdbool.h>
#include <assert.h>


void factor (int v) {
        int n = 2;
        char multiply = ' ';
        for (int n = 2; v != 1; n += 1) {
                int count = 0;
                /* Only really interested in 2s, 3's, and "the rest". */
                if (n == 4)
                        n = v;
                while (v % n == 0) {
                        count++;
                        v = v / n;
                }
                if (count == 0) {
                        printf ("%c %2s %-4s", ' ', "", "");
                } else {
                        printf ("%c %2d^%-4d", multiply, n, count);
                        multiply = '*';
                }
        }
}



void print (const char *mode, int c, int r) {
        printf ("    %-5s %-10d %10d = ", mode, c, r);
        factor (r);
        printf ("\n");
}

int next_collatz (int n) {
        if (n & 1)
                return n * 3 + 1;
        return n >> 1;
}


void collatz (int n, bool filter_noise) {
        printf ("%d\n", n);
        int c = n;
        int r = n;
        bool show;
        while (r != 1) {
                assert (c == r);

                r++;
                print ("incr", c, r);
                while ((r & 1) == 0) {
                        c = next_collatz (next_collatz (c));
                        r += r >> 1;
                        assert (r - 1 == c);
                        if (!filter_noise)
                                print ("", c, r);
                }
                if (filter_noise)
                        print ("", c, r);
                r--;
                print ("decr", c, r);
                show = true;
                while ((r & 1) == 0) {
                        c = next_collatz (c);
                        r = r >> 1;
                        assert (r == c);
                        if (!filter_noise)
                                print ("", c, r);
                }
                if (filter_noise)
                        print ("", c, r);
        }
}


int main (int argc, char **argv) {
        bool filter_noise = false;
        if (argc > 1 && strcmp (argv [1], "-f") == 0) {
                filter_noise = true;
                argv++;
        }
        while (*++argv) {
                collatz (strtol (*argv, NULL, 10), filter_noise);
        }
        return 0;
}

Footnotes:

  • 1.   Quite a lucky first pick, too, occuring squarely in the middle of a sequence.
  • 2.   Prime factorizations can occur next to eachother containing the two subsets: say, 255:256 ([3,5,17]:[2^8]). However, the two prime factorizations need to be modified differently, making them unrelated.