How Prime Factorizations govern the Collatz Conjecture

By applying some optimizations to the Collatz Conjecture, we can reduce noise that obscures the patterns that produce the sequences. It becomes apparent that the sequences generated are related to prime factorizations: that of the number itself for decending, whereby all 2's are removed; and that of the next higher even number for ascending, whereby twos and threes are traded off.

Herein, I'll lay out the math, and show the interesting things it tells us about the Conjecture.

Contents

1. Get Rid of Noise

The Collatz is iterative. The usual rule to get the next value is, for any positive integer c :

Collatz ( c ) = { 3 c + 1 if c is odd c 2 if c is even

The conjecture is that if we repeat this enough times, we will always arrive at 1.

An odd c always produces an even number, so subsequently we must always execute the "even" half of the equation at least once. So, as any good engineer would do, let's optimize by getting rid of the unnecessary iteration and the noise it generates:

Collatz ( c ) = { 3 c + 1 2 if c is odd c 2 if c is even

This means we always end up with a fractional half, before adding ½. (An odd times an odd is always odd, so the division results in the fractional ½). We can get rid of this by using c + 1 as the input and making the appropriate adjustments:

odd(c) = 3 2 c + 1 2 = ( 3 2 ) · ( c + 1 ) + 1 2 3 2 = 3 2 ( c + 1 ) 1

Note that the last term is -1 . If the successive iteration is also odd, the first step of which is to add 1, these effectively cancel out.

odd(odd(c)) = 3 2 ( 3 2 ( c + 1 ) 1 + 1 ) 1 = 3 2 ( 3 2 ( c + 1 ) ) 1 = ( 3 2 ) 2 · ( c + 1 ) 1

That is, we can redefine "odd" algorithmically:

  1. Add 1.
  2. While this value is even, divide by 2 and multiply by 3.
  3. Subtract 1.

Or, to write it mathematically, if we knew how many times to repeat n:

odd(c) = ( 3 2 ) n · ( c + 1 ) 1
So how do we know the value of n?
  • c is odd → c + 1 is even, and thus its prime factorization contains at least one 2.
  • c + 1 is even → 3 2 ( c + 1 ) is whole.
  • Looking at the prime factorization if c + 1 , multiplying by 3 2 in effect replaces a 2 with a 3.
  • ( c + 1 ) mod 4 = 0 3 2 ( c + 1 ) is even, since there were two 2s in the prime factorization of c + 1 , leaving one remaining.
  • ( c + 1 ) mod 8 = 0 3 2 ( c + 1 ) mod 4 = 0 .
  • ( c + 1 ) mod 16 = 0 3 2 ( c + 1 ) mod 8 = 0 .
  • ...and so forth.

So, generally, n = # of 2s in prime factorization of c + 1 .

1.1. So what's this telling us

  • For any c the number c + 1 governs whether it can ascend, and how far. I will call this the "control number".
  • An ascending Collatz series is related by the prime factorizations of the corresponding control numbers. These prime factorizations will all be the same overall size, and will contain the same set of numbers, with only the numbers of 2s and 3s varying.
  • The combined number of 2s and 3s in the prime factorization tells us the length of this "ascending chain".
  • The numbers of 2s in the prime factorization tells us how many links are left to reach the upper end of the chain. If there are no 2s, we are at the upper bound of the chain.
  • If there are neither 2s nor 3s, the number is not part of a chain.
  • Reminder: these represent the control number. The corresponding Collatz points are 1 less.

1.2. Example

For clarity, let's review some examples:

15
Collatz     Control         Ctrl factors
  15:       16 = (start)    2^4   
  23:       24 = (Inc,3)    2^3   *  3^1   
  35:       36 = (Inc,3)    2^2   *  3^2   
  53:       54 = (Inc,3)    2^1   *  3^3   
  80:       81 = (Inc,3)    3^4   
  40:       41 = (dec  )   41^1   
  20:       21 = (Dec,3)    3^1   *  7^1   
  10:       11 = (dec  )   11^1   
   5:        6 = (Dec,3)    2^1   *  3^1   
   8:        9 = (Inc,3)    3^2   
   4:        5 = (dec  )    5^1   
   2:        3 = (Dec,3)    3^1   
   1:        2 = (dec  )    2^1   
With the "standard" Collatz, the sequence from 15 would be 15, 46, 23, 70, 35, 106, 53, 160, 80, 40, 20, 10, etc. Since we debounced it, we eliminate the unnecessary and confusing waypoints, leaving us with 15, 46, 23, 70, 35, 106, 53, 160, 80 in the first ascending sequence. If we look at the prime factorizations of the next highers numbers—16, 24, 36, 54, 81—we see they are identical, except for the gradual replacing of 2s by 3s.

Lest you think this is cherry-picked data, let's look at some output from a random number 5-digit number I picked. In this variation, we won't do a full prime factorization, instead just showing the 2s, 3s, and whatever is leftover. To see some long chains, look for 335, 319, and 911.

17835
     Collatz   Control
incr  17835     17836 = 2^2        * 4459^1   
      26753     26754 = 2^1 *  3^1 * 4459^1   
      40130     40131 =        3^2 * 4459^1   
decr  40130     40130 = 2^1        * 20065^1   
      20065     20065 =              20065^1   
incr  20065     20066 = 2^1        * 10033^1   
      30098     30099 =        3^1 * 10033^1   
decr  30098     30098 = 2^1        * 15049^1   
      15049     15049 =              15049^1   
incr  15049     15050 = 2^1        * 7525^1   
      22574     22575 =        3^1 * 7525^1   
decr  22574     22574 = 2^1        * 11287^1   
      11287     11287 =              11287^1   
incr  11287     11288 = 2^3        * 1411^1   
      16931     16932 = 2^2 *  3^1 * 1411^1   
      25397     25398 = 2^1 *  3^2 * 1411^1   
      38096     38097 =        3^3 * 1411^1   
decr  38096     38096 = 2^4        * 2381^1   
      19048     19048 = 2^3        * 2381^1   
      9524       9524 = 2^2        * 2381^1   
      4762       4762 = 2^1        * 2381^1   
      2381       2381 =              2381^1   
incr  2381       2382 = 2^1 *  3^1 * 397^1   
      3572       3573 =        3^2 * 397^1   
decr  3572       3572 = 2^2        * 893^1   
      1786       1786 = 2^1        * 893^1   
      893         893 =              893^1   
incr  893         894 = 2^1 *  3^1 * 149^1   
      1340       1341 =        3^2 * 149^1   
decr  1340       1340 = 2^2        * 335^1   
      670         670 = 2^1        * 335^1   
      335         335 =              335^1   
incr  335         336 = 2^4 *  3^1 *  7^1   
      503         504 = 2^3 *  3^2 *  7^1   
      755         756 = 2^2 *  3^3 *  7^1   
      1133       1134 = 2^1 *  3^4 *  7^1   
      1700       1701 =        3^5 *  7^1   
decr  1700       1700 = 2^2        * 425^1   
      850         850 = 2^1        * 425^1   
      425         425 =              425^1   
incr  425         426 = 2^1 *  3^1 * 71^1   
      638         639 =        3^2 * 71^1   
decr  638         638 = 2^1        * 319^1   
      319         319 =              319^1   
incr  319         320 = 2^6        *  5^1   
      479         480 = 2^5 *  3^1 *  5^1   
      719         720 = 2^4 *  3^2 *  5^1   
      1079       1080 = 2^3 *  3^3 *  5^1   
      1619       1620 = 2^2 *  3^4 *  5^1   
      2429       2430 = 2^1 *  3^5 *  5^1   
      3644       3645 =        3^6 *  5^1   
decr  3644       3644 = 2^2        * 911^1   
      1822       1822 = 2^1        * 911^1   
      911         911 =              911^1   
incr  911         912 = 2^4 *  3^1 * 19^1   
      1367       1368 = 2^3 *  3^2 * 19^1   
      2051       2052 = 2^2 *  3^3 * 19^1   
      3077       3078 = 2^1 *  3^4 * 19^1   
      4616       4617 =        3^5 * 19^1   
decr  4616       4616 = 2^3        * 577^1   
      2308       2308 = 2^2        * 577^1   
      1154       1154 = 2^1        * 577^1   
      577         577 =              577^1   
incr  577         578 = 2^1        * 289^1   
      866         867 =        3^1 * 289^1   
decr  866         866 = 2^1        * 433^1   
      433         433 =              433^1   
incr  433         434 = 2^1        * 217^1   
      650         651 =        3^1 * 217^1   
decr  650         650 = 2^1        * 325^1   
      325         325 =              325^1   
incr  325         326 = 2^1        * 163^1   
      488         489 =        3^1 * 163^1   
decr  488         488 = 2^3        * 61^1   
      244         244 = 2^2        * 61^1   
      122         122 = 2^1        * 61^1   
      61           61 =              61^1   
incr  61           62 = 2^1        * 31^1   
      92           93 =        3^1 * 31^1   
decr  92           92 = 2^2        * 23^1   
      46           46 = 2^1        * 23^1   
      23           23 =              23^1   
incr  23           24 = 2^3 *  3^1 
      35           36 = 2^2 *  3^2 
      53           54 = 2^1 *  3^3 
      80           81 =        3^4 
decr  80           80 = 2^4        *  5^1   
      40           40 = 2^3        *  5^1   
      20           20 = 2^2        *  5^1   
      10           10 = 2^1        *  5^1   
      5             5 =               5^1   
incr  5             6 = 2^1 *  3^1 
      8             9 =        3^2 
decr  8             8 = 2^3 
      4             4 = 2^2 
      2             2 = 2^1 
      1             1 =

1.3. Looking at this in reverse

I said earlier that the length of an ascending chain is equal to the combined 2s and 3s in the factorization of the control number, and that the number of 2s indicates how many steps remain to ascend.

Conversely, the 3s represent the number of prior (i.e. lower) links in the chain.

Any number c of course "comes from" 2 c . Until now, however, there was no way to tell if c also "comes from" ( 2 c 1 ) ÷ 3 except by trial-and-error: that portion of the Collatz only went forward.

By looking at the factorization of control numbers c + 1 , it is evident whether there are earlier links, and how many of them, from which it is possible to "come from":

  • If the factorization of c + 1 contains no threes, then c is not arrived at from a lower number.
  • If the factorization does contain threes, then the number of threes indicates the number of lower values which ascend to this one.
  • You can find these lower control numbers by changing 3s in the factorization to twos and multiplying out; subtract 1 for the corresponding Collatz points.

So, again picking a number at random: 18935. Our "control" number will thus be 189361 , whose factorization is 2 3 · 3 2 · 263 .

Changing the 3s to 2s, we have 2 5 · 263 = 8416 , and changing the 2s to 3s, we have 3 5 · 263 = 63909 . This ascension chain will be length 5, starting at Collatz value 8415 and ending at 63908.

8415
     8415:   8416 = (start)  2^5 * 263^1   
    12623:  12624 = (Inc,3)  2^4 *  3^1 * 263^1   
    18935:  18936 = (Inc,3)  2^3 *  3^2 * 263^1   
    28403:  28404 = (Inc,3)  2^2 *  3^3 * 263^1   
    42605:  42606 = (Inc,3)  2^1 *  3^4 * 263^1   
    63908:  63909 = (Inc,3)  3^5 * 263^1   

Programs

2.1. collatz.c

This program follows the optimized Collatz sequence and prints the factorizations of the next-higher values while both ascending and descending.

Usage: collatz <number>...

/* Collatz conjecture research.
   Perette Barella.  2018-03-22.
   Please ensure I'm credited if this leads anywhere. */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <assert.h>

void prime_factor (int v) {
        int n = 2;
        char multiply = ' ';
        for (int n = 2; v != 1; n += 1) {
                int count = 0;
                while (v % n == 0) {
                        count++;
                        v = v / n;
                }
                if (count) {
                        printf ("%c %2d^%-4d", multiply, n, count);
                        multiply = '*';
                }
        }
}


void collatz (int n) {
        printf ("%d\n", n);
        int c = n;
        const char *action = "start";
        while (1) {
                printf ("%9d: %9d = (%-5s) ", c, c+1, action);
                prime_factor (c + 1);
                printf ("\n");
                if (c == 1) break;
                if (c & 1) {
                        c = (c * 3 + 1) >> 1;
                        action = "inc";
                        if ((c + 1) % 3 == 0)
                                action = "Inc,3";
                        else
                                assert (!"This doesn't happen!");
                } else {
                        c = c >> 1;
                        action = "dec";
                        if ((c + 1) % 3 == 0)
                                action = "Dec,3";
                }
        }
}


int main (int argc, char **argv) {
        while (*++argv) {
                collatz (strtol (*argv, NULL, 10));
        }
        return 0;
}

2.2. coder.c

This program follows the optimized Collatz sequence. When ascending, for a given Collatz point c the factorization of c + 1 . When descending, the factorization of c is shown. For the factorizations, 2s, 3s are prime factored out, and the rest left as a non-prime factor (unless it just happens to be prime).

Usage: coder [-f] <number>...

The -f flag causes only peaks and troughs to be shown, skipping intermediate values in ascents and descents.

/* Collatz conjecture research.
   Perette Barella.  2018-03-22.
   Please ensure I'm credited if this leads anywhere. */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdbool.h>
#include <assert.h>


void factor (int v) {
        int n = 2;
        char multiply = ' ';
        for (int n = 2; v != 1; n += 1) {
                int count = 0;
                /* Only really interested in 2s, 3's, and "the rest". */
                if (n == 4)
                        n = v;
                while (v % n == 0) {
                        count++;
                        v = v / n;
                }
                if (count == 0) {
                        printf ("%c %2s %-4s", ' ', "", "");
                } else {
                        printf ("%c %2d^%-4d", multiply, n, count);
                        multiply = '*';
                }
        }
}



void print (const char *mode, int c, int r) {
        printf ("    %-5s %-10d %10d = ", mode, c, r);
        factor (r);
        printf ("\n");
}

int next_collatz (int n) {
        if (n & 1)
                return n * 3 + 1;
        return n >> 1;
}


void collatz (int n, bool filter_noise) {
        printf ("%d\n", n);
        int c = n;
        int r = n;
        bool show;
        while (r != 1) {
                assert (c == r);

                r++;
                print ("incr", c, r);
                while ((r & 1) == 0) {
                        c = next_collatz (next_collatz (c));
                        r += r >> 1;
                        assert (r - 1 == c);
                        if (!filter_noise)
                                print ("", c, r);
                }
                if (filter_noise)
                        print ("", c, r);
                r--;
                print ("decr", c, r);
                show = true;
                while ((r & 1) == 0) {
                        c = next_collatz (c);
                        r = r >> 1;
                        assert (r == c);
                        if (!filter_noise)
                                print ("", c, r);
                }
                if (filter_noise)
                        print ("", c, r);
        }
}


int main (int argc, char **argv) {
        bool filter_noise = false;
        if (argc > 1 && strcmp (argv [1], "-f") == 0) {
                filter_noise = true;
                argv++;
        }
        while (*++argv) {
                collatz (strtol (*argv, NULL, 10), filter_noise);
        }
        return 0;
}

Footnotes:

  • 1.   Quite a lucky first pick, too, occuring squarely in the middle of a sequence.