On Data Integrity

Digital Signatures Provide Security

In high school I read 1984 by George Orwell. You may remember that in this novel Winston Smith was employed to burn old newspapers. This was done in order to help support the ruling parties changing interpretation of events, if not to outright change the historical record. George Orwell could not have known about the digital way that information is preserved today.

Imagine what a current day Winston Smith would look like. This hypothetical Winston Smith wouldn't really need to burn anything. Simply altering the stored copies of data could drastically change people's perceptions. Just last week a librarian though it was odd that I was looking up something that wasn't digitally available. I could be very sure that the copy on the library shelf was the copy that the author intended. It would be nice if as much or more certainty could be provided for digital documents.

In fact it is possible to have a high degree of certainty that data has not changed. A digital signature provides mechanism to determine if information has changed. Once a document is signed then changing the document will invalidate the signature. It's also very difficult for anyone but the signing party to forge a signature. More difficult, in fact, than forging a handwritten signature.

There are other ways of preserving the integrity of data. The paper of Satoshi Nakamoto is perhaps one of the best digitally preserved documents. Blockchair provides a whole page that monitors this paper as hosted on several sites. They advertise the hash of the file and periodically check that it hasn't changed. There is also a copy included on the bitcoin blockchain. Embedding the document in the chain ensures the data haven't changed however it could be the case that a false copy was buried long ago. Digital signatures provide the advantage that the signer can first do their diligence and be reasonably certain that they're singing the correct copy.

As I have high certainty that I am hosting the original version of Satoshi Nakamoto's paper, I have signed this document.

Use Linux to Check the Signature

If you don't have a linux computer you should get one. Most linux installs come with Gnu Privacy Guard gpg which is software that implements the PGP (Pretty Good Privacy) protocol. To check the signature we will need three files. You can right click the links to download them.

Download all three files and put them in the same directory. I made a fresh directory so that it's less cluttered. The command ls lists all the files


That's good, all three files are there. Now I import Darren's public key.

gpg-import-bitcoin-sig.sh (Source)

$gpg --import darren_tapp_public_key.pub

When you start typing the file name, most likely, you can press tab to auto-complete.


The picture above is the response to this command. I used a fresh gpg install and there was a warning about it being a fresh run. This warning is not serious. It's only important if we were expecting gpg to manage our trust. Finally, we can verify the signature.

gpg-verify-bitcoin-sig.sh (Source)

$gpg --verify bitcoin.sig bitcoin.pdf

Use the --verify flag and type the file with the signature and the file that we want to verify.


In this example we have verified a detached signature. That's when the signature is separate from the file. It's also possible to have a signature as part of the file or message.

Again the display shows a warning. If we informed gpg that we trusted my key that warning will go away.

10 Reasons Why You Should Install Linux

Number 9: Impress Your Friends and Colleagues

Time for a personal story. At Purdue my advisor said there was software that would compute what he asked me to calculate. I was able to look up the program that did the calculations and bang on the keyboard until it worked. I remembered that I ran the computer in run level 1 which didn't use a graphical interface. That way I was able to devote the whole computer to these calculations. These calculations could take days if they completed at all.

Number 8: Don't Waste Your Computer

It is generally true that the proprietary operating systems basically use more and more of your computer. Eventually they become so bloated that old hardware cannot be supported anymore. With linux it generally will run on anything. For extremely old hardware you might choose a version of linux that is less demanding.

Number 7: Tools Tools and More Tools

With linux you can really set up computers to do whatever computers do. Need a print server? Linux has you covered. Want to have a shared drive in your house? The tools to do that with linux are out there. These tools could be too costly otherwise.

Number 6: Set a Good Example

If you use linux, your children will naturally have questions about what you're doing. This can be a real opportunity to teach your children about computers. Being proficient at linux is a skill that jobs are based on. Also young people can be given a device where internet access has been restricted. I don't know what's the best age to introduce the internet to children, linux provides the option of providing computer expierence without the internet. That is assuming that your kids don't hack their own computer. If they do, then pivot that into a marketable skill.

Number 5: Generally Linux Doesn't Have Malware

Linux currently doesn't have a large market penetration. This means that it's not as an attractive target for hackers. That is, exploits of other operating systems might have more penetration and therefore a larger payoff.

Number 4: Linux Has a More Secure Architecture

Linux generally is operating with restrictive privileges. This means if a hacker does get control of a part of the computer there's still another step before the hacker can completely control the computer. This architecture can prevent malware from spreading. Since the source is open, anyone can inspect the code for vulnerabilities, and anyone can fix them. With a proprietary operating system the only people who can find vulnerabilities are paid to find them.

The command sudo is used to signal that expanded privileges are to be used.



Number 3: Herd Immunity

These two facts that there hackers have fewer targets and that those targets are hardened against attack provides something like herd immunity. Which is what happens when a sizable proportion of a population is vaccinated. Basically a virus cannot spread through a population when a high proportion of individuals are immune.

Number 2: Linux is Really Geared for Productivity

Since Linux is a product of the open source community all the tools that are needed for open source development are included with each install. Many programming languages that aren't included are just a command away to install. Most installs include Open Office which can replace other office products.

Number 1: You Will Actually Own Your Computer

By installing linux you will have full access to all of your computer. Open source software generally have very permissive terms of use. When you use a proprietary product the license terms generally very restrictive as to what you are allowed to do with your computer, and surveillance you must accept. When the source there are no restrictions.

If your computer updates, despite your protest, who really owns that computer?

Security Analysis of ChainLocks complete

In Satoshi Nakamoto's bitcoin paper there is section 11 with the title "Calculations". This section basically proves that bitcoin is very secure as long as at least half of the hashing power is behaving as expected.

Last year, the cryptocurrency Dash introduced a secondary consensus mechanism called ChainLocks. The technical specification of ChainLocks was explained in a document known as Dash Improvement Proposal 008 or DIP008. When this document was published I quickly went to calculate the security provided by the new specification. The large 400 node quorum means that the spread sheet I had for such calculations choked hard. So it was time to pull out the big guns, Python.

Within a half an hour I was maxing out a processor of my old laptop with calculations. I was happy that minutes later I had an answer, and the answer was affirmative. ChainLocks did provide security from purposefully malicious actors. I went back to other tasks now that I was convinced of the security of ChainLocks. Then I received a few questions about how to perform these calculations. So I shared my Python script. Thephez on github made this Python script much more efficient. It would run in seconds instead of minutes.

I then thought back to Satoshi's paper. It was really that Calculations section that was the convincing part. So in the open source spirit I formally wrote up the security analysis and submitted a pull request. Today, that pull request was merged. DIP008 now has a calculations section.

The calculations rely on binomial coefficients. As an example you can type "7 choose 5" in google and get 21. The number 7 choose 5 is written \[ _7 C_5 = {7 \choose 5} = 21\]

My main complaint is that GitHub's markdown does not out of the box render all mathematics. I used the notation \( _n C_r \) for \(n\) choose \(r\) because markdown of github supports that better. I generally think \( { n \choose r }\) is a more common notation.

My favorite quote is:

The attacker would have a less than one in 100 trillion chance of producing at least one malicious ChainLock in the next sextillion (10^21) years.

This is assuming that thirty percent or less of the Masternodes are controlled by an attacker.

The heart of the calculations is in the function pcalc below.

dip008functions.py (Source)

def binom(x, y):
        binom = fac(x) // fac(y) // fac(x - y)
    except ValueError:
        binom = 0
    return binom

###This function takes inputs and outputs the probability
#of success in one trial
#pcalc is short for probability calculation
def pcalc(masternodes,quorumsize,attacksuccess,Byznodes):
    SampleSpace = binom(masternodes,quorumsize)
    for x in range(attacksuccess, quorumsize+1):
        pctemp = pctemp + binom(Byznodes,x)*binom(masternodes-Byznodes,quorumsize-x)
    #at this juncture the answer is pctemp/SampleSpace
    #but that will produce an overflow error.  We use logarithms to
    #calculate this value
    return 10 ** (log(pctemp,10)- log(SampleSpace,10))