Wednesday, October 6, 2010

Who do we trust

In the old days of the internet (10-15 years  ago) there used to be about 10-20 trusted root ca's installed on my operating system. On my windows-xp machine at work i have hundreds.

For those of you who don't know what i'm talking about, here's a basic intro:

Root Certificate Authorities

When you connect to certain websites you will see a "Secure lock" icon appear in your browser in an address that starts with https.
What this is supposed to mean is that the connection is "secure".
This security is provided by a number of protocols, most importantly each website has a certificate that says "the company X owns this address".
Now the problem is that you can't just take someone at their word - you need some trusted third party that verifies that this person is who they are.

This is a similar idea to personal id- your government serves as a trusted authority that provides a certificate (id) that verifies someone is who they say they are.
In the world of the internet, this authority is called a Certificate Authority (CA).

The difference between the government and the internet in this case is that we implicitly assume that the government is a trusted issuer of id's.
In the world of the internet, there is no accepted, trusted authority that we can count on to produce these id's. Several commercial organizations then took this role and has been accepted as "trust worthy".
This makes them what is known as Trusted Root Certificate Authority.
What this means is that certain organizations were accepted as trusted and are allowed to ascertain  the identity of others.
The root ca is responsible for the validity of the certificates it provides and holds the power to revoke them if they are misused or stolen.

The problem is that once a certificate authority is accepted as root, it holds tremendous power for abuse.
So when you find yourself with hundreds of them installed on your machine - then something is very wrong.
What could a rouge root certificate authority do with it's power?

First of all, a rouge root ca is in a sense unstoppable - once a certificate is accepted as trusted on your machines local certificate store there is no "higher authority" that can revoke the certificate.

If someone gains control of a root certificate authority they can use it to fake the identity of anyone.
This opens up all "secure" traffic to an undetectable man-in-the-middle attack.

Main In The Middle

in  a man-in-the-middle attack works something like this:
say that you and i talk to each other on the phone.
we both assume that you are listening to me when i say something, and that when you speak i am hearing you talk.
now let's assume we've never met and neither of us knows what the other one sounds like.
I call you and i assume you answer, but in fact some other person is on the other end and he is on a separate call with you.
They repeat most of what i am saying to you, and most of what you are saying to me.
But since they are in the middle of the line, they can change what is being said.
Neither one of us is even aware that something is wrong...

This is the essence of a man in the middle attack. On the internet, the equivalent to you and me knowing what we sound like is the certificates given to us by the root certificate authority.

What could someone do if they were able to impersonate a root ca?
They could monitor any secure channel you have - you email. your Facebook account, basically anything you log into. They could also act on your behalf and you would not even suspect something is wrong, no software would alert you, no anti virus.


My intention is not to alarm anyone about someone reading your emails because of compromised root ca, but to point to the fact that the more root ca's are installed by default on our operating system, the grater the chance that one of them is compromised.
And that my computer has root ca's installed by certificate authorities around the world, many of which i do not consider the least bit trustworthy.

Saturday, May 22, 2010

Introduction To Programming - What is Code?

In the most technical sense computer code (or source code) is a series of instructions that can be translated through a process that varies from trivial to complicated* into machine "language".

While technically correct, the previous definition is pretty boring. It is also very far from what actual code is like. It certainly doesn't help us understand what it's for and why it works the way it works - which I think are the more interesting questions.

Before I try to talk more about what real world code is like and some more interesting technical aspects, I would like to say that there are many different programming languages and there are many things that can be considered computer code, from the structural language called HTML that describes the page you are reading to the assembly language that is used to write processor instructions.

It's hard to cover the huge variety of what can be considered "code" in a single, broad definition so instead of doing that I will try to explain how code is used in some of the common programming languages and the why's, how's and what's of this code.

Why do we need it?

Computer code is a tool we use to describe ideas, concepts and processes in a kind of functional way that can be converted into machine language.

We need this tool because there is a dividing line between what can be easily and efficiently done in hardware - the physical components of a computer, primarily in the processor - and what can be efficiently done in software. Code begins pretty much where the hardware leaves off.

Processors are good at things like adding one number to another number, moving bits from one place to another, multiplying, subtracting and going through these instructions in sequence. They are also good at performing logical decisions like "if this number is a zero move to the next instruction, if it's not zero jump 5 instructions"

Machine language is these basic instructions, there is a language called Assembly which exists directly "above" machine code. It allows giving instructions in a way that translates almost immediately into these instructions. It is rarely used for complicated things and is considered one of the toughest and most respected specializations in computer programming. Its proximity to the actual hardware gives the programmer the most control of any language over what goes on in the processor. It can allow for some of the most efficient code that can be written.

This control comes at a cost - expressing complex ideas is very hard, writing even the simplest programs requires a lot of thought and training and even with training reading Assembly is a tedious job. A good knowledge of Assembly is one of the essential tools for hackers.

Assembly is rarely used in software development. languages like Java, c and c++ are far more popular as they allow describing more complicated structures and expressing complicated ideas in simpler ways then assembly.

A 2 lines of code in java, like this one counting to 100 and printing each number:

for (int i=0; i<100; i++)
    system.out.println("i = "+i);

Will translate into hundreds of instructions in assembly.

When programmers speak of computer code, they rarely talk about Assembly.
The thing is, using other more "high level" languages like Java or c++ requires some kind of translation into this computer code as this is the only thing that can run on the physical hardware. This is why a process called "compilation" was introduces, which translates higher level languages into machine code. This abstraction allows us to create far more complicated software in exchange for giving up control of the specific instructions given to the hardware.

Higher and more specialized languages exist that are even further away from the processor.
The common thing to all programming languages is that they exchange some level of control over what really happens on the machine for clarity and ease of use.

The building blocks

We discussed the why, so now let's talk about what code is made out of.

The goal of any higher level programming language is to provide the syntax or "semantics" to express complicated ideas in a coherent, manageable and extendable way.

There are some concepts that are shared between most languages, these are:
variables, conditionals, loops, procedures (or methods/functions) and mathematical operations. Another important common concept is the Comment. Slightly less common are classes.

To better understand what code is let's look at the building blocks.

Variables - variables are a way of giving a name to some of the computer's memory in a way we can easily use in the code.
For example, we can define:

int myVariable = 100;

now we can refer to "myVariable" in other places in the code, we could change its value, add to it, subtract from it or compare it to other values.

Conditionals - conditionals are a way of asking for one thing or another to happen, depending on some logical condition.
For example this rule:

if (myVariable<100)
    system.out.println("My variable is less than 100");
    system.out.println("My variable is equal to or larger than 100");

States that the line "My variable is less than 100" should be printed if the value within "myVariable" is less than 100, and the line "My variable is equal to or larger than 100" otherwise.

Loops - loops are a tool we use when we want to perform repeated operations. One of the useful properties of computers is that they can do certain things many times and very fast. Loops are a way to facilitate this property. A loop generally consists of one or more instructions we want to perform and some condition that will make the loop stop.
For example:

for (int i=0; i<100; i++)
    system.out.println("i = "+i);

This code says something like this:
"For as long as the variable "i" - which starts out at zero - is smaller then 100, increase "i" and perform the next instruction"
put another way - do the next line 100 times.
This next line says write to the screen the text "i = " followed by the value of i.
the output would look like :
i = 1
i = 2
...(and so on)

Procedures/Functions/Methods - Procedures are one of the mechanisms that allow us to create new tools and break apart complicated logic into smaller, reusable and more manageable parts.

It is an essential mechanism when we want to collaborate with other programmers and logically separate different parts of our program.

One example of a function is the "system.out.println("some text");" that i was using in other examples.
In Java, this call hides all the complicated logic required to tell the computer to print out the text into a very simple, easy to use line of code.

Because someone else did the work of writing this method for us we don't need to worry about how exactly it was done or how to do it - we only need to know what it's supposed to do, and how to use it - which is far simpler then writing it in the first place.

The Comment - a comment is just as it sounds, commenting about the code that was written.
Code files are no different than other text files - that is, they are simply text. The compiler takes our text and converts it into machine code. By marking certain lines as comments we can tell the compiler to ignore these lines. This allows us to communicate with the reader and provide more insight into the code.

Comments are a vital component of programming languages and are one of the few tools that are dedicated to communicating our intentions to other programmers.

A comment in code looks very much like other text, but it contains a marker that tells the compiler to ignore
//This is a comment, it can contain any text we want and many special characters and we can use it
//To help other programmers understand the code we are writing
//For example
//The following code computes 2 * 2 and sets the result in myVariable
//Assign the value 2 to myVariable
int myVariable = 2;
//Multiply myVariable by itself, and set it to myVariable
myVariable = myVariable*myVariable;

Classes - These are more complicated elements and harder to explain without more technical background. In general terms classes are a way of creating useful metaphors and using them in code. They are the foundation of what is called "Object Oriented Programming" which is the ruling paradigm in software development for many years.

What is code?

Computer code and its components serve a dual function:
The first is to help us to describe our ideas in terms that allow us and other programmers to read understand the ideas behind the code - the purpose of what we want to achieve with the code.
The second is that the code must translate into machine code that can perform the function for which it was written.

This is duality is important to understand - all working code essentially complies with the second requirement, it will run and perform its function. The first and more important requirement - to provide a coherent picture of what the writer of the code intended is an often overlooked** and vitally important role of code. This requirement to be clear and coherent is central to the job of a programmer. It is the hardest to achieve is rarely taught.

The compiler imposes certain rules, for example it requires that lines be terminated with a semicolon. It requires that a parenthesis that opened must also be closed. It also requires the user of specific "keywords" to be used to express specific ideas. A conditional must always take a certain form (as demonstrated above). Many compilers are "case sensitive", this means that MyVariable is a different variable then myVariable simply because one uses a different capitalization on the "m".

Despite all of these constraints, the compiler doesn't particularly care about the "shape" of things or about the names of things. that is - it doesn't care what names you use, as long as you follow the rules of syntax.
In this sense a compiler is like a strangely strict teacher that will accept any answer as long as it is spelled and punctuated correctly.

The ability to name things, comment on them and use more coherent and reusable structures in code exists so that programmers can impart meaning to the code that would not exist otherwise. This meaning exists only in the minds of the programmers who read the code - the computer does not "understand" the intentions of the programmer. It does not even "see" the original code as written by the programmer.

The vast majority of the lines of code written and much of the effort in writing them is dedicated conveying an idea. Of course, it's important that they actually perform the function for which they were written, but it's at least just as important and often far more important that they do so in a way that describes ideas simply, coherently and on some rare occasions - beautifully.

* - The complexity of the translation depends primarily on the language, compiler and hardware involved.

** - When I was in high school my teacher would often complain that my code was completely incoherent. I would argue that it works. Only after trying in my second year of high school to re-read some code I wrote in the previous year that I really understood why writing well formatted, well commented and coherent code is so vitally important. Writing incoherent code means that to use it or change it you must first spend a long amount of time understanding it.

Monday, May 17, 2010

Introduction to programming - What is programming?

I've been wanting to write something like this for a long time.
My goal is to try and explain computer programming and more general things about computers in a way that i hope will be interesting even to those who are not particularly interested in learning programming.

So, Let's get started.

What is computer programming?

Well, generally speaking it's about making computers doing whatever you want them to.
This is not to say that I or any other programmer CAN make them do anything we want, actually we work under pretty limited constraints. It is by using these rules, constraints and limitations and manipulating them that the richness of software you see around you can be achieved.

In a way, practically any interaction with a computer system is about understanding, testing and manipulating it's capabilities and its bounds. Every interaction with a computer system carries within it the essence of programming.

The basic job of a programmer is to use their understanding of the internal workings of computers and provide a more natural and intuitive interface so that other people can take their programs and make some use of them.

It's easier to use an example, so let's talk about MS Paint. I am sure pretty much everyone saw or used MS Paint or something that resembles it at some point.
Let's try and take a look at the basics of MS Paint through the eyes of a developer.
We'll be working under an operating system (i.e. windows).

To a programmer an operating system is an environment that provides many many services.
In the same way that when someone wants to draw simple drawings they can open paint, choose a color and click their mouse, programmers receive tools and services from the operating system.
We don't have to worry about controlling the movement of the mouse cursor, controlling the various complicated mechanisms required to draw something on the screen, managing the memory, hard disk or any one of the many components needed to make computers work.

So what do we have to do?
Well, the operating system, along with our programming environment already cover many of the things we want to do. they will provide us with a window to work in, buttons to click and they will let us know when the user clicks the mouse inside our window, they will also tell us where he clicked it, and which button he clicked.
We also get something that can be thought of as a canvas that we can draw things on. To draw something we change the color of a pixel (a pixel is the smallest possible dot we can draw on the screen) to whatever color we want.

A simple recipe for a program that lets you draw with the mouse might look something like this:
1. draw a window 200x200 pixels (there is literally something called a Window or a Form and you can set its size)
2. when the mouse is clicked, change the pixel in the same position as the mouse to the color Black. (you can "ask" the window to let you know whenever someone clicks on it, and where it happened - then you can tell the window to change the color of the pixel at that point)

Doesn't look all that complicated right? that's mostly because it's really not all that complicated. it's not so far from opening paint and clicking the mouse.
Of course, I am oversimplifying certain aspects of what needs to be done - and there are languages and operating systems where doing what I just described is far more complicated.

But it's like saying that it's easier to dig a hole with a shovel then it is with your bare hands - True, but not very interesting :)

In its essence, computer programming is about creating tools for other people. I doubt if any of the people who wrote MS Paint ever really used it, personally i rarely use (other than during development) any of the software I write. What we did in our little programming thought exercise was to take an idea - drawing pictures with your mouse, and used our imagination (albeit limited) and the tools at our disposal, and created something for someone else to draw with.

Now that we know how to manipulate the pixels by changing their color, we don't need something like a mouse if we want to draw things on the screen, but by finding a simple interface other people can easily use we created a simple tool.

So what is computer programming? For me it is about taking our idea's, knowledge and understanding of what computers are and how they work and using it to create tools for (mostly) others to use.
Most professional software developers writing code today do exactly this - they spend their days thinking and writing tools and solving problems so that others can become more productive.

I'd be happy to get your thoughts/questions/comments.