Bittorrent vs. Broadband providers: A proposal for peace

April 18, 2008

It’s time stop spewing rhetoric and get real

I’m tired of reading the endless propaganda put forth by telecoms about “bandwidth hogs”. The issue of fair allocation of limited bandwidth needs to be dealt with.

another satisfied customer

The latest story in this sad saga is Bell deciding to throttle its customers. Bell says 5% of users account for a third of the traffic, resulting in poor performance for the other 95%. The bell case differs from the widely publicized Comcast case in that Bell is applying their traffic shaping policy to business customers as well. Many other ISPs are actively reducing speeds of Bittorrent traffic. Bittorrent users have responded by evading the filters. Providers will just tweak their filters to compensate.

The telecoms have some fair gripes, not that they ever talk honestly about them in public. Read on to learn the actual issues.

As utilization approaches 100%, throughput approaches zero

A 10 megabit pipe cannot deliver 1 megabit of traffic to 10 customers at the same time. ISPs incur significant cost overhead because of this immutable property of computer networking. Once you factor in all the segments traffic needs to pass through in a city wide network, it can take 10 megabit of raw bandwidth to give you 5 megabit of actual throughput.

ISPs cannot survive without overselling available bandwidth

There is not enough bandwidth to guarantee every user a dedicated amount. ISPs rely on the fact that not everyone uses shared resources at the same time, just like banks rely on the fact that not everyone shows up to get their money at the same time.

It’s not worth it for an ISP to sell bandwidth for $100 per month to a single customer that it could sell for $500 per month to five customers. Their best customers are the ones who barely turn their computers on.

Network usage varies over time

All networks have high and low usage times. There is a pattern to it. The average office network is more active during office hours and less active at night and on weekends. The average residential ISP is more active when people are home and less active when they are at work. There is more traffic during the day than in the middle of the night for the simple reason that most people sleep at night.

Windows update schedules installs at 3am for good reason. There is a lot of traffic on Patch Tuesday.

Remember when dial-up ISPs gave you a certain amount of peak hours and unlimited off-peak hours? Just because we’re not on dial-up anymore hasn’t made this problem go away.

Each segment is a possible bottleneck

Everyone talks as if the internet literally has three tiers: The Backbone, The ISP, and The User. In a way this is true, but most end users misunderstand this simplification. The architecture of your ISP looks something like this:

Customers in a large building or a residential neighborhood are connected to a switch somewhere. An uplink connection of some kind connects it to a regional switch. The regional switch is connected to the ISPs backbone network, which is connected to the internet.

Each switch or router can handle a limited number of connections. If too much load is placed on a device, all customers downstream of that device will suffer. The unacceptable performance of Shaw in my previous post is for this exact reason. I’ve experienced this problem at my home (since corrected) with Novus. It’s a common problem because an ISP would have to be out of its mind to refuse a new customer because the switch he is connected to needs to be upgraded. As with most things in life, cheap equipment generally handle less traffic than expensive equipment.

What telecoms don’t tell you: Not all routes cost the same money

Telecoms don’t like to discuss how they are connected to the internet. They’ll be offended if you even ask. They’ll only tell you that they’re connected straight to the backbone, with bigger pipe the competition, and bigger than your insignificant needs are for sure. But the truth is there is no central internet “backbone” because nobody owns the internet. There are two ways to get on the net: Transit (buying your way) and peering.

Transit is something everyone is familiar with. You pay a company that is already connected to the internet to connect you. They agree to carry all traffic you send them and deliver to you all traffic sent to you from the rest of the internet. In such a relationship your traffic is not worthy of trading, so you pay the other party to carry it.

If you had a lot of users you would have the option of peering with other networks instead of paying them. Peering is an agreement which two networks agree to directly exchange traffic because they see value in the other. It’s usually done in order to save costs. Peering agreements are arbitrary contracts. They could, for example, require the party which sent more traffic than it received to pay for that traffic. Or maybe the peers don’t bother billing each other at all. Essentially peering is an agreement to trade traffic.

Big ISPs like Comcast and Bell definitely engage in peering.

Obviously peering is a good thing. But there is one crucial limit to peering. Your peers don’t allow you to send them traffic for any destination. You may only send them traffic for destinations they advertise. And of course, they’re only going to advertise their own network. So if you are peered with say, Microsoft, they’re not going to carry your traffic to Google for you. To send traffic to Google you have to peer with them directory, or send it to one of your transit providers.

What does this have to do with bandwidth hogs? Simple. Bittorrent clients often have choices. It could get data from many different hosts. And right now it has no information on peering agreements. If it could tell that talking to one of those hosts is free, it could choose it over hosts that cost the ISP money.

Conclusion

Telecoms can do much better than stupid, draconian solutions. But it requires them to be honest about what they sell.


OpenBSD gets layer 7 load balancing

September 16, 2007

Just when I thought it was impossible to make a good HTTP load balancer with OpenBSD I noticed hoststated.

This avoids the problem of losing the client IP address by giving you a mechanism for manipulating HTTP headers. Now we won’t be needing more expensive solutions.


Reliable web business

September 13, 2007

I have worked for 10 years in companies that live or die by the uptime and reliability of their web sites. In that time a lot of things have changed. The tools for building feature rich web applications are better than ever. Frameworks such as Rails and Catalyst make web development much, much less painful than it once was. The Model/View/Controller design used by these excellent frameworks has even influenced traditional client side GUI application development practices, as anyone who has used Windows Presentation Foundation can see.

Despite better tools, web startups run into performance trouble. With a single machine running everything, highly dynamic sites can’t handle many users. Adding a second machine dedicated to the database gives a little relief, but not much. Over a period of a few months the server goes from locking up every other week, to at least each week, to every few days, until finally it’s hard to go 24 hours without downtime. Owners should have taken action, but usually squirm a bit like a lobster in a slowly heating pot. They keep rebooting, maybe trying to go with a faster server, or more memory. Many never realize how dramatically performance problems reduce their chance of big success.

Downtime kills web business. If you are down, or even slow, those who would have become your customers will go elsewhere. They will find satisfaction at a competitor and they won’t be back.

If this sounds like your current situation, you need to do something about it before it’s too late.

If you can get your hands on faster hardware within 48 hours, do it. Don’t wait. You must act immediately to stop the bleeding.

The next step is to put a solution in place enables redundancy and capacity planning. This not only allows you to keep sleeping next time your server crashes at 4am, it also makes it easy to predict when you will need more power so you can order new servers well in advance instead of settling for whatever is available right now.

This solution is supported by 3 components: Centralized Storage, Load Balancing, and Monitoring.

Load Balancing

Load balancing spreads incoming requests over multiple servers. I have used several different methods. Each method has advantages and drawbacks.

Round Robin DNS

When a browser goes to your domain, it asks its name server to fetch the IP address of your domain. It then connects. It is possible to have more than one IP address for a name. In such cases, the browser will connect to one of them at random.

Reverse NAT

It is possible to use reverse NAT such as that found in pf to redirect incoming connections to a pool of web servers. Using OpenBSD it is easy to setup a redundant pair of systems. The drawback to this layer 3 method is that your web servers can’t see the true IP address of the client. Everything comes from the reverse NAT device, so IP address based ACLs and GeoIP features are broken.

Reverse Proxy

The reverse proxy is an excellent solution because it works at the application (HTTP) layer. When the reverse proxy accepts an incoming connection it waits for the request to come in and then initiates its own request on behalf of the client. It can add headers to its request, so the back end web servers can know the real IP address of the client. Perlbal, Apache, and Big IP make excellent reverse proxies. I will write about each of these in more detail in a future post.

Centralized Storage

Depending on what features your site has, you may already have centralized storage. If your site stores all state and dynamic content in a SQL database then you’ll be fine. If there is anything stored on the file system it will have to be moved into the database, or MogileFS, which I will discuss in another post.

Monitoring

One important aspect of operations is real time monitoring for problems. You need to know immediately if your site is down so you can take action. You also need to have resource usage history for troubleshooting and capacity planning.

Nagios

Nagios will monitor your services for uptime, but it lacks performance graphs. It has some neat network maps, including a cool looking but useless 3D map.

Cacti

A better monitoring solution is Cacti. I recommend it over Nagios. There is excellent support for monitoring nearly every service you can think of. The graphs are very readable.

Throw in some smart power and you’ll have a very reliable and easy to manage site.