RAID

RAID is intended to keep your dedicated servers or your virtual private server (VPS) alive and your data redundant in case of single (or more) disk failures – allowing you to replace faulty hardware in the case of disk failure.

Our own opinion is that RAID is always worth the extra cost – it usually saves you from a lot of trouble when things go wrong. There are two main options to decide between when you want a RAID setup, these are software and hardware RAID. In the first case, your main CPU/memory take over the part of ensuring your desired RAID level, in the latter, you have extra (costly) hardware to handle that part of your machine.

Software RAID has advantages such as being cheaper and not subjecting you to vendor lock-in, and – in some cases – even outperforms a hardware RAID with today’s fast CPUs. Nevertheless, hardware RAID offers features a software RAID setup cannot, for example hot swapping disks, or write back caching if you have a BBU.

This post is not about the pros and cons of software vs. hardware RAID, however. Essentially, we want to present the four most common setups for data redundancy and integrity – RAID 1, RAID 5, RAID 6, and Raid 10 – in a concise summary.

RAID 1 is all about disk mirroring. You team up two identical disks, and they form a mirror, all your data is kept twice. You can lose one disk and still go on running your server. Of course, the storage efficiency is rather low – out of 2x2TB you only get 2TB in total.

RAID 5 is another very common setup. It needs at least 3 disks, and in a nutshell, you can lose one disk before things start getting sinister for your server. That gives you moderate storage efficiency – in a 3x2TB setup you get around 4TB in total, in a 4x2TB you get something close to 6TB in total.

RAID 6 could be seen as a further development of RAID 5, in laymen’s terms. Here you need at least 4 disks, and you can afford 2 disks going down before your disk array suffers data loss. The storage efficiency is worse than with RAID 5, but typically better than with RAID 1 since both RAID 5 and RAID 6 allow for more than just 3 or 4 disks to be used.

And finally, RAID 10 is a mix of RAID 0 (stripes over several disks) in combination with RAID 1 (mirroring). This gives the same capacity as RAID 1, as well as the same redundancy level, but requires at least 4 disks to work and is generally more expensive than RAID 5 or 6 compared to their capacity.

In terms of performance, RAID 10 generally outperforms the other RAIDs in terms of write speed. This difference becomes smaller with more disks in the array, but still, in your typical 4 disk setup, RAID 10 is fastest for writes, and RAID 1 is typically faster for writes than RAID 5 or 6 as well. In terms of read performance, RAID 1 lags behind the other options, whereas RAID 5, 6, and 10 can be considered pretty similar and vary depending on the applications and I/O going on.

Overall, if you don’t need much storage and want a cheap redundant solution, choose RAID 1, it offers enough performance for everyday’s applications as well. If you need more redundancy, but do not have write intensive applications, then RAID 5 or RAID 6 are fine. DB-intensive (really intensive in terms of writes) applications should consider RAID 10, however. The increase in write performance is well worth the extra cost, but pay attention to the number of disks in the array – the more disks in a RAID 5 or RAID 6 array, the better write performance becomes.

Cloud computing

(this post will also appear in our dedicated servers blog)

Ok, this had to come. Sooner or later – cloud computing. In a nutshell, do we offer it: yes – if the very specific advantages of the cloud warrant its use.

So, what is it – why are so many people crazy about it, and why is it so expensive, compared to a virtual private server or dedicated servers? Essentially, it is simply a different concept of providing resources on demand that can seemingly scale ad infinitum, with similar contention disadvantages like a virtual private server, however. Why? Because eventually, also cloud resources must run on a physical machine. And typically, you won’t be the only person using this machine for your cloud purposes, hence you share the resources with others, and therefore there will always be a level of contention subject to the ISP’s discretion – even if you use very sophisticated virtualisation and isolation methods. Most ISPs sell you cloud computing as a hype, when in fact it is very little else than a different version of a virtual private server.

Of course, cloud crusaders will tell you why you must have a cloud, and start explaining about the additional layer of flexibility, redundancy, security, scalability, etc. In return, one can ask questions such as: do you really want to host your confidential and secure data in a cloud without being able really pinpoint the exact location? Your data is “somewhere in the cloud”. How secure does that make you feel? How redundant is it really? How can I determine redundancy if it is not even clear to me where exactly my data is stored? What different level of redundancy is there compared to normal Raid systems and backup/recover solutions? My application’s resource usage varies by 25% only, why can I not go for a dedicated setup instead, or a virtual private server even?

We still consider one of the cloud’s main advantage its flexibility for sites and application that vary a lot in their resource use over time, with very irregular patterns as well. While they can scale a lot (depending on what you agree upon with your ISP), there will still be resource limits to be observed, so even in a cloud you should take care of estimating your expected peak resource usage.

This is a very basic and by no means comprehensive statement – there are a lot more complex issues to be observed with clouds – put the other way round: normally, even for very complex and high resource applications, you will only need a cloud if you can state its technical advantages over a virtual private server / VPS and dedicated server. Otherwise, in 99 out of 100 cases you will be better of with the latter, outside the cloud.

A very good read is http://www.eucalyptus.com/resources/info/cloud-myths-dispelled – 4 important technical aspects when it comes to cloud computing.

Traffic and bandwidth, revisited

Today, I read a thread in a feedback forum:

http://www.webhostingtalk.com/archive/index.php/t-1052232.html

There is a lot of talk and fuss about what is legitimate use for these 150TB plans, whether download sites or CDNs are allowed, what constitutes a CDN, and what does not, etc.

The entire thread is one single credo for our own traffic policy – as long as it is legal, use it for whatever you want, we reserve the traffic for you, end of story. Yes, this comes at a price, but there is no smallprint. You won’t be capped to a 10mbps port if you exceed your bandwidth allowance, you are not expected to use your traffic perfectly evenly across the entire month all the time, go have some spikes! This is what we call fair – we do not use freerider tactics to give a small group of large traffic users an advantage the majority of negligible traffic users are paying.

Put it the other way round: Assume an ISP has 100gbps of bandwidth available, and sells off single servers with 100TB usage each. That would give the following equation, roughly:

100 x 1000 x 300GB x => 30000 TB / 100 TB = 300 servers

A typical server with such deals will cost you GBP 100 per month, x 300 means the company is reaping 30,000 GBP per month in turnovers before the bandwidth is being oversold.

With 30,000 per month, they will have to cover their infrastructure costs, all staff costs, and all opportunity costs. Even if the company only had one single employee (the owner), this would never pay. So, how do they do it? Quite simple: overselling fully aware that 99% of users will never even come anywhere close to these magic numbers of 100TB per month or more. And for that final per cent, they will (and do, as we see) apply their T&C smallprint, and make a couple of exceptions for those who shout too loud. In the end, there are two winners: the ISP using such practices, and the shouters. The rest, the majority, pays for these.

Often you will also find terms such as <insert large TB number here> OR 100mbps unmetered. 100mbps unmetered will give you 30TB of traffic per month. Why, then, can you choose between options that are SO much unlike each other? 100, 150 TB per month on a gbps will cost the same as 100mbps unmetered? This simply doesn’t add up.

Also, such contracts typically come with a 6 month nicely reduced fee, but then you will be charged the full monthly price – for something you might not even need. If you know you are never going to use 150TB, why pay for them to cover the losses the ISP makes from the small number of customers who actually do use them? Usually, after the initial contract period, these machines could be obtained considerably cheaper if you only paid for the traffic you actually need instead of having to drag that cost around like a ball and chain around your ankle.

Bottom line: again, be careful, ask questions. These T&C are all legit – not nice maybe, but legit – and you need to wade through the smallprint in order to understand what you can expect from these ISPs, and what you cannot.

Linux flavours

(this post will appear in our dedicated servers blog as well)

Abstract/Summary – basics only

These days, the most prominent Linux flavours are Red Hat, CentOS, Debian, Fedora, SUSE/SLES, and Ubuntu. The number of variants of these flavours is legion, the main distinction here, however, is that Red Hat is a fully commercial branch, whereas the others are available free of charge.

Red Hat is closely related to CentOS and Fedora, and while avoiding too technical an explanation, in layman terms CentOS can be seen as the “free” version of Red Hat, Fedora as the “next generation Red Hat”. There are a lot of caveats with these metaphors, but they help to get an overall idea. Debian and Ubuntu are independent (similar to some extent) flavours with their own community. Ubuntu has gained a lot of popularity recently due to its cloud abilities. SUSE was originally independent and started off in Germany, but has been bought by Novell and has seen a decline in community lately.

Red Hat and CentOS are more conservative in their application of packages and in their approach of going for the latest in everything. This is not necessarily a bad appoach at all – a huge number of commercial, high performance, and mission critical applications are specifically tuned for Red Hat, based on our own Linux experience since 1992, starting with Slackware, we identified Red Hat and CentOS as the leading Linux flavours for our own server environment (this is not a strictly objective judgement per se, as we of course need to evaluate our own needs first, and we encourage digressing opinions).

Red Hat and CentOS are ideal solutions for virtualisation, too. Both offer similar technologies, though we tend to go for KVM with Red Hat, and openVZ with CentOS (openvz will also run on Fedora, by the way). KVM (kernel based virtual machine) employs a different concept than openVZ, and allows running guests in a way such that the guest does not really know it is a guest only. That gives you a chance to, say, run a Windows server as a guest system on a Red Hat (or, rather, KVM) host – openVZ will only support Linux guests, and the ones we have best experience with are CentOS, Debian, Fedora, Ubuntu, and to some extent SUSE.

When it comes to control panels, we have excellent experience with CentOS + cPanel/Plesk, and Debian + Plesk. These setups pretty much work out of the box, and wont give you any hassles in a live environment.

Virtual Private Server – VPS – the basics

A virtual private server, or VPS, can be described as a mostly (we will get to the meaning of that further below) separated container inside a physical machine pretending it is a machine of its own.

As opposed to a dedicated server, where you own all resources of that particular machine, in a VPS you are being allocated only a subset of resources, typically configurable by parameters such as:

  • memory;
  • disk space
  • CPU cores/speed;
  • bandwidth;

The advantage of a VPS is its compactness – you can pretty much do whatever you could also do with a dedicated physical machine, but nevertheless enjoy the much lower cost of a VPS compared to paying for an entire server. The disadvantages of virtual private servers lie in their contention ratio and scalability.

The more customers hosted on a single physical machine (even powerful machines), and unless we are talking about workhorses such as IBM’s P7 series, which can be virtualised nearly without contention), and even with no oversubscription, you will face increasing rivalry for a machine’s resources between the guest systems, such as I/O, memory, or CPU power. Scalability is another issue – you cannot scale a VPS up without ends. The current hardware of everyday’s high performing Intel or AMD architecture cannot be scaled ad infinitum, and a site requiring resources that were usually only served by dedicated servers a few years ago might even benefit from the additional performance (albeit at higher cost) from a dedicated machine with the same general specs such as CPU, memory, and disk space.

In this blog we will focus on virtual private servers in general, how they compare to dedicated servers, where to draw the line, typical administrative caveats, and just everything that comes to our mind which might be helpful for you in running your VPS. We hope that this blog will prove useful to you!