Cheap VPS & Xen Server

Residential Proxy Network - Hourly & Monthly Packages

Low Cost SAN


1 Objective

The objective of this document is to provide making of Low Cost SAN using FOSS tools. We have tried to set up a SAN which has following features:

•  Low cost and easily affordable

•  Ensured Scalability

•  High Reliability

•  Easily Manageable

•  High Performance

•  Ensured Security

•  High availability

2  Definitions, Acronyms and Abbreviations

This section provides a list of all definitions, acronyms and terms required to properly interpret this document as well as to understand SAN terms and terminology.

Abbreviation

Description

AoE ATA over Ethernet, a open storage protocol
ATA Advance Technology Attachment
Targets End Point of communication ( normally refers to server side )
Initiators A host that requests access to storage device ( client end )
RHCS Red Hat Clustering Suite
Heartbeat A signal periodically sent out by a hardware component in order to inform another component that it is working normally
ISCSI Internet Small Computer System Interface
SATA Serial ATA, a newer version of ATA interface
GFS Global File System, a cluster aware filesystem for Linux
SAN Storage Area Networking
LVM Logical Volume Manager
RAID Redundant Array of Inexpensive disks
DRBD Distributed Replicated Block Device
NBD Network Block Device
ENBD Enhanced Network Block Device
GNBD Global Network Block Device
HA High Availability, a clustering solution for Linux which provides reliability, availability and serviceability
FOSS Free Open Source Software
DFS Distributed File System for Windows
LVS Linux Virtual Server

3  References

This section tells all the references and urls used to prepare this document.

SN

URLS

1 http://nbd.sourceforge.net/
2 http://en.wikipedia.org/
3 http://3ware.com/products/serial_ata2-9650.asp
4 http://www.drbd.org/
5 http://www.linux-ha.org/
6 http://www.linuxjournal.com/article/8149
7 http://www.redhat.com/docs/manuals/enterprise/RHEL-5-manual/Cluster Administration/

 

4  Layered Architecture of SAN

4.1  Brief Description and Layered Architecture

This is the investigation document which touches each aspect of Low Cost SAN making right from hardware, OS and Softwares. The layered architecture of our SAN is shown in following diagrams.

Server Architecture:

server

Client Architecture:

client

In this diagram, three boxes in red color depicts the solution for windows and DFS is Microsoft distributed file system for windows server.
4.2 SAN Features and available options

Having above architecture in mind we have tried to achieve all the features of low cost SAN in terms of speed, reliability, security, scalability and availability. Following table gives us an overview against the features and corresponding available options. All the softwares which we have used to achieve SAN features are available on FOSS:

SAN Features

Available Options On FOSS

Low Cost & Simplicity AoE Protocol and corresponding softwares are available on FOSS.
Security No routability provides inherent security
Speed of ATA disk 1) Typical 7200-rpm SATA disk drive: 105 MB/s (sustained throughput)

2) Typical 7200-rpm PATA disk drive: 72 MB/s (sustained throughput)

Speed of Ethernet 1) Gigabit Ethernet (1000baseT): 125 MB/s

2)10-Gigabit Ethernet: 1,250 MB/s

Data Packets AoE simply delivers 48 bytes and data ( only extra 48 bytes )
Full Virtualization Support Fully Compatible with hypervisors such as Xen, VMware, Microsoft Virtual PC to virtualize computers that are used as servers
Virtualized disk We can combine multiple 22 TB disk into a single RAID disk.
Device access through Internet Remote access to an AoE device through the Internet can be achieved

through tunneling, we can use software to convert local packets into routable packets at both ends of a link.

Easy management of AoE Servers and nodes AoE Tools like CEC provides a terminal interface for AoE device. All the clusters and nodes can also be managed by RHCS cluster manager
Connecting Multiple Disks 24 port SATA controller PCI express card having capacity >=2TB per disk
Theoretical limits of AoE devices AoE has a limitation of 65535 major x 255 minor addresses, so you’re
limited to approximately 16 million block devices on a single
broadcast domain / san. For each individual block device, the ATA
lba48 addressing restricts you to about 140 PetaBytes.
Diskless booting Support Diskless booting (PXE booting) is available in AoE for windows as well as for Linux
Fencing RHCS fencing daemon provide fencing against corresponding failover domains
Network Load balancing RHCS lvs and piranha provides network load balancing
Proper Synchronization among all the nodes RHCS GFS/GFS2 uses DLM to provide this feature
Block Level Redundancy DRBD is a tool available on Foss to provide high availability in SAN in terms of block level. If DRBD is used with heartbeat and rhcs, it’s a very good solution for HA in storage networking.
Directory level Redundancy NFS fail over and auto mounting is easily handled by RHCS.
Resource Management and ensured communication among other nodes CMAN of RHCS and Heartbeat are good solutions against this.

4.3  Overview of SAN with HA/Failover

The main challenge of a reliable SAN is high availability and zero down time. Thanks to the tools like LVS, RHCS, HEARTBEAT and DRBD by which we can easily restart our applications and can do migration of services. Following diagram shows the failover a node and relocation of services, so that users get their applications running even if corresponding node crashed.

HASAN

5  Low level Building Block of SAN

This section provides brief description of low level building block of SAN. The very first thing comes in our mind is what hard disk we are using and what are the available protocol for proper communication (in terms of read and write operation over network) exists. Of course the corresponding support in OS is also required at protocol level. Choosing of a reliable protocol is necessary according to the corresponding hard disk.

buildingblockofsan

Fiber Channel is much expensive and it has much extra overhead in terms of cost and resources. So, we have three choices to export our block devices over network: AoE, NBD (ENBD/GNBD), and iSCSI. The brief descriptions of these three are as follows:

AoE: ATA over Ethernet ( AoE ) is a network protocol developed by the Brantley Coile Company, designed for simple, high-performance access of SATA storage devices over Ethernet networks. It gives the possibility to build SANs with low-cost, standard technologies. AoE does not rely on network layers above Ethernet, such as IP and TCP. In this regard it is more comparable to Fiber Channel over Ethernet than iSCSI. While the non-routability means AoE cannot be accessed over the Internet or other IP networks, the feature makes AoE more lightweight (with less load on the host), easier to implement, provides a layer of inherent security, and offers higher performance. Its support is available on Linux, Windows, Mac OS X, Free BSD and for plan 9 from Bell Labs.

•  NBD: The Linux Network Block Device (NBD) is a device driver extension to the Linux kernel. With the NBD device driver you can create a TCP/IP network connection between your local Linux system and a server program on a remote (not necessarily Linux) computer. But NBD has some limitations in terms of read/write operation and using NBD as a root file system.

•  iSCSI: iSCSI is Internet SCSI (Small Computer System Interface), an Internet Protocol (IP)-based storage networking standard for linking data storage facilities. By carrying SCSI commands over IP networks, iSCSI is used to facilitate data transfers over intranets and to manage storage over long distances. The protocol allows clients (called initiators ) to send SCSI commands (CDBs) to SCSI storage devices ( targets ) on remote servers. ISCSI requires TCP offload Engine and Host Bus Adapter. It has a large number of OS support available like Linux, Windows, HP-UX, AIX, NetWare etc-etc.

If we are thinking in terms of Low cost San having less overhead, then AoE is a good choice for us. The advantage of AoE is that you don’t have the overhead of translating ATA to SCSI and then back to ATA (if you are using ATA drives). So there is a performance pickup. Server processing load for iSCSI is much higher than AoE for equivalent throughput. AoE can spare processing cycles. iSCSI requires TCP/IP and its requisite complexity.

 

6  Low level Components (AoE)

This section provides brief description of the low level components which are available on FOSS and in market. Since we are focusing on Low Cost SAN and we have to face all the storage networking challenges of clustering, so when it comes to choose an OS, we can go for Centos. Centos 5.2 is almost equivalent to RHEL5 having inherent RHCS (redhat clustering suite) and virtualization facility. We have to attach more and more number of hard disk having much capacity, and we have PCI card/ raid controller card available having 24 ports and 2 TB size of each disk. Therefore we have following table for our low level components:

Components

Description

Disks SATA disks of >= 2TB capacity
PCI Card/ Raid Controller 24 port SATA controller PCI express Card, SATA || raid controller card
Nic Card / high quality Switch Gigabyte multiport networking switch & gigabyte multiport nic card ( having jumbo frame support )
OS Linux/Windows (Preferably Centos 5.2)
Protocol AoE

 

7  High Level Building Block of SAN

This section provides brief description about high level building blocks of SAN. When we think to make our SAN more roust and to perform against all the available challenges of storage networking in terms of software then three things comes in our mind which are as follows :

buildingblockofsan2

Protocol: A reliable protocol is necessary. We can go for AoE. iSCSI and HyperSCSI can also be a choice.

•  Drivers (Targets & Initiators): Client side and server side drivers are necessary to export the block devices over network and to access them on client side. Generally on client side, there is a kernel module available and on server side it can be a kernel module as well as a user space application. These drivers are known as targets (on server side) and initiators (on client side).

•  HA Softwares: Redhat cluster suite can be used in many configurations in order to provide high availability, scalability, load balancing, file sharing, and high performance in SAN.

 

8  High level Components (AoE)

This section provides brief description of available high level AoE components like targets, initiators and HA softwares. Following table gives us the guideline to make a reliable high level building block of SAN.

Components

Description

Protocol AoE
Targets Vblade, Ggaoed, Qaoed ( on GPL )
Initiators AoE driver, WinAoE driver ( on GPL )
HA Softwares RHCS suite, DRBD, Heartbeat ( on FOSS )

 

8.1  Targets (AoE)

This section gives brief description about available AoE targets. The highly reliable and highly configurable AoE targets are Vblade, Ggaoed and Qaoed. Ggaoed and Qaoed are more configurable than Vblade, while Vblade is quite simple and can be easily ported on any platform. These three targets are user space targets. Following diagram shows the combined view of these targets:

aoetargets

Apart from above mentioned targets, some other targets are also available for AoE and they all are on GPL. Kvblade, Vblade-kernel and Aoeserver are kernel modules while rest others (vblade, Ggaoed, Qaoed and Sqaoed) are user space targets. Following table defines these targets.

Targets

Description

Vblade Vblade is a software-based AoE target, a virtual EtherDrive Blade. It exports local block storage to hosts on an ethernet local area network. Hosts with an ATA over Ethernet (AoE) initiator, like the aoe driver for Linux, can then access the storage over the ethernet. It is available for Linux, FreeBSD and for Plan 9 of bell labs.
Ggaoed Ggaoed is an AoE ( ATA over Ethernet ) target implementation for Linux. It utilizes Linux kernel AIO, memory mapped sockets and other Linux features to provide the best performance. It requires Linux kernel 2.6.22 or greater. It’s currently available for Linux only.
Qaoed Qaoed is a multithreaded ATA over Ethernet storage target that is easy to use and highly configurable. It’s available for Linux.
Aoeserver Aoeserver is an in-kernel Ata Over Ethernet Storage target driver used to emulate a Coraid EtherDriver Blade. It is partly based on vblade and the aoe-client from the Linux 2.6-kernel. It uses procfs

to control and command this target.

 

Kvblade Kvblade is a kernel module implementing the target side of the AoE protocol. Users can command the module through sysfs to export block devices on specified network interfaces. The loopback device should be used as an intermediary for exporting regular files with kvblade.
Vblade-kernel Vblade-kernel is an AoE target emulator implemented as a kernel module for Linux 2.6.* kernels.
Sqaoed Qaoed is now ported on Solaris 10. Fubra people have done it and they call Qaoed as Sqaoed. But, the new Sqaoed has currently not coming with its configuration file. It is operated via command line.

8.2  Initiators (AoE)

This section describes about available AoE initiators. Client side AoE drivers are available on FOSS for Linux, Solaris, Free BSD as well as for windows. It is also available for Mac OS X, but it’s paid. Following diagram shows it more clearly:

initiators

The brief description of available AoE initiators is as follows:

Initiators

Description

AoE driver
AoE driver is a block driver which allows the Linux kernel to use AoE network protocol. Linux system can use AoE block devices like EtherDrive (R) storage blades. The block devices appear as local device nodes (e.g. /dev/etherd/e0.0). This is freely available for Linux, Free BSD and for Solaris.
WinAoE driver WinAoE is an open source GPLv3 driver for using AoE (ATA over Ethernet) on Microsoft Windows(tm). It can be used for diskless booting of Windows 2000 through Vista 64 from an AoE device (virtual vblade or real Coraid device), or can be used as a general AoE access driver.
2ºFrost AoE Driver 2ºFrost AoE Driver provides direct access to shared networked AoE (ATA over Ethernet) storage, transferring raw ethernet packets using the fast, open AoE protocol rather than with the more complex and slower TCP/IP.

 

8.3  SAN diagram (based on AoE protocol)

This section describes the basic architecture of our SAN which is based on AoE protocol, AoE target (vblade) and AoE initiator (aoe driver).

SAN

In above diagram there are two servers: server0 and server1. Each of them exports two block devices on the network and client node access these block devices as a RAID device. Here from server0 /dev/hdb is exported as /dev/etherd/e0.0 & /dev/hdc is exported as /dev/etherd/e1.0. Similarly from server1, we have exported two block devices: /dev/hdb as /dev/etherd/e0.0 and /dev/hdc as /dev/etherd/e1.1. Now we have four block devices on client side. These are as follows:

•  /dev/etherd/e0.0 (from server0)

•  /dev/etherd/e0.1 (from server1)

•  /dev/etherd/e1.0 (from server0)

•  /dev/etherd/e1.1 (from server1)

Now we have combined /dev/etherd/e0.0 & /dev/etherd/e0.1 as a raid device (/dev/md0) having raid level 1 (mirroring) property. Similarly /dev/etherd/e1.0 and /dev/etherd/e1.1 is combined as a raid device (/dev/md1) having raid level 1(mirroring) property. Now we combined these two raid device as a single raid device (/dev/md2) which has raid level 0 (stripping property). So, finally we have a single raid device /dev/md2 (by the combination of four exported block device) on which we can easily make a file system and can use on client side for further work. We can also do volume management before creating the raid device.

 

8.4  HA/Failover

This section describes about SAN challenges and the available open source solutions against them. The quite obvious challenges of storage networking arena is as follows:

•  Storage

•  High Availability

•  Load Balancing

•  High Performance

•  Easily Manageable

To achieve these targets, we have a quite reliable tool which is known as RHCS (red hat clustering Suite). Redhat cluster suite can be used in many configurations in order to provide high availability, scalability, load balancing, file sharing, and high performance.

hasoftwares

RHCS has following component to achieve above mentioned SAN challenges.

Services

Functionality

CMAN The main component of RHCS. It controls cluster membership and take care of fencing, resource management, distributed lock management and failover domains. It has its own GUI as well as it is controlled by cluster.conf file.
GFS/GFS2 Global File System ( GFS ) is a shared disk file system for Linux computer clusters. GFS and GFS2 is a cluster aware distributed file systesm which uses Distributed Lock Manager (DLM) for cluster configurations and the “nolock” lock manager for local file systems.
Piranha/LVS RHCS includes lvs (Linux virtual server) with the Piranha management/configuration tool which is used for load balancing.
Conga Conga is basically a cluster administrator web interface which uses luci and ricci daemon.
DLM/Gulm Lock management is a common cluster-infrastructure service that provides a mechanism for other cluster infrastructure components to synchronize their access to shared resources. DLM is distributed lock Manager while GULM is a client-server lock manager. DLM runs in each cluster node; lock management is distributed across all nodes in the cluster.
Fencing It’s a phenomenon to fence some devices and nodes if they are failed or corrupted. Basically CMAN controls the fenced daemon.

Apart from RHCS suite we have DRBD and HEARTBEAT also available against the HA solution on SAN or in any cluster. They are on FOSS. The brief descriptions of these softwares are as follows:

•  DRBD: DRBD ( Distributed Replicated Block Device ) is a distributed storage system for the Linux platform. It consists of a kernel module, several userspace management applications and some shell scripts and is normally used on high availability (HA) clusters. DRBD bears similarities to RAID 1, except that it runs over a network.

•  HEARTBEAT: Heartbeat is a daemon that provides cluster infrastructure (communication and membership) services to its clients. This allows clients to know about the presence (or disappearance!) of peer processes on other machines and to easily exchange messages with them. Heartbeat comes with a primitive resource manager (haresources); however it is only capable of managing 2 nodes and does not detect resource-level failures.

DRBD is often deployed together with the Heartbeat cluster manager, although it does integrate with other cluster management. It integrates with virtualization solutions such as Xen, and may be used both within and on top of the Linux LVM stack.

 

9 iSCSI in SAN

Till now, we have covered all the aspects of SAN but it was basically focused on AoE protocol. iSCSI has its own features and own advantages in SAN. If we need features such as encryption, routability and user-based access in the storage protocol, iSCSI seems to be a better choice. ATA disks are not as reliable as their SCSI counterparts. Therefore iSCSI can be used for SAN creation. The following table covers all the building block of iSCSI in brief:

Components

Description

Disks SCSI disks
HBA Host Bus Adapter ( Aic 7xxx, QLE4xxx etc)
OS Linux/Windows ( preferably Centos )
Protocol iSCSI
iSCSI Targets Ardis iSCSI target, Intel iSCSI Target
iSCSI Initiators Ardis iSCSI target & Intel iSCSI initiator ( for Linux ), Microsoft iSCSI initiator ( for windows )
Routers, Switches, Offload Engine iSCSI offload Engine (ISOE), Security offload Engine ( SOE ), quality routers and a quality switch
HA/Failover RHCS Suite, DRBD , HEARTBEAT

 

Comments

comments