The objective of this document is to provide making of Low Cost SAN using FOSS tools. We have tried to set up a SAN which has following features:
• Low cost and easily affordable
• Ensured Scalability
• High Reliability
• Easily Manageable
• High Performance
• Ensured Security
• High availability
2 Definitions, Acronyms and Abbreviations
This section provides a list of all definitions, acronyms and terms required to properly interpret this document as well as to understand SAN terms and terminology.
|AoE||ATA over Ethernet, a open storage protocol|
|ATA||Advance Technology Attachment|
|Targets||End Point of communication ( normally refers to server side )|
|Initiators||A host that requests access to storage device ( client end )|
|RHCS||Red Hat Clustering Suite|
|Heartbeat||A signal periodically sent out by a hardware component in order to inform another component that it is working normally|
|ISCSI||Internet Small Computer System Interface|
|SATA||Serial ATA, a newer version of ATA interface|
|GFS||Global File System, a cluster aware filesystem for Linux|
|SAN||Storage Area Networking|
|LVM||Logical Volume Manager|
|RAID||Redundant Array of Inexpensive disks|
|DRBD||Distributed Replicated Block Device|
|NBD||Network Block Device|
|ENBD||Enhanced Network Block Device|
|GNBD||Global Network Block Device|
|HA||High Availability, a clustering solution for Linux which provides reliability, availability and serviceability|
|FOSS||Free Open Source Software|
|DFS||Distributed File System for Windows|
|LVS||Linux Virtual Server|
This section tells all the references and urls used to prepare this document.
4 Layered Architecture of SAN
4.1 Brief Description and Layered Architecture
This is the investigation document which touches each aspect of Low Cost SAN making right from hardware, OS and Softwares. The layered architecture of our SAN is shown in following diagrams.
In this diagram, three boxes in red color depicts the solution for windows and DFS is Microsoft distributed file system for windows server.
4.2 SAN Features and available options
Having above architecture in mind we have tried to achieve all the features of low cost SAN in terms of speed, reliability, security, scalability and availability. Following table gives us an overview against the features and corresponding available options. All the softwares which we have used to achieve SAN features are available on FOSS:
Available Options On FOSS
|Low Cost & Simplicity||AoE Protocol and corresponding softwares are available on FOSS.|
|Security||No routability provides inherent security|
|Speed of ATA disk||1) Typical 7200-rpm SATA disk drive: 105 MB/s (sustained throughput)
2) Typical 7200-rpm PATA disk drive: 72 MB/s (sustained throughput)
|Speed of Ethernet||1) Gigabit Ethernet (1000baseT): 125 MB/s
2)10-Gigabit Ethernet: 1,250 MB/s
|Data Packets||AoE simply delivers 48 bytes and data ( only extra 48 bytes )|
|Full Virtualization Support||Fully Compatible with hypervisors such as Xen, VMware, Microsoft Virtual PC to virtualize computers that are used as servers|
|Virtualized disk||We can combine multiple 22 TB disk into a single RAID disk.|
|Device access through Internet||Remote access to an AoE device through the Internet can be achieved
through tunneling, we can use software to convert local packets into routable packets at both ends of a link.
|Easy management of AoE Servers and nodes||AoE Tools like CEC provides a terminal interface for AoE device. All the clusters and nodes can also be managed by RHCS cluster manager|
|Connecting Multiple Disks||24 port SATA controller PCI express card having capacity >=2TB per disk|
|Theoretical limits of AoE devices||AoE has a limitation of 65535 major x 255 minor addresses, so you’re
limited to approximately 16 million block devices on a single
broadcast domain / san. For each individual block device, the ATA
lba48 addressing restricts you to about 140 PetaBytes.
|Diskless booting Support||Diskless booting (PXE booting) is available in AoE for windows as well as for Linux|
|Fencing||RHCS fencing daemon provide fencing against corresponding failover domains|
|Network Load balancing||RHCS lvs and piranha provides network load balancing|
|Proper Synchronization among all the nodes||RHCS GFS/GFS2 uses DLM to provide this feature|
|Block Level Redundancy||DRBD is a tool available on Foss to provide high availability in SAN in terms of block level. If DRBD is used with heartbeat and rhcs, it’s a very good solution for HA in storage networking.|
|Directory level Redundancy||NFS fail over and auto mounting is easily handled by RHCS.|
|Resource Management and ensured communication among other nodes||CMAN of RHCS and Heartbeat are good solutions against this.|
4.3 Overview of SAN with HA/Failover
The main challenge of a reliable SAN is high availability and zero down time. Thanks to the tools like LVS, RHCS, HEARTBEAT and DRBD by which we can easily restart our applications and can do migration of services. Following diagram shows the failover a node and relocation of services, so that users get their applications running even if corresponding node crashed.
5 Low level Building Block of SAN
This section provides brief description of low level building block of SAN. The very first thing comes in our mind is what hard disk we are using and what are the available protocol for proper communication (in terms of read and write operation over network) exists. Of course the corresponding support in OS is also required at protocol level. Choosing of a reliable protocol is necessary according to the corresponding hard disk.
Fiber Channel is much expensive and it has much extra overhead in terms of cost and resources. So, we have three choices to export our block devices over network: AoE, NBD (ENBD/GNBD), and iSCSI. The brief descriptions of these three are as follows:
AoE: ATA over Ethernet ( AoE ) is a network protocol developed by the Brantley Coile Company, designed for simple, high-performance access of SATA storage devices over Ethernet networks. It gives the possibility to build SANs with low-cost, standard technologies. AoE does not rely on network layers above Ethernet, such as IP and TCP. In this regard it is more comparable to Fiber Channel over Ethernet than iSCSI. While the non-routability means AoE cannot be accessed over the Internet or other IP networks, the feature makes AoE more lightweight (with less load on the host), easier to implement, provides a layer of inherent security, and offers higher performance. Its support is available on Linux, Windows, Mac OS X, Free BSD and for plan 9 from Bell Labs.
• NBD: The Linux Network Block Device (NBD) is a device driver extension to the Linux kernel. With the NBD device driver you can create a TCP/IP network connection between your local Linux system and a server program on a remote (not necessarily Linux) computer. But NBD has some limitations in terms of read/write operation and using NBD as a root file system.
• iSCSI: iSCSI is Internet SCSI (Small Computer System Interface), an Internet Protocol (IP)-based storage networking standard for linking data storage facilities. By carrying SCSI commands over IP networks, iSCSI is used to facilitate data transfers over intranets and to manage storage over long distances. The protocol allows clients (called initiators ) to send SCSI commands (CDBs) to SCSI storage devices ( targets ) on remote servers. ISCSI requires TCP offload Engine and Host Bus Adapter. It has a large number of OS support available like Linux, Windows, HP-UX, AIX, NetWare etc-etc.
If we are thinking in terms of Low cost San having less overhead, then AoE is a good choice for us. The advantage of AoE is that you don’t have the overhead of translating ATA to SCSI and then back to ATA (if you are using ATA drives). So there is a performance pickup. Server processing load for iSCSI is much higher than AoE for equivalent throughput. AoE can spare processing cycles. iSCSI requires TCP/IP and its requisite complexity.
6 Low level Components (AoE)
This section provides brief description of the low level components which are available on FOSS and in market. Since we are focusing on Low Cost SAN and we have to face all the storage networking challenges of clustering, so when it comes to choose an OS, we can go for Centos. Centos 5.2 is almost equivalent to RHEL5 having inherent RHCS (redhat clustering suite) and virtualization facility. We have to attach more and more number of hard disk having much capacity, and we have PCI card/ raid controller card available having 24 ports and 2 TB size of each disk. Therefore we have following table for our low level components:
|Disks||SATA disks of >= 2TB capacity|
|PCI Card/ Raid Controller||24 port SATA controller PCI express Card, SATA || raid controller card|
|Nic Card / high quality Switch||Gigabyte multiport networking switch & gigabyte multiport nic card ( having jumbo frame support )|
|OS||Linux/Windows (Preferably Centos 5.2)|
7 High Level Building Block of SAN
This section provides brief description about high level building blocks of SAN. When we think to make our SAN more roust and to perform against all the available challenges of storage networking in terms of software then three things comes in our mind which are as follows :
Protocol: A reliable protocol is necessary. We can go for AoE. iSCSI and HyperSCSI can also be a choice.
• Drivers (Targets & Initiators): Client side and server side drivers are necessary to export the block devices over network and to access them on client side. Generally on client side, there is a kernel module available and on server side it can be a kernel module as well as a user space application. These drivers are known as targets (on server side) and initiators (on client side).
• HA Softwares: Redhat cluster suite can be used in many configurations in order to provide high availability, scalability, load balancing, file sharing, and high performance in SAN.
8 High level Components (AoE)
This section provides brief description of available high level AoE components like targets, initiators and HA softwares. Following table gives us the guideline to make a reliable high level building block of SAN.
|Targets||Vblade, Ggaoed, Qaoed ( on GPL )|
|Initiators||AoE driver, WinAoE driver ( on GPL )|
|HA Softwares||RHCS suite, DRBD, Heartbeat ( on FOSS )|
8.1 Targets (AoE)
This section gives brief description about available AoE targets. The highly reliable and highly configurable AoE targets are Vblade, Ggaoed and Qaoed. Ggaoed and Qaoed are more configurable than Vblade, while Vblade is quite simple and can be easily ported on any platform. These three targets are user space targets. Following diagram shows the combined view of these targets:
Apart from above mentioned targets, some other targets are also available for AoE and they all are on GPL. Kvblade, Vblade-kernel and Aoeserver are kernel modules while rest others (vblade, Ggaoed, Qaoed and Sqaoed) are user space targets. Following table defines these targets.
|Vblade||Vblade is a software-based AoE target, a virtual EtherDrive Blade. It exports local block storage to hosts on an ethernet local area network. Hosts with an ATA over Ethernet (AoE) initiator, like the aoe driver for Linux, can then access the storage over the ethernet. It is available for Linux, FreeBSD and for Plan 9 of bell labs.|
|Ggaoed||Ggaoed is an AoE ( ATA over Ethernet ) target implementation for Linux. It utilizes Linux kernel AIO, memory mapped sockets and other Linux features to provide the best performance. It requires Linux kernel 2.6.22 or greater. It’s currently available for Linux only.|
|Qaoed||Qaoed is a multithreaded ATA over Ethernet storage target that is easy to use and highly configurable. It’s available for Linux.|
|Aoeserver||Aoeserver is an in-kernel Ata Over Ethernet Storage target driver used to emulate a Coraid EtherDriver Blade. It is partly based on vblade and the aoe-client from the Linux 2.6-kernel. It uses procfs
to control and command this target.
|Kvblade||Kvblade is a kernel module implementing the target side of the AoE protocol. Users can command the module through sysfs to export block devices on specified network interfaces. The loopback device should be used as an intermediary for exporting regular files with kvblade.|
|Vblade-kernel||Vblade-kernel is an AoE target emulator implemented as a kernel module for Linux 2.6.* kernels.|
|Sqaoed||Qaoed is now ported on Solaris 10. Fubra people have done it and they call Qaoed as Sqaoed. But, the new Sqaoed has currently not coming with its configuration file. It is operated via command line.|
8.2 Initiators (AoE)
This section describes about available AoE initiators. Client side AoE drivers are available on FOSS for Linux, Solaris, Free BSD as well as for windows. It is also available for Mac OS X, but it’s paid. Following diagram shows it more clearly:
The brief description of available AoE initiators is as follows:
|AoE driver is a block driver which allows the Linux kernel to use AoE network protocol. Linux system can use AoE block devices like EtherDrive (R) storage blades. The block devices appear as local device nodes (e.g. /dev/etherd/e0.0). This is freely available for Linux, Free BSD and for Solaris.|
|WinAoE driver||WinAoE is an open source GPLv3 driver for using AoE (ATA over Ethernet) on Microsoft Windows(tm). It can be used for diskless booting of Windows 2000 through Vista 64 from an AoE device (virtual vblade or real Coraid device), or can be used as a general AoE access driver.|
|2ºFrost AoE Driver||2ºFrost AoE Driver provides direct access to shared networked AoE (ATA over Ethernet) storage, transferring raw ethernet packets using the fast, open AoE protocol rather than with the more complex and slower TCP/IP.|
8.3 SAN diagram (based on AoE protocol)
This section describes the basic architecture of our SAN which is based on AoE protocol, AoE target (vblade) and AoE initiator (aoe driver).
In above diagram there are two servers: server0 and server1. Each of them exports two block devices on the network and client node access these block devices as a RAID device. Here from server0 /dev/hdb is exported as /dev/etherd/e0.0 & /dev/hdc is exported as /dev/etherd/e1.0. Similarly from server1, we have exported two block devices: /dev/hdb as /dev/etherd/e0.0 and /dev/hdc as /dev/etherd/e1.1. Now we have four block devices on client side. These are as follows:
• /dev/etherd/e0.0 (from server0)
• /dev/etherd/e0.1 (from server1)
• /dev/etherd/e1.0 (from server0)
• /dev/etherd/e1.1 (from server1)
Now we have combined /dev/etherd/e0.0 & /dev/etherd/e0.1 as a raid device (/dev/md0) having raid level 1 (mirroring) property. Similarly /dev/etherd/e1.0 and /dev/etherd/e1.1 is combined as a raid device (/dev/md1) having raid level 1(mirroring) property. Now we combined these two raid device as a single raid device (/dev/md2) which has raid level 0 (stripping property). So, finally we have a single raid device /dev/md2 (by the combination of four exported block device) on which we can easily make a file system and can use on client side for further work. We can also do volume management before creating the raid device.
This section describes about SAN challenges and the available open source solutions against them. The quite obvious challenges of storage networking arena is as follows:
• High Availability
• Load Balancing
• High Performance
• Easily Manageable
To achieve these targets, we have a quite reliable tool which is known as RHCS (red hat clustering Suite). Redhat cluster suite can be used in many configurations in order to provide high availability, scalability, load balancing, file sharing, and high performance.
RHCS has following component to achieve above mentioned SAN challenges.
|CMAN||The main component of RHCS. It controls cluster membership and take care of fencing, resource management, distributed lock management and failover domains. It has its own GUI as well as it is controlled by cluster.conf file.|
|GFS/GFS2||Global File System ( GFS ) is a shared disk file system for Linux computer clusters. GFS and GFS2 is a cluster aware distributed file systesm which uses Distributed Lock Manager (DLM) for cluster configurations and the “nolock” lock manager for local file systems.|
|Piranha/LVS||RHCS includes lvs (Linux virtual server) with the Piranha management/configuration tool which is used for load balancing.|
|Conga||Conga is basically a cluster administrator web interface which uses luci and ricci daemon.|
|DLM/Gulm||Lock management is a common cluster-infrastructure service that provides a mechanism for other cluster infrastructure components to synchronize their access to shared resources. DLM is distributed lock Manager while GULM is a client-server lock manager. DLM runs in each cluster node; lock management is distributed across all nodes in the cluster.|
|Fencing||It’s a phenomenon to fence some devices and nodes if they are failed or corrupted. Basically CMAN controls the fenced daemon.|
Apart from RHCS suite we have DRBD and HEARTBEAT also available against the HA solution on SAN or in any cluster. They are on FOSS. The brief descriptions of these softwares are as follows:
• DRBD: DRBD ( Distributed Replicated Block Device ) is a distributed storage system for the Linux platform. It consists of a kernel module, several userspace management applications and some shell scripts and is normally used on high availability (HA) clusters. DRBD bears similarities to RAID 1, except that it runs over a network.
• HEARTBEAT: Heartbeat is a daemon that provides cluster infrastructure (communication and membership) services to its clients. This allows clients to know about the presence (or disappearance!) of peer processes on other machines and to easily exchange messages with them. Heartbeat comes with a primitive resource manager (haresources); however it is only capable of managing 2 nodes and does not detect resource-level failures.
DRBD is often deployed together with the Heartbeat cluster manager, although it does integrate with other cluster management. It integrates with virtualization solutions such as Xen, and may be used both within and on top of the Linux LVM stack.
9 iSCSI in SAN
Till now, we have covered all the aspects of SAN but it was basically focused on AoE protocol. iSCSI has its own features and own advantages in SAN. If we need features such as encryption, routability and user-based access in the storage protocol, iSCSI seems to be a better choice. ATA disks are not as reliable as their SCSI counterparts. Therefore iSCSI can be used for SAN creation. The following table covers all the building block of iSCSI in brief:
|HBA||Host Bus Adapter ( Aic 7xxx, QLE4xxx etc)|
|OS||Linux/Windows ( preferably Centos )|
|iSCSI Targets||Ardis iSCSI target, Intel iSCSI Target|
|iSCSI Initiators||Ardis iSCSI target & Intel iSCSI initiator ( for Linux ), Microsoft iSCSI initiator ( for windows )|
|Routers, Switches, Offload Engine||iSCSI offload Engine (ISOE), Security offload Engine ( SOE ), quality routers and a quality switch|
|HA/Failover||RHCS Suite, DRBD , HEARTBEAT|