SAN - The (not so) Little Network That Could...
- A SAN is a creature combining 2 hot buzzwords (Storage, Networks) using
a pointless word (Area).
- A SAN is a way to acknowledge that "it is no wonder the IT managers go
nuts - the disk space keeps filling!"
- In short - a SAN puts the hosts in one place, the disks in another place,
and lets everyone fight over a network.
- Linux is quite in infant with regards to its SAN support - here is one
area where its older brothers and sisters fare better...
- ...But with recent kernels, this is changing.
What's In A SAN?
- A SAN is a network, much like the Internet is a network.
- Except that SANS were meant for LANs
- And SANs allow hosts to talk to.... disks (or more likely, disk arrays,
RAIDs, JBODs (Just a Bunch Of Disks) and similar beasts.
- Mind you - a RAID can be a small thing costing 10K$, or a large beast
costing several 100K$.
- Most SANs employ the Fiber-Channel protocol, over fiber-optic links
(1Gbps, 2Gbps, lately also 4Gbps)...
- ...thought someone from IBM will tell you that iSCSI is the wave of
the future...
- ...and the real weirdos will say that Infiniband is king!
Why Use SANs?
- In the past - each server had its own internal disks - hard to replace
them or add more.
- Disks were moved outside (up to 7 disks per SCSI controller) - requires
down-time for adding disks, many small disks for many servers cost a lot.
- Disks were moved to NAS servers (that expose CIFS or NFS shares) - not
enough control by the applications (e.g. what if i want raw device access?,
how does it scale to many servers and lots of TB of data? also, Ethernet
performance is limiting)
- Disks were moved to RAIDs that expose block-devices directly, across
a fiber-channel network.
A Little About SCSI
- SCSI defines a BUS-based protocol - you connect several devices to a
BUS, give a unique ID to each device, and allow them to communicate.
- On the server (host) you install an HBA (Host-BUS Adapter) to communicate
with the SCSI BUS.
- A SCSI address is composed of 4 numbers:
- Host number (the number of the SCSI HBA on the host)
- Port (or channel) - in case an HBA has more then one SCSI port
- Target ID (identifies a storage controller)
- LUN (Logical Unit Number) - identifies a single SCSI disk/tape/whatever
in a given target.
The SCSI Architecture
- In The SCSI Architecture, there are initiators and targets.
- An initiator is like a client.
- A target is like a server.
- Initiators send SCSI commands to targets, and receive responses.
SCSI Commands
- SCSI commands pass data in one direction only. If you want to send some
data and then receive some data, you need to use two separate SCSI
commands.
- Some common SCSI commands: READ, WRITE, Send Diagnostics, Receive
Diagnostics, Inquiry, Test Unit Ready.
- Each command returns a status code. A special code named 'mode sense'
means there is an extra 'sense data' to be received, which contains
an explanation for the cause of the problem.
- In order to allow a pipeline of commands, it is possible to attach a tag
to a command. The tag is a single byte, so at most 255 commands may be
pending.
An Example - The SCSI Inquiry Command
A Little About Fiber-Channel
- Fiber-Channel (FC) is like Ethernet
- Thought built with multi-pathing in mind (see below)
- and runs SCSI on top of it, for good measure (and backward compatibility
with existing software and operating systems).
Fiber-Channel Terminology
- WWN (World-Wide Name) - the address of a controller (HBA on a host,
controller on a RAID) connected to a Fiber-Channel network.
- LUN Masking - The operation of defining which WWNs may access a given LUN
on a RAID attached to a Fiber-Channel network.
- Zones - Similar to VLANs in Ethernet - allows splitting the Fiber-Channel
network to separate logical networks, where traffic cannot flow between
different zones. Zones may span multiple fibre-channel switches.
- Name Server - a service on a Fiber-Channel switch, that collects a list
of WWNs connected to it, and allows a host to see who is connected to the
Fiber-Channel network (actually - the zones that contain the client sending
the query).
More Fiber-Channel Terminology
- PLOGI (Port Login) - the first operation an initiator performs in order
to talk to a target, is to login into the target.
- Multi-Pathing - the ability to connect several different paths between
two entities, to use for load-balancing and/or for high-availability.
About RAIDs
- A RAID (Redundant Array of Inexpensive Disks) is a very expensive (10K$
for cheapo things and up to 500K$ and more for large high-end) container
for disks.
- RAIDs support redundancy - i.e. the same data is stored on more then one
physical disk, so a single disk failure does not stop I/O.
- RAIDs normally have multiple controllers, for high-availability and
for load-balancing. A giant RAID might have 20 such controllers...
- RAIDs allow defining virtual disks, each exposed as a separate LUN,
and support LUN masking.
- Finally, mid-range and high-end RAIDs support applications such as
taking snapshots of LUNs, remote mirroring, etc.
SAN Virtualization
- As SANs grow, it is hard to manage them - knowing which host uses
which LUNs. It is also hard to perform backups in today's environment,
where you can't afford down-time for backup purposes (and you can't
backup an active file-system).
- RAIDs were the first to virtualize things - virtual disks, snapshots,
mirroring...
- Reliance on the applications in the high-end RAIDs result a vendor
lock-in - which mean high prices.
- For starters, LVM (Logical Volume Management) in the hosts allow
performing snapshots in the hosts rather then in the RAID - but then
you cannot access these snapshots via another host.
- There is growing tendency to move all this virtualization to the network
level - where you allow sharing snapshots and virtual volumes between
hosts, have centralized management, and can mix RAIDs from different
vendors (and thus reduce RAID prices).
Linux, SCSI and SANs
- Linux supported SCSI since very long ago - drivers for various SCSI
controllers were in the kernel for years.
- Linux supports Fiber-Channel HBAs from various vendors.
- There are many non-SCSI devices that use the SCSI protocol on top of
another protocol (e.g. USB) - Linux supports many of those too.
The Linux SCSI Sub-System
The Linux SCSI sub-system is split into 3 layers:
- SCSI low-layer - contains the HBA drivers for the different vendors.
- SCSI mid-layer - a single driver that allows HBA drivers and upper-layer
drivers to find each other, and supports operations common to all
SCSI devices.
- SCSI upper-layer - contains devices for different device types:
- sd - a block driver for SCSI disks.
- st - a character driver for SCSI tapes.
- sr - a block driver for SCSI CDs/DVDs.
- sg - a generic character driver - used to bypass the above drivers,
or to talk to other types of devices, such as scanners.
SCSI Drivers Loading Order
- The mid-layer driver must be loaded before the others.
- When an HBA driver is loaded, it causes the mid-layer to perform
a SCSI scan, in order to find any LUNs accessible to it, and attach
upper-layer devices to them.
- This means that the order of HBA drivers loading is crucial to device
naming: if we have HBA 1 that sees disk A, and HBA 2 that sees disk B,
loading the the driver for HBA 1 first, will mean that disk A will be
seen via /dev/sda, while disk B will be seen via /dev/sdb.
- In a SAN this is even more problematic - the order of discovery of
different RAIDs will change the target ID (and device file attaching)
for them. Anyone said "data corruption"?
Looking At SCSI "things" - /proc/scsi
Looking At SCSI "things" - /proc/sg/
Looking At SCSI "things" - /sys/class/scsi/
Linux HBA Drivers
- First, we have the old SCSI HBA drivers for pure SCSI Controllers,
such as aic79xx.o for various Adaptec HBAs, advansys.o for ConnectCom
Solutions, Inc drivers, etc.
- The we have drivers for SCSI-over-something - SCSI over ide (ide-scsi.o),
SCSI over USB (usb-storage.o for USB storage devices).
- Fiber-Channel HBAs - In the last 2-3 years, the 2 major vendors of
Fiber-Channel HBAs - QLogic and Emulex - open-sourced their drivers,
and they are now a part of the standard kernel (qlaXXXX.o for QLogic,
lpfcXXXX.o for Emulex)...
- ...althought you still often need to take the latest drivers from the
vendor's site, to overcome various problems.
Linux And SCSI Hotplug
- Until recently, Linux's SCSI disk layer discovered new devices when the
HBA driver was loaded into memory.
- For normal usage that means only during system boot.
- In a SAN Environment, disks are added without shutting servers down. Thus,
we want to discover new SCSI disks without a reboot.
- In kernel 2.6.12, the SCSI disks layer was given the ability to discover
new disks in a plug&play manner.
Linux Hotplug And Fiber-Channel
- The Fiber-Channel protocol allows an HBA to be notified when a new LUN
is seen on the SAN.
- The Fiber-Channel HBA reports LUN events to the SCSI mid-layer as soon as
it sees them - hence, plug&play.
- It also notifies the SCSI mid-layer when LUNs are deleted.
- On drawback - if a RAID gets disconnected from the Fiber-Channel network,
even for a few seconds, a similar event propagates and the device is
immediately removed from the system.
- As a result - such events will cause I/O errors on the device.
Two Words About LVM
- Logical Volume Manager (LVM) is a tool to decouple file-systems (and
raw devices) from the physical disks.
- with LVM, we create a volume group (vg), from one or more disks.
- On top of a vg, we can create logical volumes (lv).
- File-systems get created on top of the logical volumes.
- To add new capacity, we simply add a new device to the vg, and then
either create new logical volumes, or expand existing ones.
- Since the volumes are now not tied to physical devices, we can add
features such as snapshots, mirroring, etc.
Two Words About LVM And SANs
- LVM also solves the problem of changing device discovery order in a SAN -
each device that belongs to a vg has a signature written onto it.
Thus, when the system boots, it discovers the vg-s using these signatures,
no matter the order of booting the devices.
- The names of the device files associated with the logical volumes remain
fixed, making it easy to shuffle around RAIDs in the SAN.
References
- IBMs SAN redbook
http://www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/sg245470.html?Open
- The Linux SCSI generic driver -
http://sg.torque.net/sg/
- LVM HowTo -
http://www.tldp.org/HOWTO/LVM-HOWTO/index.html
- Linux Hotplugging -
http://linux-hotplug.sourceforge.net/
Originally written by
guy keren