Our story begins with what are usually known as famous last words - "...and
I want it done quick, good and cheap!"
This is a story about a small startup-chik (very small) that developed
a proprietary product, with very little resources.
In the proprietary world, it is considered possible to get only 2 of
"quick", "good" and "cheap".
FOSS changes the equation (in my opinion) - lets see how it worked for
us...
...and yet, how it also have hurt us, in some ways...
The Product
In a startup, you first need to have a well-defined product description.
In our company, the buzz-wordish definition was "to support virtualizing
services from a data-center to the branch-offices of an organization"
Which mostly amounts to exposing the services via a proxy, that will
allow us to support bandwidth reduction (compression), security (which
offices get which service, and to which workstations) and centralized
management (do it via one station, for the entire organization).
Infrastructural Needs
We needed a rather large infrastructure (communications
software, Tweak-able DNS servers, 3rd-party compression libraries,
configuration software, process-control software, Testing software,
etc).
Developing all this ourselves meant taking a lot of time.
Using proprietary products means "pay a lot of money". It was already
decided to do the GUI via Microsoft-based tools, and paying for an MSDN
license gave us a small example of what it could cost (and we're not
talking about royalties yet).
Which eventually allowed me to introduce the concept of using available
FOSS software (which shall be listed shortly).
The Product's Architecture
In order to understand what was needed, let me describe the
architecture of the product:
The Product's Architecture (Cont.)
An appliance (running Linux) is set in the data-center, with access to
the various services of the organization (web servers, mail servers,
DB servers...)
In each branch office, a similar appliance (also running Linux) is
installed.
Software running on the branch-office appliances exports TCP or UDP
services, which are proxied to the appliance in the data-center,
and from there to the actual servers.
The proxy software performs adaptive data-compression (i.e. it adapts
to the data being proxied for each service).
The Product's Architecture (Cont. 2)
To configure the system (add proxied services, set access permissions),
a GUI based on win2K's MMC is used.
This GUI stores the configuration in an active-directory server,
and then notifies the data-center appliance to take the configuration.
In addition, the GUI can show events, accounting data, etc.
Each appliance runs (a subset of) several processes - a multi-threaded
communications process, a compression adaptation process, a configuration
control process, an accounting process, and events-handling process...
How To Select An Infrastructure
After performing a feasibility study to check the algorithmic part
of the system, and after defining the architecture, came the part of
choosing 3rd-party software tools.
The first thing to do, was write a list of infrastructure software that
will be needed.
This list, of-course, cannot be written if you never developed
such a system, and are not aware of various products that exist
on the market (whether FOSS or proprietary it does not matter).
Since I had experience with implementing similar architectures, I was
given the task of writing the list.
The Grocery List
And so I wrote (with the assistance of the boss):
One communications library, to handle everything to do with bits and
sockets...
One threading library, One multi-process library.
One database server and a matching database client library.
One LDAP-aware library (to access the active directory server).
A configuration-files library (to store configuration data locally,
so the appliance could start even if the active directory server
is unreachable).
A DNS server whose source we could easily tweak (to allow exposing
the proxied services under the original server names).
A process-manager (to manage process launching, etc).
A standard general-purpose compression library (to add on top of our
adaptive compression).
(i might have missed a few minor items...)
The Little ACE That Could
For the communications library there was no doubt - ACE.
ACE (Adaptive Communications Environment) also happens to support
OS-portability APIs, portable multi-threading APIs, portable
multi-processes APIs, etc.
For the higher-level communications that did not need too much control
(i.e. did not run across a WAN) - we wanted CORBA, and so TAO (The ACE
ORB) was the natural choice.
Having (good) prior experience with both of them couldn't hurt either ;)
For WAN communications, CORBA was not tweak-able enough (at least not
without a high learning-curve), and thus we
settled for direct use of ACE for the WAN link (as well as the
communications with workstations in the offices, and servers in the
data-center).
The MySQL Devil
We needed a database mostly to store accounting records - so we did
not need any special features - only that it will be light-weight
and fast.
Since MySQL has these two features - we went for it. It was also
easier to set up then Postgres and came by default with our Linux
distribution.
The database layer was limited to a small part of the software
(the accounting process) - so replacing the database would have been
rather easy, should the need arise.
From the management GUI we accessed the database via ODBC - which meant
the code was database-neutral. Replacing the ODBC driver would suffice
to work with a different database.
When To Develop Infrastructure Software In-House
For some parts of the infrastructure, we did not manage to find
satisfactory 3rd-party libraries.
For example, there was no process-manager software that had the
capabilities I deemed important - so it was decided to develop it
in-house.
Configuration libraries were plenty - but all had some annoying
problems - they relied on too much extra libraries, they supported read
but not write, some were too simplistic (treated everything as strings,
did not support hierarchical data).
Eventually, it was decided to go with XML (not my choice, but I did
not strongly object), and go with a standard XML library (xerces).
On Microsoft windows we went with Microsoft's library, since using
xerces there proved to be a pain.
And When To Mix The Two Approaches?
Since XML parsers are very annoying (DOM, anyone?), we wrote our own
wrapper library on top, which supported a sub-set of XML that made our
lives simpler.
In general - if it takes longer to search for a good software tool,
then it will take to develop and maintain it - I prefer to develop it
and gain the extra control over the source and the features.
Sometimes a hybrid approach is best - take an existing software that
does most of what you want, and add what you need. This is how we handled
the DNS server.
The Development Environment
Software was written in C++ - so it was compiled with g++.
Makefiles were written to compile the project
And CVS was used to manage versions
Doxygen was used for automatic API documentation generation. We hardly
read the documentation (viewing the header files was more comfortable) -
but it forced us to write better documentation.
As a new initiate of the "automatic testing" cult, I set up the system
very early for automatic builds every night, that included running
automatic unit and sub-system tests.
In addition, learning from sourceforge, the CVS server was set to send
diffs after every check-in - and I made a habit of reading them every
day, and using the SCSI cable to flog the criminals...
A Word About Software Licenses
As a conscious user of FOSS, I wanted to make sure that all software
we use - allows us to do so.
Thus, we limited ourselves to libraries with licenses such as
the LGPL (and then we made sure to dynamically-link with them), BSD,
X11 and similar.
It is easy to be tempted to use some GPL library in a manner
that is not allowed - one should make sure to read the licenses and
develop awareness to various implications (e.g. what constitutes
"derived work").
In doing this, my best guide was "prior art" - if I knew that some
software has a license that is considered "understood", or if it is
being used by large companies (which employ good lawyers especially
for these purposes) - I can generally trust them.
More on this - in our epilog...
The Journey
Once all the players are set on the board, Someone shouts "Pawn to K-4"
and the game begins.
During this part, we shall see how programmers coming from the proprietary
world, adapt to the new environment.
We will also see a few examples where Linux literally rescued us from
ourselves, or from 3rd-party products...
(Automatic) system testing will be mentioned as well.
The first step - learning ACE
Since ACE was our abstraction layer, (almost) everyone had to learn
how to use ACE. ACE has a very good book that serves as an intro,
and from there on - we used the examples and test programs of ACE.
CORBA was a little tougher - so we bought what I knew to be the only
book about CORBA that actually teaches CORBA programming. Since we didn't
need it too often, one (used) copy was bought.
Initially, we were asked to "wrap ACE up, in case we want to replace
it one day". This line of thinking comes mostly from the proprietary
world (we might want to use a different product one day), or from
the portability-frightened world (what if we need to port it to
a platform where this product does not exist? hmm???).
ACE being what it is (a useful portability wrapper with a lot of extra
functionality) - it does not lend itself to wrapping (how do you wrap
a wrapper?).
It took a bit to get this extra-wrapping requirement off our backs...
ACE Communications control - The all-mighty Reactor
One of the nicest parts of ACE is its socket-programming wrapper
module - the ACE Reactor, and the ACE Acceptor-Connector.
Using them, it is extremely easy to write a select-based server that
handles as many clients as we want.
This Reactor allows adding clients and server-listeners quickly - all
you need to do is derive from the proper Acceptor (server) or Connector
(client) class, implement a few methods, and you're set.
Thus, we had a Link class to handle the WAN link (where compression
and connections interleaving is performed).
We had a similar class to bring up listener sockets on the branch-office,
and to connect to the data-center servers (on the data-center side).
It took about 2 weeks to have software to proxy one service (with no
management or compression capabilities). It took a while more to make this
software configurable via an XML file.
Writing The Process Manager
In a previous project I worked on, we had a team write an
overly-sophisticated process manager.
I took some of its requirements and said "we need this, only we'll
write it in 1 month with one person, instead of half a year with 3+
programmers". How come? because of lessons learned, and because
we'll only do the really necessary parts.
What a process manager basically does is:
Load a list of processes in a defined order, with configurable
parameters and configurable environment (working directory,
environment variables).
Monitor these processes (both on the OS Level, and on the
application level).
In case a process crashes - it performs a configurable recovery
operation.
It allows shutting the system down in a convenient manner.
Writing The Process Manager (Cont.)
Since the process manager is the parent of everything else (so it could
catch CHLD signals) - it better be very stable - or it'll crash and
bring the entire system down.
Since we had a limited amount of machines, it must be possible to run
two instances of the system on the same machine easily.
Some people will say "use a bunch of shell scripts" - I find it a crappy
solution, which is hard to maintain, and is slow to respond in case of
problems (ever wrote a signal-handler in a shell script?).
Important note: the reason to split the system into several processes
was in order to increase stability - a bug that causes the compression
analyzer to crash shouldn't bring the communications code down.
The Situation With FOSS Library Documentation
Many FOSS libraries suffer form poor documentation - things keep changing,
and none of the programmers has the patience to update the documentation
and add new interfaces.
You often see the following in a reference manual:
connect(Addr address, int timeout, Connector& c) - connect to
a server.
After reading such a thing I say "ah. now _everything_ is clear. NOT!".
(Good) proprietary products tend to have much better documentation. On
the other hand, they tend not to come with the source ;)
The Situation With FOSS Library Documentation (Cont.)
So with FOSS libraries, everything is a little harder (since you need
to read the source in order to understand the nuances of an interface).
On the other hand, with FOSS, when you have a problem, you can almost
always find the answer in the source. With closed-source software - you
are stuck with the support.
Sometimes the support of a proprietary product is really good - and
sometimes they need your source in order to help you - and you spend
hours and days in order to come up with a stand-alone demonstration of
the problem you're having.
Personally, I prefer the poor documentation with the source code,
then the good documentation with having to wait for support coming
from across the sea (i.e. at least one working day to get the request
for extra details, and another working day to get the answer).
The Local Source Oracle
Since the other programmers were not used to working with FOSS tools,
I served as the local oracle.
When one of the programmers ran into a problem with one of our FOSS tools,
I started reading the source-code and supply the answer.
This could take anywhere between 5 minutes to 2 days.
Which means that overall, the support level they got was quite better then
the support they got from companies with good support.
One of them kept whining about this situation - apparently, being
dependent on people you know is harder then being dependent on people you
only talked to via the phone (or via e-mail).
It is the same reason why programmers never trust the software their
own company creates - it's easier to trust a life-support system when you
are not aware of the bugs that QA found and were not fixed...
"Don't Install A DNS Server In My Network!"
In our first test site, the network administrators immediately said "Don't
install a DNS server in my network!"
They were probably burnt by DNS servers in the past - it was
non-negotiable.
If we cannot redirect clients via DNS - we can redirect them via iptables
transparent-proxy rules...
...provided that our appliance is "in the middle" (connected as a router),
or the network administrators can add a transparent proxy rule on their
current router.
A little more coding and GUI changes, and we can select between proxing
via DNS or proxing via iptables.
Testing On A WAN Without A WAN
Writing for a WAN is different then writing for a LAN. The TCP stack is
the same - the communications conditions are not.
NIST Net - a free-software patch to the linux kernel adds capabilities of
WAN simulation (delay packets, reduce bandwidth and add random packet
drops).
Very poor documentation, and some annoying bugs - but for a product that
worthy - we learned how to work around these issues.
Once it is set up and you've learned how to use it - changing the
configuration is very simple.
It required a dedicated host, but a single host can simulate several
WANs, so there was no need for an extra host per developer.
Automatic System Tests
When all you have is 4 developers and 0.5 testers - you need automatic
testing or else you die.
Our initial focus was on web applications. Users work with Internet
explorer - so the tests had to be with Internet explorer, and hence run on
windows.
Internet explorer exposes a COM interface - which can be activated from
Perl.
The tester (which was also the 0.5 programmer) wrote a script to
automatically surf required sites in a pattern we thought would represent
a "real user".
Several weeks of work, and he could run 3-4 explorers on a windows machine
(any more and the machine would crash).
There was something about connecting to a windows telnet server in this
process - but I forget why ;)
Epilogue - When a FOSS-Using Company Is Acquired By A Conglomerate
The company was eventually closed, and then sold to a very large
international software company.
Most of the due-diligence had to do with.... software licenses.
It appears that the lawyers of said company wanted to be absolutely sure
that any little thing used by the software had a license that is
acceptable for a proprietary product.
Apparently, that company is quite afraid of FOSS software, and had a hard
time absorbing the idea.
On the other hand - it might have been an excuse to delay the purchase
until the next fiscal year.
Eventually, since the project was written using such "alien technologies",
the team gained (at least until now) more freedom in its development
process then other teams inside that company.