The Problems of the Bazaar
Thoughts on the Growing Pains of Open Source Software
by Perette Barella
Contents
- 1. Introduction
- 2. The Evolution of the Open Source Revolution
- 3. Increasing Complexity
- 4. The Cult in the Bazaar
- 5. Successes
- 6. Solutions
- 7. Conclusion
- 8. Acknowledgements
In his defining work, The Cathedral and the Bazaar, Eric Raymond established the case for Open Source Software (OSS). The Open source movement has given us many great and diverse things, including operating systems such as Linux, applications like GIMP, development tools like Eclipse, and libraries such as Hibernate.
Raymond compared the haphazard collaboration and development from across the Internet to a bazaar, a virtual space where the terrain was filled with thousands of hackers contributing their respective knowledge for the overall good. The alternative was the cathedral: vendors hawking their wares, developed with traditional business software development techniques with a centralized power structure that directed projects in terms of business needs and potential profitability.
Raymond woke up the world to the possibility that the Open Source movement could actually work– in fact, it could create outstanding things. However, too much of a good thing has its own problems. To see where things have gone off course, we need to go back to the beginning: Adam Smith's Wealth of Nations.
1.1. Division of Labor
Smith proposed that dividing labor allowed more to get done. On their own, a person couldn't build a nice wooden-timber house because they'd first need to invent the saw to cut the wood and nails to assemble it. Both these required metalworking skills, which required tools to do metal working and a forge.
When the farmer and the blacksmith cooperate, the farmer trades milk and meat for nails and a saw, thus avoiding the need to assemble his own tools and forge. The blacksmith becomes skilled at metalworking, and thus learns the tricks to mass produce nails more efficiently. He, in turn, doesn't need to be concerned with the intricacies of husbandry.
As things grow, labor divides into increasingly granular tasks. In a sufficiently large city, blacksmiths might specialize in particular wares, honing their skills in creation of particular goods with an effect of increasing productivity and quality.
We see this division of labor within the computing industry as well. It was not too long ago, in the ages of PDP-8s, TRS-80s and Commodore-64s, that some of us knew our systems thoroughly, hardware and software alike: every chip, every machine code or assembly mnemonic, every I/O register, every pin-out, every ROM subroutine available (documented and undocumented alike).
As PCs became popular, PC users all knew the basic DOS commands. We knew our non-WYSIWYG word processors cold, because they weren't much more than glorified typewriter emulators. On the big iron, administrators knew every configuration file on their UNIX boxes by heart, in part because there were a lot fewer of them and what there was had significantly fewer options. Coders knew every standard library call available in their preferred language by heart, because there weren't that many to learn; they could likely read a couple of other languages if not code them equally as well.
The division of labor in computing probably started along administrative versus development lines. There were those who kept the machines up, and those who wrote code. Databases were sufficiently complex to get their own set of administrators, separate from the overall system. Still, though, division was limited: a coder knew some system configuration, a system admin could code in a pinch, and a DBA could install an operating system.
Then came the triple whammy: Linus releasing the prototype Linux kernel, Raymond releasing The Cathedral and the Bazaar, and the explosive growth of the Internet. There was growth, and there was variety, and it seemed like it should be good.
Along the way, though, the division of labor went terribly wrong. Instead of dividing things up into categories of work, we created numerous new languages and technologies that have made it difficult to move between even related technologies. Meanwhile, as systems have become increasingly complex, it is increasingly often left up to the software engineer to acquire, compile, and install libraries which are necessary to build ever more complex software.
2. The Evolution of the Open Source Revolution
Raymond postulated that Open Source would be a very natural evolution, with multiple projects and developers offering their solutions, the best ideas making the cut and surviving. This expectation turns out, much of the time, to be false.In some cases, OSS promotes revolution. Other times, weak projects— which should die off— establish user bases and thus secure their own existence. Such projects use up resources that could otherwise be devoted to moving software forward, but are instead allocated to redundant, competing projects. Furthermore, in some areas where OSS has been overly successful, it has created an overabundance of solutions which has its own problems: as overall complexity increases, the diversity of solutions tends to pigeonhole individuals to a subset of technologies or languages.
2.1. Open Source Requires Evolution, but Sparks Revolution
Without a relations department fearful of complaints, OSS is constantly changing existing things. Cathedrals certainly restrict rate of change, mainly with the goal of maintaining compatibility. Things do move forward, but at a much slower pace than that given us by OSS. Out of fear of upsetting their customer base, the cathedral's primary method of change is carefully-planned evolution.1
Not so with OSS, where features, file formats, and specifications change at the development team's whim. Revolutionary changes occur: in Java, unit test suite JUnit 4 wasn't compatible with prior versions; Ruby on Rails 2 broke the basic scaffold; Linux editions have replaced printing subsystems between releases, rendering old configuration files useless, to name a few examples.2
These are obvious challenges, with obvious results of slowing mainstream OSS acceptance. The real problems are much more subtle and hit closer to home.
2.2. Overlapping project fragmentation
Given enough opinions, all ideas seem
good.
(compare to Raymond's Given enough eyeballs, all bugs are
shallow.
)
First, there is the question of what is a good idea. With the cathedral, folks sat down and considered alternate ways of implementing technologies looking to long-term needs. It took time to write code, and profit was a goal, so nobody wanted to develop something unless they were certain it was necessary and likely to be used for a good, long time.
The other side of the coin was academia, where computer science departments provided us with numerous tools and languages in the years before OSS. While these may have been interesting experiments, their technical limitations, incomplete or unusual function and lack of user bases meant that only the superb ones established themselves and stayed around as viable projects. Few of these experiments made it anywhere beyond colleges and universities, many not even off their home campuses, eventually disappearing as better solutions took hold.
These days, OSS and sites such as SourceForge and FreshMeat provide a way for people to get ideas out to others quickly. With thousands, maybe millions of people involved, there is invariably someone who will find a project interesting or think it's just what they need to solve a problem. This causes weak projects to crop up and divide resources, stalling progress as the development community is divided.
Why the division?
- Instant availability
Once a project is underway, those that see its potential utility to their needs want it to achieve usefulness. They are not concerned with whether it is the best solution or the optimal reuse of existing technology; they just want it to solve a problem. Waiting for a similar but optimal project is not considered, as there is no guarantee one will exist or, if it does, how long it will take to develop.
- Status
One reward of OSS is to be a respected, high-ranking member of the development team. However, not everyone can be at the top if the development team is large. In the quest for status, projects are forked or new projects created to generate more leadership roles, resulting in labor pool division and creating competition among similar projects, reducing individual team productivity.
- Progressive differentiation
Without the cathedral's paychecks as incentive for people to get along, those who disagree with the direction of a project can simply fork the code base or start their own from scratch. This is more likely to occur on a small project that does not have a critical mass of support, but can happen on large projects: for example, the GCC/EGCS split circa 1997. And, as the EGCS example demonstrates, splitting the code base is not inherently negative: in 1999, the groups reunited as GCC using the EGCS code base. In fact, the death of projects should be an important part of open source development, but occurs all to rarely; this leads me to the next point.
2.3. Evolution Requires Vetting
In nature, weak species die off and become extinct, but in the software world the vetting process happens less reliably.
Cheap disk space means SourceForge and FreshMeat retain old projects. Stub projects that never got very far before being abandoned remain on file, their disk utilization so small it is not worth trashing them for risk something useful get lost. Nevertheless, they remain present in searches, cluttering up the information space, the bad starts helping to hide the good potentials.
Next, there is the problem of legacy. We'd all like the old COBOL code from the 1960s to go away, but it doesn't and so we continue to be stuck with COBOL. Even if better ways of performing tasks have been invented, those who are using the old technology, and for whom it meets a need, want it to remain around. Their familiarity with the technology and their existing investment in it means that, should they have need of improvements, they will search for a way to adapt or upgrade their existing technology which is already in place and working.
The problem is not limited to legacy languages either. There's
been a flurry of new languages developed over the last
decade— which is not a problem if we think of Fred Brooks's
comment in The Mythical Man-Month that we should
Plan to throw one away; you will, anyhow.
3 The problem happens
when user bases are established so quickly that nothing is ever
thrown away.
2.4. Too many choices
The development of new languages provides a clear example of the problem. Since OSS and the Web became popular, several new open source languages have become available: Perl, PHP, Python, Java, ActionScript, JavaScript, Ruby, Groovy. These added to our existing languages, C, Fortran and Pascal (deprecated). In addition, there are new derivatives such as Visual BASIC, C++, C#; as well as commercially developed languages such as Flash and Shockwave. There are advocates of each language, and each has its advantages and disadvantages. It's hard to say that any one of them, in and of itself, is bad.
But ask yourself: Are these languages really so different from each other? C# and Java have similar purposes. There are tools to automatically convert between Visual BASIC and Visual C++. Perl, Python, and Ruby, while syntactically different, fill the same niches. And JavaScript could certainly been done as BrowserPerl or BrowserPython, eliminating the need for yet another language.
Yes, each language has certain special uses. Some of the newer languages are cleaner, or some people prefer the syntax. The libraries are more extensive or cleaner for certain applications. Still, was there something preventing us from evolving existing languages to better adapt to new technologies? And, if the new languages are so great, shouldn't we deprecate the old ones?
The lack of a vetting process and the problem of legacy ensure that once a language hits a certain presence that we're stuck with it. The more complex our projects are, the more the complex our tools must be– thus, all languages and their libraries must continue to evolve and become more capable. Therefore, advocates of every language must be involved in the effort to evolve their respective favorites, implying that every language we invent further divides the available development effort. A certain level of fragmenting is tolerable, but increasing fragmentation prevents or stalls OSS productivity.
2.5. The problem is not academic
On a project involving the Asterisk PBX, I searched for a library to interface with the AMI client/server protocol. Because of the variety of languages, there were were several available, including clients for C, Python, Ruby, and Java.
With fewer languages, there might have been fewer choices. Yet, with fewer choices, one of them might have been acceptable (sufficiently developed) for my needs. Instead, I had choices of a C library that didn't fully implement the requests I needed, a Python version that required learning Twisted (more about inbreeding of projects later), a Ruby one that wasn't done, and a Java one that was undergoing upgrade to the latest Asterisk version but wasn't ready yet.
The projects requirements suggested Perl, but were not firm. After looking through my options, I did choose Perl and I ended up coding my own library, from scratch, because none of the existing four met my needs. Which means I introduced a fifth.4
2.6. N -> N-square
This problem is compounded when we look outside of languages: there is a similar problem with a growing number of databases. Considered together, the continual development of new database systems and languages means we have an N-squared problem, because every language needs a database interface written for each database. ODBC could help this (reduce it to a 2N problem), but due to inefficiencies in using a generic interface, as well as nuances in individual databases of which proponents like to take advantage, the proliferation of individual language/database interfaces will likely continue.
There's also the problem of the growing complexity of the libraries. Again, going back to the C, Fortran and Pascal days, a book could cover the entire language and standard library in both teaching and reference format. The libraries were different, but in their simplicity it was easy to learn both in full. As the complexity of our ventures has increased, we've built larger libraries to support us– and each language has slight variations of the basic library. There are multiple libraries for given languages, each tuned for specific tasks, with many sufficiently complex in their own right to deserve books entirely of their own: Hibernate5 , iBATIS (a competing ORM), Java foundation classes (AWT and SWING), JUnit, xUnit, Rails, Grails (Groovy's version of Rails), EJB, Faces, Lucene, jQuery, iText, and the frameworks Struts and Flex, to name a few.
As each library has different ways of doing things, or at least different names for the same things, it becomes increasingly difficult to adapt to a new language. The string classes in Java and C#, for example, are subtly different despite overall similar functionality. These difficulties encumber developers, isolating them to their languages— the beginning of technological isolation, a result of nichification. This promotes redundant development efforts in different languages either because of unawareness of similar competing projects in other languages, or individual inability to help with a competing project due to technological isolation (and thus starting another one). This leads back to overlapping project fragmentation.
2.7. Reinventing the Wheel
Some open source projects are reinventing things. Ant, for example, is a replacement for the old UNIX make. Part of the reasoning seems legitimate: make has different variations (omake, nmake, gmake, mk) with subtle syntax differences, with a simple file format. Replacing it with an all new, modernized system that uses a standard file format that allows better data expression (XML) is a good solution. The downside to the new format is that it loses uniformity with existing make.
The other reason for Ant is to have a build tool in Java. This has utility, considering it can then be integrated more easily into an IDE such as Eclipse, since Eclipse is written in Java. If build tools are going to be necessary for each different language, though, this begins to look problematic. (See too many choices.)
The cons team, citing make problems and limitations too, decided to build on Perl, and thus their files use another, incompatible Perl-like syntax.
Python coders seem to hate ugliness, especially Perl, and love their whitespace-sensitive indent-level based scheme (no need for those ugly curly-brackets!). So while the available history suggests that the Python-offshoot scons was a move toward improving software build tools, suspicion is there was religious motivation in the early days of the offshoot.
The folks on the Ruby development team also thought a new build tool was necessary, so they developed rake, presumably short for "ruby make". It standardizes makefiles for use with Ruby, but introduces the fifth type of build system (and corresponding file format). It solves a problem for the Ruby community at the expense of polluting the language space with yet another variation.
The Ruby philosophy seems to be particularly dangerous: if all tools are written in Ruby, then portability will cease to be a problem. Ergo, reinvent everything in Ruby. Spread to all the other languages, this creates a big problem: numerous tools and packages need to be developed over, and over, and over.6
2.8. Subjecting ourselves to the problem
Another example of reinventing is documentation systems. There was a time when one could reliably find everything in the manual pages. These days, however, documentation is split in many places on a UNIX system: man pages, Gnu's info tool, python's pydoc, Ruby's ri (ruby info), Perl's perldoc, and Java's javadoc, to name a few.perldoc, to its credit, can convert documentation to nroff's "an" macro format for inclusion with the manual pages. The others, however, create disparate systems each with separate tools for accessing contents. The lack of a cohesive documentation system7 aggravates transition between languages, and is particular insipid in that these are the repositories of information that could help people trying to cross between languages, libraries, or other technologies. Thus, a person trying to take on a new project may now be challenged not only by a new language with new syntax, new libraries in which common functions are performed subtly differently, but the way of learning about that language and those libraries may now be very different.
This, I suspect, is the "straw the breaks the camel's back". While any documentation system is not incredibly complex to use, the lack of integration and information sharing, and subtle encumbrance encourages a functional fixation on a particular language or technology, isolating developers to that language or technology.
3. Increasing Complexity
As time has passed, software has become increasingly complex. Word processors are more than glorified typewriter emulators, spreadsheets are 3 dimensional (with multiple sheets), video players support increasing numbers of codecs, web browsers support JavaScript, Java, and plug-ins to display PDFs and run Flash. The increase in complexity is normal, understandable, unavoidable.
Open source needs to respond to this additional complexity, and in some ways it has— for example, most Linux distributions have straightforward installation CDs, and many projects have easy-to-use installers for end users. In other ways, however, open source has introduced new complexity— such as the aforementioned glut of new languages databases, and libraries. If we step back and look at the problem, we will see a division: using open source software has become easier, whereas building open source software has become harder.
Raymond argues that open source benefits from the sheer number
of people available. Although they vary by skill level, being
involved helps them gain skill: Properly cultivated, [users] can
become co-developers
8 . Microsoft agrees in The Halloween
Documents: a modestly skilled UNIX programmer can grow
into doing great things with Linux and many OSS
products
9 .
If we want to keep open source alive and accessible to the
masses, complexity must be manageable for the tinkering neophyte.
There must be enough promise to hold their interest, enough
incentive to get them to look deeper. Quoting Microsoft again:
I'm a poorly skilled UNIX programmer but it was immediately
obvious to me how to incrementally extend the DHCP client code (the
feeling was exhilarating and addictive)
.10
There are three important tools in the toolbox to overcome complexity: Abstraction/encapsulation, uniformity, and modularity. Abstraction and uniformity provide to humans the advantages that object oriented languages provide to code. Together they allow us to readily use many things we don't understand throughout our daily life. Encapsulation is the act of making something abstract, hiding the dirty details inside an easier-to-understand, candy-coated shell. Modularization is an important method we use to comprehend complex systems, by reducing them (hierarchically if necessary) into smaller, discrete units.
We use these all the time to solve problems— and in fact, they've already been used to make significant headway against the problem I'm posing. Unfortunately, the existing solution is not readily accessible to all, but focused more on the elite, skilled developers available. If we want to redefine the open source rules where our projects are written only by the best, then this is acceptable; but if we want to keep fresh blood coming into the community, gaining skills and joining our ranks, then a broad solution is necessary.
3.1. Compiling and Interdependency issues
Most notably, building things from scratch is difficult. GIMP, for example, currently depends on 3 other packages (GTK, libart, and an XML parser). Two others are recommended, and there are a slew of "optional" libraries that will leave a crippled GIMP if left out. All these packages need to be built and installed prior to GIMP. Installing the binary is straightforward if an installer is available— but if not, or you're installing the code in hopes of tinkering, you're in for more of a challenge.The build tool Ant is another example. It may work on its own, but to get certain features it needs additional components: a total of 29 optional components are available. If you're using a library such as Hibernate, even if you do not need any of Ant's optional packages yourself, the library will. But you're not done yet: Once you've downloaded and installed the software, you will need to configure it. In the case of Ant, this means setting Java's elusive CLASSPATH, but Ant's documentation in this regard is confusing an contradictory.11 If using Eclipse, the CLASSPATH must be set via the IDE in several different places: application build, debug build, and again for Ant. You'll also need to keep the CLASSPATH straight for Hibernate, JUnit, or any other libraries being used and those they depend on as well.
All of this is closer to system administration than software development— but it's not really system administration either. What is needed is another division of labor, with a new person or two responsible for managing the build process and configuring interdependent software.12 Either that, or the process needs to be simplified— abstracted, encapsulated— because for the solo geek, it is increasingly too much. It's creating a barrier to entry for both personal and open source development.
3.2. On uniformity
Uniformity prevents people being overwhelmed by diverse things. Considering the complexity of Linux, for example, the modularity provided by the many packages provides a way to break learning or tinkering into manageable chunks. Uniformity eliminates the need to relearn similar procedures which would otherwise be arbitrarily different.I'm not the only one that thinks uniformity is important— again, from the Halloween Documents13 :
Prominent in this comment is GNU autoconf, though not mentioned explicitly. In 1991, Luc Van Eycken recognized that there was getting to be a lot a manual labor to configuring GNU software for a machine before building. He started building automated configuration scripts, which later turned into the GNU AutoConf package. Gone were the days of downloading a package, adjusting the dozen or two settings for whatever system you happened to be on, then compiling. Now, you downloaded, typedA key attribute ... is the common UNIX/gnu/make skill-set that OSS taps into and reinforces. I think the whole process wouldn't work if the barrier to entry were much higher than it is ... Put another way -- it's not too hard for a developer in the OSS space to scratch their itch, because things build very similarly to one another, debug similarly, etc.
./configure, and ran make.
Since most packages were self-contained, this brought compiling a
prewritten package from a low but definite level of arcane
wizardry, to something a script kiddie could do.
The barrier to entry to playing with open source went down. That meant more people tinkering with it, more people being smitten with interest, more people learning from it, more people contributing to it.
As more packages are introduced from various foundries, the cohesiveness of the GNU build process is being lost. Some packages are straightforward, others not: once dependencies are resolved, some might require invoking Ant, a shell script, or a Java compiler. While GNU packages might remain straightforward, one still has to know when this is appropriate and when it is not. Building prewritten software is returning to a level of arcane wizardry, because it is not a rote procedure. While modularity is retained, uniformity disappearing. Even when uniformity is maintained, the package interdependencies are turning package installation into a bigger and bigger headache.
The barrier to entry— even the barrier to stay involved— is going up.
3.3. Packaging code
While software is becoming more complex, packages interdependencies are rising, variation is increasing in build processes, and the technology we're working on changing faster, our efforts are simultaneously being spread thinner and thinner, forcing us to take ever more time and effort to install and update the technology we're using.On the whole, our present packaging systems do not help much on this front. The most recent code is at SourceForge, FreshMeat, individual web sites. Staying up-to-date requires downloading lots of tarballs, possibly compiling, then installing them. Knowing whether or not there are dependency issues requires study of README files or other documentation. Manual configuration may be necessary, especially for packages that are not expected to be used by end-users since those using a development-oriented package are expected to have the knowledge to configure it manually.
Like documentation availability, these issues are tangential to software development. The lone software developer is increasingly expected to devote thought to resolving issues that are not development, taking focus away from design and code. His labor is being divided in ways that destroys productivity, creates frustration, and depletes interest and spirit.14
Companies can hire personnel to offload this work from programmers, but individuals can not. Unless something is done, the increasing complexity and resulting difficulty will encumber and demoralize increasing numbers of open source developers, who will pursue other ventures. Those who might be intrigued by open source software after an encounter will instead be intimidated, never becoming involved in more depth or making contributions.
To continue to have high levels of involvement, the open source community need to resolve these issues. While open source may not die entirely, neglecting to address the problem of growing complexity will allow the barrier to entry to continue to rise, reducing individuals' involvement and especially a reduction of fresh contributors to the ranks. The loss will stifle development, especially from the individual community. Projects will increasingly become efforts of paid developers at companies, working collaboratively to develop features needed by their respective companies.
3.4. The Gentoo Solution
Portage is Gentoo's "software management tool"— a package manager that works from source code, operated from the command line via the program emerge. It fulfills a number of the requirements that I've just set out: it builds from source available at the respective project home pages, resolves dependencies, and can manage updates. While Portage might be a good starting point, it does not at present address all of my concerns:
- Gentoo Linux is a prerequisite.
Gentoo compiles the world to install itself, a time consuming process and not something your average new user will be up to. Curious individuals interested in toying will not consider reinstalling an operating system just to explore. A cross-platform system is needed that is not tied to a particular operating system or distribution.
- Gentoo Linux operates on the installed operating system
This is a risky thing— friends have expressed frustration after failed emerges leave most of their system in an unusable state that requires difficult manual clean-up. Especially in a situation where the user may want the latest, inadequately-tested packages for development purposes, it is not ideal to be installing the changes into the system's binaries directories A better solution would be one that creates a development environment in user accounts, where problems (be they a result of the package manager or the user's tinkering) are safely confined.
Portage's SLOTs are used where there are incompatibilities or interfaces changes. For example, allowing two versions of a standard library where a function prototype has changed, or a version 1.X and version 2.X of a language interpreter when 2.x is not fully backward compatible. SLOTs are not generally used for minor incremental changes; it is unclear to me whether extensive use in this manner would be viable.
- There needs to be better development environment integration
emerge downloads, compiles, and installs the software within its confines. A user wanting to use an IDE to enhance the software must still configure the IDE, import the project, etc. This process needs to be streamlined.
4. The Cult in the Bazaar
If OSS is to grow, those involved need to be realistic about the users, their skills, and their own motivations. Consider some of the questionable ideas that pervade the Open Source movement:
4.1. Open Source must defeat Software Vendors
The open source cult believes that the bazaar shall defeat the cathedral, utterly dismantling it. Many in the open source world see software vendors as the bad guys, the bourgeois who are getting rich off technology that they sell to the masses at unfair prices, extracting money from the masses in a tax-like manner with near-mandatory upgrades.
The goal should not be to sink the mainstream software vendors,
but to provide a useful creation. If software vendors fail as a
result, so be it; but the us-versus-them mentality does not achieve
anything productive. It is little more than a modern-day equivalent
to My TRS-80 is better than your C-64.
Many software vendors have, in fact, been very friendly to the open source world. Sun collaborated on numerous projects, notably ZFS; Apple gave us launchd; IBM did initial development on what later became Eclipse.
4.2. It's easy, really! Anyone can build it themselves.
Early in OSS phenomena, the sleek and trim nature of the code was purported to be a boon. Corporate bloatware like Microsoft... well, Microsoft anything... was pointed to as bad software, more than anyone needed, slow, over-complex; it required people to continually upgrade their machines to accommodate new software. Over the last decade, however, OSS projects projects like Eclipse and Open Office have suffered similar effects, with the claimed benefits moving from efficiency to rich feature sets, compatibility, etc.With efficiency lacking, the new explanation is that open source is configurable. If you don't need it, leave it out. There is a belief that compiling your own software is something your average person does, or is at least capable of. Along with selecting software packages, and configuring daemons.
This thinking is erroneous. The average user who buys a Mac or Windows PC wants a computer that lets them get stuff done easily: send e-mail, surf the web, word process, maybe do some image or video editing. Users do not want to have to search the Internet for driver updates, download them, and fight with compilers and version interdependencies to make their computer work. They want to turn the computer on, have it boot, then let them do work. The extra money they pay for the operating system is well worth it in their eyes.15
4.3. The Optimization Illusion
Both OS X and Windows take up gigabytes of hard disk space, just as Linux does without careful package selection. With the price of disk space under $1/ gigabyte, leaving out packages just to save a few megabytes of disk space is senseless. Similarly, rebuilding the kernel to shave out a few unused features is pointless too. It may be a necessity when retrofitting Linux onto a hand-held device with a limited-size flash drive, but that's not something your average user will ever do, or at this point, should ever do.4.4. An easy Linux distribution is a viable choice by the masses
One might think that although we have more difficult Linux distributions such as Gentoo (where everything is compiled from scratch), that since we also have easy-to-use, pre-built systems like Ubuntoo and therefore Linux is available to the masses. This is not accurate.With increasing complexity under the hood comes the need to better hide complexity from the user; we understand this. For example, despite a mobile phone being terribly more complicated than an old click-and-bang land-line, the user interface to make a telephone call is still straightforward: type in the number then press talk, as opposed to pick up the receiver and dial the number. Nevertheless, even this minor change confounds some of the older generation who are intimidated by technology.
With people intimidated by mobile phones and unable to set their VCR clock, we can not expect that providing double-clicking icons and only asking a few simple questions will solve their problems. Before users even get this far, they need to choose a Linux distribution, but there is not one clear thing to choose from. People buy Vista because there's just Vista. (Yes, there are multiple editions, but in the store there is only one, maybe two editions to choose from). Regardless of which edition they choose, it is going to run their software, and users understand that.
With Linux, users are first forced to choose a distribution. Although Ubuntu is on the right track, even being confronted with the choice is more than many users are capable or interested in dealing with. Like the glut of languages, the glut of Linux editions overwhelms potential users: Ubuntu, Gentoo, Debian, Red Hat, Slackware, gNewSense, to name some big ones; according to Wikipedia, there are over 300 distributions, with most in active maintenance.16 It's no wonder many people retreat to their familiar Windows environment.
4.5. Linux experts are always available to help neophytes
Those searching for answers must find information they can comprehend. If they cannot, they will become frustrated and disenchanted with software.
Those maintaining open source software primarily work on the
parts they are interested in, and therefore do the coding. They
have no interest in writing end-user documentation, instead
communicating informally through assorted forums to those of a
similar technical skill. All too often, they don't even bother
commenting their code (just read the code
).
Another group, less technically inclined than the coders, writes the documentation.
As a result of this "knowledge classism", communication is being separated into a class-like strata, with each type of user able to understand their own level, and comprehend one or maybe two levels up. Unfortunately, not having adequate knowledge to converse knowledgeably preempts discussing technical issues with someone too many levels up.
5. Successes
Open Source has created some wonderful things, and I want to acknowledge that. At the same time, however, I want to acknowledge places where cathedrals have had influence.
- Firefox & KHTML
Firefox developed as an offshoot of Mozilla, itself developed from Netscape. KHTML, the HTML rendering engine from Konquerer, a web browser for the KDE desktop environment. (KHTML is used by Apple's Safari.) Both are quite specification-compliant browsers; it's wonderful to be able to write a web page and have it render the way one expects. While both of these have been community-based open source development efforts for some time, we need to respect the presence of the cathedral: The W3C, driven by industry needs for standardization, writing the HTML, CSS, and DOM specifications that define how these programs behave.
Were it not for the W3C, we would likely continue to have haphazard browser development, and web developers would be pulling their hair out with the monthly browser releases rather than only when Microsoft announces a new version of IE.
- GCC
The Gnu C Compiler is another fantastic example of what open source can create. Respected as world-class software, it is used by many a company for their software development needs. It produces some of the best optimized code available, with cross-compilation to numerous platforms. Nevertheless, it should be recognized that the developers were not inventing or extending the language– instead, they were working to a language specification that came, again, from a cathedral: ANSI in the 1980s, then ISO starting in 1990 up to the present.
- Linux
While Linux was developed by the bazaar, its specification was not, at least not entirely. Linux's goal was to be like UNIX, the specification thus being defined by POSIX or, in lieu of POSIX detail, simply emulating behavior often originally chosen by Bell Labs— a cathedral.
- Groovy & Grails
Groovy and Grails address a number of problems that I've described herein. Despite being yet another language, Groovy builds on top of Java— evolving the language, rather than trying to replace it, and thus leveraging (and avoiding having to rewrite) all the existing Java infrastructure and libraries out there.
Grails, a web framework, integrates Groovy language and several other open source packages into a cohesive package. Utilizing Hibernate for persistence, JUnit for testing, and SiteMesh, Spring, and a few other packages for their respective purposes. Grails differentiates itself from so many interdependent open source nightmares: after installing one package and setting an environment variable, everything else works. The install includes all the integrated components, and somewhere in there it handles all the Java CLASSPATH magic.
6. Solutions
Some areas seem to have reasonable solutions, but others seem unresolvable— how to you convince someone, devoted to technology they're happy with, to move to something new so we can reduce the language glut? Nevertheless, I'm going to throw some ideas out there, and we can see if there are enough eyes such that even these problems are shallow.
- Create a set of truly standard libraries.
While there is certainly evolution that should happen at the bleeding edge of technology, the stuff we have mastered shouldn't change at light speed. Create a standard set of objects, named consistently, whose abilities provide the best available ideas from the myriad of variants right now. Make that standard set of objects available in all object-oriented languages.
Challenging this goal will be finding a way to resolve preferential issues like under_score_naming versus CamelCaseNaming. A technical solution, such as a linker capable of recognizing the naming schemes and translating as necessary, is probably necessary. Such a linker might also overcome mismatched calling convention schemes.
- Create a cohesive documentation system
Part of the reason for redundant efforts is unawareness of others' efforts. Having a common, integrated documentation system will create awareness of what else is available, encouraging reuse. It will also ease the process of transitioning between technologies, because at least the documentation will have a familiarity about it.
- Place a moratorium on new languages.
While impossible to formally prevent invention of new languages in the bazaar, when we recognize the problems that they create, we may instead look to alternatives. Clever new technologies should consider evolving and extending existing languages instead of dashing in to creating their own (and then having to reinvent the standard library, network library, database interfaces, ORM layer...).
- Create a vetting process
In the natural world, evolution kills off species that are not viable. When there are too many competing projects with similar goals, some process is needed to decide what lives and what dies. This is especially critical for niche areas, where there are not enough developers to support numerous similar projects. Paradoxically, this is often an area where redundant projects exist because, as none of them have attained critical mass, people with visions of glory continue to create new ones. Unifying people onto a smaller set of projects will provide better chances that some will achieve success.
- Choose to support existing projects rather than create new ones
Those looking for a piece of software, on finding that there is not one adequate for their needs, should look into existing projects that might be adapted. Whereas creating a new project further divides developers, contributing features to an existing project draws more users and developers and moves that project closer to achieving critical mass in its domain.
- Create a comprehensive source code management system
An ideal system would work cross-platform, directly from the respective packages' web sites so there wouldn't be a delay in propagation of versions. It would have to be capable of resolving package and version dependencies, and be able to configure interdependent package settings, including compiler and/or IDE settings, on the user's behalf. Such a system would bring the growing complexity back within easy reach of the casual developer, and restore a starting point for someone inexperienced but interested.
If such a system could encapsulate all the details of differing build systems, installation details, etc., it could be a real boon. One difficulty is that if target package involvement is required, that there needs to be exactly one system. The likely possibility is that, should one project start down these lines, other projects will follow and soon we'll end up with a number of different, incompatible systems— some packages supporting one, some the others, some trying to support several.
7. Conclusion
Open Source Software is not the beautiful green pasture we all believed it would be. While there have been many very successful open source projects, there are certain key areas where open source has failed to control complexity. Rising complexity is increasing the barrier to entry for newbies to get involved with open source software, and for those who have been involved, it is making things harder and taking the fun out of it. Paid engineers may experience dissatisfaction as the learning curve to stay current gets ever steeper, leading some to pursue other careers.
When we consider that open source developers are not paid for their work, we must look to other rewards: respect, the joy of creation, a sense of personal achievement.
Increasing complexity of open source, in lieu of some increase in rewards, will result in a decline in development participation. Duplication of efforts and communities' efforts to standardize their world by creating their own versions of generic tools will divide the pool of developers, restricting people to a particular niche of project or language.
Since evolution operates on the survival of the individual, and open source has parallels to evolution and lacks a centralized control structure necessary to induce consolidation of redundant efforts, open source software is in danger of its own success. The larger scale, longer-term problems that exist are in contradiction with the short-term, immediate needs of those who are using the software or have a goal right now.
Linux, the prominent open source operating system, was once built mostly on C with some shell scripts and a sprinkling of Perl; much of the peripheral software was GNU which had a cohesiveness about its packages. As open source has thrived and been imbued with numerous diverse languages, tools, and build systems, etc., complexity has increased due to introduction of variation into what was once, in its simplicity, somewhat unified. While Linux has always evolved quickly, its de facto unification once provided a frame of reference from which users could operate, a baseline as they adapted to change. As that cohesiveness has dissolved, the diversity and rate of change is increasingly challenging to keep pace with, and becoming a threat to the very community that created it.