Title photo
Ode is simple! (Simple means that you know how it works.)

Hello, and welcome to news.ode-is-simple.com.

This is a weblog dedicated to Ode (ode-is-simple.com) and other topics relevant to the project.

If you're looking for general info about Ode you may want to start at the project homepage at ode.simple.com/home

To stay up to date with the newest news and info related to Ode, subscribe to this site's RSS 2.0 using Google Reader or your preferred feed reader.

Posts

Tue, 13 Apr 2010

Early design goals: Ode before Ode was Ode

This is interesting (to me at least) look back at some very early design goals for the project that would become Ode.

All of these were pulled from a document I created early in 2008 (really, wow?). Looking at these goals now that Ode is released, I'd say I did pretty good. Like it or not, it clearly is the app I set out to create. I like it.

Maybe most interesting is what's not on the list. There are quite a few big features and little touches that I aren't here, because I wouldn't think of them for months.

There is still lots of work to be done of course including documentation and finishing up a bunch of addins I have underway. If nothing else, maybe this will convince you that the project isn't going anywhere anytime soon.

I'll warn you that there may be quite a few typos and other problems here. I haven't spent much time cleaning it up, and it wasn't originally intended to be made public. (I'll try to spend some time editing in the next day or so.)

Early design goals

April 8, 2010 (originally written early 2008)

Design goals

  • General design goals
  • Code
  • Posts (content)
  • Client environment
  • Server requirements
  • Content Management
  • Modules
  • Caching
  • Templates
  • URLs and Permalinks
  • Project management
  • Documentation
  • Installation
  • Synchronization and Backup
  • Revision control/versioning

1. General design goals

The application should be lightweight

Resource requirements should be minimal. By taking advantage of services already provided by the OS, other open-source projects, and well-established protocols, we can minimize the overhead involved in running this application.

The prime example of this is the decision to use the filesystem as a content database, which greatly simplifies everything from installation, use, and management to backup and recovery.

Also, client-server interactions, whether content management or administration are essentially file transfer, which can be handled via any number of existing transport and application level protocols. We can even implement robust security in the same way. For example, taking advantage of SSH and SFTP, SSL in the browser or via WebDAV. We can even handle anonymity by taking advantage of existing proxy architectures such as the EFF's Tor http://tor.eff.org/ network of anonymous relays (or other secure anonymous VPN services).

Because it is highly modular, the core of the application need handle only a very limited set of functions. Many important (and not so important) but not universal features including among many others syndication, commenting and spam filtering, variable interpolation for dynamic data in templates and posts, and integration with third party web services can be implemented as single purpose modules. Not only does this simplify development but also it means that any given installation need only support those features required. This flexibility will allow users to minimize the application's footprint while also offering advantages related to security and reliability.

Of course the modular architecture should be implemented intelligently so that we don't incur any undue penalty from implementing these frequently used routines in modules rather than the application itself.

There should be a minimum of external dependencies

Essentially prerequisites should be eliminated to the extent that it is reasonable to do so. For example:

There should be no database backend

The app should use as few general purpose library functions as possible. Of course this should not be taken so far as to eliminate routines from standard libraries for less efficient, secure, and portable 'homegrown' alternatives.

There should be a minimum of internal dependencies

In my experience, projects that encourage group participation and present a modular architecture quickly run into problems with conflicting competition and dependency among components. This situation should be avoided.

The style of the source code should be as explicit and language-neutral as possible without sacrificing efficiency.

Furthermore the code should be well-documented so that it is accessible to the widest possible audience.

The project should not use an existing package manager.

Doing so adds an external dependency, potentially fractures support and maintenance issues between multiple projects, and complicates initial setup and configuration, especially for users new to the package manager who must, at the outset, contend with multiple new projects.

If this application were significantly more complicated, then taking advantage of an existing package manager might be the right thing to do. But as it is, implementing a simple native package management scheme for dealing with modules is preferred.

The application should work in combination with a remote server or locally without introducing multiple modes of interaction.

The application should be portable.

The only requirements for content creation are:

A platform which allows for the creation of plain-text files, and (b) which presents the user with an accessible filesystem via some sort of file management application.

Additional requirements to share information and manage shared content:

  • Internet connectivity
  • Some sort of file transport mechanism
  • A standards compliant web browser

Of course different devices vary wildly in a number of ways, display size key among them. But these platforms can be accommodated by taking advantage of the templating system.

Without porting the app, it should be possible to run on any of the major computer platforms including Linux, Mac OSX, and Windows.

The application should be compatible with a number of different web servers.

2. Code

I intend to use Perl for this project.

First of all, I'm familiar with it, but more importantly:

It has a robust CGI library (like many other newer high-level languages used for web programming incl PHP, Ruby and others),

The language itself is mature and stable (unlike some of the other high-level languages used for web development),

The language is particularly adept at manipulation of text, which describes the bulk of what we need to do.

Furthermore...

There are many resources available for people new to the language.

It's well-established as a web programming language.

There are a number of projects aimed at improving the efficiency of Perl for web applications. Though I don't anticipate that the application will require this sort of help, it's reassuring to know that these exist for supporting large or popular installations.

As for Perl's shortcomings,

  • that it is a permissive language
  • has a tendency to result in unstructured programs
  • that it is overly idiomatic

These 'weaknesses' can be overcome simply by adhering to good coding practices. There's certainly nothing about Perl that prevents code from being well-written.

The code should be as explicit and accessible as possible as opposed to being terse and overly idiomatic, except where the idioms are more efficient. Wherever idioms are preferred, liberal commenting should be used so that people who are unfamiliar with the eccentricities of the language can follow along.

3. Posts (content)

Each post is an ordinary, discrete, plain-text file. Normal text should display in the browser as written except that it's subject to the style imposed by the template (i.e. CSS rules, applied to the page).

Posts support HTML without restriction except that they are contained within a larger page structure so tags concerned with document structure, for example , do not apply and should not be used.

Posts should also support one or more of the common lightweight markup syntaxes, including markdown http://daringfireball.net/projects/markdown/, or textile http://textism.com/tools/textile/.

These lightweight markup syntaxes are an important component of the application as far as usability is concerned.

The following is a abbreviated list of some of the advantages these tools offer:

They...

  • simplify reading and writing the plain-text source documents.

  • encourage the appropriate use of formatting to improve the readability of rendered content, because it is much easier and more efficient to include formatting markup, without the formality of a language like XHTML.

  • avoid the problem of formatting mistakes cascading through a document. Broken syntax is left harmlessly unconverted, which may look odd (an advantage in itself because it makes mistakes easy to identify) but won't propagate through the rest of the document.

  • Encourages linking by reducing the overhead involved.

Furthermore, it's easy to integrate HTML with these markup languages because they are all designed to coexist with HTML.

Posts are content. As such, they should not contain formatting instruction or behaviors. This is standard separation of content, style and behavior which is a tenet of modern web design and development.

4. Client environment

Any text editor, or any application capable of generating plain-text files, is a suitable client (including for example mail clients and IM applications, among others). Users can 'customize' this important aspect of the application simply by choosing a preferred editor.

There should also be a web interface, consisting primarily of a fairly typical HTML form. This form should integrate with the lightweight markup syntaxes so that the supported syntax is consistent among editing environments.

The web interface should support preview of posts.

5. Server requirements

Two of the primary dependencies on which the application is built are (a) a web server and (b) Perl. Both are readily and freely available for virtually all platforms. Apache and Microsoft's Internet Information Server (IIS) account for the vast majority of deployed, dedicated web servers, though there are many alternatives, including for example, lighthttpd http://www.lighttpd.net/.

Any server with CGI support should suffice.

Running locally on a client machine should provide many of the same options. Mac OSX, Linux and other Unix-like operating systems typically include an Apache install (and are capable of running alternative servers). Microsoft's client operating systems have included some version of Internet Information Server for many years now. In XP IIS was a part of the default installation. That is no longer the case with Vista, but IIS 7 is still provided as an optional component.

Note that there's no requirement of actually running a web server on the local computer. As is already mentioned, content is simply a collection of text files and as a result, there are only very minimal requirements concerning the creation of posts. They can be created offline from any device with an application capable of producing plain-text files and then sent to the server or synchronized at some later time.

But running a web server at the client would allow for complete functionality even when offline, which may be useful for any number of reasons (e.g. running a presentation when internet access isn't available). This is a useful option but should not be considered a requirement.

6. Content Management

As has been mentioned, the application should use the filesystem as a content database. Posts are organized hierarchically according to a topic-based categorization scheme. For example, the root of the weblog may contain any number of folders, which are categories, for example 'Technology', 'Politics', 'Music', 'School', ... . Each of these categories may contain any number of subcategories as well as posts. For example, 'School' may contain subdirectories for each of a number of different courses. Within each of these subcategories there may be more posts and additional categories.

It should be possible to syndicate or apply templates at any level of the hierarchy. An instructor may choose to use this application to host websites for more than one course. As this relates to syndication, all students who are subscribed to one or more courses are notified whenever the content for their particular course is posted or updated.

Syndication is supported via one or more of the established formats (e.g RSS 2.0 http://cyber.law.harvard.edu/rss/rss.html, Atom 1.0 http://atompub.org/rfc4287.html)

Adding, editing and deleting content should be as easy as acting on the files directly.

Also, it should be possible to build a module to handle these same actions within a web browser.

There are any number of good text editors and/or file transfer applications that allow a user to seamlessly edit files hosted on a remote site.

In addition to the topic based scheme, the application should automatically build a date-based hierarchy of posts from the file modification times. So, for example,

sample.net/weblog/2007/ should return all posts from 2007

sample.net/weblog/2007/08/ all posts from the month of August in 2007

sample.net/2007/08/02/ only posts created on August 2, 2007

7. Modules

Given that so much of the functionality of the app is dependent on its modules, management should be simple so that this aspect of the project does not detract from the appeal of the application.

Modules are Perl scripts which define routines available to the application that are called at key points in the program's execution.

Modules can be

  • uninstalled
  • installed and disabled
  • installed and enabled

Disabled modules must not affect the behavior of the application and should not, in any signficant way, impact performance.

Modules can be managed by directly manipulating corresponding files via the filesystem, in much the same way as posts.

Modules should be installed simply by downloading them and moving them to a specified directory.

Modules can be disabled by renaming them. This has the added advantage that it communicates whether a module is enabled or disabled at a glance without requiring that the user load an admin interface or guess at which modules are enabled from looking at the output. It also avoids the issue that a problem module cannot be disabled because it is interfering with loading the admin interface.

I cannot be precise without giving more thought to the application itself, but...

It should be possible for modules to override many of the default, core routines.

Modules should have the opportunity to alter content and templates.

In the case of content, modules should have access to run against both the raw plain-text and converted output immediately before it's sent to the browser.

Modules should be inventoried at the start of execution, after which they can be called at key points during execution in accordance with their function.

Modules should be self-sufficient. Internal dependencies among modules are often a problem with projects like this one. It gets sticky when, as an end user, I can install the module I want but not some plugin it requires because of a conflict.

Modules should be single purpose. If I'm installing a module to pick up one feature it causes trouble when a whole bunch of other routines come along for the ride. Small single purpose modules mean a finer granularity of control over what's running along with a user's blog.

8. Caching

There are two ideas of caching relevant to the project.

Because traversing the filesystem and filesystem I/O are relatively slow operations, posts should be indexed or cached as a structure that can be read and parsed more efficiently than revisiting the entire directory structure and pulling together all of the entries relevant to a request each time a visitor views the site. The only time we need to access the original files is when they are modified (added, deleted or edited). Every time one of these events happens, for example when a new post is added, we can add it to the cache, or reindex the site.

The other form of 'caching' is related to how the web server interacts with the script itself. Typically when the program is run, the Perl interpreter must be launched, the program run, data structures created, and then all of this is torn down again and the process must be repeated for each new request. Various server extensions can be used to influence this pattern (FastCGI, mod_perl among others). Other mechanisms, such as Perl's storable package allow us to preserve data structures by writing them to disk, where they can be retrieved on subsequent requests. This type of persistence can have a significant impact on performance.

Mechanisms like this should not be required for the majority of installations. After some limited testing, I do not anticipate that performance will be a problem except under heavy peak loads.

By implementing some form of caching within the project, we can avoid the use of external persistence mechanisms. As a result the application is more likely to work with a typical installation.

9. Templates

Templates should be XHTML Strict compliant by default.

Templates that I plan on creating include:

  • a print template,
  • a presentation template,
  • a default project template,
  • two or three wireframes. (These will be one, two, and three column layouts to serve as examples and a starting point for would-be template designers.)

Templates should consist of a single XHTML file along with related style sheets, image files, and other associated resources.

The most important design goal for templates is that they should be similar enough in structure to a standard XHTML page that designers are free to use whatever tools or techniques they typically would to create pages.

10. URLs and Permalinks

URLs should follow the guidelines specified at "Hypertext Style: Cool URIs don't change" http://www.w3.org/Provider/Style/URI.

URIs should not change, which means that identifiers should not contain information which is likely to need to be changed over time.

Some key considerations related to the URLs produced by the application:

URLs which include filenames should not include file extensions (i.e. existing file extensions should be stripped from the URL)

URLs that refer to directories should include a trailing forward slash

Index page names should be omitted; for example, sample.net/index.html should be sample.net/

With a static website, file and path names are the responsibility of the author or the person who maintains the site, but with dynamically generated pages, the application must be made to adhere to the basic guidelines.

Weblogs introduced the term 'permalink', which is a permanent address to an individual post.

Permalinks should be date-based.

Since the directory structure of the weblog reflects a categorization scheme, it may be tempting for authors to move posts when reorganizing their site. This should be discouraged; it's certain to happen anyway. But because the date-based archives are generated automatically from the original modification dates of the files and preserved by the application, they are likely to be reliable. Therefore, date-based permalinks are more appropriate in the case of this application.

Search engines should be instructed to index only permalinks for content. This can be accomplished using the Robots Exclusion Protocol (aka the robots.txt mechanism), site maps, or the Robots meta tag. It should be possible to write a module to generate both a site map and meta tags dynamically.

11. Project Management

Project management related to modules

Management of shared modules is an important consideration and must be handled well if the project is going to be successful.

All shared modules should be made available via a central repository. Years of experience participating in other, similar projects have proven to me beyond any doubt that this is the only way any modular open source effort can work. Modules shared in this way will be considered 'contributed' and managed under the auspices of the project. All other modules, even if they are made publically available in some other way, via the author's personal website for example, are not considered to be contributed and as such, fall outside of the scope of the project, until such time that they are made available.

There must be some review of contributed code. I have yet to work out the details of how this might be handled. It can be exceedingly frustrating for novice users to struggle with partially functional or ill-conceived modules. All due effort should be applied to steering users away from such issues.

Widely used routines should be generally available. I haven't yet decided how to implement this, but I'm thinking in terms of a 'catalog' of common routines. This catalog could be implemented as a module itself, or something else.

This is the perfect sort of thing to be hashed out on the wiki, where people can tinker directly with the code to make improvements. At any time the catalog should represent the community's idea of the best way to handle routines that are very commonly used.

12. Documentation

Documentation is a vital part of any project and it's necessary the I spend a sufficient amount of time on documentation during development and that it continue to be an important consideration going forward.

Documentation will be handled via the project wiki and community participation should be encouraged, but not expected. In other words, documentation must not be abandoned, and pushed off to the community to succeed or fail as an independent effort.

13. Installation

Many projects which have found their way to the web recently have adopted a 'no step 3' installation policy, i.e.

Step 1: download the app Step 2: move or rename a few key files or directories or set a couple of configuration details Step 3: run the software (there is no step 3)

This is a nice idea but I'm not married to it. I don't believe in adopting a philosophy and then bending a particular project to it even if it seems counterintuitive. For example, a lot of these installation procedures abandon platforms when they can't be made to adhere to the overly simplified installation guidelines.

What I'll say is that installations should be simple and intuitive and consistent among all supported platforms to the extent that it's possible.

14. Synchronization and Backup

Because we're dealing only with simple text files any backup strategy that a user has adopted for their other data should work for this application. Content can be backed up incrementally and restored entirely or in part. The same is true of synchronization, which shouldn't present any significant challenge though it's certainly a topic to be addressed in the documentation.

15. Revision control/versioning

There should be a separation of configuration data from the code itself so that the application doesn't need to be reconfigured everytime it's updated.

Otherwise, whenever updated, the new version of the application will be made available on the project homepage, along with a changelog and other relevant information.

Updating should be a simple matter of redownloading the revised app and replacing any existing copy.

A simple versioning scheme will be used.

Modules will be handled in a similar fashion but independently of the core application. Module developers will be encouraged to adopt a similar scheme, but I can't imagine that adherence to a single convention would be necessary as long as participants do something sensible.

Revision control should apply to templates as well.