Why hypermedia? - Writings by Ruckus

## Definitions Recent years have seen a revival of interest in the use of hypermedia designs for Web applications, driven largely by the success of the [htmx](https://htmx.org/) AJAX library. What follows is a loose justification for the use of hypermedia designs, as opposed to the "thick client" pattern more common in modern times, from a software development perspective. Much has been written about the nature of hypermedia, and this will not be repeated here. For further information, I recommend starting from the [HTMX essays](https://htmx.org/essays/), as well as the [Hypermedia Systems book](https://hypermedia.systems/). However, for the purposes of this essay I will quickly define hypermedia systems by contrast to their current main competitor, the "SPA" pattern of thick clients. A hypermedia application is one where an application's UI is implemented by sending *hypermedia* to a *hypermedia client*. The distinguishing factor of hypermedia is that the information presented to the user, and the controls allowing him to operate on it, are presented in the same data stream in a self-describing manner that the hypermedia client can interpret. This means that a single generic client — the paradigmatic example being a web browser — can interact with any application, without any requirement for special-purpose code on the client end. A hypermedia application is contrasted with a thick-client application, or in the context of web development, an SPA (single-page app). A thick-client application is defined by having a dedicated client for the specific application being used, as opposed to a single generic client that can interact with any application. This dedicated client typically handles all user interface concerns internally, defining data presentation and user interaction within its own code. It interacts with a server over a defined data protocol which both server and client are hard-coded to use; any changes to the data protocol require updates to both server and client, and each application has its own client program. ## History Of these two designs, thick clients are actually older. Thick clients were the norm in all network programming pre-Web; protocols such as SMTP, FTP, Telnet and SSH all use thick clients. Though the concepts underlying hypermedia were discussed much earlier, the Web was the first hypermedia system to achieve wide penetration, and hypermedia applications first came on scene after the Web's introduction in the 1990s. The hypermedia concept was originally conceived as a system only for browsing static information, which would be created and updated by other means outside the system. However, as hypermedia systems became more widely used, there quickly became a demand for more user interaction primitives, which were added shortly in subsequent versions of HTTP and HTML. Using only the basic HTML primitives of information display, links and forms, it was possible to implement an extreme variety of applications from the server side, all of which used the same program — a Web browser — as client. This flexibility led directly to the Web becoming now perhaps the world's dominant applications platform. The downside of these apps — now in hindsight referred to as MPAs, "multi-page applications", in contrast to SPAs — was that their user interaction was, unavoidably, fairly clunky. The Web was originally meant to be a document-viewing platform. The use of dynamic backends, and the addition of Web forms, allowed it to be repurposed to write most kinds of what we now call CRUD apps. However, the fact that interactions with the server were limited to full-page refresh cycles meant that using these apps could be a chore. This limitation became more galling as Javascript was introduced for client interactivity, since a refresh cycle would not only repaint the screen, but also lose any local state that Javascript had set. The solution to this issue was the introduction of in-Javascript network connectivity to the server, initially via the `XMLHttpRequest` API. This paradigm, soon to become known as AJAX (Asynchronous Javascript and XML, so called despite not necessarily or even usually involving any XML), allowed Javascript code on the client to communicate with the server over HTTP requests without a page refresh. In combination with the gradual standardization of the DOM APIs for modifying a page in-place, this allowed for Web apps to support whole multi-step interactions in a single page, without any page refreshes at all. Due to the superior UI affordances available through AJAX/DOM techniques, this paradigm soon become the standard for cutting-edge web applications. However, this change also carried within it a break from the hypermedia paradigm that had previously defined the Web. Instead of being a universal client in and of itself, the browser was now an applications platform, a target for software development much like a desktop OS. The move to AJAX apps meant a move away from hypermedia and toward bespoke thick clients that simply happened to be implemented in Javascript and run in a Web browser. Early work in AJAX web applications was often ad hoc and gradual. Many early apps were mostly standard MPAs, with some amount of manual manipulation (often via utility/compatibility libraries such as jQuery) to call AJAX functions manually and update the page with the results. However, this method proved to be quite unscalable; the difficulties of consistently managing user interaction and a complex UI state with manually-applied event listeners were prohibitive in developing larger applications. Thus, front-end development moved more and more to elaborate framework technologies that attempted to automate common components of the UI/AJAX flow. Communications between frontend and backend largely standardized on JSON APIs, generally pure data APIs without any self-describing hypermedia component. Often, application data is available only via those data APIs, and the HTML rendered by the server is a vestigial component that exists only as a canvas for the complex frontend Javascript to draw on. Such applications are essentially pure thick clients, familiar to any desktop programmer from the pre-Web era, that happen to be implemented on the browser platform. ## Design implications The major advantage of hypermedia designs, as opposed to thick clients, has always been that a single client could be used for any application. In the early days of the Web, this was an extreme operational advantage: users could be assumed to have a Web browser available, and so the challenge of developing, distributing and updating a client application for every desktop platform supported could be avoided. With the rise of Web browsers as a full, mostly-universal applications environment, and the increasing penetration of high-speed Internet links that allow downloading many megabytes of code in seconds, these operational challenges have been substantially eased. However, other implications of the thick-client design remain. In particular, the separation of application logic into client and server, connected by a data API, has implications for the factoring of an application: a app that could be written as a single component must instead be written as two components with an interface between them. Software factoring, and large-scale design in general, are deep and involved topics; see e.g. [grug](https://grugbrain.dev/#grug-on-factring-your-code) and [ESR](http://www.catb.org/~esr/writings/taoup/html/modularitychapter.html) for more. However, one basic point is that choosing the correct level of modularity, and the correct cut points between modules, is key to handling complexity. Modules that are too large and internally complex turn into spaghetti internally; modules that are too small make the overall system into a spaghetti of modules. And, placing module boundaries in places they don't naturally belong leads to a development morass where the interface across a module boundary is oversized and must change continually, due to the modules on either side requiring high visibility into each others' internals. The thick-client model for client-server applications implies a design in which client and server are separate modules, connected by a defined interface (here, the data API). However, quite often, this design is not actually optimal for the application in question. Client-side functionality may have arbitrary dependencies on server-side state, and in most cases a new feature will require changes to both the client and the server. The client and server are not "natural modules" such that separating them optimizes complexity; their separation into modules is forced by the network topology and the development paradigm. Developing applications under this non-optimal factoring of modules is thus substantially hindered. (For more technical detail here, see [[#Appendix I the glue logic associated with an interface|Appendix I]].) ## Solutions to the factoring problem This factoring problem has existed since the introduction of complex client-server apps, and so many attempts have been made at ameliorating or avoiding it. One class of solutions involves attempts to minimize its effects by optimizing the [[#Appendix I the glue logic associated with an interface|glue logic]] involved — implementing the same interfaces using less developer effort and fewer lines of code. This class of solutions is very useful, often crucial to the success of an application. However, it has fundamental limits. The cost of an interface layer is always nonzero, and attempts to optimize it past a given point may become counterproductive.[^1] Also, before this point of diminishing returns, optimizations of this type are in a sense always constructive — this approach is usually applicable to whatever glue logic you have, and the amount of glue logic in a system is always nonzero. Thus, the choice between glue logic optimization and refactoring is generally "both/and" rather than "either/or". An interface that has been removed from your system need not be optimized; however, the interfaces that remain will still require it. Another basic approach in solutions to the factoring problem is to refactor so as to remove unwanted interfaces altogether. Often, this means attempting to "dissolve" either client or server — replacing it with a generic component that requires minimal bespoke development, in order to concentrate effort and complexity in the other. Attempts to dissolve the server include such projects as GraphQL and Supabase, as well as a wide variety of cloud computing services. The goal here is to reduce the server-side complexity to a minimum, providing only the smallest possible set of services and automating their implementation to the greatest possible degree, while moving almost all application complexity into a client which handles it internally. This minimizes the factoring problem by pushing as much functionality as possible into the single module of the client. Hypermedia, as a general paradigm, is an attempt to dissolve the client. The hypermedia client is, by design, generic across most possible applications. An application's UI is implemented via a standard and (ideally) simple[^2] interface description language, which is transmitted over the network to the generic client, at which point (ideally) the developer does not have to do any more work at all.[^3] The goal of the hypermedia design is thus to minimize the factoring problem by pushing almost all the application complexity into the server. This relies on the browser, the hypermedia client, as a highly capable generic client that itself represents a massive investment of development effort. However, the project of genericizing the client is usually easier than the project of genericizing the server: in most applications, a server must contain some minimum of application logic that cannot be pushed to the client (e.g. authentication, optimized data storage), whereas most possible UI interactions can be implemented in a reasonable interface description language. Thus, even without scripting, the browser is a far more practical "universal client" than most attempts at a "universal server". ## When hypermedia? The question of whether to use a hypermedia approach, then, is largely a question of whether your application's complexity is most easily concentrated in the client or in the server. One commonly given example of an application for which hypermedia is not well-suited is Google Sheets. There are various contingent reasons for this, such as the arbitrary potential interlinking of UI updates. But fundamentally, the reason why hypermedia is not a good fit for a spreadsheet is simply that a spreadsheet is an application that is inherently client-heavy. This should not be a surprise: remember that Google Sheets is ultimately a re-implementation of a class of very successful desktop applications that were used for decades without any server component at all. Pushing such apps entirely to the server, as is necessary for a hypermedia approach, is unsurprisingly unnatural and clumsy. Google Sheets is a good example of the class of applications where complexity is almost entirely client-side with very little on the server; in many ways you could duplicate its functionality using Microsoft Excel pointed to a Dropbox folder.[^4] This, though, is an unusual case in the larger space of web apps. The majority of web apps are primarily oriented around accessing some information resource that is stored on the server, or else communicating with other users of that server. In either case, the server is the primary locus of the application functionality. Thus, it's a much easier project to shuffle the application complexity all to the server and (mostly) dissolve the client, than to shuffle complexity to the client and (incompletely) dissolve the server. The third option, of course, is to retain the client-server factoring without attempting to minimize either component. This is often a mistake, for reasons mentioned above. However, in some particular cases it is reasonable. If the complexity in the application is naturally split between client and server, in a way that doesn't allow either to be easily minimized, then a split design is natural. For instance, a complex multiplayer game may act like this: the necessity to manage substantial state on the server, while also rendering a complex UI on the client, makes it a natural fit for a two-part design. Alternatively, for a very large application with many developers working on it, maintaining a line between client and server may be a low-cost decision given the complex factoring that any such project must already incorporate.[^5] To sum up: a hypermedia design is a sensible decision when application complexity is weighted toward the server, and when the facilities of the available hypermedia client are capable of rendering a UI sufficient to the task. This is the case for large numbers of web apps in practice, and many web apps that are currently written as thick clients, with heavy use of Javascript frameworks and data APIs between client and server, might be able to use a hypermedia design instead and see gains in terms of overall complexity.[^6] ## Appendices ### Appendix I: the glue logic associated with an interface One concrete manifestation of the difficulties associated with the factoring problem is the amount of work required to translate application data across domains and representations. An ordinary Web app will have application data stored on the server in some persistent form — usually, in a database of some kind — and presents that data to the user in the form of a browser DOM (either rendered from HTML provided by the server, or by DOM API calls from client Javascript). Aside from any application logic, code is required to read that data from its stored form and translate it into the form that the user sees. (There is also usually an intermediate form, which is that used for in-memory representation within the server application.) This data translation code (often called "data plumbing" or "glue logic") may amount to a substantial part of the total code required for an application. Thus, one important factor in minimizing the overall complexity of the application is minimizing the amount of translation code required. This can be approached by attempting to automate and genericize translation code, or by minimizing the amount of data translation required. Adding new interfaces across which data must flow usually amounts to a straightforward and substantial increase in the amount of translation code required. In a hypermedia-style application, the usual data flow goes `database -> server data structures -> HTML`; the HTML is then sent directly over the network and can be rendered by the browser without further effort by the developer. However, in a thick client application, the usual data flow is `database -> server data structures -> JSON -> client data structures -> DOM`. Every one of those interfaces usually means another layer of translation code, which (in the simplest case) must be updated for every new feature added to the application. Many attempts have been made to ease this issue by automating or genericizing data translation code. ORMs, reactive Javascript frameworks, template languages, and even data serialization formats themselves can all be seen as means of tackling this problem. However, the project of reducing global complexity by adding new libraries and layers of generic code is a famously fraught one. Library code, DSLs and introspective code are all inherently high-complexity domains in and of themselves; using them in a way that reduces global complexity is an extremely tricky design problem, one that toolmakers often fumble. A more reliable way of avoiding this class of problem is simply to architect your application to reduce the number of interfaces to begin with. ### Appendix II: why HTMX? In Web programming, of course, the original and still-used hypermedia architecture is the browser itself, via full page loads and MPAs. However, there are also quite a few different solutions available for using hypermedia approaches while still retaining in-page interactivity. This can be done manually and ad hoc, as in the old "jQuery-style"; there are options that integrate tightly with a given server-side framework, such as Laravel Livewire; and there are standalone libraries for similar functionality, such as Unpoly or HTMX. Among these options, why HTMX in particular? This decision is of course heavily impacted by circumstances, and much of it comes down to aesthetic taste. However, I will give my reasons here in case readers share tastes similar to mine. Of the options I have seen, HTMX feels the most like a simple extension to HTML. It takes the existing logic of HTML interactive elements and extends it to the new user-interface paradigm of AJAX requests and in-place DOM updates. This shares in both the strengths and the weaknesses of HTML as a language; for instance, it is bare-bones enough in its facilities, and primitive enough in its default styling, that achieving what is often thought of as a basic UI experience takes quite some effort. However, for those with an appreciation for HTML, HTMX is a straightforward and pleasant extension of that model. One of the guiding principles of HTMX is "HOWL" — "hypermedia on whatever you like". That is to say, like HTML, it is entirely backend-agnostic. HTMX works purely on the level of web standards — it issues plain HTTP requests and parses HTML in reply. Thus, any backend system that can reply to HTTP requests can be used with HTMX to provide an interactive UI. This comes at a minor cost to integration — working with HTMX to its fullest potential will require some manual (though easily-abstracted-out) grubbing through headers. If your backend system already provides a hypermedia option, it may well be better-integrated. However, the flexibility of HTMX makes it an attractive option across a wide variety of contexts. Perhaps the closest alternative to HTMX at the current time is Unpoly. Like HTMX, it is backend-agnostic and offers hypermedia-style UI updates via custom attributes. My preference for HTMX over Unpoly is primarily aesthetic. Unpoly has too many features, is too opinionated[^7] and includes too much magic. [^1]: One approach to optimizing interface glue logic involves reducing the amount of code required by adding magic to the system — new layers that infer facts about the interface via declarative descriptions or introspection, then generate the code required to handle it. However, this conceals a nasty trap: there is an optimization frontier between the developer effort required to create this code and the ability of a developer to understand the whole system afterward. Often, adding too much magic along these lines results in a system that takes no effort to use, and is impossible to fix when it breaks. [^2]: Whether the HTML-CSS-JS complex, even when used purely for interface description without any application logic, counts as "simple" is surely a matter that could be vigorously disputed. [^3]: In practice, most applications using the hypermedia pattern will still include at least some client-side code to handle purely client-side interactivity. Most such cases could still be pushed to the server to make a "pure" hypermedia app, but the gain in purity may come at a heavy cost in performance and complexity. [^4]: The exception here, of course, is Google's collaborative editing functionality — which, not coincidentally, is the part of the application that is most focused on the Web app core competency of communication with other users. [^5]: Beware, though, the social dynamics of such a situation: many such projects end up generating within themselves the complexity necessary to justify their large developer teams, even when a much lesser level would be capable of doing the job. [^6]: Note, of course, that any number of other reasons — existing codebase, availability of developers, ecosystem of extension software, etc. — may militate against actually using a hypermedia system. Make decisions based on a holistic assessment, not a purity test. [^7]: Of course, being opinionated can be a good thing, depending on context. HTMX is more opinionated than the free-form "jQuery-style" manual hypermedia approach, and is thus much more pleasant to work with.