HTTP Feeds

Dieser Artikel ist auch auf Deutsch verfügbar

In order to allow the efficient and targeted development of software across multiple teams, software systems can be subdivided into different areas along the boundaries of different business functions by means of architectural concepts such as microservices and Self-contained Systems. Over the course of this separation, interfaces emerge between these systems, as access to data from other fields such as orders or customer data is required and follow-up actions such as invoicing or reporting may be triggered.

Interfaces can be designed synchronously or asynchronously. Synchronous interfaces are remote function calls that return an answer to a query. Although they are generally easy to implement, it is better not to use them in enterprise-critical system components as they multiply the availabilities of the interdependent systems and add to the latencies of the calls. As an example, the availability of an online shop should ideally not be dependent on whether the financial accounting system is currently blocked by a maintenance window or not. Conversely, the monthly financial reporting should not negatively affect the functioning of the online store through large numbers of GET requests.

Example of a synchronous GET call of an HTTP API:

GET /api/customers/387484543
200 Ok
{
	"customerId": "387484543",
	"firstname": "Alice",
	"lastname": "Springs",
	"email": "[email protected]",
	"newsletter": true,
	"termsAccepted": ["2018-01-01"],
	"locked": false,
	"created": "2018-04-12T12:13:58.876111Z",
	"updated": "2021-09-29T11:44:46.254440Z"
}

To prevent the described problems, developers should instead use asynchronous interfaces. These are often used in batch processing. For example, a file export is generated at night and other systems then made available on a file server. The main problem here is the age of the data – no one wants to work with stale data.

Message broker systems lead to new dependencies in the overall system (Fig. 1)

In order to also be able to exchange data in virtually real time asynchronously, message-oriented broker systems have become established. A system sends a message via a protocol such as the Advanced Message Queuing Protocol (AMQP) to a topic and a registered consumer processes it as soon as it is ready for it. The broker acts as a middleware system that buffers and manages the messages. Typical examples of this type of middleware are MQSeries, Apache Kafka, and RabbitMQ, but also Pub/Sub, which is used in Google Cloud, or AWS SQS.

Middleware Leads to Dependencies

The cross-system use of such broker systems as an interface technology generally leads to organizational and technical dependencies. This leads above all to the question of organizational responsibility – which team attends to the operation of this system, including installation, operation, and agreement on version updates? And who is responsible for the costs? The team in question is also always involved in matters concerning the establishment of new topics or permissions.

Moreover, the choice of broker system has technical consequences for the systems concerned. All parties must use version-compatible client libraries, and a serialization format is prescribed that for example in the case of Avro or Protobuf requires special schema management. If an error occurs, for example when a message cannot be delivered to the broker, a laborious, cross-team error analysis is often the only solution.

HTTP Feeds as an Alternative

Such dependencies resulting from middleware should be avoided by developers, or at least minimized. This is possible since asynchronous interfaces can be implemented without any middleware, by means of classic HTTP end points as feeds. Feeds make available a chronological sequence of data.

Conceptual representation of a data feed (Fig. 2)

The systems responsible for the data provide an HTTP end point, by means of which the data can be queried. Consumers access it via GET queries and can process both historical data once and new data continuously.

Data Replication or Event Feeds – Or Both?

HTTP feeds can be used in a wide range of scenarios for the decoupling of systems. In the case of data replication, all entities of a business object are to be made available in a defined format and the systems (consumers) received from the feed stored in the required form in a local database. All business objects are to be found at least once in the feed, although in case of a change the complete current state is added to the end again as a feed entry. Older entries that have the same function key (for example customer number or order number) can be dropped. Subscribers to the feed are thus kept fully up to date on all business objects and can also reload them at a later date, for example when a further attribute is to be stored in the local database. Data replication allows quick lookups to be undertaken independently of the availability of the source system, for example in order to read customer data on the basis of a customer number or to undertake data analysis in a separate analysis system.

A further form of feeds are event feeds. Here, function-specific events, known as domain events, are published in a feed. They describe what occurred at a specific point in time in a system, but do not contain all information about the business object concerned. For example, a fraud check can lead to a negative result (suspicion of fraud), or a customer may have changed his email address. The systems receiving the feed (consumers) can trigger actions relevant to their functional context on the basis of these events.

Interestingly, data feeds can also trigger corresponding actions: When a new or updated entity is present in the feed, a corresponding function-specific action can be triggered in the consumer system. For example, a new entry in a customer feed can be used to enter the email address of a customer into the email marketing system.

In both cases it is always important to watch out for functional and technical idempotence in the consumer system, as it can often happen that a feed is reloaded for technical reasons. In addition, a process should be agreed upon for when a business object is to be deleted.

Implementation of HTTP Feeds

HTTP feeds can be implemented via a range of different technologies, data formats, and protocols.

Atom

One of the most prominent feed technologies is the Atom Syndication Format. Comparable to the RSS standard, news websites and blogs typically use Atom to make available new articles in the form of an XML interface. This format contains a specific data model that is optimized for news and blog entries, but which can actually be used for any data sets.

However, Atom only describes one format and not the protocol that regulates the information retrieval. The continual reading of new entries is therefore not specified in the case of Atom.

Polling of REST APIs

A feed can also be based entirely on pure REST APIs, with new entries available in chronologically ascending order as a collection in an HTTP GET endpoint. In order to avoid excessively long response times with large data volumes, it is recommended to limit the number of entries per request, for example to 1,000 entries. Further data can be retrieved by means of links, until all have been processed.

In the REST API the chronologically arranged entries are concatenated by a next link (Fig. 4)

A subscription is implemented by means of a polling process: As soon as all historical entries have been processed, the consumer receives an empty array as response. The consumer now simply sends further requests to the last linked URL in an endless cycle. In classical polling, empty responses are followed by a defined pause, before the check for new entries is undertaken with a further GET call.

By means of a long-polling process, latency in the processing of new events can be minimized (Fig. 5)

Alternatively, the long-polling process offers the server the option of keeping the request open until either new data are available or a defined time-out has expired. The client can then immediately issue a new query. The long-polling process minimizes latency during the processing of new entries, at the cost of open connections to the server. The open source library http-feeds implements such a long-polling approach by means of REST APIs for both data feeds and event feeds.

Server-Sent Events according to HTML5 Standard

A third option for the implementation of HTTP feeds is offered by Server-Sent Events. As suggested by the name of this HTML standard, the communication of the data takes place here in one direction, from the server to the client. The client system establishes a permanent connection to an HTTP endpoint, via which the server writes data in the MIME type format text/event-stream – without ending the connection. The client is able to continually read and process the data sets, which are separated from one another by a blank line. If there is a break in the connection, the processing is resumed by entering the Last-Event-ID as the header field in a specific data set.

Streamed Server-Sent Events are separated by blank lines (Fig. 6)

Server-Sent Events are suitable for the asynchronous provision of data and can be used in all popular browsers by means of JavaScript. However, the connection handling, which is designed for durability, hinders the operation and debugging, especially in end points which contain millions of entries. In addition, the response is not a pure JSON, which can lead to compatibility restrictions during tooling.

Housekeeping and GDPR Requirements

Feeds generally contain historical data. This raises important questions: How long can the data be held, and how is it to be deleted if, for example, a user demands it, as is their right under the General Data Protection Regulation?

The minimum storage duration of function-specific events is initially a function-specific question. It is however generally advisable to keep the feed compact, so that new consumers can quickly read it from beginning to end. For event feeds for example, a storage duration of four weeks could be considered as sensible; older entries are deleted.

In the case of data feeds, which have to contain all entities, the use of a function key that references the functional-specific entities is recommended. This can be for example a customer number, an order number, or a corresponding URI. Upon implementation of the feed, all previous entries with the same function key are deleted as soon as a new entry with the complete current condition of this entity is added. In this way, all entities are contained only once in the feed and are always up to date. This approach is known as compaction.

By means of a compaction run, older entries with the same function key are removed from the feed (Fig. 7)

The function key can also be used for DELETE events. For this purpose, a blank feed entry with the same function key and a DELETE tag is added. After a compaction run, all previous entries are deleted. Consumers of the feeds can react to these DELETE events by in turn deleting entries in their data sets.

Article