ED2K Protocol Parsing Framework

1. Introduction, rationale

HydraNode ED2K Protocol Parsing Framework provides a generic API for parsing and handling ED2K packets, as well as constructing and sending packets. It acts as an object-oriented approach to protocol parsing, instead of the traditional non-object-oriented way. Allow me to verbose.

In traditional protocol parsers, the parsing is done at the location where the data is received, and also handled during the parsing. This results in code similar to the following (trivial example):

uint8_t opcode;
socket->read(&opcode, 1);
switch (opcode) {
	case OP_SERVERMESSAGE: {
		uint16_t len;
		socket->read(&len, 2);
		bswap16(len);
		char msg[len+1];
		socket->read(&msg, len);
		msg[len] = '\0';
		printf("Received server message: %s", msg);
	}
	case OP_IDCHANGE: {
		uint32_t newid;
		socket->read(&newid, 4);
		bswap32(newid);
		if (newid <= 0x00ffffff) {
			// handle low-id things ...
		} else {
			// handle high-id things ...
		}
	}
	case ....
	default: ....
}

As can be seen, this quickly escalates into huge switch statements, which in turn makes functions huge, indention levels skyrocket and in the end, you have rather unmaintainable code (not to mention it looks plain ugly). Now, it can ofcourse be cleaned somewhat by moving each of the switch branches into separate functions et al, but thats not my point - the base problem - huge switch statements - still remains.

Instead of the above, I'd like to suggest an object-oriented approach to protocol parsing. In the remainder of this document, I'm describing one approach, which is used in HydraNode ED2K protocol module for parsing ed2k protocol.

2. Expected syntax

The resulting syntax for using the parser should be similar to the following

	// 1. Incoming data
	// 1a. Set handlers
	void onServerMessage(const ServerMessage &p);
	void onIdChange(const IdChange &p);
	void onServerStatus(const ServerStatus &p);
	DECLARE_PACKET_HANDLER(ServerMessage, &onServerMessage);
	DECLARE_PACKET_HANDLER(IDChange, &onIdChange);
	DECLARE_PACKET_HANDLER(ServerStatus, &onServerStatus);
	// 1b. Read data from stream/socket and send to parser
	socket >> parser;
	// 1c. Parser calls the right functions when different packets are
	//     encountered.

	// 2. Sending packets
	socket << LoginRequest(prefs.getNick(), prefs.getPort());
	socket << Search("knoppix");

As you can see, the syntax is object-oriented - packets are objects, packet parsing and handling is decoupled, and at handling space, no parsing happens. How do we accomplish this? Read on!

3. The parser

First and foremost, we need to define a set of objects which represent various packets used in the protocol. In our minimal example, we could define LoginRequest, ServerStatus, IDChange and ServerMessage objects. Some objects may be used both during sending and receiving, while other objects may only be sent - the objects can control that. If we define implicit conversion operator to std::string for each of those packets, we can easily accomplish the syntax displayed in section 2 of the above example. An example packet object interface might look like this:

class LoginRequest : public Packet {
	LoginRequest(const std::string &nick, uint16_t port);
	operator std::string() { // format the packet and return the string }
};

Now we get to the interesting part. The parser is sent a stream of data. The parser must then perform the following operations:

There are several problems with this. First and foremost, how will the parser know what packet to construct? One idea would be a big switch statement for all known opcodes, in each branch a single packet type constructor. However, since we are trying to avoid 'switch' statement altogether, this is not the way. Instead, lets implement an Abstract Factory design pattern.

Each and every known packet type is also accompanied by a factory class, which is capable of constructing the packet type. ServerMessageFactory is capable of constructing ServerMessage objects, IDChangeFactory is capable of constructing IDChange packets etc. All those factories are derived from PacketFactory abstract base class. The PacketFactory class self-registers itself with the main parser class upon construction, thus building up a map of abstract PacketFactory objects, with their corresponding opcodes, which packets they can create. The code might look like this:

class PacketFactory {
public:
	virtual void create(const std::string &data) = 0;
protected:
	PacketFactory(uint8_t opcode) {
		Parser::addFactory(opcode, this);
	}
};
class IDChangeFactory : public PacketFactory {
public:
	IDChangeFactory() : PacketFactory(OP_IDCHANGE) {}
	virtual void create(const std::string &data) { // parse }
};

Using this approach, the Parser can perform a quick lookup on its map<id, factory*> to find the correct factory, and direct the construction to the specific factory through a virtual function call.

The only thing left to do now is inform the user of the parser that a new packet has been received. Since we do not know the user class, nor anything else about it - tempaltes to the rescue. First and foremost, declare Parser as template class. Also declare _all_ factories as template class. Now, the logic here is that there is one class per stream that wishes to know about that stream's events, so, upon construction, the parser requires Parent pointer, which is the class who wishes to receive these events. The parser itself cannot call the parent class's functions tho, since it doesn't know about the specific packet types itself. This is the job that specific factories must perform - thatswhy they are also made template classes. Create member function is modified to take Parent* argument. Once the packet is constructed, the factories can then call parent->onPacket(packet);

This creates another problem however - the system initialization. The template parser/factory classes do not get initialized unless used, and simply constructing a parser will not do any good, since the factories wouldn't get instanciated. Thus, the user must specifically instanciate all the factories one by one, using itself as template parameter. Thus:

class Server {
public:
	Server() {
		static IDChangeFactory<Server> s1(this);
		static ServerMessageFactory<Server> s2(this);
	}
	void onPacket(const IDChange &p);
	void onPacket(const ServerMessage &p);
};

This instanciates the above two factories. What's interesting using this approach is that the Server object, which wants to handle those packets - must only define onPacket() functions for those packet types it instanciated the factories. Since the factories make direct function calls, instanciating more factories than corresponding overloaded onPacket() member functions causes compile-time errors from the factories. However, not instanciating some factories doesn't harm nobody, since they are never compiled, and thus never become able to make those (unsupported) function calls in the first place.

We can make the system slightly more user-friendly and create some macros for the factories instanciating, like this:

#define DECLARE_PACKET_HANDLER(Client, PacketType) \
	static Factory_#PacketType<Client> s_fact_#PacketType

After renaming the factories from IDChangeFactory/ServerMessageFactory to Factory_IDChange, Factory_ServerMessage etc, we can now write the following in client implementation file:

DECLARE_PACKET_HANDLER(Server, IDChange);
DECLARE_PACKET_HANDLER(Server, ServerMessage);
Server::Server() {
	INIT_PACKET_PARSER(Server, this);
}

The last macro expands to static object of type Parser<Server>, which is the top-level parser class. This is needed for the parser's static packet types map, which stores the corresponding factories.

4. Summary

Looking back at our original syntax, we have come to almost exact syntax as expected. The only change between the expected syntax, and the resulting syntax was the function naming scheme - while the original syntax used functions with pretty names, in this implementation, we are instead using overloaded function with predefined name.

5. Sample implementation

Below is a short sample implementation of the system to give a better overview.

// packets.h      User frontend
/**
 *  Copyright (C) 2004 Alo Sarv <madcat_@users.sourceforge.net>
 *
 *  This program is free software; you can redistribute it and/or modify
 *  it under the terms of the GNU General Public License as published by
 *  the Free Software Foundation; either version 2 of the License, or
 *  (at your option) any later version.
 *
 *  This program is distributed in the hope that it will be useful,
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *  GNU General Public License for more details.
 *
 *  You should have received a copy of the GNU General Public License
 *  along with this program; if not, write to the Free Software
 *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 */

#ifndef __PACKETS_H__
#define __PACKETS_H__

#include <hn/osdep.h>
#include <hn/log.h>
#include <hn/utils.h>
#include "tag.h"
#include "ed2k.h"
#include "opcodes.h"

// All supported packets. Each packet knows how to construct itself from stream,
// as well as how to output itself to stream. For input, std::istream is used,
// for output, operator std::string() is used, however these are implementation-
// defined.
namespace ED2KPacket {

class Packet {
public:
	Packet(uint8_t proto);
	virtual ~Packet() = 0;
protected:
	uint8_t m_proto;
};
class LoginRequest : public Packet {
public:
	LoginRequest(uint8_t proto = PROT_ED2K);
	~LoginRequest() {}
	// Construct a string containing the packet's representation in protocol
	operator std::string();
};
class ServerMessage : public Packet {
public:
	ServerMessage(const std::string &msg, uint8_t proto = PROT_ED2K);
	ServerMessage(std::istream &i, uint8_t proto);
	~ServerMessage() {}
	std::string getMsg() const { return m_msg; }
private:
	std::string m_msg;
};
class ServerStatus : public Packet {
public:
	ServerStatus(uint32_t users, uint32_t files, uint8_t proto = PROT_ED2K);
	ServerStatus(std::istream &i, uint8_t proto);
	~ServerStatus() {}
	uint32_t getUsers() const { return m_users; }
	uint32_t getFiles() const { return m_files; }
private:
	uint32_t m_users;
	uint32_t m_files;
};
class IdChange : public Packet {
public:
	IdChange(uint32_t id, uint8_t proto = PROT_ED2K);
	IdChange(std::istream &i, uint8_t proto);
	~IdChange() {}
	uint32_t getId() const { return m_id; }
private:
	uint32_t m_id;
};

};

template<class Parent> class ED2KParser  {
public:
	ED2KParser(Parent *parent) {}
	~ED2KParser() {}
	// Implementation-spefic code, provided for example purposes.
	void parse(const std::string &data) {
		static std::string buf;
		buf += data;
		if (buf.size() < 5) {
			// Can't parse anything under 5 bytes
			return;
		}
		std::istringstream i(buf);
		while (buf.size() > 5) {
			uint8_t proto = Utils::getVal<uint8_t>(i);
			uint32_t len = Utils::getVal<uint32_t>(i);
			if (i.str().size() < len) {
				return; // We don't have full packet yet.
			}
			uint8_t opcode = Utils::getVal<uint8_t>(i);
			// Locate the right factory for this packet.
			Iter iter = s_factories.find(opcode);
			if (iter == s_factories.end()) {
				logWarning(
					boost::format(
						"Received unknown packet. "
						"protocol=%s length=%s "
						" opcode=%s Data:%s"
					) % Utils::hexDump(proto) % Utils::hexDump(len)
					% Utils::hexDump(opcode)
					% Utils::hexDump(i.str().substr(0, len+5))
				);
				i.seekg(len, std::ios::cur);
				continue;
			}
			logDebug(
				boost::format(
					"Received packet. protocol=%s length=%s"
					" opcode=%s Data:%s"
				) % Utils::hexDump(proto) % Utils::hexDump(len)
				% Utils::hexDump(opcode)
				% Utils::hexDump(i.str().substr(0, len+5))
			);
			// found the handler
			(*iter).second->create(m_parent, i.str().substr(6, len-1), proto);
			 // Those bytes were sent to factory
			i.seekg(len, std::ios::cur);
		}
	}

	// Abstract base
	class PacketFactory {
	public:
		virtual void create(
			Parent *parent, const std::string &data, uint8_t proto
		) = 0;
	protected:
		PacketFactory(uint8_t opcode) {
			logDebug(
				boost::format("Adding PacketFactory %s")
				% Utils::hexDump(opcode)
			);
			ED2KParser::s_factories.insert(
				std::make_pair(opcode, this)
			);
		}
		virtual ~PacketFactory() {}
	};

private:
	Parent *m_parent;
	static std::map<uint8_t, PacketFactory*> s_factories;
	typedef typename std::map<uint8_t, PacketFactory*>::iterator Iter;

};

#endif

// PacketFactories header -> specific factories
/**
 *  Copyright (C) 2004 Alo Sarv <madcat_@users.sourceforge.net>
 *
 *  This program is free software; you can redistribute it and/or modify
 *  it under the terms of the GNU General Public License as published by
 *  the Free Software Foundation; either version 2 of the License, or
 *  (at your option) any later version.
 *
 *  This program is distributed in the hope that it will be useful,
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *  GNU General Public License for more details.
 *
 *  You should have received a copy of the GNU General Public License
 *  along with this program; if not, write to the Free Software
 *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 */

#ifndef __PACKETFACTORIES_H__
#define __PACKETFACTORIES_H__

// Specific packet creators. These need also to be in header, since they are
// templates.

template<class Parent>
class ServerMessageFactory : public ED2KParser<Parent>::PacketFactory {
public:
	ServerMessageFactory() : ED2KParser<Parent>::PacketFactory(0x38) {}

	virtual void create(Parent *parent, const std::string &data, uint8_t
proto)
	{
		std::istringstream i(data);
		parent->onPacket(ED2KPacket::ServerMessage(i, proto));
	}
};

template<class Parent>
class ServerStatusFactory : public ED2KParser<Parent>::PacketFactory {
public:
	ServerStatusFactory() : ED2KParser<Parent>::PacketFactory(0x34) {}

	virtual void create(Parent *parent, const std::string &data, uint8_t
proto)
	{
		std::istringstream i(data);
		parent->onPacket(ED2KPacket::ServerStatus(i, proto));
	}
};

template<class Parent>
class IdChangeFactory : public ED2KParser<Parent>::PacketFactory {
public:
	IdChangeFactory() : ED2KParser<Parent>::PacketFactory(0x40) {}

	virtual void create(Parent *parent, const std::string &data, uint8_t
proto)
	{
		std::istringstream i(data);
		parent->onPacket(ED2KPacket::IdChange(i, proto));
	}
};

#endif

// User header
class ServerList {
	// Singleton
	static ServerList& instance() {
		static ServerList s_sl;
		return s_sl;
	}
private:
	void onPacket(const ED2KPacket::IdChange &p);
	void onPacket(const ED2KPacket::ServerMessage &p);
	void onPacket(const Ed2KPacket::ServerStatus &p);
	friend class IdChangeFactory<ServerList>
	friend class ServerMessageFactory<ServerList>
	friend class ServerStatusFactory<ServerList>
}

// User implementation file
INIT_PACKET_PARSER(ServerList, &ServerList::instance());
DECLARE_PACKET_HANDLER(ServerList, ED2KPacket::IdChange);
DECLARE_PACKET_HANDLER(ServerList, ED2KPacket::ServerMessage);
DECLARE_PACKET_HANDLER(ServerList, ED2KPacket::ServerStatus);