HydraNode Core Framework


Copyright © 2004
Alo Sarv

Index

  1. Overview

  2. Startup and Shutdown

  3. Logging Subsystem

  4. Config Subsystem

  5. Event Handling Subsystem

  6. Networking Subsystem

  7. Range Management Subsystem

  8. Files Identification Subsystem

  9. Files Management Subsystem

  10. Metadata Subsystems

  11. Modules Management Subsystem

  12. Core/GUI Communication Subsystem

  13. Remarks

1. Overview

HydraNode Core Framework (framework) provides a generic interface forming a platform upon which to build a true multi-network p2p client. The framework itself does not communicate with any p2p networks, neither does it perform any operations on its own. The actual functionality is implemented in modules, which are loaded dynamically on runtime. The framework provides platform-independent wrappers for networking, events, hashing, logging, configuration and so on, which the modules can use as they please. The framework also provides a generic API for temporary download files management, which may even be shared between multiple plugins, resulting in multi-network simultaneous downloads.

The only operation that the framework does internally is files hashing and metadata extraction (provided by Files Identification Subsystem). As such, all checksumming algorithms used by modules must be implemented directly into the core (although additional ones may be added by the modules, it is not recommended). The reason for this is that we want to have a central database of all checksums and metadata, to later move to true multi-network downloads from incompatible networks.

2. Startup and Shutdown

On application startup, first things that get initialized are several Singleton objects with automatic initialization. They are initialized prior to application entering main() function. These include mainly factories in Files Identification Subsystem. Then the remainder of Singletons are initialized using lazy initialization, from application entry point HydraNode::run(). The following subsystems are brought up, in the following order:

  1. Configuration
  2. Logging
  3. Networking
  4. Modules

The remainder of subsystems are initialized upon first usage.

3. Logging Subsystem

Logging subsystem provides a generic API for handling various messages which are shown to user indicating changes, updates or errors. The following types of messages are supported:

logMsg() logs a simple message, with no specific additions.

logDebug() prepends the message by "Debug:" and is only enabled in debug mode.

logWarning() prepends the message by "Warning:".

logError() prepends the message by "Error:".

logFatalError() prepends the message by "Fatal Error:" and also calls abort().

logTrace() logs a message only if the specified trace mask is enabled.

The only class in Logging Subsystem is Log singleton, which acts as container for trace masks and log targets. Trace masks can be string masks or integer masks, targets may be streams, files or sockets. When a target is added, all new messages will also be sent to that target.

4. Config Subsystem

Config Subsystem provides a generic API for storing key/value pairs of elements, and stores those elements between running sessions. The elements may be structured in directory hierarchies for better overview and management. The API is implemented in Config class, however, the configuration used by the framework can be accessed via the Prefs singleton.

5. Event Handling Subsytem

Event Handling Subsystem provides the main application event loop, and de-centralized type-safe event tables mechanism, based on templates. EventTable object uses template parameters to define the event source object, event type and event handler type, and encapsulates events posting, handlers locating and handlers calling mechanism. Each EventTable self-registers itself with EventMain class, thus adding itself into the main application event loop. EventMain class loops through its known EventTable objects and instructs them to handle events as neccesery. If there are no events pending, the system goes to wait state (using boost::condition) to avoid constant polling for events. When event is posted, the thread wakes up and handles the events.

The rationale behind this design is that generally events are posted from separate thread context, so the main thread sleeps/waits until there is an event, and then handles it, going back to sleep after that. This allows worker threads to perform their jobs, and notify main thread via events on progress/completition.

Users of the Event Handling Subsystem have a choice of deriving their event source classes from EventSink class, which is a wrapper around EventTable class, or creating a static member object EventTable in their class, providing means of posting events to it (either through wrapper functions, or by directly accessing the object).

6. Networking Subsystem

The Networking Subsystem in the framework is a wrapper around platform-specific native sockets, e.g. Berkeley sockets on POSIX-compliant systems, and WinSock on Microsoft Windows. As such, it provides a generic object-oriented wrapper around the lowlevel C sockets API. The following classes are defined in the Subsystem for public usage:

SocketServer is a listening socket which accepts incoming connections.

SocketClient is a connected TCP socket which can be used to send and receive data from remote peers.

Internally, the Networking Subsytem has one additional class, SocketWatcher, which performs sockets polling for data and events multiplexing. SocketServer and SocketClient are derived from EventSink and emit events upon status changes, which client code can handle.

All errors within the subsystem are handled by throwing exceptions from the context they happened. Almost all public functions in the subsystem may throw exceptions, either directly or indirectly, so client code must be ready to catch SocketError type exceptions at all times. This is done in favour of standard return-value style error notification to force the user to handle the errors. Return values can be ignored, but exceptions cannot.

7. Range Management Subsystem

Range Management Subsystem is implemented as backend for PartData object (from Files Management Subsystem) to provide a generic API for managing ranges and range lists. The basic idea is that a range has a begin and an end, with optional data stored in it. Ranges can be contained in range lists, for example, to indicate which parts of a file have already been downloaded, or verified, or which hash corresponds to which range in a file.

The system is designed completely type-safe, using templates. Ranges may have an optional policy set (using Template Policy traits), indicating how to behave when two ranges overlap or border with each other. The policy is used by RangeList object to handle ranges merging/splitting on overlapping/bordering ranges.

RangeList's may contain only ranges of same types and same policy. Attempts to insert incompatible ranges into RangeList results in compile-time errors.

Range with begin value 0 and end value 0 is reserved for internal usage and should not be used by client code. Attempts to use such ranges in client code will result in assertion failures in the subsystem. This also means that all PartData rangelists start at position 1, as opposed to traditional position 0, which may be somewhat confusing in the beginning.

The system is designed using very aggressive type and usage enforcing mechanism. As mentioned earlier, templates are used for compile-time typesafety. In addition to that, any attempts to mis-use the system will result in assertion failures and/or RangeError type exceptions. However, client code is not required to handle the exceptions, since they are triggered only when you'r doing something bad - so if the exceptions/assertions fire, fix your code instead of handling the exceptions.

8. Files Identification Subsystem

Files Identificaiton Subsystem provides a generic API for checksumming, verifying and extracting metadata from files. It is used as backend by PartData for hashing and hashes verifications. Work is submitted to the subsystem by constructing a HashWork object on heap, containing the pointer in boost::shared_ptr<> wrapper, and passing it to Hasher class. HashWork object describes the job being performed - it may be either partial range verification job (in which case, the work must contain reference hash), or a full hash job (in which case all supported hashes, as well as metadata is generated).

Internally, the Files Identification Subsystem uses a separate thread for performing the work. The thread class is not aware of the checksums it performs. Instead, it has a list of TransformerFactory objects, which are capable of creating HashSetMaker objects on demand. The basic operation during full hash job is requesting a HashSetMaker object from each of the registred factories, and passing all read data to all of those makers. HashSetMaker objects describe how to create the hashes used in various p2p networks. For example, ED2KHashMaker creates MD4 hashes for each 9500kb of the data, plus a MD4 hash over the part hashes for the filehash. The HashSetMaker objects use various Transformer objects internally for the actually checksumming functionality - MD4Transform, MD5Transform, SHA1Transform, etc.

Using this design, it is very easy to add new hashset to the subsystem - it basically comes down to implementing the specified HashSetMaker class, and instantiating a specific factory which can create that type of hashes.

Results of hashing are sent back to user via events of type HashEvent, emitted from the HashWork object used earlier for posting the work. Client code should always set a handler for the work being posted to retrieve the results. If client code doesn't handle the event, the work is automatically deleted after passing through from event tables (that is also the reason for using boost::shared_ptr<> wrapper around the pointer - to provide automatic deletion).

9. Files Management Subsystem

Files Management Subsystem is, arguably, the most important subsystem of the framework. It manages the lists of shared and temporary files, in the following class hierarchy:

PartData object encapsulates the information about a partial download. It uses Range Management Subsystem for keeping track of which ranges have been completed, which are pending, which need verifications, which ranges we have hashes for etc. It uses Files Identification Subsystem for verifiying the data being downloaded. It uses MetaData subsystem to find and use hashes for verification. It also has an internal small thread used for moving completed object to its destination location.

SharedFile object encapsulates a file which is being currently shared. The file may be complete or partial, in the latter case SharedFile has a pointer to the relevant PartData object. SharedFile also has pointer to its corresponding MetaData object, where it stores most of its information. Only SharedFile can destroy PartData objects.

FilesList governs all SharedFile objects, parenting them and owning the pointers, thus making sure there are no loose SharedFile objects dangling around. Only FilesList class can destroy SharedFile objects.

10. MetaData Subsystem

MetaData Subsystem is the most memory-heavy subsystem in the framework. It consists of a large number of containers for various datas. The following objects are defined:

VideoMetaData encapsulates all we know about video files - the length of the stream, codecs used, bitrates et al.

AudioMetaData stores information about audio files, including artists, track numbers, albums, release years and so on.

ImageMetaData stores information we know about images - height, width, format etc.

ArchiveMetaData stores information about archives - the format, compression ratio, uncompressed size etc.

Hash encapsulates a generic checksum of something. Hash type is defined by template parameter, so different hashes are compile-time incompatible with each other, providing safety against mixing different hash types in containers.

HashSet represents a set of hashes, generally about some data amount. HashSet uses template parameters about the contained hash types to provide compile-time type-safety. HashSet may have a file hash and part/chunk hashes.

MetaData object owns and contains VideoMetaData, AudioMetaData, ImageMetaData and ArchiveMetaData objects, and also adds fields like file size, known file names etc.

MetaDb is the top-level container in MetaData Subsystem, owning and containing MetaData objects. MetaDb provides a number of means of locating specific MetaData about files, including hash-based lookups, name-based lookups, and even SharedFile-based lookups. MetaDb stores its settings between sessions in metadb.dat file.

11. Modules Management Subsystem

Modules Management Subsystem provides a platform-indenepdant API of dyncamically loading and initializing modules on runtime. It keeps track of which modules are loaded, and also performs module verifications using an online modules database, to deny attempts to load bogus/evil modules.

12. Core/GUI Communication Subsystem

Core/GUI Communication Subsystem provides means of attaching a user interface to the framework to provide a graphical interface for controlling the the framework. The communication is performed over TCP using binary Core/GUI Communication Protocol.

13. Remarks

First of all, this document is far from finished, however it is more up-to-date than the original Core Design document. While the UML diagrams are missing, I believe a more generic overview of the framework was needed for new developers to get a quick glance at the design.

The entire framework relies strongly on templates to provide maximum amount of compile-time typesafety, where possible. In addition to that, it uses very aggressive debugging tactics - in theory it should never reach a state where it SIGSEGV's - assertions or exceptions should trigger before it reaches that state. As such, it is very sensitive to tampering - incorrect usage should quickly surface either via exceptions or assertion failures. Maintainers are encouraged to continue this trend, adding more checks as needed when new bugs are discovered, to ensure the checks fire on the conditions where the bug appeared.