Alo Sarv
lead developer

Donate via

Latest Builds

version 0.3
tar.gz tar.bz2
Boost 1.33.1 Headers
MLDonkey Downloads Import Module Development
Payment completed
Development in progress.
Developer's Diary

Tuesday, January 25, 2005

Implementing HydraNode SmartChunkSelector(tm)

As discussed last night, the current topic at development is HydraNode SmartChunkSelector(tm), which needs to select the most important chunk for downloading. One idea for implementation was suggested by SimonMoon - to introduce some kind of scores/ratings system for chunks.

It would mean, for each ChunkSet, we'd have to calculate a score, tweak those scores until they do what we want, keep them updated etc. The development overhead of that just doesn't seem to justify it. Besides, in this approach, we'd have to convince the scores system of what we want to achieve, instead of implementing the neccesery algorithms ourselves, using objects. (Score-based approach would fit well into a non-object-oriented environment, but C++ is OO language, and thus things should be done ... in OO way).

As such, here's my idea of ChunkSelector, based on Boost.MultiIndex.
The base idea is that we construct a ChunkMap, which is indexed by a number of variables we'r interested in - partial/verified status, availability, usecount etc. Using that, we can now perform the neccesery lookups, in the order we prefer, thus have complete control over the situation and tweaking it is much easier than with some score system.

An interesting find during building this was that I realized I can have chunks of different sizes (let's name them ChunkSets) in a ChunkMap - it is NOT limited to a single ChunkSize. As such, for example, BT plugin submits ChunkSet with SHA-1 hashes and ChunkSize=2MB, while ED2K submits ChunkSet with 9500KB ChunkSize and MD4 hashes, and ChunkMap handles all those chunks uniformly. When a chunk is being requested, the chunk is chosen from all ChunkSets, taking all chunks availability/partial/verified status into account.

At this point I realized that the requested chunk size (as passed to getRange() method), just become irrelevant, since a chunk is given out as one of the existing chunks, or a sub-range of that, leaving little control to the requester over the size of the chunk. While I initially started to get worried where it'd lead, I tend to think it's ok - at least from ed2k point-of-view, I see no problems in having PartData return smaller chunks, since those chunks are merely "Used", actual protocol stuff still works on smaller chunks.

Initial testing on the system shows it's working, albeit needs more work - namely, chunk multi-usage (which seems broken atm), and some tweaking here and there.

On other note, the compile time just skyrocketed - takes 18s to compile partdata as is right now, and only the header already adds 10s compile time on my 2.4ghz P4 system. Boost.MultiIndex uses Boost.MPL (template metaprogramming framework) heavily, hence the compile time. But adding 10s compile time to each file including partdata.h isn't my idea of fast development, so I guess I'll have to introduce Bridge pattern to PartData soon to separate the implementation and reduce the compile time significently. My first attempt to do that failed tho (god bless Kate's unlimited undo steps), one thing led to another and almost broke the entire thing. However, sooner or later it'll need to be done to keep compile times at reasonable speeds.

Anyway, ppl are waking up, so I better get to sleep - can't work with ppl trying to talk to me all the time :o

Madcat, ZzZz

With regard to huge compile times due to inclusion of Boost.MultiIndex headers, I've experimented a little with precompiled headers and the gains are worthwile: a .cpp using Boost.MultiIndex which used to take 30s to compile can be sped up to 5s by turning precompiled headers on (MSVC++ 6.0). You might want to try that (YMMV). On the other hand, PCH support in MS compilers is notoriously fragile.

Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?