Alo Sarv
lead developer

Donate via

Latest Builds

version 0.3
tar.gz tar.bz2
Boost 1.33.1 Headers
MLDonkey Downloads Import Module Development
Payment completed
Development in progress.
Developer's Diary

Saturday, December 11, 2004

Ideas and requirements for new PartData

I'v been thinking about what/how new PartData internals should look like, as well as the requirements for it. It was discussed on IRC several times after the last blog entry, and by now I have a general understanding what it should provide.

The most important aspect of PartData is choosing which ranges to download, as mentioned previously. It should always prefer the least-available chunk. Chunk, in this case, is defined as a part of file for which we have a hash (so we can verify the data). This means introducing availability-o-meter into PartData ... or perhaps even into SharedFile, because SharedFile would need this information too (later, during uploading - in order to better spread files, we'd only announce that we have the rarest chunks at any time ... optionally ofcourse). I went through several different concepts of storing source masks... one idea is that we keep a map, keyed on chunksize, and in that, keep vector of integers. Would show the implementation, but the blog screws up with less-than / greater-than symbols needed to express template constructs :(

If we key the outer map with chunksize, the inner vector indicates the availability of specific chunks. Next, when a module asks us for a random chunk, we look the size up on the map, and if we recognize anything from there, choose the lowest-counted entry from the vector. When adding source mask, we simply add +1 into the inner vector into every position which the source has, when removing, the same thing reversed. The actual implementation needs to be somewhat more complex, since we don't want to perform linear searches through the vector all the time - Boost.MultiIndex library to the rescue - there we can do multiple different types of lookups on same container.

This leads up to next thing - using/locking chunk/parts in PartData. Using this approach, we can say that we can easily mark full parts as "used", e.g. ed2k module would "use" 9500kb parts at time, BT would "use" [file-part-size] parts etc. The main logic would go:
  1. Select lowest-available chunk that is incomplete. Prefer half-complete chunks over completely empty chunks (we want to complete a chunk before starting next one).
  2. Check how much of it has been downloaded already, and how many times it has been marked "used" already.
  3. If the "used" count is larger than some constant, say, CHUNKSIZE / 5 (allow max 5 concurrent downloads of same chunk), choose another chunk. If it's less, grant the "use". Otherwise, choose next-rarest chunk.
  4. PartData needs to keep references of granted chunks for bookkeeping, and some other fancy stuff (more on this later).
  5. Next up, for example in case of ed2k module, it splits the given part into 180k chunks and requests them, and also locks the first 180k chunk. After that it starts writing data... same as before.
Now, what we want really here is make the "used" parts an object, and keep back a pointer to granted parts. This is probably best implemented using Boost.SmartPtr library, using shared pointers. Why is this important? Because:

PartData knows what parts it has given out. It also has access to those parts after giving them out. Now, if multiple sources start downloading same part, what will PartData do ? It'll split the first given out part to half, and grant the second half to the second downloader. The splitting is done of the incomplete part only. And when third source wants to use the same chunk (also possible), split it again.

Now, at some point, the first source will hit the end of it's originally given chunk. In that case, it'll request a new part. Now it gets interesting. If we keep a construction time of all "used" chunks, we can easily calculate the downloadrate for those chunks. Now, based on that downloadrate, PartData can decide (w/o knowing anything else about the source), which sources are fast and which are slow. Now, if the one that completed the chunk was a fast source (relative, compared to others), and we don't have any other chunks to give to it (say we'r near the end of file), we'll just kick one of the remaining two (slower one), and give the chunk to the fast source. This way we perform optimally when we have fast sources available, and thus slow sources don't block us when we'r dealing with small files and/or file endings (we all know how painful sometimes the last <180kb can get in emule ... )

Madcat, ZzZz

PS: Uhh... I write too much *doh*
PS2: Thanks for the comments on last blog entry, was really cool to wake up in the .. evening, and find some positive feedback in a while :) Been getting the "hydranode? bah, yet another mldonkey?" stuff all week *grr*. I mean no disrespect to mldonkey developers, but sadly, nowadays saying "core/gui separated, modular, p2p client" is almost equal to saying "yet another mldonkey clone", and we all know what the general public thinks of mldonkey :(

ATM there can be made no comparison with Hydranode and Mldonkey because it's too early for that. I don't know why people already compare Hydranode with Mldonkey. Hydranode is a completely different project with different goals. But Hydranode needs more time to evolve right now. It's certainly on the good road though :).

cya Skyw4lker

PS.: Your blogs have just the right length :)
As an mldonkey user, I gladly welcome an "mldonkey clone". Especially one written in a language that the vast majority of developers can read. ocaml may be a wonderful language, but I can't make head nor tail of it.

The modular nature of hydranode will allow an evolutionary approach to development. If a protocol is popular, its plugin will naturally attract more users and developers and unpopular ones will stagnate and eventually die.

So, I just wanted to poke my head in and say "great job so far madcat, keep up the good work!" I'm very impressed with what you've done so far and I hope that when you're done with the core other developers will join you and help build the plugins.

Thanks for all the hard work,
This post has been removed by a blog administrator.
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?