I'v been thinking about what/how new PartData internals should look like, as well as the requirements for it. It was discussed on IRC several times after the last blog entry, and by now I have a general understanding what it should provide.
The most important aspect of PartData is choosing which ranges to download, as mentioned previously. It should always prefer the least-available chunk. Chunk, in this case, is defined as a part of file for which we have a hash (so we can verify the data). This means introducing availability-o-meter into PartData ... or perhaps even into SharedFile, because SharedFile would need this information too (later, during uploading - in order to better spread files, we'd only announce that we have the rarest chunks at any time ... optionally ofcourse). I went through several different concepts of storing source masks... one idea is that we keep a map, keyed on chunksize, and in that, keep vector of integers.
Would show the implementation, but the blog screws up with less-than / greater-than symbols needed to express template constructs :(
If we key the outer map with chunksize, the inner vector indicates the availability of specific chunks. Next, when a module asks us for a random chunk, we look the size up on the map, and if we recognize anything from there, choose the lowest-counted entry from the vector. When adding source mask, we simply add +1 into the inner vector into every position which the source has, when removing, the same thing reversed. The actual implementation needs to be somewhat more complex, since we don't want to perform linear searches through the vector all the time - Boost.MultiIndex library to the rescue - there we can do multiple different types of lookups on same container.
This leads up to next thing - using/locking chunk/parts in PartData. Using this approach, we can say that we can easily mark full parts as "used", e.g. ed2k module would "use" 9500kb parts at time, BT would "use" [file-part-size] parts etc. The main logic would go:
- Select lowest-available chunk that is incomplete. Prefer half-complete chunks over completely empty chunks (we want to complete a chunk before starting next one).
- Check how much of it has been downloaded already, and how many times it has been marked "used" already.
- If the "used" count is larger than some constant, say, CHUNKSIZE / 5 (allow max 5 concurrent downloads of same chunk), choose another chunk. If it's less, grant the "use". Otherwise, choose next-rarest chunk.
- PartData needs to keep references of granted chunks for bookkeeping, and some other fancy stuff (more on this later).
- Next up, for example in case of ed2k module, it splits the given part into 180k chunks and requests them, and also locks the first 180k chunk. After that it starts writing data... same as before.
Now, what we want really here is make the "used" parts an object, and keep back a pointer to granted parts. This is probably best implemented using Boost.SmartPtr library, using shared pointers. Why is this important? Because:
PartData knows what parts it has given out. It also has access to those parts after giving them out. Now, if multiple sources start downloading same part, what will PartData do ? It'll split the first given out part to half, and grant the second half to the second downloader. The splitting is done of the incomplete part only. And when third source wants to use the same chunk (also possible), split it again.
Now, at some point, the first source will hit the end of it's originally given chunk. In that case, it'll request a new part. Now it gets interesting. If we keep a construction time of all "used" chunks, we can easily calculate the downloadrate for those chunks. Now, based on that downloadrate, PartData can decide (w/o knowing anything else about the source), which sources are fast and which are slow. Now, if the one that completed the chunk was a fast source (relative, compared to others), and we don't have any other chunks to give to it (say we'r near the end of file), we'll just kick one of the remaining two (slower one), and give the chunk to the fast source. This way we perform optimally when we have fast sources available, and thus slow sources don't block us when we'r dealing with small files and/or file endings (we all know how painful sometimes the last <180kb can get in emule ... )
Madcat, ZzZz
PS: Uhh... I write too much *doh*
PS2: Thanks for the comments on last blog entry, was really cool to wake up in the .. evening, and find some positive feedback in a while :) Been getting the "hydranode? bah, yet another mldonkey?" stuff all week *grr*. I mean no disrespect to mldonkey developers, but sadly, nowadays saying "core/gui separated, modular, p2p client" is almost equal to saying "yet another mldonkey clone", and we all know what the general public thinks of mldonkey :(