MLDonkey Downloads Import Module Development
Development in progress.
Monday, October 31, 2005
Server change, ed2k improvements
The server hosting the website, svn etc was changed today; everything's now running on the new server, but some things aren't fully configured yet, however the main things (site, svn, mail) seem to be up and running properly.
I'v been testing and tracking performance issues in ed2k plugin, due to several reports of Hydranode's ed2k plugin being slower than the other ed2k clients. Thankfully, we have extensive logging capabilities in ed2k module, so optimizing/fine-tuning the module simply means going through hundreds of megabytes of log data. We currently have two tools for this job - srcstat.pl
, which does a overall analysis, and displays various statistics, such as average download-session lengths, percentages of failed/succeeded udp queries etc; and a newly-added commlog.pl
, which splits the trace log into per-client log chunks, so you can quickly see all data for each client, separately. The output of the latter is what makes currently debugging it really easy, since I no longer have to grep the log by ip all the time, but I can simply scroll through the post-processed log and see how each client behaved, where communication ended etc, including the timestamps of each operation.
Based on all that data, I'v managed to discover, and solve a number of issues of various nature; the details of those issues aren't relevant (you can see them in the corresponding checkin messages); theoretically the performance is getting better, but outside confirmations are missing yet (should get some outside reports in a few days hopefully).
PS: I'm looking for documentation on Azureus PeerExchange protocol (equivalent to ed2k sourceexchange), as well as Azureus DHT trackerless system - to my knowledge, no documentation on it exists; please prove me wrong.
Thursday, October 27, 2005
Feeling somewhat tired, and un-motivated after yesterday's heavy (and complex) coding session, today I'v been mainly chilling, and doing routine testing. I guess the most interesting area is win32 port; I made an optimized build (god that took long - something around 30-40 minutes on high-end system), and downloaded the usual bittorrent testset - a wired creative-commons-licenced mp3'set and the suse dvd, latter of which has 14'500 chunks in it, which is really heavy even by bittorrent standards (for comparison - in ed2k, 700mb file has 72 chunks, 4gb file has below 450 chunks; in bt, normally torrents have below 5000 chunks). Anyway, this exposes some scalability issues.
Namely, in order to choose the rarest chunk, we must update the availability of each chunk, and that's a linear-complexity operation. Furthermore, in order to do chunk requests, we must pass a vector of bool's to PartData::getRange(), which means when we receive a chunkmap (as bitfield) from a client, we must copy that into bool-vector, which is also a linear operation. Now, I did find a way to avoid the latter - namely generalize the PartData::getRange(), passing a functor returning bool, and using direct bit-manipulation. However, the first issue is harder, and will, no matter what, remain a linear-complexity operation - I don't see any way around it.
Coupled with the above bitfield-problem, the CPU usage when downloading that torrent seems to be around 10%, spiking up to 15% occasionally (at around 150kb/s transfer rate). We also seem to have rather heavy memory-leakage issue somewhere there - 20-30 minutes runtime resulted in leaking from 15mb (normal hydranode memory-usage) up to 90mb for no apparent reason. It's not yet determined whether the leakage is within to hncore/hnbase libraries, or in bittorrent plugin (ed2k has shown some minor leakage, but nothing of this scale, so I'd guess the leakage happens in BT).
However, it also seems there's some more cpu-hog code around in BT module, since while the bitfield-handling is heavy, it only happens once per each client (bitfield is only sent once); however the cpu-usage stayed at the high levels even when there were no new clients being created. Off top of my head, I'd blame the virtual file wrapper, but then again, that can't be THAT heavy - yes it does a lot of signal/slot-based communication, but most of that should'v been optimized out by compiler, and this torrent had only one file, which means the rangelist lookups can't be the bottleneck either. To make things worse, after a while hydranode/bt just maxes out the CPU and stays that way (still downloading tho, so not a local endless loop)... And while we'r at it, bt downloads sometimes stop at 100%, unable to complete; and on win32, deleting temp files fails sometimes with error "file already in use" :( I guess I'll have some fixing and optimizing to do *sighs*.
I ran a profile build two nights ago a whole 12 hours, but accidentally destroyed the profiling data when I woke up in the morning (programmers really shouldn't be allowed near computer before they'v had their second cup of coffee ...). I'll do a more thorough performance analysis again in a few days... however, when it comes to memory leakage, I'm bit stuck - due to sophisticated memory-management of C++, normal memory-leak-check tools (e.g. valgrind etc) are no use, since on shutdown, everything is cleaned up (all big data is stored in std::string's for example, and those get cleaned up properly). So I need some other kind of leak-checking mechanism that is able to tell me on RUNTIME where all the memory is allocated. If anyone knows a tool that can do that, please leave a comment :)
Wednesday, October 26, 2005
More black magic in Bt::Files system
As was pointed out to me regarding the hydranode uptime records, I'd like to clarify - both records were ended not because hydranode crashed, but, in my case, since my server locked up due to broken video driver, and chemical's, cos he wanted to upgrade to newer build. On memory/cpu-usage leaking for that long uptime, I did experience rather large leakage (tho most of the issues were fixed shortly afterwards), but chemical didn't report any memory/cpu-usage raising over that long uptime period.
Anyway, looking at devcenter timeline
, it's been 18 checkins today, total 75 checkins during last 7 days, which indicates rather high activity (used to be 50 checkins/week average).
Wubbla has been busy improving HTTP module - HTTP redirections work properly now, HTTP proxy support was added, and the plugin is now capable of handling really complex urls, like http://user@password:hostname:port/path/file?query#anchor
for example. Unfortunately, hnsh parser can't handle that stuff yet - to be added shortly.
On Bittorrent side, there's been a lot of additional black magic being done. Basically, after 12 hours of heavy coding there, it's now possible to cancel, pause, resume and stop files within a torrent, and it handles everything correctly. There's a lot of tricky stuff going on there, doing checksums with mixed set of real files and cache files etc, but it works correctly now. There are some unhandled situations tho - namely, in childPaused() method, it only works correctly right now when the chunk crosses only two file. Situations where files from both ends of an overlapping chunk are removed (e.g. two consequent files in the torrent), or when chunk crosses three or more files, aren't handled properly there yet (altough other code handles that properly), so seems I can say it again - one more function to update for tomorrow. As they say - "There's always one more bug"
On Bt::Client side, we had some protocol compatibility problems, namely availability bitfields were generated incorrectly, we failed to start uploading when socket was already writable when requests came in, and other miscellaneous things, so those 9 of you (yes, all 9 of you downloading that suse dvd), upgrade immediately to latest version. Basically, the bitfields were (a) generated again for each client, and with 14'000 chunks as that suse dvd torrent has, it took quite a long time. Now at least OUR chunkmap is cached in TorrentFile, and updated in-place when new chunks are downloaded, so no generation happens. However, for incoming BITFIELD messages, we still need to manually transform the bitset to std::vector that Hydranode uses internally, and that's a really slow operation - it slows down the client even on my 3.2ghz system. We'll see if we can optimize something there ... turning on optimizations will have large effect there tho, so maybe we won't have to optimize it by hand.
Testing, fine-tuning and bugfixing
It's been fairly interesting day of testing and fine-tuning. Request canceling in BT caused endless loop (guess I was too tired last night when I wrote that code); there was also a problem when passing torrent files with ' symbols in them via hlink, due to a problem in hnsh parser (such symbols need to be escaped again, but it didn't work). Also, there was a fairly hard-to-detect issue in PartialTorrent constructor, where chunk number was stored in 32bit variable, and then a mixed calculation was done with the 32bit chunk number, few int literals, and 64bit variable - one would've assumed the entire calculation would be done with 64bit variables, but apparently it wasn't - the inner () included the 32bit chunk number and int literal, which caused the entire calculation to be done at 32bit, triggering an error when downloading torrents larger than 4gb. Mainline client-id parsing works correctly now; Opera BT example no longer causes unhandled exception, and MLDonkey is also recognized. I also reduced socket timeout to 10 seconds when transfer is in progress (normally 2m10s in bt, since pings are sent every 2min); this allows unlocking chunks much faster, resulting in much better behavior near the end of download.
In the file wrappers, I fixed one of the last two functions needing to handle various cases of missing files in the torrent properly - PartialTorrent::write now correctly handles cases when some files are missing. Next (and last) function to update is verify() method. The trouble with those functions are that while being small (~10 lines), they are incredibly complex logic, so it takes a ton of thinking to get them right; the write function is passed begin offset (global), and data (as string); now we must find the file that contains the data, and issue write() call (using relative offset inside that file) on that file; furthermore, we can no longer assume the data is continuous, since we might be missing some files in-between, so we must re-calculate both the relative offset within the file, as well as position within the input data (relative to the global begin offset and the file's global begin offset), and write that sub-range (also taking care of not writing over the end of the file in question, since the data can span across multiple files). In verify() function, we must do cross-referencing between the files map, and the cache map, e.g. if the range that we want to hash partially doesn't exist in files map, we must replace the missing parts with corresponding cache files.
On the GUI topic, I've been filling pages and pages of paper with sketches of different views and pages, working closely with our designer, who will soon be implementing that stuff in photoshop. Sadly, as mentioned before, I can't show the sketches to you, since I lack a scanner, but what I can say is that it's progressing nicely, several scalability issues were resolved today, while remaining clean and simple in it's default configuration. Some open topics right now include how exactly we handle custom views (e.g. DC chat windows et al) - accessing them the first time isn't a problem, but switching between them seems bit tougher. Anyway, just wanted to let you know that progress is being made on the GUI side as well.
On an unrelated sidenote, I was downloading SuSE 10 dvd iso via torrent today, and found that at least 5 other Hydranode clients were downloading the same torrent. People - Hydranode BT module is officially in Pre-Alpha state right now, which means it's not even meant for public testing! Bittorrent people are VERY touchy about broken clients, and they ban clients by brand on first notice of bad behavior; Hydranode BT plugin is marked Pre-Alpha for exactly that reason - it's not fully well-behaving BT client yet - hell, up until few days ago, we didn't even upload properly, and currently, uploading is still bit flaky. While I cannot stop people from using pre-alpha code, I do urge you not to do so, until it's officially marked at least Alpha, and open for wider testing. Please don't use Hydranode BT module until it's officially announced as ready for testing.
PS: Chemical just beat my Hydranode uptime record - my last record was 11 days 9 hours, the new official hydranode uptime record is now 16 days 7 hours. There were no noticable memory or cpu-usage leaks for that long uptime.
Monday, October 24, 2005
Sometimes people (including myself) wonder why does Hydranode development take so long. I'v been at it, for like 1.5 years already, the most recent module (bt) has taken 2.5 months already - where is all this time going?
The problem with limited developer resources is that you can't slap things together, hope they work, move on, and later go back and fix things that break. It just doesn't work that way - you just end up with a ton of half-finished components, and yourself juggling between them - it can take years to properly stabilize the code then. Alternative solution is write everything right the first time; easier said than done though. The key is to approach every component of the app so that at some point, you can say to yourself "ok, this is finished", and forget about it. This way you can move on in a steady, albeit slow, space, and can always rely on earlier components to do their job.
This is also one of the reasons why I declared the Console application (first attempt on GUI-related things) obsolete - it just doesn't work. The reason was I took way too many things at once - Core/GUI comm protocol, the support library, GUI's internal data structures, and the visual representation of the data itself. You can't juggle between four distinct components - you have to build step-by-step.
Another thing you learn when developing alone for long periods of time is that you cannot afford bugs / mistakes - the cost of going back to fix something is far greater than that of writing it right the first time. This is different from corporate development, which have large resources - they have one set of developers working on new stuff, and other set of developers cleaning up the old mess; when working (nearly) alone, you don't have that luxury.
The most current problem at hand is BT module's files handling. As you know from previous posts, there are a lot of different usage scenarios there - files could be canceled from within the torrent at any point; they could be paused... or they might already be completed, or not existing anymore either. We'v approached this problem from multiple ends, the most recent being the implementation of Cache manager, which caches all cross-file chunks. Today I also added PartData::dontDownload() method, which allows customizing the chunk-selector to skip past specific ranges. It's not that simple though - if we simply mark all missing files in a torrent as dontDownload, we still need to download overlapping chunks, e.g. to cache. So basically, after marking all missing files as dontDownload, we must also unmark the ranges that cross file boundaries, since we still need those; the already complex writing and verifying code will become even more complex, since it must handle the case where files are missing, anywhere in the chunk.
In related news, some smaller issues in BT::Client were fixed (used range was never reset), uploading code works as per BT spec again, and uploadrequests can now be properly canceled. I also added ComponentStatus
page to our wiki, which gives a proper overview of various Hydranode components and their status.
Saturday, October 22, 2005
(bt) seeding, and (core) framework for multi-net uploading
There are two topics today that were implemented, both providing base for future work over the next week or so.
Basically, ModuleBase looks at four values at 5-second intervals, and if those values fall below normal, it requests the module to open more upload slots via virtual function call. The four values are module's upload-speed (current and average), and global upload speed (current, and average). The logic is that when we set a "soft" upspeedlimit for a module (which can be exceeded, e.g. in order to use up all available bandwidth), we test against that, and if we'r working below that, open more slots; that raises this module's upload rate, which gets it closer to it's soft limit. Under ideal conditions, each module should end up uploading at their limit, no more, no less, and Scheduler will dynamically adjust each module's limits, based on how much incoming data we receive from them. The second phase (Scheduler part) isn't implemented yet, but right now, I'm already looking how BT and ED2K are competing for upload bandwidth.
Bittorrent uploading is rather complex topic on it's own tho. While Hydranode currently supports basic uploading, this is no where near what true BT clients do nowadays - choking
and optimistic unchoking
algorithms, for example; super-seeding ... and so on. So there's still a lot of work to be done in that area, but seems we'r getting somewhere.
Now running BT for longer time (will leave it running overnight, seeding), I'm also noticing that BT seems to be lot more connection-hungry than ed2k (altough we did several optimizations in ed2k to keep connection-counts low); BT will need optimizations in that area as well, for example when seeding, we can safely drop other seeds etc.
On an unrelated sidenote, PartData API now exposes setComplete() method, which allows explicitly marking a range in the file complete. This is part of public API, but should only be used when you truly know what you'r doing; the intended purpose is allow plugins to recover corruption (e.g. AICH), or in BT case, allow marking parts of PartialTorrent complete (called by TorrentFile).
Friday, October 21, 2005
(bt) downloads resuming, and speedmeters
Downloads resuming for torrents works nearly correctly now - there are still some unresolved cases (if a download is at 100% at the time of shutdown, but the bordering chunks aren't verified yet for example), but generally, it's working.
Also, Bt::Client class now implements BaseClient API, and speed-meters are correctly attached to correct files. For a moment there I thought I had a serious problem there, since Bt::Client downloads the top-level torrent (virtual file), but I needed to somehow indicate which files within the torrent were being downloaded; it was resolved, however, rather easily - namely, Bt::Client attaches it's download-speedmeter to whatever file contains the most-recently requested chunk; it does come with small overhead of re-attaching the speedmeter on each chunk start, but I didn't note any major performance hit from it. The top-level virtual file's speedmeter is just the sum of all child speedmeters, so even if children are downloaded from multiple sources, the top-level torrent correctly displays the overall speed of all children.
Other changes today include improved support for client-software and client-version detection (now detects all known BT clients, and there are about 30 of them out there), and preliminary support for seeding clients (Bt::Client no longer assumes implicitly that it's downloading, and has valid m_partData member).
Next complex topic is how to handle availability chunkmaps properly. The thing is, while availability-o-meters are attached to the top-level virtual file, and correct (rarest etc) chunks are selected, the child objects (downloaded by other modules, e.g. ed2k) have no knowledge right now about BT's availability of the file, since no availability-o-meters are added to the children. What makes the topic complex is the fact that PartData requires chunks to be of equal size (except for the very last one), and also requires chunks to have hashes. However, with this scheme, we'd need to start the first chunk at non-zero offset (due to chunks crossing file boundaries), and I'm not sure we want to attach hashes to those chunks either, since the hashes are already checked by the top-level virtual file.
Thursday, October 20, 2005
Back in action
The real-life problems that have been plagueing me for nearly two weeks finally reached a turning point yesterday; while not being completely resolved, I have a clear understanding on what's going on and what's the future. So today, I was able to focus on coding again, and with progress.
The first discovery today was three issues in ed2k clients-management code, which caused considerable performance problems for ed2k module; one was caused by ~50ms-variation in our event-engine, which resulted in udp reasks being stopped (with notice "reask time, but we already did that 1799 seconds ago"), and client becoming "zombie" (never asked again). The second issue was related to TCP reasks, where sometimes the connection fails before we manage to complete the handshake/request with the client. Now we keep track of current session state, and drop the source if something like that happens (to avoid it becoming yet another "zombie" client). The third issue was rather rare, namely when we were uploading to the client, and downloading at the same time, and the remote client stopped the transfer (to us) session, and sent us back to queue, and then sent AcceptUploadReq (to re-start downloading), we simply ignored it, since we "thought" we were already downloading (downloading state was only reset on connection-close).
Then I recalled that I had heard multiple reports over several months about ipfilter not working properly, and since ipfilter is based on RangeList API, I headed down to hnbase/tests/test-range.cpp
to figure out what's going on. Creating the test-case was easy - rangelist with ranges 0...2M and 2M+1...4M; it exposed the bug, which was an integer overflow in Range::operator<, which was triggered for all ranges that exceeded range size maximum (4M in case of 32bit values). It never surfaced in PartData, since that uses 64bit ranges everywhere, but ipfilter used 32bit ranges, and made use of the complete number spectrum (up to 4M @ 255.255.255.255).
Next up I cleaned up the BT module classes, removed Manager and TorrentDb, merging the code into BitTorrent class, in an attempt to keep the module class design simple. Majority of additional time went to implementing initial support for resuming downloads after restarts, since that was never handled before; now most of the resuming code is finished, but PartialTorrent internally doesn't handle it flawlessly yet - it fails to recognize that some of the files in the torrent have already been completed.
Other miscellaneous changes include the addition of 'uptime' command to hnshell, and workaround for completing/moving files when the destination file-name already exists - now we prepend the filename with '_' if that happens.
Tuesday, October 18, 2005
ed2k_kad code in svn; misc updates
The most important news today is the addition of ed2k_kad module
to the main codebase. Developed by cyberz, it's still not working, but the base structures are in place, and some initial communication with the network has been implemented. It also includes a generic Kademlia algorithm template
, which is now part of hncore library, and will be re-used for all Kademlia-based networks.
sca is continueing his work on improving the build system, the latest addition is revision 2036, which adds proper handling for MODULE_DIR variable, new wrapper scripts which properly handle plugin/library locations, and many miscellaneous improvements.
Wubbla has been busy with stabilizing http module, which showed a number of crashes during last testing; there's hope that the module can be enabled by default in next release.
Personally I haven't been able to do much coding past few days, due to real-life-related problems. The bittorrent module simplification topic is still open, but I did some brainstorming today and visualized (on paper) the entire user interface. Some details still need work, and I'm no photoshop guru, so I can't digitalize it for your viewing pleasure, but this is the first time we have the full user interface visualized completely (previous attempts have only visualized parts or single pages of the interface). What I can say is that it follows the thoughts from previous blog posting
, with familiarity, simplicity and extendability as main keywords.
Sunday, October 16, 2005
from the madcat-running-out-of-titles dept.
I was out of town for two nights, due to real-life reasons, but I did have some interesting discussions with people during that time. I'v also been doing the usual research / news-reading on many topics - you can't just sit around and develop your stuff w/o keeping close eye on what the rest of the world is doing. The P2P market is ever-changing and evolving, and one has to stay updated of all new developments.
User interface topic (again) is also very interesting, and I'm watching Windows Vista development very closely, since that's what 90% of computers will be running in a year or so - any software developed today must work well with Vista. This is where I love the fact that Hydranode's engine is clearly separated - we can finish the engine, and then slap the latest and greatest GUI stuff on top of it. This is also as modern games are developed, which also have 4-5-year development time - the visual effects are added at the very end, when everything else is finished, otherwise you'd just end up with 4-5-year-old technology.
Another widely-discussed aspect of the GUI topic is whether go with a familiar interface, or an innovative one. The thing is, familiar interface (such as tens of other clients have) is a safe bet - it's tested, users accept it easily, and there are no riscs involved. The few people who come out and say "bah, copycat" can safely be ignored. On the other hand, innovative solution would most probably be very different from what users have gotten used to see/think; this is one of the reasons Microsoft has been very careful in making radical changes to the Windows user interface - familiarity is a very powerful and important thing. You can walk up a 10-year-old windows computer today, and have no trouble at all using it. And the changes that were made in XP, for example the new Control Panel layout - many people still have trouble adjusting to it, 5 years after the fact. So basically, Hydranode GUI should be very familiar to those switching from other clients, to minimize the "surprise" and "re-learning", since people shouldn't need to learn yet another GUI.
I also did background research on Gnutella and Gnutella2 protocols, and from what I see, we'r in for a tough ride there as well - Gnutella is old, widely-extended protocol, much like ed2k, altough what speaks in their favour is the fact that the client's developers are closely working together on newer/updated protocol standards. Gnutella2 can mostly be ignored for now, since the userbase is too small; while the protocol may be better-designed, it doesn't matter if the aren't enough users.
I also talked to cyberz, and we should have ed2k_kad module design documents available soon enough, as well as the prelimiary code merged to svn, since he has no time right now to actively develop it.
Thursday, October 13, 2005
Keeping BT module class hierarchy simple
The next logical step now is to add proper downloads resuming and partial torrent handling (where some files are missing), as we now have the neccesery backends for that. But first, let's simplify the BT module class system slightly.
One idea when implementing the cache manager last night was also to put it into separate class, as yet another abstraction, however it turned out it was much better to merge it directly into the virtual files layer (which is an abstraction on it's own already). Same pattern should be applied to two more, already existing classes in BT module - Manager, and the newly-added TorrentDb. Neither of them have any big responsibilities that would justify having them as separate Singletons - TorrentDb just contains a map of torrents found in cache folder, and Manager basically just controls the listening port, and currently active torrents. Both of those classes should be moved to the BitTorrent class (the module's main class, derived from ModuleBase).
Once that's done, we can add a nice and clean loop in BitTorrent class, which loops over shared files, checks if it has any torrent associations (via metadata->customdata field), creates the torrent (if it doesn't exist), or attaches the file to existing torrent.
With this in place, the overall bt module class system stays within the original design graph
- the addition of TorrentDb and BitTorrent classes broke away from that design, which causes unnecesery amount of files/classes in the module.
Wednesday, October 12, 2005
Introducing BT chunk-cacher engine
"You blow up one sun, and everyone expects you to walk on water."
- Samantha Carter, "StarGate SG-1"
Well, boys and girls, I'm glad to announce that I solved all of the problems mentioned in last nights blog post. Here are the gory details:
The fundamental problem was that we need to have access to all data in chunks crossing file borders, even if the files themselves are no longer present. Hence, the basic idea is that we simply duplicate all downloaded data into so-called "cache", from where we can access the data later on, if we need it. Now, I thought long and hard on what exactly should the cache files format be, as well as how much data are we going to cache at all.
As it turns out, the best solution on the file format is to create one (or two - see below) physical files for each chunk that crosses file boundaries. In reality, it's slight more complex. The simple case is when a chunk starts in file1 and ends in file2 - in that case, we need to cache the last bytes of file1 (into one cachefile), and the first bytes of file2 (into second cachefile). It gets more interesting, however, when the chunk crosses multiple files - in that case, we need to cache the ending of the file where the chunk began, all intermediate files, and the beginning of the file where the chunk ends.
Why this specific format? TorrentHasher (custom hasher object for hashing data across file boundaries) accepts list of paths to files as input. With this style of cache we can feed TorrentHasher a mixed set of real files, and cache files, and it works seamlessly.
It doesn't end here ofcourse - the data that was written to cache must also be checksummed, each chunk against the corresponding hash. That was also kinda tricky to get right, but it works correctly now - if the real-data hash fails, the cache also fails, and if the real-data hash succeeds, the cache also succeeds. Middle-cases aren't handled right now, however if one succeeds, and other fails, then this can only mean either code error, or hardware error (or out of disk space), so I have mixed feelings on how (if at all) to handle this scenario.
How much does this cache system take disk space? Well, the worst-case scenario is the same amount as the entire torrent, however this is an extreme case (happens when all files in torrent are smaller than chunksize * 2, thus all chunks cross file boundaries, and need to be cached). The average-case scenario is much, much lower though - preliminary testing showed cache sizes ranging from 0% to 2% of the total size of the torrent, depending on chunksize vs average filesize ratio.
On an unrelated side-note, I also fixed a bug introduced by the delayed disk-allocation mechanism introduced few weeks ago - namely, if chunk size was smaller than PartData's internal file buffer size (500kb by default), the first chunkhash verification was performed after the allocation (correct), but before PartData was able to flush the data to disk (first flush is done after allocation is finished). This didn't affect ed2k (where chunksize is 9500kb), but caused first chunk of each file to be corrupted in BT when chunksize was smaller than 500kb. This no longer happens - the verification jobs are delayed until the allocation, and flushing is finished. Internally in BT, this introduced a somewhat more complex situation, where the job had to wait until ALL the files that the hashjob affected were finished flushing, but that works properly now as well.
For those, who want to read the code to see what I'm talking about, hncore/bt/files.cpp
contain all the nifty stuff.
"Next step - parting the Red Sea."
- Samantha Carter, "StarGate SG-1"
Tuesday, October 11, 2005
Making pigs fly?
I have to admit that I seriously under-estimated the complexity of bittorrent module complexity. Two months of work has already gone to it, and I was hoping to do a release around 15th this month with bittorrent support, but it's getting painfully obvious that that deadline is no longer achievable. There are enough features and optimizations to qualify as new 0.1.x release, but I really don't want do do yet another 0.1.x release.
Frankly, I'm getting the feeling that it'll be easier to make pigs fly than it is to get bittorrent integration working and doing the right thing under all circumstances. Current idea and related problems are like this:
Bittorrent plugin will cache all known torrent files to config/bt folder. On startup, it shall load all of them up, and store the info_hash values for each file. Note: That could later be optimized to only check timestamps on the .torrent files, and have faster reference file for the info to avoid scanning tens, maybe hundreds, of torrent files on startup.
Next step is to scan shared/temp lists, and for each file that has a custom metadata field "torrent:****", attach the file to the corresponding Torrent object. This is where it gets complex: based on this design, we will allow torrents (either seeded or downloaded) to be composed of one-to-many of the files it contains - basically the torrent can have any number of gaps in it. The problem is that in BT, hashes can exceed file boundaries, which means if we are seeding one file of a big torrent, we cannot seed the first and last chunks of the file (since we don't have the full chunks for those).
On the other hand, when we'r downloading a single file out of a bigger torrent, in order to complete that file, we must also download more data in order to complete the first and last chunks of the file. Now add canceling, stopping/pausing and the rest of the stuff you can do with downloads to the mix, and we'v got a whole new set of problems. For example, when doing a cooperative download with other (file-based) network, what happens when the file completes? BT plugin will not allow completing the file unless all of the chunks are complete, but that means the file will stop at 100%, technically complete (could be verified via alternative hashes), but BT plugin still needs the first and last chunks to be downloaded (from other files) in order to complete it. But the other files may be already canceled, or completed, but deleted from disk.
In order to compensate against that, I think we need to add an intermediate cache for BT module. Basically, all downloaded (or shared) chunks that cross file boundaries must be duplicated internally by BT module, and stored somewhere INTERNALLY. Later, when we need that data, and are unable to read it from the original file (canceled/deleted), we use the cached data to complete the file. All of that must be of course cleaned up in the end as well - we don't want to store that data indefinately.
How exactly all of that that should be implemented, I have no idea.
Sunday, October 09, 2005
Teaching cmod_bt to play with other kitties
Seems the optimizations I did last night weren't a waste of time after-all - testers reported today CPU usage drop of nearly six
times, from 60% to 11%, in debug builds, on 200mhz pentium pro. Granted, your milage may vary, but seems the optimizations have larger effect on slower systems.
Today I was trying to make "BT" module play along better with other modules. As you are aware, I wrote it as standalone app initially, to speed up the development process (going through hnsh every time to start new downloads would'v gotten really tedious). However, now we need to turn it into a well-behaving hydranode module. Re-compiling the code as module wasn't any problem; I also ran initial tests in cooperative downloading with other networks - http and ed2k, both which currently have those capabilities. Both tests "kinda" worked - http module properly initated BT download, however due to problems on BT side, BT module created a new
download instead of attaching itself to the existing one. ED2K integration worked better, since there BT initated the downloads, and ed2k is much more mature module. The downloading was nicely coordinated, same file was transfered simultaneously from both modules. Problems appeared, however when the files got to 100% - BT module didn't allow the files to be completed, since there were still hashes not verified (BT hashes can exceed file boundaries, if you recall my earlier posts).
As for the uploadmanagement, and the comments on previous blog post, seems I still have some thinking to do in that area before jumping to implementation. I still think the top-level handling should be by module's session up/down ratio, maybe slightly modified by the module's priority (3 priorities are currently defined for modules - low/normal/high). In addition to that, we need two "exceptions" - focusing all upload bandwidth to one (or more) file [is this really uploadmanager's job? files selection is done per-module basis; ok, perhaps focus file + network - release made to ed2k network?]
, and focusing all upload bandwidith to one (or more) clients (an ftp server is a "client" as well).
Let's take a look at usage scenarios:
- Normal operation, no user interaction. Each network is "rewarded" based on how much they have given us. All is fair.
- User has a list of "friends", to whom he prefers to upload. This should be handled per-plugin basis, to remain compatible with each networks netiquette for this. In ed2k, this modifies the requesting clients queue-score.
- User has set some shared files to high priority, some to low. As with friends, this should be handled at per-plugin basis, to remain compatible with each networks netiquette for this. In ed2k, this modifies the requesting clients queue-score.
- User releases a file to one network. User configures a percent of upload bandwidth (say 80% by default) for this file. The rest of the upload bandwidth is split as per rule #1.
- User wants to focus upload to a specific peer, let's say an FTP site. He configures a percent of upload bandwidth (say 80% by default) for that specific client.
Did I miss anything?
Saturday, October 08, 2005
Hate to say it, but I spent another day profiling and bugfixing. But then again, I dislike building new stuff on top of things that I can't be sure they work 100% good, so I guess the time spent ensuring the base is good is well spent. The major fix today was finally closing bug #97, that had plagued us since the first implementation of ClientManager
- it caused rather odd crashes every few hours; apparently, the bug was that when merging clients in ed2k, the client extension objects were re-parented, but since they kept parent Client
pointer internally, this led to crash. This bug didn't surface earlier, since before the m_parent pointer wasn't used for so important stuff - just some logging calls, and such. To be totally honest, it's not really good design there either - having the extension objects keep reference to their parent - it was originally added to allow log messages coming from extensions to print parent object info (ip/port - for debugging purposes); later it was used to simplify some handling of ED2K::DownloadList
management; and now it's used to simplify handling of ClientManager
integration. Technically, both of the last two could be implemented "externally", e.g. handled in Client
class, instead of the extensions, however that would be more error-prone - lot of stuff needs to be done whenever the extension is constructed, or destroyed, and the most maintainance-safe place to do it is in their constructors/destructors (since they are created/destroyed in many places).
gprof and strace-based profiling only resulted in limited success. I was able to reduce time spent in gettimeofday() syscall (on Linux) from 65% down to 16.5%, and inlined a number of functions in different places (Scheduler
), however personally I didn't record any noticable CPU usage drop - still at around 3-5% at 25kb/s upstream, 185kb/s downstream rates (maxing out my link limits) [release build, 2.4ghz p4/ht]
When I first brought in ClientManager
, I was expecting to see a lot of mess being shown about ed2k upload-manager, because I considered that part rather weak; however, as it turns out, it's handling it pretty well - keeping slots at low counts (3-5 usually), and splitting bandwidth between those slots correctly (current implementation uses "slot-focus" I think it's called in emule - single slot gets as much as possible, other slots get the leftovers).
Now the big question here is how to handle upload slots across multiple plugins. The base idea detailed in original Hydranode design documents, dating back August/2004, indicated that upload bandwidth should be distributed between plugins based on how useful the plugin has been to use from downloading perspective; hence, if 80% of incoming traffic has come from edonkey plugin, then 80% of upstream bandwidth should go to edonkey plugin. Now, slots don't measure bandwidth, slots are merely used to "fill" the available bandwidth. Plugins have support for priorities right now - adding per-plugin bandwidth limits would be simple.
Basically, I'd like to end up with two upload handling options: dynamic and static. Dynamic
way would then, as described above, automatically adjust per-plugin upload limits based on how much download data has been received (per session or overall - that's still open); static
way would allow user to specify per-plugin limits by hand, e.g. 5kb/s ed2k, 5kb/s bt. Now, some networks have minimum upstream requirements (e.g. ed2k), so question is how to handle that properly. One way would be to apply the ed2k-specific limiting only when using static bandwidth limiting. If user sets ed2k upstream limit <10kb/s nice
behaviour I don't know yet... suggestions?
Thursday, October 06, 2005
So, back at development; I figured to get us warmed up again, I'll do some optimizations, and few bugfixes along the way. Most part of today went to profiling and fixing the resulting bottlenecks. Some things, however, are outside my control for now - namely, ed2k module has rather heavy bottleneck at Crypto calls (which is external code).
Anyway, the listing:
- [FIXED] ed2k now properly set connected state in BaseClient for incoming clients as well.
- [FIXED] Fixed duplicate calls to socketError(), which resulted in errors ala "Error closing socket: Success".
- [FIXED] Moved Bt::Manager shutdown code to Bt::Manager::exit() method instead of destructor to solve some shutdown crashes.
- [OPTIMIZATION] Removed top-level exception handlers (were only enabled in release build). Rationale is that if we ever need those, we should fix the bugs instead of relying on such things.
- [OPTIMIZATION] Scheduler optimization: Optimizes away the 5% time spent in ~SSocketWrapper by moving the object from stack to heap (wrapped in shared_ptr).
- [OPTIMIZATION] Inline PartData::isComplete() calls.
- [OPTIMIZATION] Scheduler optimization: Reduce map lookups by storing SSocketWrapper pointer in SSocket as well.
- [IMPROVED] Better command-line handling in bget: now passes args to hncore as well, so all normal args work now.
- [IMPROVED] Build system improvements: Now you can compile --with-mod-xxx or --without-mod-xxx; bjam install, install-libs and install-headers (almost) work as well. (patch by sca)
Wednesday, October 05, 2005
Busy with hardware/software maintainance
People keep asking me - what am I doing? What am I working on? True, I returned on monday as scheduled, but there hasn't been much (any) activity on SVN since then; furthermore, there are some rather serious bugs open, inherent from the new ClientManager (it crashes rather often right now).
Basically, yesterday and today went to hardware and software maintainance on my development systems... defraging disks, re-organizing data, and it will be concluded tomorrow early morning by reinstall of one box (Ubuntu 5.10 preview, if anyone cares). Following that, I can resume normal development routine again, and first targets are fixing the bugs introduced by the ClientManager, moving statistics handling to ClientManager (currently handled inside ed2k module), and introducing generic upload-management (which brings BT to usable state - currently it cannot upload).
Meanwhile, I got some more ideas on two topics. First, in eMule (and some other clients), there's a concept called categories
- you can assign downloads to categories, have separate incoming dirs for them, can pause/resume them separately etc; these are usually implemented in user interfaces via tabs or similar. However, in Hydranode, we will need an additional concept, which I currently call download groups
. These could be implemented by user interfaces as "subdirs" in the listing. They differ from categories by two things - 1. they are created only programmatically, never by user, and 2., their names are usually long. Examples of such groups
are torrents, and eMule / ShareAza Collections
The second idea is related to core/gui comm. Namely, I figure that we will need to assign a unique identifier to each operation requested by user interface; when core responds to a command, it sends the original id back. This allows more flexible management on both sides, as well as cleans up the protocol somewhat. Also, on the hncgcomm library, it will need to be re-structured - currently it exposes too much of the underlying system. Basically, hncgcomm API should mirror core API - for example PartData object would have pause(), stop(), resume(), getSources() and simialar methods. The difference here, however, is that all operations are performed asyncronously, which is where the operation ID's come in - when GUI requests a download to be paused, the request is sent to core, and the response to the operation - either success or failure - is sent back, so GUI can react. The current protocol design doesn't allow that, so GUI would simply work "blindly".
Meanwhile, while I'v been pre-occupied with other things, wubbla has been busy improving the http module. Among major code cleanups and re-structurings, several new features have been added - automatic download mirror finding (via www.filemirrors.com), automatic hashes finding (sha1 and md5, and they are checked against when the download completes), and automatic .torrent file finding (and passing to BT plugin).
September 2006 Current Posts