MLDonkey Downloads Import Module Development
Development in progress.
Wednesday, November 30, 2005
API changes, and misc changes
Two important API changes were made today - PartData::getRange() and getLock() methods no longer throw exceptions when failing to generate a range (and return null pointer instead), and Config::read() no longer throws exceptions on invalid configuration vaules, but logs a warning and returns default value instead. The rationale behind both of these changes was that expected failures should not generate exceptions; exceptions should only be used for unexpected failures. The additional reasoning behind the first change was that we were throwing/catching exceptions in getRange() inner loops, which potentially meant throwing/catching hundreds, perhaps even thousands of exceptions while looping over chunks, which became a serious performance bottleneck. Exceptions in inner loops are evil.
There were few more API changes made few days ago, but I got carried away in that blog entry
; those were change of EventTableBase::handlePending and EventMain::handlePending function renames to process(), which is more generic, and based on that, the moving of SocketWatcher (which performs sockets polling) derivation from EventMain, overriding the process() method, rationale being more generic and logical. This also had the effect of removing the need to call SocketWatcher::poll() explicity from main loop, since that's now done automatically from EventMain.
Another change that was done recently was reducing the fluctuations of transfer rates (both in statusbar and in 'vd' output). Previously, the transfer rates dispalyed were the amount transfered during last 1 second; now we display the average of last 3 seconds, which shows more stable transfer rates. It seems that other clients (eMule, Azureus) show even further averaged transfer rates, for 5-6 seconds at least, so we might tweak that more in the future.
Monday, November 28, 2005
BT performance improved by 3-4 times
I discovered some possible issues with our DNS resolver, namely it was using port 1024 by default always, which could result in problems when other apps use the port for other things, so I changed it to listen on random port above 1024 instead, to avoid possible collisions in the future.
Then I used the aforementioned Object system (that's been our friend for number of things in the past) to "hide" the torrent's internals from hnsh download-list view. Now only the top-level "virtual" file is shown, the sub-files aren't shown at all. Soon I'll add "vd X" command, which will display more information (including the sub-files) about each download. Other upcoming changes to hnsh are "names" command (lists known file names for a download) and "comments" command (should be self-explanatory). The detail-view of a download, via "vd X" command, will also show other important things, such as availability.
On BT module, I started adding support for proper choking/unchoking system. It holds the key to BT downloading performance, since it's almost guaranteed to get a download slot from a client within next 30 seconds after you unchoke it (e.g. allow it to start uploading). As a result of the preliminary implementation, BT download-rates went up by about 3-4 times
, showing how important this feature is. It still needs some work, we sometimes do wierd stuff, such as choking a client, and then instantly unchoking it, but the initial results are really amazing.
Saturday, November 26, 2005
New builds, and more
First and foremost, as you can see on the right side of this page, new builds are available for testing. Among other updates, HTTP module is now fully documented, making it the first externally-developed module to become fully
compliant with Hydranode coding standards. All hail wubbla
Secondly, I finally got sufficient access rights at the server, so I could start fixing the stuff that broke during server change. Mail stuff is now working again, e.g. you can mail me again, forum registrations work and CIA IRC bot works. What doesn't work currently is mailing list, and re-sending activation mails from forum. The latter seems to be a bug in forum software, so I attempted to upgrade the forum to latest version (we'r like 4 releases behind already), however the first attempt to upgrade failed (I'm a programmer, not a unix/web/admin :)). Still, I'll try to give it another shot tomorrow night, I think the upgrade is neccesery.
An interesting thing happened today - I discovered that the cooler of my main system (3.2ghz prescott) had been dead and not working for unknown amount of time already. One would'v thought that considering the "oh-so-hot" Prescott CPU would'v overheated by now under the workload, but no - it has probably ran w/o any cooling at all for a month or more, and I didn't even notice. Temps seem to be around 60-65C, which is normal for such cpu's ... funny :)
Wednesday, November 23, 2005
BT updates and some API changes
Some 12 hours of coding and over 20 checkins today, showing some progress. Main targets were fixing open issues with BT, and moving towards hiding the internals of BT from the user - namely the spammy contents of the torrent. While I didn't get to actual hiding, most of the pre-work was already done, and I figured out a way to do the hiding (e.g. "package" concept) using existing API's, without creating any new systems. Namely, I realized that we still have that virtual filesystem / Object tree thingie at our disposal, AND PartData is-an Object, thus it can marry and have children. And then we'll just stuff the children in the back room, tell them to play quietly, and nobody will ever know what we did. Unless someone really
wants to see them, in which case he/she can ofcourse see them.
However, before we can actually do that, we need to make sure that when the parent dies, it takes the children with it (orphan children? not a good idea). Also, if one parent dies, it's most likely that the other must die as well (widows ? not a good idea either). So if PartData dies, SharedFile may still survive (he's the strongest side of the relationship), but if SharedFile dies, PartData shall die as well (out of sadness and misery).
Raising children is hard work, and recently, we had a lot if issues there, with children getting corrupted by the evilness of the world outside. The problem was that we attempted to teach them the same thing multiple (three) times, twice too early and once too late. Children shouldn't learn about some stuff
too early in their life, and repeating the same lesson too many times tends to backfire... Anyway, this no longer happens.
Another thing is that we attempted to raise the children all the same, not caring about their historical backgrounds, so they all ended up at the same location. With one or two children, this wasn't an issue, but some ppl are raising hundreds of children at a time, which created a HUGE mess once they grew up. Now they are all placed into different locations, based on their backgrounds. It'll be possible to manually specify target locations for them in the future, via additional settings during birth.
Monday, November 21, 2005
It was brought to my attention yesterday that the continued mail-server downtime (over which I don't have any control, unfortunately) also affected forum registrations - apparently, they haven't worked for over three weeks already. I manually activated about 10 accounts that were registered during past 3-4 weeks, and removed the email-based activation for now. While the forum software still complains that "you cannot log in w/o activation", actually you can now.
Anyway, I did some math here, and realized that it'll take another 18 months at this pace to complete Hydranode project. Put it this way - 1 month more for BT, then 4 months for cgcomm + GUI, 4 months for Gnutalla (or G2, if we choose that one), 4+2+2 months for emule_kad, azureus_dht and mainline_dht (the latter two reuse code from the first, so take less time). Add 2 months for miscellaneus unexpected delays or features (FTP for example), and we have total 18 months to finish the project.
It sounds like a lot of time, and few months back, I would'v come up with considerably smaller numbers, but practice and experience has shown and teached me how long things take time in reality. Hell, 1.5 years ago when I started all this, I actually thought I could get the core + ed2k finished in under two months, but it actually took 12 months. So another 18 months to complete the afore-mentioned things is pretty realistic assessment, considering the current rate of 4 months per fully-featured module. Things might speed up as other components mature, or external contributions / developers join, but those factors cannot be relied upon - only thing that such calculations can be based on is me working alone on things, and 18 months is what is realistic.
Developing Hydranode costs money, I also live in the real world where power, net access and food cost money, also hardware upgrades are sometimes neccesery, which in short means I'll have to find a way to finance this project for longer term than originally planned. Currently, it seems my financial situation will improve starting from January/February, but until then... anyway, the current target is 0.2 release, with BT support, before xmas. The complex things are finished, things that need attention is performance and usability (the latter is the weakest right now, but performance also needs tweaking). After 0.2 release I'll most likely take a 2-3-week vacation, to rest and possibly earn some money with other things.
Thursday, November 17, 2005
Yesterday was somewhat less active coding day, still focused on bugfixing the BT module. I still have some issues open related to cross-file hashing, which is the reason why I'm not making new builds just yet. Still, a number of bugs were fixed yesterday, for example we now correctly handle torrents exceeding 4gb size, also I finally figured out why some clients were sending chunks of length 0 always - apparently I had forgotten to take the remote client's chunk-map into account when selecting chunks to download. Furthermore, as I had split the request-generation code (rather resource-intensive currently) into 4 separate functions the day before, yesterday I was able to eliminate one of them completely - getLeastUsed(), and incorporate the logic in there into getNextPartial(). getLeastAvailable() already takes use-count into account, so we won some cycles there.
A few days ago (on 15th), Hydranode became 18 months old, which was also the original estimated development time for the project, I figured this is appropriate time to do some statistics on the codebase, to see the amount of code produced during this time, development speed and so on. So here's the details.
Total lines of code: 53223
Lines of executable code: 35783
Lines of comments/docs: 17450 (32%)
Executable code per component:
- hnbase library: 7673
- hncore library: 7437
- edonkey module: 8292 (excluding the crypto code)
- hnshell module: 1913
- bittorrent module: 3319 (alpha status)
- http module: 2036 (beta status)
- core/gui comm module: 1011 (pre-alpha status)
- ed2k/kademlia module: 782 (pre-alpha status)
- directconnect module: 683 (pre-alpha status)
- email notif. module: 122 (beta status)
- ftp module: 281 (pre-alpha status)
Average +code/month: 2901 ( 1987 when excluding documentation)
Average +code/year: 34812 (23844 when excluding documentation)
Development time: 18 months
- 1 fulltime dev
- 1 parttime dev
- 3-4 random contributors
- 5-10 active testers
Average ~50 checkins/week in SVN repository (tops at ~70)
To give you a comparison value, aMule v2.0.3 is 93547 lines total, out of which 22847 lines (24%) are comments/documentation. eMule 0.46b is 142039 lines total, out of which 20820 (14%) lines are comments/documentation. eMule is roughly 46 months old (started early 2002), which means 3087 lines/month, 37044 lines/year development rate. aMule is based on eMule codebase, so time-wise calculation can't be done there.
What does all this show? First, we see that when we include documentation in coding-speed calculation, Hydranode is near eMule, and considering Hydranode has 32% documentation compared to eMule's 14%, we see that writing documentation is a considerable hit for development speed. We can also see that since the creation of LMule (the first eMule linux port, on which aMule is based upon), eMule has evolved faster than aMule, but aMule has become more documented, so we can deduct that aMule developers value documentation, and thus code quality over more features.
But, we'r getting off-topic here, we were looking at Hydranode's statistics. We can see that about one-third of the current codebase is in base and core libraries, which is good - large amount of code is shared between modules. eDonkey module, being the oldest and most feature-rich is naturally the biggest module, even passing core library in size. Bittorrent module, however is catching up fast, altough I estimate BT module will cap at around 5000 lines of code, since the protocol is lot simpler (and DHT implementations, both Mainline and Azureus ones, will go to separate modules). Http module, despite sounding really simple and straight-forward when you think of it, is still a considerable amount of code, due to a number of features, such as proxy, usernames/passwords, etc support.
In general, it would seem that we are roughly up-to-par, perhaps only slightly behind, with development speed compared to other projects. I'm aware of the fact that eMule actually has three active developers (maybe more that I'm not aware of), aMule has 3-4 I think, and this is not counting external contributions, but I don't think breaking the statistics up for per-developer is worth it. Also, neither eMule nor aMule have the strict 80-chars-per-line limit as Hydranode has, which affects the statistics - if Hydranode was written in eMule-style (I'v seen 200-character lines in emule code), the total lines of code would be less, thus indicating even slower development speed. Some might argue that Hydranode is much more complex application, but that's not true - complexity is relative and cannot be measured. What may be complex to one developer may be easy for other developer, and vice versa. Just as we have really complex issues with, say, BT virtual file wrappers, eMule devs have really complex issues with some wierd MFC control not doing what they want it to do - solving complexity always takes time.
All in all, while it seems really odd that a project has gone on for 18 months, and is still in it's 0.1 release series, from raw statistics it becomes apparent that we'r actually quite on-track with development speeds, compared to other clients. The illusion that Hydranode is still very young project comes from the fact that other projects have become widely usable a lot sooner, for example eMule got it's 1'000'000th download just after 4 months since the project was registred on SourceForge. This can be explained by the fact that the other clients have user interfaces built-in (or engines built into user interfaces :)), which means they became usable to wider audience lot earlier in the development process. However, with Hydranode, we have one fully-featured network module, and are developing second large network plugin, yet have no user interface to speak of, the latter of which creates the illusion.
Tuesday, November 15, 2005
Due to the rather large amount of new/changed code during past two days I was forced to dedicate today for more thorough testing, while limiting the new code flowing into the codebase. You have to let the things cool down after beating it for a while, otherwise you end up with deeper problems (notice the analogy with blacksmithing).
So, the short changelog of fixes that came up during regress-testing, as well as bunch of tickets being closed (I like to keep the open ticket count at bugtracker at a reasonable size).
-  Better self-checks in FilesList::push() method (adding shared files)
-  Always resets hash job internal buffers after finishing up (used to be done in destructor, but since we had some problems with those objects destructions, this should lower the amount of memory leakage even IF we forget to destroy the job itself, altough it shouldn't happen anymore).
-  Splitting getRange method (in PartData) into four sub-routines, in preparation for further fine-tuning and optimizations for each one of them.
-  Apparently, our DNS resolver doesn't support timeouts (altough it's part of the API), so now we just work around it in BT module manually.
-  Improved error messages for self-checks and fatal logic errors.
-  PartData API change: corruption() and onCorruption() methods now take Range64 argument (instead of two integers).
-  Black magic in BT files wrappers: fixed corruption handling routines.
- [2278, 2279] Handling downloads of size 0 gracefully (sometimes torrents contain those, ticket 151).
-  Fixed crash in HTTP module (ticket 148)
-  Additional checks before dereferncing pointers in ed2k.client subsystem (ticket 122)
- Also closed tickets 138, 146, 52 and 133 for various reasons.
Monday, November 14, 2005
Some more bits and pieces
11 hours of coding session, but the list of improvements is bit shorter today. Thing is, I lost like 3 hours trying to introduce an EventTable template specialization for shared_ptr-ptrs, but couldn't get it working properly. The original problem was that I realized that shared_ptr-contained objects that used event tables (such as hash-work objects) were never actually deleted, since EventTable engine stores a copy of the pointer internally (which is OK for raw pointers, but not OK for smart pointers). So this means we leaked roughly 32kb memory for each hash job (32kb io buffers which were supposed to be cleaned when the object was destroyed). It didn't show up too much in ed2k, altough enough to raise some eyebrows (if you downloaded a lot and fast, you noticed the leak). However, in BT, where there are thousands of chunks, this became a problem very fast.
Anyway, the idea was to use a rather clever construct - EventTable template class specialization, which is itself derived from the main EventTable class template, and overrides handleEvents virtual function. Great in theory, but practice turned out to be more complicated, since we still need some way for lookups in there, and Boost smart pointers are REALLY annoying for that thing - their operator<
, which makes these smart pointers a nightmare when you need to do lookups on objects based on their pointer value (which is the case there). I tried to work around it by overriding the containers comparison predicate, but ran into further problems - namely, we need to store the pointers in there as weak_ptr, since only weak_ptr can implicitly be constructed from shared_ptr (raw pointers can't, which would mean I'd have to duplicate most of the code in the specialization - something I set out to avoid in the first place with the inheritance). However, weak_ptr can only be casted back to shared_ptr via explicit shared_ptr constructor call (which, can throw as well), and even IF all goes right, we still have the problem - if we have two weak_ptrs (in which case the comparison-value would be 0), we break std::map unique-key rule. *duh*.
Other than that, today's changelog is as follows:
-  Fixed .dat file (for temp files) saving and loading - namely I had forgotten one 'break' statement in a switch clause during loading, and wrote wrong tagcount when writing. For the most part, this hopefully didn't cause anyone to lose their temp files (there is one condition where this could theoretically break things). Ofcourse, if you didn't use revisions between 2255 and 2259, you aren't affected.
-  Various additional self-checks related to chunk-verification code; should now properly syncronize/rehash formerly completed, but currently not yet marked as "verified" chunks. (ticket 136)
-  verifyRange() method now also allows disabling the file saving before posting the hash job, to speed up cases where we need to rehash thousands of chunks in same file (no point saving it thousands of times in row).
-  Handles duplicate data (e.g. near the end of download) more gracefully now.
-  Decoupling ChunkCacher init from standard PartialTorrent init (in preparation for possible further decoupling of those two into separate classes).
-  Fixed a number of bugs related to cache files; if you have partial downloads, expect to see a lot of cache verification failures on first start - the cache engine behaviour before this patch was very broken.
-  Properly updates child m_verified ranges when parent verifies ranges, and properly updates parent's m_verified ranges when child verifies ranges.
-  AutoStartTorrents config setting now takes effect immediately on runtime.
-  Fixed Content-Disposition header parsing (in http module), which incorrectly added one " symbol to the end of filename.
Those that have forgotten, AutoStartTorrents configuration setting tells BT plugin to automatically start downloading torrents that were downloaded by other modules. So with this setting turned on, if you download a .torrent from ed2k, or via http, or any other means, BT plugin will start downloading it. It doesn't affect normal shared .torrent files - it only triggers when a download is completed, and filename ends with ".torrent". Hence, it's possible to type "do http://some.url/blah.torrent", and it first downloads the blah.torrent file using HTTP module, and then starts the torrent download using BT module.
Sunday, November 13, 2005
Various things at various places
I had a rather interesting post planned for today, had several topics in mind, but well, after 15 hours of coding, I seem to have forgotten all of the interesting stuff, and all I can offer at this hour is a changelog. Sorry, maybe some other time then.
-  Ed2K message filter can now be customized by pointing config option MessageFilter to a file containing strings to be filtered, one per line. Hard-coded message filter now includes two more filtered strings. (tickets #145 and #59)
- [2238, 2239] Removing some too aggressive exceptions and error messages.
-  Experimental support for reading "nodes" field from .torrent files (ticket #139)
- [2241, 2246] Better error-handling in tracker communication routines.
-  It's no longer possible to download same torrent twice simulataneously :)
-  Regularly saves server.met and clients.met now (17 and 12 min intervals, respectivly) - formerly they were only saved on clean shutdown.
-  Fixed clients.met corruption issues on win32 (was writing the file in ASCII mode *doh*) (ticket #128)
-  Getting rid of the pesky ed2k.ini file; the settings formerly stored there are now stored in global config.ini, subsection [ed2k]. Automatic settings migration is performed on next startup, settings copied to new location and ed2k.ini moved to ed2k.ini.bak.
-  Moved the entire ed2k module into Donkey namespace (each module should be in their own namespace, but ed2k module predates that requirement, so it never had it's own namespace).
-  ed2k_kad module is now excluded from build, since it's not being developed and it's not working.
-  Fixed announce-list .torrent tag parsing (ticket #147)
-  Fixed a number of issues of Chunk's (in PartData) being out of sync or lacking information, due to insufficient syncronization code. This is also the cause why Hydranode started using 100% CPU near the end of BT downloads, and most likely caused increased (but not noticable-enough) CPU usage near end of ed2k downloads as well.
-  Fixed error-handling code when reading partially corrupt partdata .dat files.
-  Verified ranges are now saved between sessions. This has no effect really, besides added security against some theoretical race conditions, however the implementation needs the verified rangelist now to be properly updated (see previous bullet), so it is now saved as well. A side-effect of this patch is that all current downloads will need to be re-checked ONCE to re-verify all completed ranges.
-  Improving PartData API by adding a set of new accessors (for advanced usage), which are replacing the former protected interface, which was proved to be to error-prone.
-  Some black magic at BT virtual files wrappers.
PS: Yes, I'm aware I'm not online on IRC or MSN, and no, I don't read my mail, and yes, my cellphone is turned off. It's called "busy, working". I'll be back to communicating... some day...
Saturday, November 12, 2005
New builds and updates
Due to some important updates for HTTP module, new builds are now available for Linux
, and Windows
, as well as source package
. The updates include improved support for various different webservers that do things "their way".
Other than that, it's been quiet in the development front; I'm gathering energy to finish BT module, which is a really frustrating and boring stuff.
Performance tests of ed2k module are now showing that we'r nearing eMule's performance; while we'r still somewhat behind in some cases, we'r catching up. The lack of Kademlia is slowing the "startup" time, where eMule gets say 600 sources via Kad instantly, while it takes about a hour for Hydranode to get that amount of sources via global source queries and source-exchange. Sadly, Hydranode Kademlia module development was halted, since the developer doing it no longer had time for it, so it's unclear when Hydranode will support it. One thing is clear, I won't be coding that anytime soon, since there are other things that require my attention for at least 4-5 months - BT, cgcomm, GUI and FTP come to mind.
Wednesday, November 09, 2005
Photoshop and GUI design in general
Learning Photoshop is progressing nicely; learning from mistakes and understanding the problems. I realized my colour-sense is way off course, and needs a lot of work. I also seem to be way over-using gradients for 3d effects, while there are actually lot of other effects that can be used to achieve the same thing - almost any effect or combination of effects can be used to create a small 3d-look. As was expected, the first attempt at the GUI drawing in Photoshop is being discarded.
Anyway, I decided to approach the GUI problem in the similar way as I'v approached core designs in the past - with a thorough analysis and design document. While the analysis and design topics have been active for a long time, now is the time to put all that stuff together into a single concrete document, that will later serve as reference during implementation. Today I layed out the basic structure of the document; about 5 chapters, 15 sub-titles, plus prologue and epilogue, so it should result in around 30-page document describing various topics in Hydranode GUI design, as well as proposed solutions.
Tuesday, November 08, 2005
Since coding Hydranode has become a full-time job (for free, I might add), I seriously need a hobby to get my mind off the boring, complex C++ stuff. So I decided to pick up Photoshop and get some skills in that area. There are other motives behind it as well - a modern programmer in today's computer age must be capable of at least modest Photoshop (or similar tool) skills, for visualizing and designing user interfaces - you can't write user-space software w/o user interfaces.
One thing that I'v never understood is how do people create icons - working at 16x16px resolution seems like serious black magic; however I discovered that if your interface is heavily skinned, the importance of icons is reduced significently. Icons have huge impact on native-controls-based interfaces, since there they add visual appeal to otherwise boring native controls, but if your custom theme is appealing enough, icons become less important - look at iTunes interface for example - it has nearly zero icons, but nobody can call it ugly.
Anyway, as mentioned roughly two weeks ago
, I have most of the GUI design stuff on paper, so what I spent today's night on was attempting to digitalize that stuff, and draw it in Photoshop. Four hours of (fun) work, and I have most of the basics up and running. The following days will most likely be focused on this topic as well, unless I feel an irresistible urge to go back to the code.
I'm not going to show you what I'm working on on the GUI side for several reasons. First, my whole 9-hour photoshop experience isn't something you'd like to look at (or me to show anyone outside a closed circle), secondly I'd like to finish the concepts before going public, and thirdly,
considering that it would take about two months of code to get the GUI running at all, plus time for the skinning engine, plus time for actually skinning the damn app, I just don't want to publish the GUI concepts I have planned so much ahead of implementation time. Furthermore, while I have sufficient software engineering and C++ knowledge to defend my choices in code, 9 hours of Photoshop experience is definately NOT sufficient for me to defend my choices. It is most likely that someone with more experience will later re-do the skinning (I certainly hope so), but right now I can spend time off the code while still related to project, do usability-oriented design, and actually have fun for a while. All work and no play makes Madcat a dull kitty.
So stay tuned for updates, but don't expect any (public) screenshots until the time is right.
Monday, November 07, 2005
Despite the lacking blog post past few days, there is a fairly normal level of activity going around in the background. SVN is back up, and seems it's working for good now, no more crashes. Wubbla has been busy improving the HTTP module, the latest news is that the mirror-searching engine was moved to a separate, standalone plugin. Whether or not it will be included in the official builds is currently being debated, but I'm currently thinking it should be available as an additional download/add-on. Not everything should be in official tree - quality over quantity.
The publishing of BT module for wider testing, and moving it from Pre-Alpha to Alpha state caused some activity at our bug-tracker, which was expected. Most of the reported issues were fixed within hours, while some more complex ones are still open, pending deeper investigation.
ed2k module and related things also saw some minor bugfixes; the most serious one was the final discovery of the origins of the "chunks without hashes" warning that had been popping out sometimes (also caused downloads to stall at 100% sometimes). The actual reason behind this is that we have a mess at MetaDb, and duplicate entries sometimes, which in turn caused ed2k module to request hashsets from clients (based on information from one MetaData object), but then drop them, since adding them to a different MetaData object failed (they were already present). I fixed the symtoms - checks and additions are done with same MetaData object now - but the deeper cause for this needs a more serious investigation. This isn't the first issue I'v discovered in MetaData / MetaDb subsystem, and that subsystem is one of a very short list of core subsystems that have been unchanged since the dawn of time, so they have been marked for rewrite for quite some time now - just collecting enough issues/bugs to squash them all at once once we get there.
For the record, in addition to MetaData subsystem, the only other one that I can recall right now that has been unchanged since the beginning of the project is Config subsystem (a fairly small one), and the issues there are mainly syntactical - it's inconvenient to use, and the parsing code uses STL, while it should use Boost.Spirit. However, it's unlikely those topics will be addressed in fore-seeable future, since other things have higher priority.
Real life issues have been plagueing me again, and coupled with the recent SVN downtimes, there's been little "progress" in the general term (bugfixing/testing/maintainance can't be called progress). Continued work on the BT module requires lot of energy and concentration due to the complexity of things there, so it requires real-life issues to be out of the way.
Friday, November 04, 2005
Speaking of consequences, seems the server change had bigger consequences than originally planned. SVN is still down (on the bright side - it was up for four hours yesterday!) and mail still doesn't work (so if you'v mailed me during last few days, I haven't received it).
Since SVN access is down, here are new builds for Windows
, as well as source code
Most it's been bugfixing after the optimization cycle, things seem to be back under control (except for an issue with upload slots - too many of them). I also did some cleanup in hnbase module, networking subsystem - removed lazy disconnection functionality (it allowed marking a socket for disconnection after all pending data has been sent out), and removed connect by hostname functionality (allowed doing socket->connect("www.google.com", 8) -like calls). The reasoning for the removal of those two is simple - they are error-prone, their usability questionable, and near-impossible for the API to do the right thing
under all usage scenarios.
For the lazy disconnections, what if data is received when the socket is marked for disconnection? The client code would want to handle the data most likely, but that would result in a disconnection at some random time. If you want a delayed disconnection, use socket timeouts instead - that's what they are for.
As for the DNS resolution integration within the networking API, it just can't do the right thing in case a name resolves to multiple IP addresses. What should the API do? Connect to the first found IP, but what if that fails? Connect to next one? What about timeouts? These decisions should not be made by an API transparently, these should be made by client code, which makes the integration pointless, and even harmful. There's a good and working API for dns resolution, use it - that's what it's for.
Thursday, November 03, 2005
It would have been naïve to think that this large amount of optimizations wouldn't have some consequences. Let's face it - the entire last patchset were technically optimizations, even through they involved bugfixing - or for that matter, isn't optimization
merely a type of bugfixing, where you fix performance issues?
Anyway, I'm getting off-topic. The last optimization set did have consequences, some more, some less pleasant. First of all, we seem to be opening too many upload slots now, then I get reports of really wierd server-connecting behaviour; then I get some utterly odd crash somewhere deep inside scheduler (triggered by our new forced TCP reask code, which is used when our ID changes); and on top of it all, a quick valgrind run showed rather large amount of memory leaked, apparently from ed2k client sockets not being deleted. And now I have hnsh freezing on windows on random occasions for no apparent reason. Duh.
I did some quick tweaking around BT tho, to get my mind off the ed2k issues for a while; port is now configurable, and it's capable of starting downloads directly from HTTP links (requires HTTP module to be loaded to work). Both settings are in config.ini, names TCPPort
(integer) and AutoStartTorrents
(boolean). A small optimization was done as well, related to the HAVE message spam we are inevitably receiving from peers - previously it unconditionally caused us to generate a test-chunk-request, to see if we'r interested in the client, but creating a chunk request is an expensive operation. Now, the test-request is only generated if the published file range is actually incomplete on our side.
Tuesday, November 01, 2005
More ed2k performance analysis and improvements
Since we kinda got into the "groove" already with the ed2k performance tuning topic, and since we have the tools for doing it in a fast manner, while having proper statistical results after each change, we continued the topic today with chemical, with very positive results. Overall, we noted radical overall performance improvent in downloading, queue and upload management. While your milage may vary, the list of fixes should say enough:
A very special thanks to chemical for assistance in testing and analysing the hundreds of megabytes of logfiles Hydranode produces, and the perl scripts used for analyzing that data.
- Use 2-minute socket timeout when transfer is in progress, since upload/download slots are very valuable in ed2k, and eMule often puts clients in "stalled" mode (in upload-list, but no data transmitted) for longer periods of time. Two minutes ought to be sufficient to survive those stalled situations.
- When downloading was in progress, and client sent us QueueRanking, this means we are put back to queue. Hence, schedule next reask as normally (30 minutes). Prior to this patch, we never re-asked clients from which we received data previously, unless it connected us first.
- In Bittorrent module, increase socket timeouts to 140 seconds by default and only use the short 10-second timeout when torrent has more than 50 sources and transfer is in progress. This improves performance when downloading rare torrents, where peers are rare and valuable.
- When the remote client hasn't reasked us during last 60 minutes, only drop it from queue if it's not a source for us. This is more fair behaviour than previously, where the client was dropped from our queue, but we still expected to be in it's queue.
- Fixed client's credit score calculation, which was very wrong due to a subtle code error.
- Go back to 10-sec interval for upload slot opening check - 5 seconds interval caused too many upload slots to be opened.
- When socket timeouts while downloading was in progress, we correctly re-schedule next reask to T+30min again. This wasn't done before, and caused us to never reask clients from which we previously downloaded, unless they contacted us first.
- When socket timeouts, and download request was only half-way done, still schedule next reask to T+30min.
- When socket timeouts, and we were uploading to the client, put the client back to queue, instead of dropping it's upload request completely. The former wrong behaviour was based on the wrong assumtion that if we sent the client some data already, it is no longer interested in getting anything from us; now they are correctly queued again.
- When socket timeouts, reset upload state as well; not doing this caused us to upload to same clients over and over again, since when connection is established, the code checks for upload-state member, and if present, it indicates that we should start uploading to it. The effect of this wrong behaviour was that we ended up uploading hundreds of megabytes to nearly same set of clients.
- If the remote client doesn't have any parts that we'r interested in (NNP client), re-establish connection with the client in one hour to see if it has something for us now. Former wrong behaviour never reasked the client.
- We no longer attempt to use UDP protocol with clients which do announce support for the extended UDP protocol.
- Disabled the early disconnection system, which attempted to do smart checks and cut the connection when both parties had sent/received all requests. This system was originally implemented as an optimization, to speed up initial source queries, where possibly thousands of clients need to be connected, and waiting 30 seconds for socket to timeout seemed a waste of time. However, as it turns out, this system was source for a number of different bugs, some of which were patched, some of which were worked around, but at the end of the day, it causes more problems than it solves. For example, it caused erronous disconnections in race-conditions during secure ident verification, caused the last packets we sent out to be missed, resulting in eMule considering Hydranode source as QueueFull, and many others.
On other news, I'm running another ed2k+bt cooperative downloading test, the test torrent includes about 9 files, and hashes for ed2k network; there are about 30-40 peers on BT for this torrent, and total of 150 clients on ed2k for these files. It's really interesting to watch how well ed2k and BT protocols complement each other in case of rare files - sometimes BT download stalls for hours, but the ed2k kicks in, and sometimes vice versa - resulting in overall steady download-rate of 20-80kb/s, which for this rare stuff is pretty good.
Bottom line: There are new optimized builds available, for Windows
, including BT module, which is now open for public testing. While there are some known issues with it, I think it's good enough to warrant more wider testing. Additionally, Linux debug
build is available (19mb download, over 100mb when unpacked), and for completeness, source code tarball
. Enjoy :)
September 2006 Current Posts