MLDonkey Downloads Import Module Development
Development in progress.
Wednesday, March 30, 2005
Been mainly in bugfixing state (still :(). The build system problems are still around, altough at least it compiles (almost) cleanly again ... god I hate autotools :( Basically, at the end, I had to revert several changes, so at this point, we'r back where we started - its still not possible to link against dynamic boost libraries, and it's still not possible to build modules into core. Bah.
- [hnsh] [patch by sca] Adds support for escaping names with \ and/or ""
- [core] Avoid SIGPIPE at few more network calls; misc stuff.
- [ed2k] Fixed incorrect CHECK_THROW; closes bug #41
- [ed2k] Packed downloads reenabled.
- [core] Socket::getAddr() returns the local address of the socket. In case of SocketServer/UDPSocket, this is mostly useless, however, in case of a CONNECTED TCP socket, this _may_ return our external IP, IF we are connected directly to internet. Doesnt work behind routers tho. (thanks to HellFire for the proof-of-concept).
- [ed2k] Use rotational chunkrequests with eMule clients.
- [core] Added operators <<>> to SocketClient class.
- [hnsh] HNShell now uses raw SocketClient - we need fastest response times here.
The main problem (besides the pesky build system), is the fact that for some reason, we are receiving <500kb from each client that accepts our download request, and then get disconnected. The downloadsessions average at below 500kb range ... while they SHOULD average at at leaset cpl MB's region (Mules usually disconnect you at 9500kb). So - *doh*. Probably a side-effect from the rushed-in changeId() signal stuff, altough I still do not have any proof of where the bug came from, or why is this happening. I upgraded some trace code around Client class to provide a more greppable output, in order to provide me with better overview of things in there, guess I'll analyze the 700+mb logfile in the morning.
Sunday, March 27, 2005
Thoughts/ideas on build/install system upgrades
Been somewhat slow day ... the four entries mentioned in last blog post are things that have been delayed and delayed and delayed for ages, because of some problems regarding them, so now I'm trying to figure out possible solutions.
First, we have the build system ... I think the best approach here is to go with recursive configure; each module has it's own configure script, so they can be built outside the main source tree aswell as in the source tree. For compiling modules in, naturally the module must be built within the main source tree.
While we'r at it, perhaps it's time to finally separate the base library, and the p2p-related code. Namely, hydranode includes a set of generic classes / cross-platform wrappers (networking, range management, config, logging etc), which are truly generic and are useful in any context, not just hydranode. I'v kept the code separate, but never actually got around to perform the physical separation.
So, the end result should be like this:
a) Base library, called libhydranode (or smth similar), which includes cross-platform API plus a general-purpose class library. This also includes our current MD4, MD5 and SHA1 transformers ... will later be replaced with Botan library calls instead.
b) HydraNode executable; includes FilesList, SharedFile, PartData, MetaDb and so on and so forth - e.g. everything else. Links against libhydranode (either dynamically, or builtin - static linkage + modules will probably cause problems).
c) Modules; each module has it's own configure script; modules require libhydranode and HydraNode headers for compilation; modules can also require headers from other modules for compilation, in which case we'r dealing with inter-module dependancies. This is still somewhat a grey area - if we leave the required-modules loading to the host OS, we have some problems regarding module initialization - we can't call the init functions well ourselves ... tho I think there's smth like _init() or like that that the OS calls on module init ....
With make install, we would install libhydranode headers into usr/include/hn, hydranode executable headers into same place (no overlapping files ... tho maybe it's better to still separate them?) ... modules would naturally go to /lib/hydranode/ (would love to put them into /lib/modules/hn, but I think /lib/modules is dedicated to kernel-modules only?) ...
Any other ideas regarding compilation/installation/etc topics?
Note: All of this only affects POSIX platforms, win32 is completely different topic - for win32, we have to release a separate HydraNode DevKit, which includes all the neccesery .lib files and headers - that's a completely different topic.
Friday, March 25, 2005
Argh, lost my BlogPostTitleGenerator(tm) :(
While originally I was supposed to continue with my 0.1 release roadmap (4 entries there), today morning I discovered hydranode had crashed, and after extensive analysis, it came out that we were corrupting our main containers in ClientList. The thing is, ClientID updates were either delayed, or not properly updated at all in ClientList, which led to gradual, but definate screwup of the maps structure, which in turn led to lookups starting to fail, and all hell breaking loose. So I spent several hours tracking this thing down, debugging, and finally realized that only way to solve it finally and properly, and avoid such things in the future, is re-structure the subsystem. The re-structuring could be coupled with the ED2KFilesList idea that I'v been having for some time now - the idea is to write a wrapper API around PartData objects, which would allow us to keep track of sources-per-file (currently not possible), and thus start doing A4AF and other nifty stuff. However, that would take at least a day to fully design, plus several days of implementation time, so I think it has to be pushed past 0.1 release. With those thoughts, I headed down to the original problem again, and implemented a short-term, working solution - using signal to propagate the idchange up w/o any delays.
The current 0.1 roadmap includes 4 items:
- Fully re-enable compressed downloads (currently supported, but the support is not announced, since if we lose connection in the middle of transfer, we lose entire 180kb chunk due to non-dynamic unpacking code)
- Implement configure options and the underlying scripts to allow building modules into core, as well as perform fully static application linking.
- Format the lists in hnsh better; key items by numbers, allow operating on items using those identifiers. This is first phase of hnshell upgrades - second phase (some time in the mysterious future) involves full tab-completition support, and operating with string-keys.
- Actual credits storing/bookkeeping. While we have SecIdent, and queue calculation engine already takes credits into account, we still don't store uploaded/downloaded data in credits yet.
Full list of today's patches merged to CVS:
- [core] logFatalError() now also prints full pretty function name before abort()'ing
- [ed2k] Dropped passive sources aquisition for UDP ReaskAck packets - not worth the effort.
- [core/ed2k] Reduced header interdependancies with forward declaration; affected over 30 files over the codebase; now less files need to be recompiled on headers changes.
- [core] Enabled automatic stack trace and log trace printing on crashes, to aid in bugreporting when hydranode was running outside debugger. But don't rely on this - the stack trace is very primitive, and provides only very basic information - full gdb backtrace is still the only way to fly.
- [ed2k] Removed target<>() mechanism from ED2KParser (not used, simply useless noise, taking up resources).
- [ed2k] Client::changeId is now a signal (concept identical to QT signals). ClientList containers are now fully synced with id changes - no more zombie-clients, no more broken lookups etc.
- [core] Added accessors for SchedBase::getConnLimit(), get***reqCount().
- [ed2k] Added getLowCount() accessor into FoundSources packet
- [ed2k] Avoid choking ourselves with connections when receiving more than connLimit sources from server at time. Now we make sure we leave enough free connections to be able to accept all LowId callbacks that we requested. The HighID connections that weren't possible to perform right after receiving the sources are delayed for 120 seconds, and performed then. This should somewhat raise the amount of successful LowID connections when running close to connections limit.
Thursday, March 24, 2005
Some optimizations here, to keep cpu usage down ...
- [core] API addition: SSocket::getData() method, which returns the current input buffer, thus avoiding the (possibly costly) string::append() call, and performs only string copy (implicitly shared data on major implementations, so no data copying)
- [ed2k] Silently ignore few more race conditions which are inherent from remote clients sending packets in wrong/different orders.
- [ed2k] Added some early returns and misc optimizations to packet parser, to avoid unneccesery parse attempts for incomplete packets.
- [ed2k] Dropped -fomit-frame-pointer compile flag, since cryptopp breaks with it
- [ed2k] cryptopp code fixed to also compile with -pedantic flag (enforces strict ISO C++)
- [core/ed2k] Don't create random number generators in function scopes; the reason is that allocating rng's can be costly, and they perform better if they are not reallocated after each use.
The CryptoPP stuff is giving me headackes ... it's currently still spitting out hundreds of warnings when enabling -W -Wall ... but luckily, we have an alternative - Botan
. Fully featured, ISO C++, portable cryptographic functions library. Initial review of it showed it's exactly what we need - it's real nice, it's compact, nice API, etc ... in fact, I haven't found anything wrong with it yet ... sounds too good to be true ... but guess we'll find out soon enough.
Currently I'm trying to add --with-[module]=[builtin|module]
configure options, to allow compiling modules in, or as ... modules ... or not at all - just as linux kernel modules. Ran into some disagreements with autoconf on the topic, but I'm roughly half-way done by now ... I'm not sure how scalable autoconf will be for this situation - specifying --with-[module]=builtin
for ten modules and --without-[module]
for another ten modules can quickly get tiresome, but I guess we'll deal with that problem once we get there. (Yes, I'v considered using linux kernel built system, and will re-consider it when we reach the amount of modules that autoconf is no longer scalable enough.)
On other news, theSilva reported successful hydranode compilation on P/100 (yes, 100mhz pentium), 64mb ram, running Slackware 9.1. The compile time was roughly 5 days, hydranode ran successfully. All hail the mighty pentium-100 ;)
For the record, chemical has been using HydraNode on his dual pentiumpro/200 for months now, and reports ~10% CPU usage with full debug/trace code enabled, so, while still needing some tweaking and optimizations, hydranode can be used on such low-end hardware.
Tuesday, March 22, 2005
Merged the secident code to CVS now, along with bunch of other fixes. Here's the list:
- [core] Some platforms (e.g. linux) don't support renaming files across platforms; use explicit copy + delete if rename fails during file completing (e.g. when temp/incoming are on diff partitions)
- [ed2k] Cleanup servers on shutdown
- [ed2k] Better formatting of ServerMessage if many messages are received in row
- [core] API addition: Utils::putVal, accepting pointer to uint8_t array
- [core] API change: Hash::getLength() renamed to Hash::size() to be closer to STL naming style.
- [ed2k] SecIdentState, PublicKey and Signature packets implemented
- [ed2k] PublicKey class implemented; uses implict data sharing and raw uint8_t array for implementation to provide best memory/speed (std::string would have memory overhead, and Hash wouldn't allow runtime-detected size).
- [ed2k] Secure Client Identification support
- [ed2k] Do cleanup in ClientList on shutdown
- [ed2k] All constants (e.g. timeouts etc) used in inter-client communication are now declared in single place, and fully documented, allowing easier tweaking.
- [ed2k] LowID callbacks timeout increased to 60 seconds.
- [ed2k] Ensure that we NEVER EVER try to reask more than once during SOURCE_REASKTIME timeframe (default is 30 minutes)
- [ed2k] All information (that is needed) is now copied over during client merging (formerly some info was left behind to the dead client)
- [ed2k] During client destruction, handle exceptions coming from PartData::delSourceMask()
Now, the implementation uses cryptopp lib, but we have a subset of cryptopp now in modules/ed2k dir - cryptopp.cpp and cryptopp.h. However, those two files will need a lot of processing, because:
- They total 16'000 sloc, and add 800kb to binary size in release build (2-3 mb in debug build)
- They don't conform to hydranode coding standards
- They don't compile with -pedantic flag (enforces strict ISO C++)
- They spit out ~500 lines of warnings when compiled with full warnings turned on (-W -Wall)
- They break (crash the app) when compiled with optimizations (-O3, probably with -O2 too)
The basic idea is to get rid of all the libraries intermediate classes, and leave only the actual classes we use. There's a huge class hierarchy behind all that, which is no doubt useful when dealing with full library, however is completely irrelevant for us, since we technically only use a few classes. So I'm hoping to reduce the code size at least 50%, perhaps even more. And with that code size reduction, the crashes with optimizations can also be addressed.
PS: The last patchset (excluding crypto lib code) was +800 sloc, raising our (own) codebase finally above the 30'000 sloc line. We were close to 30'000 once before - cpl months ago - but then I basically thew away / replaced ton of code, dropping us down to 26k region. But now we'r back, and exceeded the 30k mark for the first time :)
Monday, March 21, 2005
Almost done with SecIdent
Most of the SecIdent is done now; hydranode currently successfully verifies roughly 90% of secure identifications; the 10% that fail are related to hydranode internal problems regarding clients merging (during lowid callbacks), which is somewhat flacky atm.
As discussed before, Crypto++ library is the only option really we have for SecIdent - only real alternative would be openssl, but it's licence doesn't work for us (because of win32 port). So, to avoid having such big, non-standard, external dependancy, I'm using a small (16'000 sloc) extract of the library (found in aMule source tree). I looked through it, and large amount of that 16'000 can further be dropped - they are basically large, generic, tempalte-based design classes, which in our case are irrelevant, since we only use a small part of the library, so I can simply drop all the intermediate classes, and leave only the minimum amount we need.
The code isn't in CVS yet, because of the above-mentioned things, and while secident and credits stuff got a big boost today, some minor updates are also needed towards actual score-keeping and credits-calculations, as well as lots of testing.
Sunday, March 20, 2005
Nearing the end...
Every time you think you see light at the end of the tunnel, as you near it, it always seems to fade away and move farther away. We'v been on this path for long time already ... really frustrating, I might add. But I guess that's the thing with entering the arena as newcomer - you need to be capable of doing everything the existing masters are capable of - masters that have been around for years already - and then also have things the existing masters aren't capable of.
In current context, we'r still playing catch-up game with emule ... but the end is near. Basically, we'r missing secident and kademlia ...
It seems the closer to the end you get, the slower things are moving. Things that used to take mere few hours now take days. Take SecIdent for example - normally it would'v taken cpl days to implement, but now I'v been sitting on it for two days already. Sure, there's progress, but cmon - 2 days for some simple feature? Guess the spring or whatever tiredness is kicking in again.
Besides progressing on the SecIdent stuff, I also managed to close a major memory leak - we were leaving around lots of zombie clients due to some internal variables not being reset properly in Client class. There still seems to be one more leak around, regarding SourceInfo member, which I hope to close also before 0.1 release.
Yes, you heard correctly ... I'm thinking 0.1 release soon. Thing is - if I don't release 0.1 soon, I probably never will. You know all those dead projects on sourceforge, which usually die either before, or shortly after 0.1 release? Wouldn't want hydranode become yet another one of those. Technically, I should'v moved to 0.1 release months ago, but no, I wouldn't listen (to myself), I still had to spend time rewriting some internal API's, and perfect things ... time well spent, no doubt, but at some point you just have to tell yourself "its good enough" and move on ...
Current 0.1 roadmap indicates that we need SecIdent, dynamic incoming data decompression and better usability in shell (cancel `download` command would be nice). Everything else just has to stay as is - we really need to move on.
Kademlia, Ares, Bittorrent and GUI are the 4 additional things I need to implement before 1.0, so 0.2, 0.4, 0.6 and 0.8 releases could be dedicated to each of those 4 features. Technically, Kad, Ares and BT should be rather easy to implement, considering we have a pretty good API implemented in core, but only time will tell what will really happen. On the GUI stuff, considering that I have to also learn QT in the process (experience with wx helps tho), it probably takes a while too. Whatever the case, 1.0 won't be out before june/july, simply because of QT4 release date, which we are bound to (can't release win32 version of GUI before QT4 due to licence restrictions, though it's unlikely the GUI would be finished before that anyway).
So anyway, back to sleep, regain some strength, and try to finish SecIdent tomorrow.
Friday, March 18, 2005
GlobStatReq implemented; UDP GetSources ineffective?
As mentioned yesterday, I was only getting 5% responsivness from UDP source-queries, so I implemented GlobStatReq packet, which is used to ping servers via UDP, aquiring some information about the servers (files, users, and so on), as well as ensure that all servers in list are alive. After implementing that, it cleared up the server-lists nicely (after about a hour - server is dropped when 3 UDP queries fail, and query is done every 20 mins), however, it only raised responsivness up to 10-20% region (11.91% on one running client, 18.38% on other running client). When doing general source-acquisition analysis, it turns out that the amount of sources actually received from UDP servers, compared to what we get from local server, is damn low - around 1-2% margin. Some statistics, generated via newly added srcstat.pl script in utils/ subdir:
-> Sources acquisition statistics:
-----> From local server: 3572 (34.67%)
-----> From UDP server: 113 ( 1.10%)
-----> Passivly: 6618 (64.23%)
-----> Total: 10303
-> Server communication statistics:
-----> Sent 439 pings, got 409 answers (6.83% lost)
-----> Sent 439 GetSources, got 83 answers (18.91% effectiveness)
-----> Dropped 2 dead servers.
-> Sources acquisition statistics:
-----> From local server: 3505 (50.93%)
-----> From UDP server: 113 ( 1.64%)
-----> Passivly: 3264 (47.43%)
-----> Total: 6882
-> Server communication statistics:
-----> Sent 749 pings, got 701 answers (6.41% lost)
-----> Sent 749 GetSources, got 88 answers (11.75% effectiveness)
-----> Dropped 6 dead servers.
So either UDP queries are indeed very ineffective, or I'm doing something wrong (again). On the protocol side, it's not that complicated at all - three versions of the packet are in use - v0 allows requesting single file, v1 allows requesting multiple files, and v2 adds filesizes to the request. And I seem to be getting respones with all those packets, so the packet construction must be correct ... ideas?
In case you'r wondering about 50% sources gotten from "Passive" - this generally happens when you restart a running core - in which case clients (in our queue) start reasking us, and are thus passivly added to sources/queue. On a freshly started client, the passive sources getting is significently lower amount (I estimate around 20%, but might be even lower).
Anyway, we'r very close to completing ed2k support. Stability isn't an issue anymore, and there are almost no additional protocol features that add to download speeds (directly that is), but we are missing some critical things that are needed to be accepted in the network properly.
Namely, credits is the biggest of 'em - I'v been holding off credits until now 'cos I wanted to base credits stuff on the new sechashes, only falling back to old userhashes as last resort. So next up we really need to impl sechash + credits stuff.
After that, there's only very minor things that need to be done - need wrapper around ZStream for more dyanic unpacking of incoming packed data (currently it's possible to lose the whole 180kb packet if the source disconnects before all of it is here); minor updates on shell side - e.g. more operations to objects, and then we'r pretty much set and can call ed2k support "completed" - sure there's more work to do there, but it's no longer urgent, and can be done later, over time, when other protocols are implemented also.
Thursday, March 17, 2005
The usual fix-set
While I haven't yet gotten confirmation that the completition-crash bug was actually fixed, we did discover that my last night's fix in EventTable pending events queue handling had larger impact than originally intended. Namely, it was using vector; handleEvents() was iterating on it using iterators, and postEvent() was push_back()'ing. However, as it turns out, ALL iterators to vectors (may be) invalidated upon call to push_back(). So, now that it's using deque, and safer iteration (using size() + pop_front()), it's way better.
Today I wasn't really even sure what to do next, so I just did some bughunting and general fixes. The list:
- Break out of merge-loop after merge to avoid merging with multiple clients. [was causing us to drop tons and tons of sources for past cpl days]
- Delay next UDP reask 10 minutes if previous fails. [The logic is that we fall back to TCP as last resort, and that's T0+50min, thus we do UDP reasks at 30, 40 and 50 mins, and if all those fail, fall back to TCP, because at T0+60min, eMule drops us from queue.]
- Silently ignore Client::establishConnection() calls if we are already trying to connect.
- Experimental support for ChangeId packet. [readonly]
- Dont try to send UDP source querys when theres no files to query for. [thnx xaignar for pointing this out]
- Clients which send us ReaskFilePing, but are not in our queue are now passivly added to queue.
- Only send OfferFiles (after conecting to server) if we actually have anything to offer. [thnx xaignar for pointing this out]
- Reset uploadInfo also when connection is lost. [was causing unhandled exceptions during UDP reasks occasionally]
- Dont send ReaskAck to clients which are already uploading. [was causing unhandled exceptions]
After this, I had also figured out where to go next - namely we need GlobGetServStat and GlobGetServDesc, because these will allow us to determine whether or not servers are alive, and thus start deleting servers from our list. The problem is, currently, if you have used hydranode for a while, you have a ton of dead servers in your list, and GlobGetSources effectivness drops to like 5% response, while it could be much higher - another hydranode with slightly newer serverlist already has 25% respones, so I figure, with proper server-clearing, we can get it up to 50-60% effectivness. Naturally, all of this will be mostly nullified as soon as we bring in kademlia, but it doesn't matter - searching for sources should be effective whether or not we have some new fancy network backend or not.
Wednesday, March 16, 2005
There's been a lot of RL disturbances over the past few days, reducing the time I could spend on the code significently. However, today I finally managed to get some coding done. The one and only serious bug we have around atm is (was) the crash-on-complete bug, which I now believe is fixed, or at least we'r lot closer to finding it out.
I found a bunch of unitialized variables (inherent from recent new features) at sockets and at ED2K::Client, which should now be fixed. In addition to that, I changed the event API events queueing system slightly, to be somewhat cleaner and safer, which also should have a positive effect.
Special thanks to chemical for posting large quantities of bugreports and trace outputs about this bug to our bugtracker
If all goes well, and hydranode doesn't crash tonight, then we can move on to new features tomorrow, unless RL hits again - got still one issue pending that'll take some 6-8-hours to handle, not sure yet when exactly will it happen tho...
Sunday, March 13, 2005
Bugfixes and global UDP support
In order to address the IO buffers bugs in Scheduler, mentioned in previous blog post, I rewrote the IO buffers handling completely in scheduler. While originally the buffers were stored as raw std::string pointers in separate maps, and located from there whenever needed, the new implementation uses shared-pointers, and the buffers are located in SSocketWrapper class, which wraps around each and every socket stored in scheduler. The result was lot safer, and somewhat faster, and cleaner buffers handling.
Since we don't really want to use assert() (being C-style, and so on), I added CHECK_FAIL() macro to osdep.h, which behaves essencially the same - calling logFatalError() in debug-build, and doing nothing in release build.
In ED2K::Client-related classes, all debug-logging was moved to trace-log, to clean up the output in release build.
Global source queries are now fully operational. The bug was rather stupid really - when choosing file(s) to query for, I was checking for FileHashType != Ed2KHash, thus skipping all temp files. After fixing that, and adding support for Advanced (send more than one hash per request), and Extended (also send filesize in request) Server UDP protocol features, we now receive sources from all servers. The server's list is being rotated, queries done at regular intervals, while ensuring no server gets asked twice per 20-minute period. Downloads list (in queries) is also being rotated, each next query uses the file that was asked for the longest time ago, ensuring all files get equal amount of queries done with them.
The crashes on file-completition are still open tho. Some tracking showed that it's definately an issue with event-handlers de-registration upon handler's destruction, but from what I can see, I'v already done everything right - the object is boost::signals::trackable, which should provide automatic signal disconnection; I'm also explicitly disconnecting all connections on destruction... and STILL I get events submitted to objects that are destroyed, leading to invalid memory accesses ... wierd.
Saturday, March 12, 2005
The list of open bugs is growing, so I'm delaying new features until those have been fixed ... don't want to introduce even more new code when old code isn't fully stable yet ... namely, we have one rather annoying bug in scheduler, where outgoing data buffer is being used after deletion ... and still one bug left in ED2K::Client, during file completition, where it seems some already-deleted client is being passed EVT_DESTROY event.
There has also been talk (already cpl days ago) about upgrading Object API somewhat. Namely, it's current public interface is too intrusive - public methods a'la getName(), setName(), which are used also by derived classes; public typedef Iter, which may also be defined in derived classes, etc. Also, the Object::Operation / Object::Operation::Argument handling currently is rather cumbersome, so that could probably be cleaned up too.
On ED2K side, the new features that are pending is still global server searches, which I'v been unable to get working. Technically, I seem to be doing all just fine - sending right packets, listening on TCP + 3 UDP port, but nobody seems to respond to my queries ...
There's been some updates on hnanalyze script (by chemical), which is now capable of displaying source counts correctly also. The statistics data is currently being dumped manually into ~/.hydranode/statistics.log, in machine-parseable format, however, sooner or later we need a decent Statistics subsystem (probably when starting GUI stuff).
One important bug that got squashed today was double-metadata creation/storage problem. Namely, we'r storing full MetaData recordset also in PartData reference file, and load it always. This is intended as a safeguard, when metadb.dat gets corrupted, we would still be able to continue our downloads. However, we were also submitting the metadata loaded from PartData every time to MetaDb, which led to MetaDb growth of +X every startup, where X is the number of current downloads. This shouldn't happen anymore... you'll probably want to reset your metadb.dat now, it might be up to thousands of entries by now (downloads won't be affected by metadb.dat reset, for the afore-mentioned reasons).
So, if we can fix the scheduler bug and completititon-crash, and Client <-> Server UDP stuff, then we can move on to SecIdent + Credits, which would complete the initial implementation of ED2K module (the missing ED2K protocol features are less-important, and can be implemented later).
Friday, March 11, 2005
Well, I can't do that kind of coding every day, so today was a slow day (guess the RL stuff didn't help much either). The updates:
- Reset reaskInProgress variable when it timeouts; was causing unhandled exceptions later on.
- Use configure to detect platforms local byte-swapping header, and use what's available. Fall back to generic macros only as last resort. (should speed things up somewhat on big endian platforms)
- Reset SourceInfo when receing UDP FileNotFound and it was last offered file; was causing incorrect behaviour later on.
- Clear bytes in sockaddr_in in UDPSocket::sendTo() method; should close bug #26
- Removed #ifdef'ed usage of bzero; memset is used everywhere now. (thnx Xaignar for pointing this out)
- Added accessors for NumShared and NumPartial to FilesList; added accessor for ConnCount to Scheduler.
- 10-second-interval statistics line updated to also display num-sources, queue size and connection count.
Wednesday, March 09, 2005
Progress @ UDPland
Seems the short break was very useful, today I managed to pull a measly 18+-hour dev-session, with nice progress at various areas. At the time of this writing, I'm alreay at 24h-uptime region, so I'll be brief (dah, I'll never run out of excuses to be brief, don't I? :P).
- [core] API change: EventTable::postDelayedEvent renamed to overloaded EventTable::postEvent(object, event, delay).
- [core] Added explicit single-value constructor for creating single-value ranges (e.g. Range(0) == Range(0, 0) -> length = 1)
- [core] New byte-swapping macros: SWAPXX_ALWAYS, SWAPXX_ON_BE and SWAPXX_ON_LE, where XX denotes number of bytes to swap. Available from newly added hn/endian.h, based on wxWidgets library defs.h
- [core] Implemented priorities for Sockets and Modules, which affect how network traffic is scheduled in Scheduler. Currently defined priorities are PR_LOW, PR_NORMAL and PR_HIGH.
- [core] Scheduler now supports Socket and Module priorities.
- [core] Scheduler now ignores traffic from/to localhost/LAN addresses.
- [core] Generalized application version handling. APPVER_* are now defined in osdep.h (okok, we need a better place for stuff like that).
- [hnsh] Properly erase prompt when log message is shorter than prompt.
- [hnsh] HNShell is now PR_HIGH by default, and uses PR_HIGH sockets to provide faster response times.
- [ed2k] Fixed QueueRanking sending when client was already in queue, but connected to us via TCP.
- [ed2k] Protocol documentation update - added UDP stuff
- [ed2k] Client<->Client UDP (and TCP if needed) source reasks implemented. We reask every 30 minutes, using UDP whenever possible, but falling back to TCP if needed.
- [ed2k] Client UDP reask timeout is now 30s. If 3 UDP reasks timeout in a row, we attempt to establish TCP connection. If that fails, we drop the source as dead one. Also, we store last reask timestamp, and dont attempt reask if last communication was <>
- [ed2k] Fixed Client<->Client UDP packets construction (was using TCP-packet format, but UDP uses different, more compact, format).
- [ed2k] More verbosity (read: greppable) to Client UDP reasks code.
- [ed2k] Initial code for Client<->Server UDP reasks
- [ed2k] GlobGetSources packet implemented.
- [ed2k] Cleanup in ServerList internal constants naming style.
- [ed2k] Finalizing global Server UDP stuff: Servers are now asked via UDP for sources. No server is asked twice during 20-minute period. During normal rotation, one server is queried every ListSize/ReaskTime seconds, e.g. for 60-server-list, one server is queried every 3 minutes.
- [ed2k] ServerList sorting by Name and Desc should now work properly.
- [ed2k] Better filtering for servers which we cant ask sources for via UDP.
Bottom line is that Client UDP stuff is coming along nicely. There's still some fine-tuning needed, but the hardest stuff is already in place. Server UDP stuff, e.g. global sources searching technically should work, however, I'm not getting any responses from the servers, so something must be wrong in my GlobGetSources() packet construction that's causing the servers to refuse them. For that reason, I disabled the packet sending in CVS right now, to avoid flooding network with bogus packets.
PS: Tomorrow I need to take care of some RL stuff (unavoidable...), so doubtful I'll get any code done then. But I guess only time will tell...
Monday, March 07, 2005
Was clearly over-worked last night, so decided to take a day off ... since I'm working w/o weekends anyway, figured I can take a day off sometimes ... right?
I did have a lengthy discussion with one of our gfx designers regarding site/logo stuff again, with no progress though ...
Oh, and a patch by HellFire - hnshell now displays full path in prompt, making it more like "real" shell :)
Sunday, March 06, 2005
Tired ... soooo tired
Well, that pretty much sums it up .. god-damn-stupid-winter and so on. *sighs*. Can't say anything good about progress today, all day been like a zombie, trying to code but everything just falls apart ... March is the evilest part of the year ... it's been cold and dark for 5 months already, all energy reserves are already exhausted, and spring is still like 1 .. 1.5 months away :( But ohwell, must be strong, must code, must finish the damn app ...
Scheduler got a nice boost ... bandwidth measurements are now more accurate, and up/down speed values are cached, which reduces CPU usage by an estimated 20%. (patch by xaignar)
In ed2k module, there had been an invalid memory access whenever a chunk upload completed ... wierd it never actually crashed in there, but it's fixed now anyway.
Were some minor problems with my implementation of ed2k speed-ratio lock - apparently, it's if (up < 10) down = up * 4;
if (up < 4) down = up * 3;
This is also fixed now, and the relevant code is now applied during ed2k module startup, so if ed2k module isn't loaded, this lock isn't in place. In long-term, we will need a more robust solution though, because this doesn't play along with our long-term plans ... load an ed2k module, and screw up your ftp/http/dc download speeds? No thank you. But for now, this ensures we confirm to ed2k netiquette.
As a requested feature by chemical, duplicate sharedfiles are now properly handled. The system is capable of detecting duplicates after a new sharedfile is added (or hashed), and combine with existing sharedfile. If there's an existing pending download in progress that's identical to the newly-added shared file, the pending download is ofcourse marked as completed instantly - no point downloading it anymore. Also, when there are multiple identical shared files, but in different locations, hydranode handles it gracefully, and keeps track of both files. If the original one becomes unavailable (or modified), hydranode automatically attempts to switch to known alternative locations for the same file.
PartData also now clears up it's mess from temp/ dir - e.g. the .dat/.dat.bak files - up until now, they were all left dangling there. But hey - no need to rush into clearing your temp/ dirs manually now - hydranode detects such "danging" files on startup now and clears them up as needed.
There was also a request for canceling downloads ... I mostly implemented it, but ... there are some problems ... canceling by filename is most natural, but we don't have tab-completition support in hnshell, we don't even have chars escaping support ... so if your downloads have long names, or spaces in them, you'r so out of luck. And I'm aware there are still some problems open with that code ... I originally didn't even want to commit it yet, but I figured - I'll fix it later, when I'm less tired ...
So see ... no progress where the progress was needed [ed2k/udp], and just some stupid features :( Oh wish the winter would already end and sun come back up... real sun, not the one that shine at you at -20C, and freezing you with an evil grin :(
Saturday, March 05, 2005
More UDP stuff; ref-counted Hash; LowID support fixed
Today wasn't very productive day ... all day been like a fog in my head, no progress in any direction ... though, in the end, I did manage to get some stuff done, despite all that...
Implemented 5 UDP-related packets parsing, namely, ReaskFilePing, ReaskAck, QueueFull, FileNotFound and PortTest. The latter isn't exactly ed2k protocol, it's simply used to test ports (e.g. eMule porttest on their website), but I figured I'll implement it anyway while at it. The rest are now being parsed, but on the handling side, it's still somewhat foggy and needs more work.
Also - LowID should now be fully supported - there were some issues regarding it (bug #12), but it got fixed. Also, we had login timeout at 3s (to create fast logins), but with LowID, you need at least 30s timeout apparently before the server assigns you LowID, so now the login-time out is 30s.
Upload/Download speed limits are now user-configurable from preferences file. The keys are in section [/], UpSpeedLimit and DownSpeedLimit. (Keys are created during first configuration save automatically). Ofcourse we follow ed2k-rules for download-speedlimits when upload limit is set < 10kb - in which case, download limit will be set to upspeed * 3.
On other topic, I played around with Hash class abit, and replaced the implementation to use reference-counted data. The thing is, essencially, nearly all Hash objects in the app are copies of each other, so this is a major memory-saver - I don't have full test results back yet, but I estimate 20-40% long-term memory usage drop. This might increase as the hashes get longer - currently we only have 16-byte-hashes in use, but think what happens when dealing with SHA1, SHA256 or SHA512 hashes :) At first it seemed to me that it caused slight increase in CPU usage, however, additional tests showed that we were running at ~3.3% CPU usage even before this update, so it's all good. Note that after turning trace code off, we drop down to around 1.5% CPU usage, with possibly more drop when optimizations are turned on.
Friday, March 04, 2005
MSVC support restored; moving towards UDP
Madcat begins some strange incantations... Madcat utters the words 'raise compiler' GCC looks around, rather silly-looking, and mutters "Hm? I'm alive already ... " MSVC raises from the ashes, and compiles again!
After some playing around in ed2k module, I was able to locate the problem that caused MSVC to crash. Seems the issue is related to boost::multi_index const_mem_fun-based indexes. After replacing them with alternative indexes, no more crashes. The exact reason why MSVC didn't like it is yet to be determined though.
On other news, we'r moving towards UDP support. Currently, we are being dropped from ALL eMule queues after roughly a hour, so if we don't pass the queue in a hour, we lose our queue position. I'm surprised we'r downloading anything at all after 5-6 hour uptimes, but non-the-less, we need UDP. In eDonkey2000 protocol, there aren't many different UDP packets, there is total of ~5 packets in Client<->Client UDP communication, and another ~5 in Client<->Server communication, so it shouldn't take long to implement. I got lot of research and pre-work done today already, so hopefully I can finish it tomorrow.
Below is the (almost) complete list of improvements done today.
- Event API: MSVC-specific workarounds in the newly added Trackable concept.
- Event API: Trackable is also now boost::signals::trackable; this solves the event-handlers removal problem - when a Trackable object is set to handle some event, it is automatically disconnected also during destruction.
- Hash API: Safer and faster hash comparisons (less virtual function calls).
- Hasher stats are now printed (again) on shutdown.
- Networking: More safety in UDPSocket::recv() implementation.
- IPV4Address constructors made explicit to avoid accidental programmers errors (inherent from its default constructor parameters).
- Scheduler: DLLEXPORT request base classes. Also removed request count debug counters.
- PartData now keeps a complete-chunks bool-vector internally to reduce manual complete-map calculation by modules (this is currently done 2-3 times during each client<->client negotiation).
- HNShell: Now handles win32-style EOL correctly.
- HNShell: No longer crashes on nmap TCP connect() port scan.
- ED2K: Support for the "Message" packet.
- ED2K: No longer crashes MSVC 7.1. Seems it doesnt like const_mem_fun at all ...
- ED2K: Experimental UDP support. Currently only in listening state, no data is sent out, and all incoming packets are simply echoed.
Thursday, March 03, 2005
Timeouts, mac support & more
Madcat begins some strange incantations... Madcat utters the words 'raise compiler' GCC raises from the ashes and compiles again! MSVC raises from the ashes and ... falls back to ashes.
Well, thanks to help from Boost.MultiIndex library author, we tracked down the problems with some GCC versions crashing on partdata.cpp - all that was needed was a minor syntactical change, and voala - it now compiles again on OSX, and should also compile on the reported SuSE 8.2 GCC 3.3.5.
While on the topic - after some testing and some fixes (related to OSX's dynamic loader handling things differently), hydranode should now be 99% compliant with Mac OS X 10.3 (don't have access to older OSX versions, so don't know if it works there). There are some minor irritations - seems some layouts are slightly screwed up in hnshell - but other than that, it works, downloads, uploads, searches - e.g. everything it does on x86. On second thought - I don't know how it really behaves on 64-bit macs - native 64-bit support implementation is planned to be done in a few weeks from now, so right now, I don't know ...
Anyway, on general code side - lots of topic has been on timeouts. The thing is, when some sockets never return back writable/readable status, hydranode doesn't do anything with them, so they just sit there, doing nothing. The problem is, soon it hits our (currently) hardcoded connection limit 300, and then everything starts slowing down. Those "dead" connections would be detected when someone attempted to explicitly read or write with them, however, since no events are emitted from them, nothing triggers anything, so ... I implemented full timeouts support into Socket API - client code can call setTimeout(milliseconds), and if no events whatsoever happen during that time, Socket API emits SOCK_TIMEOUT event.
I also implemented LowID callback timeouts, which were planned for quite some time now. This was yet another thingie that never got triggered - when a client requested a callback, it expected that sooner or later the client "calls back" to us. However, there are many reasons why this may fail, and the remote client never connects us, so again, we end up having "dead" clients around. After implementing it, I ran into a different kind of problem though. Naturally, I implemented the timeouts using delayed events, which are emitted (and handled) by Client class. However, when dealing with volatile event sources, and delayed events, the problem raises that if the delayed event gets emitted after the source object is already destroyed, we have a problem (e.g. segfault).
In order to compensate that, I implemented Trackable concept, similar in many ways to boost::signals::trackable. When a Trackable-derived object is destroyed, it invalidates all pending events emitted from it. This does not only apply to delayed events - this also includes events that are already in main event queue. So even if you post an event from your object's destructor (kinda stupid, but hey - who knows) - it won't get emitted, because the source dies. The entire system is completely un-intrusive, optional, and implemented using compile-time type-checking algorithms, so provides nearly no runtime-overhead. Special thanks go to Xaignar (from amule team) for providing some useful thoughts on this.
Next up we need some kind of Speed-o-Meter type class. The thing is, we need to calculate speeds at many places - for example, PartData would like to report it's download speed, SharedFile might want to show it's upload speed, Client object might want to show it's download/upload speeds, Scheduler must keep track of global up/down speeds, etc. The current speed-o-meter in scheduler is somewhat flawed - after I upgraded it a while back from real-time to 100ms-resolution, I left out one problem - namely, when no data is transmitted during a 100ms period, it doesn't include the "nulls" in the calculations, so all our speed-calculations right now are slightly higher than they should - explaining why I haven't managed to get the 10s averages and 1s averages in sync. So this needs some thought on implementation side.
PS: Sorry about missing last nights post... was so busy handling some internal project stuff that I don't even want go into discussing here .. that I completely forgot about the blog post - was so tired. I know there are many ppl reading this blog, so I'll try to avoid such "blanks" ...
Tuesday, March 01, 2005
Networking fixed (hopefully); no progress at dead-compiler-land
Was hoping to take today off development to gather some strength after last weeks extensive development, however, one thing led to another, and here I am again, writing blog post with progress reports. Damn you chemical :)
One topic of discussion was the problems regarding hydranode networking kinda dieing after 6-8 hour uptime. While for a long time, we were blaming chemical's ISP changing his IP address every day 6am, thorough investigation actually turned out that it wasn't the issue at all. Suspicions started raising about internal hydranode resource leak problems (noticining that memory usage had raised to 30+mb after 6h uptime). At the end of the day, I believe I found at least one bug and squashed it - namely we weren't lowering active connections counter when a socket was deleted w/o being explicitly or implicitly disconnected prior to that. While this seems to have cleared up the networking system shutting down after 8 hours uptime, there still seems to be some memory leaks hiding somewhere around the corner - with some 2000 sources and 20 temp files, I'm still getting like 30+mb memory usage, and raising...
Other topic was the compiler crashes on darwin (and on some old linux distros, e.g. suse 8.2). We'v been discussing the topics with MultiIndex library author via e-mail, however, with little progress. While Boost 1.33 snapshot does lower the symbol lengths generated by MultiIndex library, it seems it's not the only issue. Considering that Boost.MultiIndex is most useful at locations which are already big and complex to start with, MultiIndex tends to add that last feather that breaks the camels back. In current state, while most GCC 3.2+ compilers handle it nicely, on darwin they crash (request: someone owning a fast mac could perhaps try compiling vanilla GCC 3.4 and compiling hydranode with it?
) . MSVC 7.1 crashes on four files in ed2k module right now, with little or no hints towards how to resolve it. Worst-case scenario? Drop support for all compilers below GCC 3.4 on OSX and Win32. Linux port wouldn't be affected by this, and end-users should never have to compile hydranode, so it wouldn't be a big loss...
PS: Check out the new real-time speed-o-meter in hydranode output last line :P
September 2006 Current Posts