Alo Sarv
lead developer

Donate via
MoneyBookers

Latest Builds

version 0.3
tar.gz tar.bz2
Boost 1.33.1 Headers
MLDonkey Downloads Import Module Development
Payment completed
Development in progress.
Developer's Diary
irc.hydranode.com/#hydranode

Monday, August 22, 2005

Designing Bt module integration

Whenever starting a module-writing for Hydranode, there are two main areas that need work - the actual protocol implementation, and the integration with Hydranode and the other modules. For some modules, the protocol implementation is the harder part, for some, the integration. Ed2k, while being the first "big" module, is hard to qualify properly, but it tends to fall under the "hard protocol implementation" category. Bittorrent, however, falls under the other category, mainly due to it's different way of handling files.

Multi-network downloads support is only as strong as the weakest party in the mix; it's not achieved by some magical code in the core; rather, it's achieved by each and every module explicitly supporting it. For example, Http module can in theory support multi-network downloads as well, by getting the required hashes from the server (often named MD5SUMS or <filename>.md5 for example).

With Bittorrent module, this concept is taken much further. As we know, Bittorrent protocol handles the entire torrent as one big file, which is broken up to real files on client-side. Also, as already noted before, in Hydranode case, this means we will have two views for a torrent - a "flat" view, which other modules see, and the "torrent" view, that BT module itself will operate upon. The latter view is created from a "virtual" file, derived classes from PartData / SharedFile which don't have a corresponding physical file, but rather wrap around a set of actual PartData/SharedFile objects.

What does this mean from users point of view? For one thing, in the User Interface, one can choose "flat-view", to display each and every file in the torrent as "separate" download. Furthermore, the user can operate with each such download as one would with normal downloads - you could pause, resume or cancel files from within a torrent. Even further, when starting a torrent download, the user interface could bring up the list of files in the torrent, and user could choose which files from within the torrent he/she would like to download.

What does this mean from the implementors point of view? A ton of problems, to be honest. Hydranode Core API merely makes it possible to create such a thing, but it's still complex. After two days of heavy thinking on the topic, I have resolved a number of issues, but still more keep popping up.

First and foremost, we need some place to store the .torrent data. While I could store this data in MetaDb (using some custom fields et al), I don't like the idea, since torrents often have hundreds of hashes, and it would unneccerely grow the metadb.dat too large. Instead, I'm thinking of copying the .torrent files into $(configdir)/bt/ dir, and keep them there. Now, we'll sha1sum the .torrent file, and create a MetaData entry with that. It could even be a fully-qualified SharedFile object if wanted, so .torrent files could be shared on other p2p networks (exeem comes to mind), however that's of little importance right now.

What, however, is of importance, is that after we have the sha1sum of the .torrent file, we will create the PartData objects corresponding to the files in the torrent, and attach the sha1sum to the customdata field in THEIR MetaData object. Now, the "normal" downloads are loaded on startup normally by FilesList. When Bt module initializes, it scans the .torrent files location, and either sha1sums the .torrent files again, or perhaps - faster - looks up the sha1sums simply in MetaDb, based on filename / modification date, as we do with normal shared files. Now, Bt module knows which sha1sum goes to which .torrent file; based on this information, it can scan the download / shared files lists (FilesList class), and check for files which have a matching sha1sum in their customdata, and thus generate the list of currently pending .torrent downloads, associating the files with the .torrent data. Based on that information, it can then create the virtual parent objects (which cannot be constructed by FilesList, since they are derived classes, implemented in Bt module).

One part around here that I don't have a solution yet is how to keep track of the order of files in the .torrent (because the order matters). Perhaps also store the # of the file in the torrent also in the customdata field along with the hash, e.g. "btorrent:<hashdata>:12335" would indicate that this file starts at offset 12335 in the .torrent (just saying 5th file wouldn't help, because if we'r missing some files in the middle, we'd be in trouble).

What does this lead us to? It theoretically allows cooperative downloads, in both ways. Consider this: if the .torrent file contained md5sums of the files (rather common nowadays), or even ed2ksums (also seen occasionally), ed2k plugin could start downloading the files right away, alongside with bt module. Vice versa is a bit trickier, and would (for now) work only under specific conditions: namely, if we have a MetaDb entry of a file that was previously downloaded from Bittorrent, we _know_ the .torrent file sha1sum in which this file belongs to. Based on that, it's only a matter of finding the .torrent file (hopefully we still have it), and we can bring in Bt module to the download, which was initiated from ed2k module.

Doesn't sound very useful? But what if you consider that the MetaData recordset, along with the .torrent file, could be requested and transmitted from a p2p network? You have ed2k hash, you look up (on the net) the sha1sum of the .torrent, then you simply download the .torrent file (from the network), and there you go.

I mentioned "problems" in the beginning of this tad long post, so here are a few:
  1. What happens if you start a .torrent download, but it turns out you already have few of the files found in the .torrent?
  2. What happens if you start two torrents, which share few files?
  3. What about seeding? Queueing? How do we make sure we act "nice" on the Bittorrent network?
  4. How do decentralized trackers fit into the module design?
  5. How does Exeem fit into the module design (what hash algorithm does Exeem use? How can we take advantage of an existing p2p network that has exclusivly .torrent files [thinking about the ed2k -> bt integration topic]?)
  6. How deep does the rabbit hole go? (no, seriously - what else can we do/implement here?)
Madcat, ZzZz



Comments:
1. immediately add the missing chunks instead of downloading them, notifying about duplicate files might also be an option

3. email the developers of official bittorrent client, the azureus client and bitcomet client and ask about acceptable values

4. they dont and should be considered as seperate networks ( there are a couple of implementations as well )

5. exeem is closed source but some info can be found at http://www.infoanarchy.org/wiki/index.php/Exeem
see the protocol link there after reading the description

6. dont forget something that controls the upload fairness ( AKA continueing to upload till a certain ratio has been reached, you might want to have a look on how azurues implements that )
 
For really multinetwork downloader we need a grand database with ed2k<->md5<->sha1<->torrent associations for files hashes.
 
Post a Comment



<< Home

This page is powered by Blogger. Isn't yours?