Introduction

Why?

In today's growing number of file-sharing networks, same files exist on multiple networks. Each network has generally specialized in some specific range of files, e.g. small files, big files, rare files, new files etc. Thus the users must constantly juggle between multiple clients, depending on which kind of data they wish to download.

Secondly, with the growing range of desktop operating systems, people often use multiple systems simultaneously or switch between them, thus they must have a different client for each network for each platform they wish to use. Often clients do not exist for some platforms, which lowers the usability of that platform for those users. They have to either use emulation software, or not use the platform because of the missing clients.

With the increasing speeds of home internet connection speeds, and the increased mobility of technology, it is common to run clients on home computers, while controlling them from work/laptop computers from remote locations. A few years ago this was not possible yet, because of too slow home network connections, which were not capable of transferring enough data for decent remote control systems. However, nowadays, DSL links with speeds 0.5mbit and above are rather common, which opens up a new range of options.

Related to the above, the today's fast network connections are often more than is needed by a single user, so it is common to share it with other family members, neighbors and often even through entire buildings. When multiple users of the same network use various file-sharing applications, each of the applications is usually configured to take advantage of the entire available network, without care for other users on the same local network. This quickly leads to multiple clients filling the upload bandwidth, degrading the entire network performance for all users, creating frustration and anger among all users of the local network.

Many users of file-sharing networks are concerned about their privacy. This is mainly a problem in the United States, but also increases in Europe. Having thoughts for those who take their privacy seriously is important for any modern P2P application.

As Linux and Open Source are becoming more and more common, there is a large amount of people out there who refuse to use closed source software on general principle. Furthermore, closed source software is often known to contain adware/spyware, especially within peer-to-peer clients, which has made people very careful about closed source software in general. Open source code also encourages users' participation within development process and thus increases the development speed.

During the past few years, the processor speeds and memory amounts have raised significantly, so it has become common for applications to use huge amounts of memory for operation. However, what people forget that there are still large amounts of users out there on 5-10 year old computers, which lack both the faster processors and large amounts of physical memory. For those users, it is often very difficult to run today's software, and (especially in third-world countries) they do not have the resources needed to upgrade.

These are the issues we are trying to address with the project described below.

Overview of Hydranode

To achieve the large number of features described in the previous section, HydraNode core needs to be extendible without causing feature bloat and increase in system requirements resulting from that. The only way to accomplish that is make the application completely modular - only a minimum set of features are provided by the core application; the rest of the features are implemented by optional loadable modules. Each file-sharing network should be in a separate module, as should be other additional features like e-mail notifications. With this design, the features are de-coupled from each other, thus greatly simplifying the debugging process, and allowing end user to select only the features he or she needs instead of what a programmer thought was best for him or her.

Second most important pre-requisite for a modern peer-to-peer application is platform-independence. The biggest differences in platforms are the graphical user interfaces, while the underlying structure of operating systems is rather similar. To achieve maximum portability, the core application should be decoupled from graphical user interfaces, which then could be written platform-dependently for each target platform; native user interfaces always perform better than interfaces designed for running on large number of platforms. To achieve this, the core application should not have any interactive graphical user interface of its own at all - it should only provide a protocol through which native graphical user interfaces and other application could communicate with it and control it. The protocol itself should be in human-readable format, but also be easily parse-able for client software; the reason for this would be to allow the possibility of interacting with the protocol directly through simple software like telnet, which would greatly simplify debugging process, but could also be useful even for end users as a crude remote control mechanism.

Related to the above comes the question of programming language to use for writing the core application. At this, C++ would be the most sensible choice, because it is widely used across all platforms and provides fastest code (which is required to achieve low system requirements); it allows (and even enforces) object-oriented design, and is easier to understand than C code. Additionally, since HydraNode strongly relies on module-writers, C++ coders are far easier to find than, say, Java coders.

As mentioned in previous section, quickest way to rapid development process is to give the users free access to the source code of the application; it increases possible developer/debugger-base significantly. Out of the myriad of open source licenses out there, GNU General Public License is most respected among users and developers, so HydraNode source code should follow the trend and be licensed under GNU GPL. Having the source code licensed under GNU GPL also allows us to use the almost infinite amount of existing code freely available through the internet, which could prove as a very useful option.

With the fore-seeable future of large number of co-developer base, it is necessary to clearly define the coding standards for the core application. Coding style is very personal; having large number of developers modifying the code will quickly lead to a mix of different styles and personalities, which in turn makes the code less readable, and thus less maintainable. There are several widely accepted coding standards floating around, and for this project we have chosen to use Linux Kernel coding standard; while originally written for C, the concepts still mostly hold for C++. Source code, however, is worth nothing without correct documentation which would give the future co-developers hints on what the original developers had in mind while writing/designing the application. Again, there several widely accepted documentation standards, out of which perhaps the most common is Doxygen-style. The reason behind this is that Doxygen is capable of extracting documentation from source files and generating web pages out of it, which can give a very quick and extensive overview of the entire application at a glance - something future developers will greatly appreciate.

The privacy of the user should be a serious concern for any modern peer-to-peer application developer; there are several institutions which tend to have a habit of spying upon the users and invading their privacy. The simplest solution would be to block the IP addresses of those groups; even better solution would be to simply stay off their radar. The first part can be implemented within the core application since it controls the low-level networking functionality; second part can be implemented by networking plugins depending on the specific networks.

Since the core application will eventually have a large number of very different networking plugins, we have the problem of bandwidth management. The end user shouldn't be bothered with each specific plugins bandwidth limiting settings, so the bandwidth limits should be managed by the main application, which in turn could either allow or deny requests for bandwidth to modules. This setting should be fully configurable, allowing end user to either have the bandwidth shared equally among the plugins, or in favor of one or several specific plugins.

Various file-sharing networks use very different files identifying methods - most often this is a checksum of the file, sometimes accompanied by file size; additionally, there is files meta-data, which can help the end-user identify the file. Since this feature is common to all file-sharing networks, while only differing in the actual checksum used, it should also be handled by the core application. Several points must be considered here - the core application should be able to generate a large number of checksums and store them. It should also be capable of extracting meta-data of files, as well as do cross-references with checksums - given a checksum from one network, it should be able to find the same file on second network (provided the file is known). However, no single client can know the checksums of all files of all networks, which means that the cross-referencing functionality will be of little use locally - this needs a central database which could store the checksums of all files from all networks, and provide cross-referencing functionality. Here's where Myradin comes in - it does exactly that. While support for Myradin shouldn't be completely integrated since it isn't really a part of the application, it could be an optional plugin that retrieves and submits checksums to the central database.

The last feature is far more important than is obvious on first impression; this is the feature that will eventually allow real multi-network simultaneous downloads of same file. The problem is - since each network uses different checksums for files, it is impossible to identify the same file on two separate networks - you don't know the file is the same until you have downloaded the entire file and generated a checksum out of it. However, with a central database which stores checksums of different files from multiple networks, it would be possible to retrieve the checksums of a file on all other networks provided you have the checksum of a file from one network, thus allowing downloading the same file from two or more networks simultaneously. Upon completition, file's actual checksum could again be tested against all known checksums to provide even higher corruption protection than single-network downloads. The central database would also contain files meta-data, which would allow end-users to more clearly identify fake files, thus improving the overall quality of files on all file-sharing networks.

Few words on the name

The name strongly reflects what the application is: Hydra, as is well known, has many heads, so in this case, some heads feed the hydra from multiple networks, while other heads control the hydra through various interfaces. But the hydra itself is only one of many 'nodes' in the networks, thus the Node part.

Things to keep in mind

  1. Every good work of software starts by scratching a developer's personal itch.
  2. Good programmers know what to write. Great ones know what to rewrite (and reuse).
  3. “Plan to throw one away; you will, anyhow.” (Fred Brooks, “The Mythical Man-Month”, Chapter 11).
  4. If you have the right attitude, interesting problems will find you.
  5. When you lose interest in a program, your last duty is to hand it off to a competent successor.
  6. Treating your users as co-developers is your least-hassle route to rapid code improvement and effective debugging.
  7. Release early. Release often. And listen to your customers.
  8. Given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix obvious to someone. “Given enough eyeballs, all bugs are shallow” - Linus's Law.
  9. Smart data structures and dumb code works a lot better than the other way around. “Show me your flowchart [code] and conceal your tables [data structures], and I shall continue to be mystified. Show me your tables [data structures], and I won't usually need to see your flowchart [code]; it'll be obvious” - Brooks, Chapter 9
  10. If you treat your beta-testers as if they're your most valuable resource, they will respond by becoming your most valuable resource.
  11. The next best thing to having good ideas is recognizing good ideas from your users. Sometimes the latter is better.
  12. Often, the most striking and innovative solutions come from realizing that your concept of the problem was wrong.
  13. "Perfection (in design) is achieved not when there is nothing more to add, but rather when there is nothing more to take away." - Antoine de Saint-Exup�y (who was an aviator and aircraft designer when he wasn't authoring classic children's books).
  14. Any tool should be useful in the expected way, but a truly great tool lends itself to uses you never expected.
  15. When writing gateway software of any kind, take pains to disturb the data stream as little as possible - and never throw away information unless the recipient forces you to!
  16. When your language is nowhere near Turing-complete, syntactic sugar can be your friend.
  17. A security system is only as secure as its secret. Beware of pseudo-secrets.
  18. To solve an interesting problem, start by finding a problem that is interesting to you.
  19. Provided the development coordinator has a communications medium as good as the Internet, and knows how to lead without coercion, many heads are inevitably better than one.

Appendix: Future Possibilities

The described application structure allows endless possibilities for future improvements without the fear of feature/code bloat. Below are few ideas which features could the additional modules provide, to give an example of the future possibilities:

References

Credits

I would like to thank the following persons for their assistance in writing this document:

Jan Magnussen aka fluffy for the design structure of this document, for many ideas presented here, for teaching us the project development strategies and for much more.

Christian Riesen aka Simon Moon for sharing his experience on various topics related to peer-to-peer applications, filesharing networks, on topics on how to successfully create/run a software projects, and for keeping us on the right track. Additionally, credits for the HydraNode name, and for the ideas in Appendix: Future Possibilities also go to him.