Package Updates over BitTorrent Protocol

The current model of receiving updates through a package manager/update manager relies on a client/server model whereas the client pings the update servers to see if updates are available and grabs them from the designated mirror. Time and time again this model has shown flaws including slow download speeds and high bandwidth and hardware costs.

So how can this model potentially be improved? During UDS I spoke with a Transmission Developer who works for Canonical about distributing all updates through a BitTorrent model in which the only servers are a cloud which does initial seeding of packages through the BitTorrent protocol while other clients (end users) update managers receive those new packages and then re-seed to a certain ratio set by the end-user.

Although this is simply an idea I do think it feasible and would definitely make distributing updates much faster globally while reducing the related IT costs for mirror providers and Canonical itself and clearly if such an idea was turned into an open source project and made a reality it could be shared with other projects and distributions to speed up delivery of updates to their communities while reducing costs and potentially also minimizing energy consumption and other aspects.

Torrent Package Update Model
Torrent Package Update Model

Comments

    • says

      I was not aware of this but still its not used currently and there seems to be zero fail safe to ensure integrity of packages. Furthermore thats a CLI tool is it not? What about totally implementing my model into a GUI (into Update Manager)

      • takluyver says

         @bkerensa I think it’s integrated with apt, so if it’s set up, all frontends, GUI or command line, will use it.
         
        Besides the fact that it doesn’t seem to have seen much development recently, I guess that it would be controversial to enable it by default, because it will use bandwidth without telling people. In many areas, bandwidth for sending data is more limited than for receiving.
         
        As you say, it would definitely need some sort of secure checksum so rogue peers couldn’t send a dodgy package.

      • Malizor says

         @bkerensa It is integrated.
        Here is the apt-p2p package description:
         
        The Apt-P2P daemon runs automatically on startup and listens for  requests from APT for files, as well as maintaining membership in a  Distributed Hash Table (DHT). For any files which have a hash  available (most files), peers that have the file are looked for in the  DHT. The file can then be downloaded from them, using the uploading  bandwidth of other peers while reducing the demand on the Debian mirror  network. However, if a package can not be found on any peers, Apt-P2P  will fall back to downloading from a mirror to ensure all packages are  downloaded. Once downloaded, the file is hash checked and added to the  DHT so that other peers can download it without the mirror.
         
        And, AFAIK,  it is secured because it still download the package signature from the main server.

  1. Carroarmato0 says

    I have been looking forward to this system being implemented for years now. My only concern is package tampering, but if the packages can be positively identified as being the real deal instead of maliciously altered, then I’m all for it!

    • says

       @Carroarmato0 Thats why the client that uses the protocol and communicates with the cloud seeders and tracker would also check md5 hash directly with the cloud seeders that are maintained by the project using a backend API.
       
      It could potentially do this:
      curl https://api.yourfossproject.com/hashcheck.php -s ‘act=verify_hash’ -s
      ‘api_key=8afbe6dea02407989af4dd4c97bb6e25′ -s ‘pkg_hash=ac22980663b27e89a0e89470b9b9154b’ -s
      ‘cloud_zone=emea’ -s
       
      and then get a response before proceeding with unpacking and installing the update.
       

      • Carroarmato0 says

         @bkerensa I’ve tried running it after my first post (apt-p2p). Seems to be working pretty good.  Updating seems to start up slowly as it’s checking for headers ( about 10 min ) and then started downloading the available packages.As far as I can tell at least 128 users  and  37 nodes are already running it.Though I’m not sure if the number of users is relative to the mirror that’s being used.  I’m using the “be” mirror because I’m in Belgium.
         
        Will try changing the mirror to the main server to see if I can pickup a lot more clients that way.

        • says

           @Carroarmato0 I’m checking it out now (apt-p2p) I wish this was a toggle featute in update-manager so users could enable this as default.

  2. jandrusk says

    I think the model is good, but the problem would be that most enterprises block the protocol due to security concerns, so I think it would work great for home users, but would be problematic at the enterprise. 

    • says

       @jandrusk In this case the Ubuntu Enterprise Desktop and Server would default to mirror while Ubuntu Desktop’s would use the Torrent based system but could be togggled. Ultimately there are ways to get around the filtering of Torrent traffic by tweaking things.

    • says

       @jandrusk because of how torrents work, users always need to be informed and allowed to opt in. another solution would be to use metalink, which along with using FTP/HTTP sources (as well as P2P if allowed) would allow proxy caches to not duplicate resources.

  3. ethana2 says

    I have several computers in this house running the same version of Ubuntu, so I regularly download the exact same packages like four times. If my computers could grab packages from eachother when running updates, they’d go a lot faster. I want this.

    • castrojo says

       @ethana2 Run a proxy, we even provide one in Ubuntu: http://askubuntu.com/q/3503/235

  4. Daviey says

    This is fixing the issue from the wrong end..  
     
    I’d rather we focus on reducing the volume that needs to be transmitted.. binary deb diff distribution. :).. 
     
    Oh, and apt is already aware of the hashsums… so tampering isn’t a concern.

    • says

       @Daviey Reducing the amount that needs to be transmitted is great… How could we start this process? Why has this not been a priority?

    • castrojo says

       @Daviey Yeah but then I spend the same amount of time or more waiting on IO to put the thing together and apply the package. 

      • Alereon says

         @castrojo I don’t think that’s the case except for trivially small packages, where the total time spent is tiny anyway. Unless you’re pulling data over a LAN the download time will be the largest portion of the install process. When you consider that IO and CPU are free while bandwidth is expensive, it makes sense even if you assume a very old and slow machine.

  5. says

    do it. I don’t care if it’s an opt-in thing (like a checkbox), because I will be checking that box.  BitTorrent is faster and more robust for large amounts of data (in my experience), it’s where I get my ubuntu/fedora ISOs :)

  6. Caspy7 says

    I think going into it with a hybrid approach in mind and an emphasis on safety & fallback is the way to go.
    It’s clear that there are a few instances where p2p is not desirable – from corporate policy to technical limitations to personal preferences.  So this needs to be both optional and smart; detecting the situations that we can (when we need to fall back to a direct download) and informing & asking the user.
    Users can be informed about how they can easily be a part of, and contribute to, a global community in simple language but it will run in the background with a limited upload amount (per release).  It should be opt in, but presented in an attractive, easily accessible manner.
     
    It also must be secure beyond a shadow of a doubt.