Skip to content

tech

Introducing local storage tier

we're adding a new storage tier: LV

kraud currently has 3 different storage types

  • GP: our general purpose ceph cluster with 100TB of consumer grade SSDs.
  • GFS: global shared filesystem, very useful for just having files available in multiple pods concurrently. This is backed by GP with cephfs MD on top.
  • RED: a data garbage bin using 200TB of spining consumer disks. You should not be using this unless you want really cheap large slow storage.

The new LV tier will aid users of legacy applications that are built for more traditional virtual server deployments. It is backed by a raid 1 of enterprise NVME drives and peaks at 1 million IOPS. A 4TB volume will have 3GB/s bandwidth dedicated and can request pcie passthrough for low latency, while smaller allocations share the bandwidth and IOPS.

Unlike GP, LV does not survive host failure, meaning a loss of a host will result in the volume becoming unavailable. During the last (bad) month this would have resulting in a 99.1% uptime, unlike GP which had 99.9% uptime. There's a residual risk of total loss, due to the nature of being electrically connected to the same chassis. We advise users to build their own contingency plan, similar to what you should have done with competing virtual machine offers.

While LV has very little benefits in modern applications, it pairs well with traditional VMs and will become the default storage in the kraud marketplace for VMs.

ceph let us down(time)

Most kraud users would rather not bother with details of how storage works. This is after all, what we've built kraud for. However, as you noticed, we're not doing great in terms of uptime recently (still better than Azure, lol) and this is due to storage. To reach our goal of carbon negative computing while also taking in zero venture funding, we must navigate the difficult path of serving a variety of incompatible workloads.

Kraud is all about energy saving so we use lower clock EPYC cpus with the highest possible compute per energy efficiency. Ceph was built for high clock speed XEONs with very little respect for energy efficiency, so it does not perform great in this scenario.

Adding to that, we treat physical servers as expendable and built all of our software for graceful recovery from loosing a node. That allows building machines for a third of a cost of traditional OEMs like HP, etc. We use things like cockroachdb, which performs well under frequent failures. While ceph does also recover from such an event, it does NOT do it as graceful as you'd hope for, resulting in several minutes of downtime for the entire cluster every time a single node fails.

As i keep saying, high availability is the art of turning a single node incident into a multi node incident.

In summary ceph is not the correct solution for the customer group that is currently the most important to the companies survival (paying customers, yes) This is why we're moving that customer group away from ceph, so GFS can come back to its previously slow-but-stable glory.

In the future, once we become big enough, we hope to deliver a custom built storage solution that can work well within the energy targets.

Thank you for your patience and for joining us on this critical mission towards carbon negative compute.

Introducing the kraud cli: kra

From the beginning of the project we always strived for compatbility with your existing tools, be it docker or kubectl. Your feedback is always greatly apprechiated, as it helps us clarify what that exactly means in practice. How much compat is good, and where do the existing tools not work?

We haven't reached a stage where this is entirely clear yet, but the trend is pointing towards

  • Fully supporting the docker cli
  • Building a custom cli to supplement docker
  • Freezing kubectl at 1.24
  • Partially supporting the most popular of the many incompatible docker compose variants

Particularly kubectl is a difficult choice. Kubernetes is a standard. But unfortunately, it's not actually a standard, and keeping up with upstream does not seem feasible at the moment.

Instead we will shift focus entirely on supporting docker and docker compose. The compose spec is weird, and inconsistent, but it is simple and hence very popular. Most of the confusion we've seen in practice is easily addressable with better tooling.

So we are introducing: kra

The kra commandline program works on docker-compose files and will implement some of the processes that docker does not do at all (ingress config currently requires kubectl) or does incorrectly (local file mounts).

Specifically a pain point in some user setups has been CI. Since we don't support docker build yet, users build on the ci machine and then use docker load. This is slow, because the docker api was never intended to be used remotely.

Instead kra push is very fast and should be used in CI instead.

github CI example

here's a typical .github/workflows/deploy.yaml

name: Deploy
on:
  push:
    branches: [  main, 'staging' ]
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: 'deploy to kraud'
        env:
          KR_ACCESS_TOKEN: ${{secrets.KR_ACCESS_TOKEN}}
        run: |
          curl -L https://github.com/kraudcloud/cli/releases/latest/download/kra-linux-amd64.tar.gz  | tar xvz
          sudo mv kra /usr/local/bin/

          # get credentials for docker cli
          kra setup docker

          # build the images localy
          docker compose build

          # push images to kraud
          kra push

          # destroy running pod
          # this is currently needed until we fix service upgrades
          docker --context kraud.aep rm -f myapp

          # start new pods with new images
          docker compose up -d

kra is open source and available here: https://github.com/kraudcloud/cli. We're aways happy for feeedback. Feel free to open github issues or chat with a kraud engineer on discord.

screenshot

deployment screenshot

vdocker moved to cradle

Vdocker is how we call the thing that responds to docker api calls like log, attach, exec and cp.

Vdocker used to run on the host, and all the commands where carefully funneled through a virtio-serial. The advantage is that cradle is small, and starts faster. However, we realized most people do start fairly large containers that take a few seconds to start anyway. Hence sub-100ms startup time for cradle is no longer a priority

Instead we traded a few milliseconds of start time for much higher bandwidth by moving vdocker directly into cradle. It listens inside of your pod on port 1 and accepts the nessesary docker commands from the api frontend. docker cp now works properly and is much faster. Also docker attach no longer glitches.

Unfortunately this means docker run feels slower, although it really hasnt changed much. Log output starts appearing roughly 80ms after download completed, but for larger container it may take several seconds to download layers, which you can currently not see.

On the upside, all other commands now feel alot faster, because we skip vmm and just proxy the http call directly to vdocker.

global file system is now generally available

Global file system can be mounted simultaneously on multiple containers, enabling easy out of the box shared directories between services.

Similar to NFS, it does make files available over the network, but GFS is fully managed and does not have a single point of failure. It is also significantly faster than NFS due to direct memory access in virtiofs.

docker volume create shared --driver gfs
docker run -ti -v shared:/data alpine

Coherent filesystem for horizontal scaling

GFS enables an application developer to start multiple instances of the same application, without implementing synchronization. This is specifically useful for traditional stacks like PHP where horizontal scaling requires a separate network storage.

Any docker container works with GFS without changes. The same standard syntax used to mount volumes on your local computers dockers will simply work in the cloud.

Shared object storage for multiple services

Modern applications often choose to store shared files in object storages, specifically s3. With GFS, you can simply store a file using unix file semantics without the need for a separate layer.

File i/o behaves identical using a docker volume on your local machine, and with kraud. This makes developing apps locally and deploying into the kraud seamless.

Built in redundancy by ceph

GFS is backed by cephfs on a 3 times redundant SSD cluster. Ceph is an open source object storage cluster backed by redhat, CERN and others. All pods/containers launched in the Falkenstein DC enjoy a 20MB/s transfer rate.

Users with hybrid regions should note that GFS data transfer counts towards external traffic.

Additionally, customers may choose a separate cluster for large intermediate data on magnetic disks. This is intended for science applications working with large data sets and can easily scale to multiple petabytes.

see the documentation for details