Part 4: Building a line chart of Aave's loans in Vega

In the previous post, we exported Aave's loan data to a readable JSON of rows. Now let's display the time series in a line chart.

Date scales

What if I have events that happen throughout the day, but I'm only interested in the total sum of values per day, week or month? My data looks like this:

[
  {
    "date": "2021-12-26T00:44:24.000Z",
    "coinSymbol": "USDT",
    "amountCoin": 150000,
    "amountEth": 39.12118760339925,
    "amountUsd": 149232.07344380283
  },
  {
    "date": "2021-12-29T00:06:20.000Z",
    "coinSymbol": "USDT",
    "amountCoin": 2000000,
    "amountEth": 521.6158347119899,
    "amountUsd": 19760.979250704
  },
  {
    "date": "2021-12-28T23:12:38.000Z",
    "coinSymbol": "USDC",
    "amountCoin": 100000,
    "amountEth": 26.236,
    "amountUsd": 10080.10796000001
  },
  {
    "date": "2021-12-30T23:10:55.000Z",
    "coinSymbol": "USDC",
    "amountCoin": 200000,
    "amountEth": 52.472,
    "amountUsd": 20160.21592000002
  },
  {
    "date": "2021-12-28T21:02:59.000Z",
    "coinSymbol": "DAI",
    "amountCoin": 250,
    "amountEth": 0.065229509230188,
    "amountUsd": 248.82513820456745
  },
  {
    "date": "2021-12-29T23:02:59.000Z",
    "coinSymbol": "DAI",
    "amountCoin": 250,
    "amountEth": 0.065229509230188,
    "amountUsd": 248.82513820456745
  }
]

Point scale

One thing I had not noticed is that Vega parsed my date field as is, that is to say as a String. You can see it in the Data view in the bottom right.

A scale type of `point` for the X axis is not what we want. Also date should be a Date, not a string.

Time scale

When I first switched the scale type to "time", it threw an error about my scale being unbound, going from [-Infinity; Infinity]. Thanks to this post, I found out that was because I wasn't passing actual Date objects to Vega.

To make Vega interpret my raw date strings as Dates, you have to tell Vega how to parse the date field. See the data reference and the format reference.

  "data": [
    {
      "name": "table",
      "format": {
        "parse": {"date": "date"}
      },
      "values": [
        {
          "date": "2021-12-26T00:44:24.000Z",
          "coinSymbol": "USDT",
          "amountCoin": 150000,
          "amountEth": 39.12118760339925,
          "amountUsd": 149232.07344380283
        },
        ...
      ]
    }
  ]

Once it is properly parsed, the Data Viewer shows the Date as a number.

Now I can switch to a better scale than point. After reading the doc on time scale, I switched to a time scale, which improved the tick marks on the time axis:

  "scales": [
    {
      "name": "x",
      "type": "time",
      "range": "width",
      "domain": {"data": "table", "field": "date"}
    },
    ...
  ]
Now using a `time` scale for the X axis

Aggregating points

In order to get a meaningful view of the data, we need to aggregate data points by days or weeks or months, depending on how high-level of a view we want.

TypeScript or Vega?

We can do it in TypeScript when preparing the JSON, or do it directly in Vega. We can also do something in-between like "floor" the keys in TypeScript, then aggregate in Vega. To summarize, here are the three levels:

  1. Full React: "floor" keys in TypeScript, then aggregate them by key. Produce the JSON for Vega to consume.
  2. Half and Half: "floor" keys in React, aggregate in Vega.
  3. Full Vega: "floor" keys with Vega transform expressions, aggregate in Vega.

Initially, I thought the effort to go either way would be similar and that the pros and cons of each method were somewhat equal. It turns out it's much more efficient to go the Full Vega route because Vega provides many useful high-level functions for graph-related work. For example, aggregating data points by day, week, etc. is a breeze. It'd be a total waste of time to code this ourselves.

Flooring dates

I naively thought that I could modify the scale to perhaps anchor my points to "floored" days. But tweaking the scale will only modify how the points linearly stretch along the width of the graph. For example I could make the axis show nice tick marks from Dec 26 midnight to Dec 31 midnight instead of Dec 26 at 12 pm to Dec 30 at 11 am (I am making up hours here).

To aggregate the points, I have to transform the data itself. There are a bunch of transformations readily available at the transform reference. Those that might apply to me are:

  • aggregate - Group and summarize a data stream.
  • bin - Discretize numeric values into uniform bins.
  • formula - Extend data objects with derived fields using a formula expression.
  • joinaggregate - Extend data objects with calculated aggregate values.
  • timeunit - Discretize date-time values into time unit bins. Available for version ≥ 5.8

The one that applies the most is timeunit.

When I add the transform that "floors" to the current day, it adds two columns to my table: unit0, which "floors" the date, and unit1, which "ceils" the date. Notice how I used the units "year" to keep the year in the value, and "dayofyear". If I had used "day", it'd have evaluated to the day of the month, in which case it'd have transformed my dates into Jan of 2021.

`unit0` contains `floor(date)` and `unit1` contains `ceil(date)`

To make the transformed value more readable, I renamed it to date0 and date1. Here's the final definition of my transformation:

"transform": [
  {
    "type": "timeunit",
    "field": "date",
    "units": ["year", "dayofyear"],
    "as": [
      "date0",
      "date1"
    ]
  }
],
Without the `timeunit` transform. Points don't align because they happen at different times of the day.
With the `timeunit` transform. Points align to a given day at midnight.

Notice also how the left line goes beyond the bonds. That's because in my axis, I forgot to use the transformed dates instead of the raw dates. The earliest raw date is Dec 2021, 26 at 00:44 but the earliest transformed date is Sat, 25 Dec 2021 15:00:00 GMT.

Aggregating points

Using the aggregate reference, here's my first attempt:

It summed amountUsd but I lost `coinSymbol` and other columns

It removed all of the original fields. If I add coinSymbol to groupby, I get something closer to what I want.

Restoring coinSymbol to get total loan amounts by date and coinSymbol

Let's add another data point for USDT on Dec 25 and check that totalAmount is computed correctly.

coinSymbol aggregates both USDT loans and totals to $298k

Yep, it works.

To finish, I simply need to adjust the scale to use  the new totalAmountUsd instead of the defunct amountUsd column, and to draw my line using totalAmountUsd for the y value.

Using actual data

And here's the chart of the latest 1000 loans:

Linear scale makes only the 2 largest coins visible.

USDC loans dwarf everything. Let's see what it looks like in logarithmic scale.

Log scale shows details of the smallest and the largest coins.

See the code in the Vega playground.

Now we can even see tiny transactions as low as 10 dollars. Awesome! And it also exposes a bug in our graph: what if in that time frame, there is only one transaction of a certain unit? Then the line won't show up. Also what happens if there are transactions on let's say Dec 23, then nothing until Dec 26? What does it mean to have a line connect the two points? It's quite misleading. The fix would be to either:

  1. turn the line chart into a stacked bar chart. If on a certain day, there is no loan for a certain currency, the rectangle just won't show up.
  2. add synthetic data, namely zero's for days when no loan was made for a given currency.

The second approach seems much harder to implement. If I had to do it, it would be with a lookup transform (see reference). But then I need to make one intermediate table to perform the lookups from.