Building CovidCureId.com

Since the very beginning of the COVID pandemic, the German news outlet Deutsche Welle has been providing some of the best global coverage on the topic. DW’s coverage of COVID available on YouTube is particularly good as it provides relevant snippets of coverage with very little drama and fanfare in short, digestible segments.

Watching the recent coverage of the situation unfolding in India and reading first hand accounts from the folks in r/india, I felt inspired to revisit the CURE ID database with the thought that the COVID case data available may be of value where access to therapeutics may be limited.

I decided to build covidcureid.com to surface just the COVID case data from CURE ID with the (perhaps misguided) hope of helping in some small way by making this case data more accessible by focusing on finding cases by three simple facets:

  1. Age
  2. Gender
  3. Drug and drug efficacy data.

A cleaner, easier to navigate, more responsive, mobile-friendly user interface helps improve the usability of locating cases of interest and understanding different real-world outcomes.

Tapping through allows access to the individual cases filtered by a patient’s age and gender. Case notes are included and a direct link to the case in CURE ID is available as CURE ID displays additional context and information about the case including community discussions, dosing information, and other details.

The open-source project starts by extracting the data from the CURE ID database through the REST APIs. From there, the dataset is then loaded into a Microsoft Azure CosmosDB instance which is an ideal endpoint for storing the extracted JSON formatted case data.

The front-end UI is developed using Vue.js and the excellent Quasar framework with mobile devices in mind. All of it is hosted in an Azure Static Web App to improve performance while reducing cost.

The application backend is built on Azure Functions — again, improving performance and scalability while reducing cost.

There is much more detail available at the Github project page for anyone interested.  I also have a writeup on LinkedIn which talks a bit more about the CURE ID application itself.

In this post, we’ll walk through the methodology and implementation of the app.

Understanding the Data

Unlike the ClinicalTrials.gov application which provides a well-documented REST API for interacting with the data, CURE ID does not provide a documented API.

However, we can see the requests in action by simply popping open the browser’s developer tools:

Navigating to the link reveals that CURE ID is serving the API from a Django application likely running in AWS:

With this, we can start to extract the data.

Extracting the CURE ID COVID Data

Looking at the URLs, there are two queries that we need to extract the case information from CURE ID:

A simple Node JavaScript file is needed to pull this information down and store it in .json files to explore:

It is possible to skip this step of saving the file and directly feed the data into an ETL pipeline instead, but without a schema or documentation, it helps to have the files to explore the data.

Importing the Data

The next question is how to get the data into CosmosDB.  We can import the data as-is since CosmosDB is a document-oriented database and we already have the data in JSON files at this point, but there’s also a lot of information in each case record that’s not interesting and I also want to try to limit the throughput so that the overall RU’s required in CosmosDB stays below the free tier limit of 400 RU/s.

(It is also possible to use actual ETL capabilities in Azure as well, but we want to do this for free if we can)

To achieve this, we can use a four step hop:

  1. Put the files into Azure Storage Blobs
  2. Use Azure Storage Blob triggers an Azure Function to transform the data
  3. Push the transformed data into an Azure Storage Queue to buffer the throughput to Azure CosmosDB
  4. Write forward the data to Azure CosmosDB from a Queue triggered Function

Loading the data can be done manually using either Azure Storage Explorer or from the command line using the Azure CLI:

This will then trigger a Function that is listening for new files:

We have three bindings for this Function:

  1. The BLOB storage binding which ingests files from the Azure Storage endpoint
  2. A queue binding to push drug entries we want to store
  3. A queue binding to push regimen entries we want to store

A regimen represents a set of drugs that were used in the treatment of the patients.  For the user interface, we want to report by individual drugs and then allow drill down into the regimens that the drug was used in so we break out the drugs from the regimens and create discrete entries for each occurrence of a drug in the case files.

To throttle the ingest, we can set the batch size for the queues to reduce the throughput we need to CosmosDB to try to keep it below 400 RU/s (free tier limit):

For the drug entries, we can directly pass the entry to CosmosDB:

But the regimens are duplicated with one entry created for each drug in the regimen.  For example, if the regimen contains Aspirin and Hydroxychloroquine, there are two case files (two regimens) in the dataset.  So in this case, we need to ensure we do not have duplicates by checking to see if we already have an instance:

This implementation is not foolproof since unlike SQL, we don’t have transactional control over the data store (transactional at the document level).  But it’s good enough for the purposes of this analysis.  To really ensure that single entry, the application would have to use Azure Service Bus Queues with a session ID that could ensure once-only entry of the data.

Application Front End

While somewhat heavy for such a simple application, the application is built using:

  1. Vue.js
  2. Quasar Framework (using TypeScript and the Vue Composition API)
  3. Apex Charts

The combination is highly productive and allows a rapid build-out of the front-end with minimal fuss.

Though, in retrospect, a lighter framework like Alpine + Tailwind may have been an even better choice to reduce the overall network and browser load.

(See the repository for the code)

API Backend

For the API, we once again turn to Functions to build a low-cost, serverless solution.

There are two simple REST endpoints that we need.

First, an endpoint to query by drugs:

When retrieving the drug information, we aggregate by the outcome so that we can use that information on the front-end to build the chart.

Then an endpoint to query regimens using the drug:

Note the JOIN operation.  In CosmosDB, this is an intra-document operation that allows reshaping the JSON.

Deploying to Azure

There are three parts of the application that we need to deploy:

  1. The static front-end assets for the website
  2. The API layer for the website
  3. The backend components for the website (Cosmos, Storage)

Azure Static Web Apps could condense the first two bullet points, but it is unfortunately limited to only HTTP bindings.  Rather than creating the HTTP bindings in Static Web Apps and the data processing in a separate Functions app, we just put everything into two separate endpoints.

I admit: I cheated here and used the UI to configure the services as the CLI commands for Static Web Apps is not documented well for cases where you’re not hooking up a Github URL.

Once the pieces are set up, we need to configure actions in Github to build and push the output to Azure.

First is the build and deploy process for the Static Web App:

Note the highlighted lines for building the Quasar app and deploying the output to Azure.  Line 54 injects both the URL of the API and the Google Analytics token into the build.

For Quasar, the correct place to “receive” these settings is in the quasar.conf.js  file at build.env :

The Google Analytics token can then be injected into the HTML template without any additional webpack plugins:

Next is the build and deploy process for Functions:

The scripts rely on a number of secrets configured in Github to keep your secrets out of source control:

  • API_ENDPOINT is the endpoint of the API that we want to inject during the build process
  • AZURE_CREDENTIALS is the secret key used to access the Static Web App for publishing
  • AZURE_FUNC_PUBLISH_PROFILE is the XML formatted publishing profile exported from Azure Functions to allow publishing to the Functions app
  • GA_TOKEN is an optional Google Analytics token to inject into the front-end.

Why Azure?

Part of this exercise was to explore working with Azure Functions using only VS Code and .NET 5.  The other part is productivity; I’ve touched on this before: I personally find Azure to be more productive than AWS, especially when it comes to linking I/O between different parts of your PaaS.

I also found the process of setting up the custom domain and SSL certificate for my custom domain much, much easier on Azure Static Web Apps than on AWS with S3 Static Web Sites.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *