COVID Trace Privacy In-Depth

April 01, 2020

Automated contact tracing only works if nearly everyone participates and has a high degree of trust in the process. COVID Trace hopes its process earns that trust by striking the right balance in protecting privacy while helping public health.

Why it matters knowing where contact tracing happens

Before looking into the details of COVID Trace, we need to look at why it’s essential to know where your location information is compared to known exposure time and locations.

To understand why it matter, take a look at how governments have tackled this problem. Both South Korea and Singapore used mobile phone data to do contact tracing. South Korea collected and used the location data from mobile phones without the consent of its citizens. Singapore developed an ingenious solution that uses BlueTooth to detect proximity without using location data.

The Singapore model is better than the South Korean model from a privacy perspective. In Singapore, the person who tested positive for COVID-19 shared the temporary tokens exchanged via BlueTooth from other devices. Those temporary tokens are associated back to a phone number to be notified of exposure. Unlike in South Korea, only the details of the COVID-19 positive person and the people they were in contact with become known to the government.

So, very different approaches, but they’re fundamentally the same in that everyone has to register with a central organization. Now, when contact tracing occurs, the organization knows something about you that you didn’t choose to share with them explicitly. A single, massive registry operated by either the government or by an organization, becomes particularly attractive to law enforcement or hackers.

For the reasons listed above and several more, the centralized approach to doing contact tracing is likely to be unpopular in the US.

In contrast, if contact tracing is private on each person’s phone, then having everyone participating share details with a single organization or government is avoidable.

Earning Trust

Our first step to earning trust is to have explicit and clear claims in the app about what we do and when. We provide a way to exclude ever recording your home data. The app will also show you all of the recorded locations it has captured. The messaging in the app aims to be clear about what it’s currently doing.

The second step to earning trust is understanding the architecture of the service and the implications of those decisions. It’s worth taking a look at the architecture post since it provides the needed context for the states and behaviors below.

What the app does when

It’s helpful to think about what the app does in the 4 distinct states/steps. The states outline the data captured and transmitted, and then below we discuss some of the implications.

No exposure, no self-report state

No exposure, no self-report is the initial state of all clients. While in this state, the app will record locations only outside of the area defined as your home.

Approximately every hour, the app will get all distinct s2 geometry level 18 areas over the past 3 weeks that have not been downloaded to check for possible exposures.

Exposure, no self-report state

The app determines a possible exposure after it has downloaded the latest data and compared it against the local location history. After the user responds to the notification, the app requests the user to send us a notification that an exposure event occurred.

If it’s the first time that the client submits data, then the user will go through the text message verification flow discussed below.

Self-report state

The user submits a self-report indicating that they have COVID-19. As a result, we will send their symptoms and all location data starting 8 days before the first symptoms. After submission, we send new location history (outside of the home area) daily for the next 15 days.

Again, if it’s the first time that the client submits data, then the user will go through the text message verification flow.

Text Message verification state

We are loathed to require people to sign-in with any account or email. However, we do need some way to ensure that the information submitted is authentic. Our authentication is via text message requiring the user to provide us with their phone number. After we verify the phone number belongs to the app, we then provide a secure token to be used by the app in subsequent submissions to our service. The phone number is only ever used for verification and never stored.

Information Leakage

A possible consequence of checking for new exposures in the No Exposure, No Report stage is information leakage. When requesting exposure data for specific areas for the past 3 weeks, the more precision in the requested areas, the more the app is telling us about where and potentially when (at least the first time) the user was in that area. This information leakage will be an issue if the requests are intercepted or logged. We do not log them. Interception is unlikely, but not an impossibility.

The mitigation of this information leakage is that we don’t support shard data on Google Cloud Storage below S2 Geometry Level 12. By restricting to a reasonably broad area, the logs of requested areas do not leak location in greater detail than Level 12. It’s important to note that we also do not use any kind of unique ID when requesting the data from Cloud Storage.

Potential future mitigation would be for the app to mask the location history by requesting sensible but random areas it does not care about in addition to the needed areas.

Why capture exposure events?

Capturing exposure events requires recording the time and place of exposure. This will likely be a small subset of the person’s location history.

Receiving reports of exposure is valuable to both us and health officials. For health officials and organizations, it gives them a valuable picture of how much the virus potentially continues to spread, given whatever level of precaution that area is currently exercising.

The mitigation here is two-fold:

The user explicitly has to share this data.
For every S2 Geometry Level 10 area, we will record only the day the exposure occurred.

Publicly available time and place of exposures

COVID Trace publicly shares the hour and the place of submitted location history of people who have reported infections of COVID-19 from 8 days prior to the onset of symptoms up until 15 days after submission. The data submitted after reporting an infection is uploaded daily. Only data outside of the area defined as the home will be shared.

The volume of data exposed here is considerably more than any of the other scenarios. We’ve done the following to balance the need for accuracy against maintaining privacy.

The time and place information shared publicly has no unique identifier — that information is not tied to an individual. The accuracy of the time reported is hourly. We can’t say exactly when in an hour that person was in that area. There is no other information shared along with the hour and place of exposure.

Conclusion

Achieving the goals of protecting privacy and public health at the same time is challenging. Without a doubt, this isn’t the perfect solution but hopefully the best. We’re open to feedback so please reach out.