Q&A on Harvard University's MyDataCan and TraceFi

Blog Post
Shutterstock.com / bob boz
Sept. 9, 2020

This story is part of our PIT UNiverse newsletter, a monthly update from PIT-UN that shares news and events from around the Network. Subscribe to PIT UNiverse here.

Harvard University recently launched a PIT Lab. This month, the school is piloting two projects to aid in contact tracing and pandemic response.

TraceFi is the university’s contact tracing app, which uses Wi-Fi to determine whether two device owners are in proximity long enough to transmit the virus. MyDataCan is a platform that lets users add multiple contact tracing (or other) apps to their profile and share data between them, as well as make 14 days of data available to a human contact tracer.

PITUniverse spoke with the Harvard PIT Lab’s director, Latanya Sweeney (also director of Harvard’s Data Privacy Lab), and Jinyan Zang, research assistant and Ph.D. candidate in the Department of Government, to learn about the pilots, how PIT students are involved, and how these projects can help other institutions with their virus responses.

Tell us about how the app pilots went, and the response from students.

Prof. Sweeney: We have two pilots, one on MyDataCan and another pilot on TraceFi. So the MyDataCan pilot we just did, we just rolled it out. It was a real success. Harvard brought a few students back, and then the last couple of days they ramped up. And so that long leg is when we ran the MyDataCan pilot, so there weren't that many students on campus. But we have 150 students participating with the apps right now. It was met with a lot of eagerness and zeal.

So that was a complete success. It told us that students are up for doing this, that they're excited about it.

How do your solutions differ from other digital contact tracing solutions we’ve seen?

Sweeney: Most apps store different kinds of location information about you. But [MyDataCan] apps all put whatever they do store into a private data can, or data store, for the individual. And the individual can then play with it, use it, let other apps use it even. And the last 14 days can be used by a human contact tracer. If you... tested positive, they'll use only your data. Or if you were co-located with someone who tested positive, they'll pull up your data and notify you as well. And the reason that's a little bit different than these kind of Apple/Google GAIN apps is because you're notified after a human actually interviewed to figure out whether or not that was a meaningful contact. Was it likely that you actually were infected?

Jinyan Zang: [MyDataCan] is interoperable from multiple senses... it's interoperable between apps if different apps are trying to use the same data and they all have your login, you’ve given them approval through your MyDataCan account, they can access data, if you want to. If you don't want to use this app anymore, and you want to use App B, you can switch over to a different app. But it's also interoperable from the perspective of a contact tracer, where we know... if you want an effective app-based contact tracing model, you really want ideally 60% or more of the population enrolled in terms of the data that's being collected, and the coverage area. And so in that sense, if all the data is interoperable on MyDataCan itself, and the contact tracer has access to that data set for the last 14 days for the specific purpose of contact tracing, then that also allows the platform to support users on multiple apps, all sort of collaborating together.

Could this model for digital contact tracing scale for use by other schools or cities?

Sweeney: We've had schools that are interested in TraceFi. We've had app developers who are interested in putting apps on [MyDataCan]. And we've had others who are looking. An instance from MyDataCan has to have a human contact tracer at the top. So it's very easy for us to stand up lots of instances, one per each school. And also it's easy to stand it up for a city. And so that is our intention, to roll out to some of the Boston area schools and also to the community itself.

For each instance, there has to be a set of human contact tracers that are responsible for a community. So that for a positive case, you have a human contact tracer who's going to know about that positive case, perform that interview process, and so forth. Why is this so critical? Well, the reason that's critical, like in our community at Harvard, the community is dense. And if we just had false positives, and the result of a false positive is you go into quarantine, we end up with lots of people going into quarantine unnecessarily. And eventually people may stop paying attention and not even go into quarantine. So, the very important role that the human contact tracer does is by going through the interview with the person who tested positive and actually identifying which of these encounters that really an infection likely happened. And those are the only ones that need to go into quarantine until we know more.

How have students responded to the programs, and participated in building them?

Sweeney: In May, everyone had already left campus. The semester ended. Over 100 Harvard students volunteered to work on an app at MIT. And that's not our app, because we hadn't even rolled out MyDataCan… but it's that energy, this idea that these individual students wanted to participate, they wanted to be a part of something that was doing this, that led those students to do so. I think when I last checked in with [that project], they had 1700 volunteers.

We also benefited from that same kind of energy... Through this summer, we had a group of 10 students who organized themselves who said they were going to build an app for Harvard students themselves… MyDataCan gave them all a place to plug in. And so it's always been blessed with the participation and excitement of students.

Zang: I think one thing that really worked well for us in terms of the pilot for MyDataCan and getting up to half the students at certain houses on board is that even though it the primary purpose is contact tracing, we were able to build educational materials, actually, working with researchers over the School of Public Health, and at the Broad Institute here at Harvard, on teaching people about pandemics and contact tracing.

One thing that really attracted people, though, to participate in the [MyDataCan] pilot is that unlike other contact tracing solutions, where you download the app, and then hopefully, you're not surrounded by someone who's totally positive. And so nothing will necessarily show up or ping on your phone. And there isn't that instant response that a lot of people are used to when it comes to interacting with apps on mobile phones. By incorporating this kind of virtual pandemic scenario with Operation Outbreak (editor’s note: Operation Outbreak is a pandemic simulation game integrated with MyDataCan in which students compete to contain a virus that spreads from phone to phone via Bluetooth)... it gives people an experience similar to what they would actually see if they were next to individuals who are COVID positive, without actually having the physical risk themselves.

These apps need to be widely used to be effective, and need to be trusted to be widely used. What privacy protections are built into your platform?

Sweeney: We started with the set of fair information practices… what are the types of practices that we want to make sure we incorporate and follow?

So, you know, only collect the minimum information you need, only share the minimum information needed. Since we're talking about location, it’s really only the last 14 days you have to worry about someone's infection… and then you're only going to show it to the contact tracer, who has a need to know based on a positive test case. Access is the ability for you to get a copy of the data yourself under your own control. And accurate, meaning that the data is reviewed with the people who are impacted by the data to assure its accuracy. The system is designed under the accountability principle, that we lock down all the data in terms of security. And there's just a one way pipeline and we keep the location data separate from identity, but the only person who can access it is the human contact tracer. And the system says, gee, every time the human contact tracer accesses the data, it should be on a positive test case. So we built an immutable log to be able to test for that.

In the case of TraceFi, we have an easy opt-out system. And in the case of MyDataCan we have an opt-in system, and then Harvard has levels of security in terms of what you have to do to hold and transmit the data.

What data privacy lessons are there in how we build digital contact tracing systems?

Sweeney: There's all this kind of data sharing, data leaking happening in mobile apps. But all of it is invisible, you have no idea that those individuals have that data. Better yet, what are they doing with the data? So one of the things that [MyDataCan] makes sure happens is that that's not business as usual. It's not that we're gonna just try to sneak some app in there that's [collecting data]. And for the social good of public health, for contact tracing, that's an important good, but we don't need to sneak around and do it. Let's show how apps really should work.

We have a chance to build something. It's not perfect. From a computer science standpoint, I don't consider either of these technologies perfect, because a perfect solution would be one where I could have the kind of anonymity that the GAIN apps talk about, and only reveal identity to the contact tracer only on those at that moment, who really were the ones who need to go and quarantine... But the apps can't tell whether you are wearing a mask or not, whether you are with your device or not. So you need a human in the loop.

We have to make sure we're building these apps in a way that is going to be our model for what we would really want apps to look like and behave like, even when just performing a social good. ⧫

The team is currently preparing MyDataCan for launch and looking at ways to increase adoption. TraceFi will undergo more piloting before being rolled out to the Harvard community.