Table of Contents
- What is the Digital Standard?
- Who Created and Maintains the Digital Standard? Who can Contribute?
- Why is Testing Important?
- Why was this Testing Handbook Necessary, and Who is it For?
- How does the Handbook Score Products?
- How did we Pick the Products? (And Why aren’t We Naming Them?)
- What Products did we Ultimately Choose?
- How did we Design the Technical Testing Procedures?
- How did we Design the Policy Testing Procedures?
- What would we Change in the Standard?
- Conclusion
What would we Change in the Standard?
Developing our testing handbook and testing the three selected products allowed us to make observations about the standard, and consider ways that it could be refined and improved. As it is an open source project, contributors are encouraged to suggest changes and updates to the text. These are seven recommendations based on our experience.
1. The Digital Standard should provide more context as to what the tests mean and why each best practice is important for protecting privacy and security.
As it stands, the Digital Standard tests allow testers to gather results and answer questions, but do not provide testers the necessary tools to interpret those results in the broader privacy and security context. Each best practice that the standard tries to incentivize is important for a different reason, not all of which will be apparent or familiar to all potential testers or users of the standard. For example, establishing whether a company publishes a regular transparency report regarding a product is less helpful if testers or readers do not understand the value of or rationale for transparency reporting. Further, evaluating the data retention practices of a product will be more informative and meaningful if testers understand how data retention or minimization can affect privacy and security.
Although some of these best practices are relatively straightforward, many are not. If a goal of the standard is to enable third-party organizations or individuals to test their own products, then we need to make sure that they have the necessary understanding of the relevant 35 areas of digital security and privacy. Providing context and links to best practices would make the Standard more effective and informative, especially because one of the goals is to help consumers make informed decisions about the products they use or purchase.
Some of the tests may be more relevant than others, depending on the product being tested. For example, it is probably less crucial that a product that obviously collects very little personal data provides as comprehensive a set of third-party request policies than one that collects detailed biometric and location data. Passing or failing that test has different consequences for different products. Understanding the context of the tests, and the best practices they evaluate, makes the results of those tests more meaningful and valuable.
2. The Digital Standard should include guidelines for how tests are meant to be scored.
The Digital Standard, as written, does not provide any information about how to score tests, instead offering criteria and indicators. Our initial approach was a simple Pass/Fail system, but we quickly realized the need for more nuance, and included a Partial Pass grade as well as a Not Applicable grade to broaden our range of options. For example, a product that does not collect any personal information in the course of its operation would otherwise get a failing grade when evaluated on whether it gives users the ability to control what information is collected, so we might mark it as Not Applicable. But even our solution is unsatisfying insofar as it doesn’t positively reward the product for not collecting user data in the first place, acquiring no data for users to control.
In order for our processes to be replicable by other testers, we needed to develop clear conditions for Pass, Partial Pass, Fail, and Not Applicable results, so that anyone using the handbook would be working from the same set of rules. A more nuanced alternative could have been a numerical score, but that proves even more subjective without a ridgid rubric for each score. What one tester may consider average is different from another, and this proves even more subjective due to the varying expertise of testers who may have different standards for success or failure. Although our resulting scoring system was the clearest process we could come up with, the standard itself should provide official guidance on how to score tests and indicators to insure replicability and trust in the results.
3. Components of the Digital Standard should be weighted according to the priority of the tests and indicators when actually measuring the impact of these practices on privacy and security.
Not all of the 35 tests included in the Digital Standard are equally important in evaluating how a product would perform in protecting a user’s privacy and security. At the present time they are not weighted or prioritized, even though some are clearly more important and crucial than others. For example, whether a user has the right to repair a product themselves is not weighted any differently than whether that product uses strong authentication practices. This lack of weighting or prioritization can inaccurately reflect the quality of a device should, for example, it pass all of the crucial tests but fail some of the more obscure ones.
Similarly, tests that include multiple parts do not prioritize individual indicators for evaluating whether a product has passed or failed. We attempted to reflect this through partial pass results, but this is a limitation of the Digital Standard more broadly. For example, an indicator measuring whether a company discloses the kind of encryption they implement is not as important as an indicator measuring whether encryption is actually used or not, but both are considered necessary to pass the encryption test. This kind of revision could also address the questions of dependencies between indicators. For example, if a company does not produce a transparency report, then the product cannot be tested on any subsequent indicators that require reviewing a transparency report.
A hierarchy of importance in both tests and indicators could also be useful in signposting for testers and consumers of the Digital Standard which of its many indicators to focus on first; which ones are vital, and which are helpful, but not critical.
4. Tests and indicators need to be constantly updated due to the changing nature of digital security best practices. They also need to provide internal flexibility to adhere to best practices for specific products.
While conducting our own testing, we noted a number of indicators for which the conditions for a product’s pass grade will have to change over time due to advances in technologies or best practices. Technology is changing so quickly that what may be a best practice in 2020 could easily change in the future. For example, one indicator calls for services to require passwords that are at least eight characters long, which was set at the last time the standard was updated. Best practices for password complexity are constantly changing, for instance by increasing the minimum required number of characters or requiring the addition of various capitalizations, special characters, or numbers. The Digital Standard should try to build in some kind of future planning so that it stays flexible, either by explicitly sunsetting some indicators and requiring the Digital Standard contributors to decide whether to update the requirement, or by pointing to some external standard that does shift with evolving reality, such as those published by National Institute of Standards and Technology (NIST).
Relatedly, there were a few instances where we had trouble rating a product on an indicator when the product was actually too good and implemented best practices for privacy and security that were not yet included in the standard. For example, the baby monitor we tested failed a strict interpretation of the two-factor authentication test, however the handset and device implemented other safety features that served as factors that limited the ability for harm. While not second factors in the strict sense, the monitor achieved a similar security benefit by requiring both the handset and the device to operate on the same local secured Wi-Fi network in order to communicate. This gap is partially because the standard is not regularly updated to include current best practices, and partially because best practices vary so much from product to product that it is hard to capture them in an evaluation system that is supposed to apply to a variety of product verticals. This problem could be addressed through an improved scoring system that, for example, allows adjustments to final ratings using some kind of extra credit measure or compensating overall score for specific practices. Alternatively, extra indicators could be added that allow for the recognition of supplemental best practices beyond the scope of the existing Standard criteria.
5. Tests that require analysis of a product’s legal and policy documents (specifically privacy and terms of service/use policies) need to be more specific about what best practices are required under the Digital Standard, perhaps differentiating by product category or capability.
We found that our analysis of legal documents relating to products that we tested was hampered by a lack of clarity in best practices in these areas (particularly as they pertain to smart devices). Products collect different types of information, or offer different features, that make it hard to create a scorable template for an ideal privacy policy. For example, a product that does not collect sensitive information is still required to include processes for law enforcement requests of different types and from different jurisdictions. A Not Applicable rating requires the tester to subjectively decide whether a specific indicator is important and relevant for a specific product. Alternatively, requiring every product to include all indicators in their privacy and terms of service policies could either penalize products for failing to provide unnecessary information, or make the policies much more confusing than they need to be by forcing the inclusion of irrelevant information.
In its technical tests the standard already differentiates the processes for testing by specific features of the product being tested. For example, the Known Exploit Resistance test provides different procedures for products that use browsers, apps, and connected devices. A product may use two or more of these things, like the baby monitor we tested which consists of a mobile app and a handset. Similar differentiation could be applied in the policy sections of the standard as well. These could be differentiated by the type of data the product collects, for example one that collects personally identifiable information versus one that does not, or some other method of identifying whether a product poses a higher privacy risk.
Deciding what the ideal policies for a specific product should look like requires expertise in that product vertical, relevant regulations, and for the tester to make subjective decisions about what best practices should be expected from a given product. This makes testing inaccessible to non-experts and does not ensure that different products will be tested using a similar rubric. There are debates among experts about the virtues of simpler, more limited privacy and terms of use policies, which may be more accessible to a reader. Shorter privacy policies are also more likely to be reviewed by customers, compared with longer and more comprehensive policies that may address every possible best practice, but might be discouraging based on length.
The standard needs to make more explicit its preferred best practices, potentially providing product or capability-specific guidance. It also needs to clearly define terms like “understandable” to allow for replicable and consistent testing.
6. The Digital Standard needs to address the fact that companies’ legal documents are often convoluted or vague due to complicated product lines, partnerships, and supply chains.
We had challenges evaluating the legal commitments that companies expressed in their terms of service and privacy policies because of the sometimes complicated connection between the legal terms and the products being evaluated, and because of situations where multiple companies have different legal obligations all covering the same product. Companies that have many different services and products, for example, may have documents that are not explicit about which products are covered by which policies. Similarly, it can be hard to distinguish between policies that are meant to apply to physical products and those which are only intended to cover a web site. Companies that have many products can also present difficulty, as it can be hard to be sure which policies apply to the product under evaluation. This mess of legal documents isn’t a flaw of the Digital Standard, of course, but there are some indicators that would benefit from more clarity about what it expects from companies and how detailed the language it is looking for should be.
In other circumstances we observed issues that privacy concerns arose because a product being evaluated included features from another company. For example, if a product interfaces with a home assistant platform, that interface may have impacts on privacy that are not addressed in the manufacturer’s policies. It may also mean that policies apply to specific aspects or features of a product, but not others. The IoT is a connected web of products which, by design, interface with other products or services. The complexity of these relationships and connections means that best practices for evaluating a product’s legal documents need to provide more guidance to ensure that testers are consulting the correct material and accurately placing responsibility for the privacy and security of users’ data.
7. The Digital Standard needs to address the fact that, as written, some indicators cannot be accurately tested.
There are some indicators in the Digital Standard that can only be evaluated by observing the device being tested and making educated guesses as to how the product is behaving. It is nearly impossible to arrive at a confident statement as to whether a product passes or fails. The best reviewers could do is to make observational assumptions. For example, one indicator regarding data use reads, “The company explicitly discloses every way in which it uses my data.” Finding the ways that the company states they will use the data is usually as easy as reading a privacy policy. However, an outside observer seeking to determine how they actually use the data, can only make educated guesses by watching the product and looking for behavior that would demonstrate some data use not mentioned in the policy.