The Digital Standard Testing Handbook

Product Stability

Criteria: The software is reliable.

See the test in action:

Notes:

This test focuses on fuzzing software (providing unexpected, random, and/or invalid data to a program). Fuzzing is a well-established process within security testing, and is one of the most common means through which security vulnerabilities have been discovered. Due to its highly technical nature, and the per case specificity of the process, a comprehensive look at fuzzing in general (or even specifically the fuzzing of Android apps) is well beyond the scope of this methodology. Fuzzing experience or additional background reading are absolutely necessary for running this test.
For an overview of the topic of fuzzing, as well as some information on popular fuzzing tools, and synopsis of recent research in the field see this post.
Instead of attempting to cover a broad, complex, and ever evolving topic, this methodology identifies useful tools and documentation on fuzzing, and focuses on what types of fuzzing outcomes to look for.
Given the case specific nature of every fuzzing test run against a piece of software, and the general documented lack of industry benchmarks making it difficult to compare the results of fuzzers, at this point it is not practical to provide a PASS/FAIL framework for this test. All of the results of this procedure should be qualitatively evaluated by testers based on the specifics of the software being tested and the background knowledge of how the software is supposed to operate.

Indicators

The software is not susceptible to crashes.
If the program is forced to unexpectedly terminate, it shuts down in a safe and responsible fashion.
The software is not vulnerable to algorithmic complexity attacks.

Methodology for Assessing Each Indicator

1) The software is not susceptible to crashes.

Note:
- This methodology only considers Android apps that are used to interact with physical connected devices, or self-contained Android applications being tested using the Digital Standard. We made this choice due to the open nature of Android, and the wide availability of free tools for inspecting Android app code. However, products may also use many other kinds of software that would be evaluated as part of Digital Standard testing. For example, many connected devices will have apps tailored for both Apple iOS and Android, or may run software that is unique to that specific product, but could be fuzzed. We recommend that the tester research the toolsets and best practices for fuzzing in the coding languages and/or platform environments of that other software,and use that information to craft additional rounds of fuzzing tests for that software.
Note:
- There are many tools and resources for fuzzing Android software, as well as multiple approaches to use for fuzzing an Android app. This methodology points towards one approach, though others may also yield valid results for these tests.
- The Security Testing pages on the Android project page, particularly the documentation on libFuzzer, will cover the current state of Android app fuzzing.
- For a background in fuzzing on Android, see this paper or the accompanying lecture delivered at a security conference in 2015, which walks through several approaches. Though somewhat dated, it does provide a good look at the landscape of the topic.
Obtain and configure a new version of the American Fuzzy Lop (AFL) fuzzing tool. The linked AFL page is a version of the main AFL project maintained by Google to include support for the Android platform.
Using the documentation provided with AFL, run a variety of fuzzing tests on the app. These will require generating, or finding examples of, preseed files for the fuzzer. The AFL documentation covers how to generate appropriate seed files for your fuzzer.
Run the tests for long enough to uncover crash scenarios. At minimum, a fuzz test should be run a few thousand times, but ideally leaving a test running for longer, and testing with more target devices will yield better and more reliable results. More test cycles will better isolate the causes of crashes, and increase the likelihood that a crash can be reliably reproduced, and that the correct input conditions to create a crash are logged by the fuzzing tool. The goal is to complete a high number of tests, however there is no predicting the amount of time each test will take to complete. Every variable in the test, from the size of the codebase being tested, to the hardware the test is running on will affect how long it takes to run.
- Determine whether crashes occur often enough that AFL can reproduce and group them over enough runs to feel confident they are not random.
In order to see the code coverage information (what part of the library crashes occur in), you will need to obtain and configure afl-cov, which is a companion to the AFL program that takes AFL output and creates code coverage information. According to the project page, “code coverage is interpreted from one case to the next by afl-cov in order to determine which new functions and lines are hit by AFL with each new test case.”
- Determine whether sufficient amounts of the software were covered in fuzz testings to find errors hidden deeply in software.

2) If the program is forced to unexpectedly terminate, it shuts down in a safe and responsible fashion.

Note:
- This methodology only considers the Android app that is used to interact with the product. We made this choice due to the open nature of Android, and the wide availability of free tools for inspecting Android app code.
Note:
- There are many tools and resources for fuzzing Android, as well as approaches to go about fuzzing an Android app. This methodology points toward one approach, though others may also yield valid results for these tests.
- The Security Testing pages on the Android project page, particularly the documentation on libFuzzer, will cover the current state of Android app fuzzing.
- For a background in fuzzing on Android, see this paper or the accompanying lecture delivered at a security conference in 2015, which walks through several approaches.
- Typically a fuzz test will either crash (cause the program to quit or exit) or hang (cause the program to become unresponsive) a piece of software. This indicator investigates what state a system is left in when software is crashed, hung, or unexpectedly terminated.
Obtain and configure a new version of the American Fuzzy Lop (AFL) fuzzing tool. The linked AFL page is a version of the main AFL project maintained by Google to include support for the Android platform.
Using the documentation provided with AFL, run a variety of fuzzing tests on the app. These will require generating, or finding examples of, preseed files for the fuzzer. The AFL documentation covers how to generate appropriate seed files for your fuzzer.
Run the afl-fuzz tool with “-C” option, which will enable “crash exploration” mode. The AFL documentation describes in greater detail how to triage crash information to determine types of access an attacker could gain from crashing the app.
- Determine whether the app crashes in an unsafe way that leaves code paths open, memory unreleased, or other potential attack surfaces for which an exploit could be written that could lead to the first step of a hack on the device, or access to private information from outside of the app.

3) The software is not vulnerable to algorithmic complexity attacks.

Note:
- An algorithm is generally designed with its own average use case in mind. Algorithmic complexity attacks take advantage of “best-case” assumptions, and trigger the algorithm’s “worst-case” behavior in order to exhaust system resources. What that is will vary by algorithm purpose, but as a simple example, assume a device checks for weather updates, and has a setting for how often to check, but does not limit that frequency. So where the designer assumed no one would need to check more often than every minute, the complexity attacker checks ten times a second, crashing the device. How to detect and guard against these types of attack through fuzzing is an area of active interest. For an overview of the issue see this presentation.
Note:
- Fuzzing for algorithmic complexity in Android apps has traditionally taken a great amount of specific knowledge of the libraries used in that app for the generation of custom preseed files. But recent work shows great promise in making that process automated for all Java libraries, which means that going forward less specific knowledge of algorithms used in Java libraries will be required for creating high-quality test preseed data.
- Since the method described in the HotFuzz paper is so new, the tooling suggested has not yet matured to the point where good documentation has been written. Automated test generation will likely lead to improved testing.
Obtain and configure a new version of the American Fuzzy Lop (AFL) fuzzing tool. The linked AFL page is a version of the main AFL project maintained by Google to include support for the Android platform.
Using background knowledge about the app and its libraries, craft preseed files that could take advantage of a complexity attack (for example providing maximum inputs into every field), and use them as part of an AFL test run. The kinds of background knowledge would include things like the coding language it is written in, what inputs it accepts, or how it is “supposed” to function when working as intended; gathering this knowledge will likely require additional product specific research.
- Determine whether the app is susceptible to an algorithmic complexity attack, based on fuzzing results.

Education & Work

Democratic Futures

Global Security

Technology & Democracy

Thriving Families

Trending Topics

Real Skills, Real Income: Why Youth Apprenticeship Is Resonating Now

Future-Proofing U.S. Nuclear Policy: Forecasting Outcomes of the Nuclear-Armed Sea-Launched Cruise Missile

Debunking Myths on Student Parent Data Collection

The App Store Accountability Act Poses Serious Concerns for Privacy, Security, and Free Expression

Redrawing School Boundaries for Fairer Funding

Reframing Fusion Voting as a Practical, Powerful Reform Strategy

Harnessing Terrorism Data to Reshape U.S. National Security Policy

Establishing a National Housing Loss Rate

New America Fellows

The Understated Value of Regional Intermediaries for Workforce and Economic Development

Evictions in the District of Columbia: June 2025 – February 2026

The Charleston Regional Youth Apprenticeship Model

Accreditation 101: A Fireside Chat on How Colleges Are Measured

The Digital Standard Testing Handbook

Table of Contents

Product Stability

Indicators

Product Stability

The Digital Standard Testing Handbook