Table of Contents
Local Data
Currently, the model requires globally available indicators. The model cannot train (train being a technical term referring to the parameters of a mathematical model) properly where any input values are absent, and those inputs must be consistent and comparable in order to derive meaningful, generally applicable patterns. Unfortunately, this excludes an enormous corpus of data collected at the national, regional, or even municipal level.
A model can utilize more circumscribed datasets, though, if its geographic scope is similarly limited. As long as each training sample has all relevant inputs available (and these inputs are also available everywhere the model is to be applied) the machine learning techniques will work normally. So a countrywide model can leverage global datasets as well as figures and statistics collected by various national bureaus; it just cannot be applied outside of that country. This opens up a whole new universe of data.
It's not difficult, for example, to imagine datasets that would allow the model or a user of the tool to distinguish between the water situations in Jigjiga, Gilo, and Araarso. National and regional datasets about boreholes exist, for example, showing location, status, and sometimes even water volume and quality. Roughly analogous datasets might exist for birkads, and reservoir data are already available on the tool. One official in Jigjiga told us that 11 people across the district are documenting weather conditions and reporting them regularly via mobile phone. Combining these on-the-ground observations with technical means, such as high-resolution groundwater assessments, could provide a much clearer view of the water situation than is generally available now.
Such untapped data resources abound. Unlocking and accessing them may be no small task, though—it was unclear, for example, how or where those district weather reports were being compiled and collated—but the information exists. Because the methodology1 and the underlying data2 for our model are both freely accessible via the tool, users can fuse the global data table with their own localized datasets to create a specialized model that may be better able to grasp local dynamics.
But, as always, there are trade-offs. A model relying on local data, as indicated, can be applied only locally. The smaller the area of interest, the more datasets there will be that cover it, but at the cost of limiting the model's scope for application and making it harder to trust because of the difficulty of testing it elsewhere. Furthermore, geographic narrowing may in some ways limit the model's ability to evaluate all relevant factors, even as that ability expands in other ways due to the larger pool of available indicators. If the modeled area becomes smaller than the space across which relevant conflict dynamics play out—and neighboring districts or countries are often highly impactful on internal affairs—the model may be blinded to important external factors.
Then there is the political dimension. The model, in essence, tries to connect local conditions with conflict outcomes. When those conditions are rainfall anomalies or vegetative activity, they are uncontentious. Even macroeconomic indicators reflect so many factors and decisions and actions as to resemble intractable natural forces. But zooming in and capturing more and more aspects at an increasingly local level, "conditions" become increasingly specific things, which may represent concrete choices made by particular parties or individuals. For example, one village might have plentiful boreholes while a neighboring village has none. Or two places with apparently similar conditions could experience dissimilar outcomes. The difference may very well be a matter of governance or resource management. But even if this is the crucial differentiator, transforming that into data requires subjective assessments of political activity, which is highly fraught, both technically and politically. For that matter, as noted earlier, the previous governor of Somali region has been accused of gross misconduct, including human rights violations; detailed data on local conditions could conceivably be weaponized by such a politician.
Citations
- [link to technical note and/or analytic scripts for methodology]
- [link to data download page on tool]