Table of Contents
- Executive Summary
- Introduction
- Current State of Knowledge
- Exploring the Intersection of OSINT and Data Privacy in the Digital World
- Methodology and Results
- Analysis and Assessment of the Impacts of OSINT on Data Privacy
- Explaining and Developing the OSINT Privacy Impact Framework (OPIF)
- Conclusion
- Appendix 1 | Survey Findings
- Appendix 2 | Reflections from Research Webinar Focus Groups Discussion
- Appendix 3 | Interview Protocol for Semi-Structured Interviews
Methodology and Results
As part of this research, a multifaceted methodology was employed to capture a broad spectrum of insights from experts, practitioners, and enthusiasts in the field to ensure a comprehensive understanding of the current sector and future prospects of OSINT and AI integration. The methodology is structured around two primary data collection phases.
Phase one included a research survey targeted at a wide range of professionals. Recognizing that the complex and specialized nature of AI applications in OSINT requires in-depth, contextually relevant insights, the study purposefully sampled individuals with substantial professional experience and specialized knowledge in AI, privacy, cybersecurity, and OSINT.
Through purposeful sampling of experts in the field, the study prioritizes the quality and relevance of data over quantity, thus enhancing the study’s internal validity. Participants were identified based on their established credentials and active roles in AI, privacy, cybersecurity, and OSINT, ensuring that each contributor could provide informed perspectives. This method aligns with qualitative research best practices, where deep, nuanced understanding from domain-specific professionals can yield findings that are both insightful and directly applicable to the study’s focus. The survey was designed to capture detailed information under three themes: current practices, challenges, and the perceived future direction of these domains. The quantitative survey findings come from 51 expert respondents, and are represented in this report as rounded percentages.
In addition to the quantitative research survey, semi-structured interviews and a research webinar were conducted for additional qualitative insights. The semi-structured interviews were conducted with selected survey respondents to provide a more nuanced understanding of individual experiences and expert opinions. A research webinar featuring industry practitioners was used as a dynamic platform for discussing recent advancements, challenges, and ethical considerations in the field, further enriching the research data with dynamic exchanges and viewpoints.
Phase two used a methodology that included a thorough assessment of sample tools commonly used in performing OSINT. It helped identify key features, effectiveness, and user sentiments, contributing an additional layer of practical insights to the research findings. Together, these methodologies created a strong basis for understanding the evolving dynamics of AI in OSINT as it relates to individuals’ privacy, facilitating a comprehensive analysis of both the current state and future directions in the field. This multi-pronged approach ensured that the study captured a wide array of perspectives, making the findings relevant to both practitioners in the field and policymakers interested in the ethical and practical implications of AI in intelligence gathering.
Phase One: Empirical Research Data Review
This section presents an overview of the demographic characteristics of the purposive sampled 51 expert respondents who participated in the survey, a key component of the study. These participants brought diverse perspectives based on their expertise and experience. The survey captured various primary areas of experience, as shown in Figure A1, referenced in Appendix 1. The professional work of the expert sample spans multiple fields, resulting in overlapping responses across several areas, which accounts for the totals in the subsequent section exceeding 100 percent. A significant 74 percent of respondents identified general cybersecurity practices as their main area of focus, followed by 56 percent who emphasized privacy concerns. Half of the participants (50 percent) reported expertise in cyber forensics and intelligence investigations, while 46 percent reported specialization in OSINT. Additionally, 42 percent of respondents worked in AI. The survey also revealed that 22 percent of participants were involved in legal and ethical activities, highlighting an interest in compliance and risk management. Furthermore, 28 percent indicated involvement in other related fields, reflecting the interdisciplinary nature of the study.
The distribution among the respondents was broad in terms of years of experience, reflecting a mix of emerging talent and seasoned experts. Specifically, 41 percent of participants had one to five years of experience, 22 percent fell within the six to nine years range, and a notable 37 percent possessed over 10 years of experience, indicating a deep reservoir of knowledge contributing to the survey. The participants held a variety of titles, ranging from founders, cyber lawyers, frontline analysts and researchers to senior managers and policy advisors, each contributing insights into the study, as illustrated in years of experience shown in Figure A2, referenced in Appendix 1.
Survey Results
The survey representation showed that OSINT practitioners had not fully recognized the significant interest among professionals in regulating AI-integrated OSINT. This reveals overlooked perspectives and highlights a growing concern within the field about the ethical and legal implications of these advanced technologies. Additionally, the survey uncovered an unexpected trend: a relatively heavier reliance on HUMINT alongside AI tools. These findings underscore the evolving landscape of intelligence practices, where traditional methods continue to play a crucial role even as AI becomes more integrated. The survey examined three distinctive themes: current practices of OSINT and AI integration, ethical and legal concerns in AI-integrated OSINT, and OSINT privacy-preserving framework in the age of AI.
The current practices surrounding OSINT and AI integration reflect a wide engagement across various types of OSINT, with professionals utilizing sources such as the web, social media, and technical data. Despite this engagement, there remains a significant lack of awareness concerning AI-integrated OSINT tools, as 73 percent of respondents were unfamiliar with them. However, for those aware, social media platforms and public databases were the primary sources utilized. The perceived accuracy of AI-integrated OSINT is moderate, with 49 percent of respondents rating it as somewhat accurate, indicating room for improvement in the reliability of AI tools in this domain (see Figure A5).
Ethical and legal concerns are prominent in the use of AI for OSINT, particularly regarding privacy and bias. Nearly one-third of respondents expressed high concern over privacy implications, while most indicated moderate concern about biases in AI systems. A significant majority, 69 percent, support regulation to safeguard privacy and enforce ethical guidelines (see Figure A8). Transparency in the use of AI-integrated OSINT tools is also a major issue, with 63 percent of respondents advocating for full transparency to maintain ethical standards (see Figure A10). These findings underscore the need for a balanced approach that integrates both regulation and ethical considerations.
There is a clear gap in the awareness of privacy-preserving frameworks for AI-integrated OSINT, as 100 percent of respondents were unaware of such frameworks. This highlights an opportunity for further development in this area.
Confidence in the effectiveness of privacy-preserving frameworks is varied, with only a small percentage expressing high confidence. Additionally, respondents showed overwhelming support for the development of privacy-first frameworks, with 86 percent in favor (see Figure A16). The consensus also extends to the belief in the need for international agreements to regulate the use of AI in OSINT, with 88 percent agreeing on this necessity, reflecting global concern over the ethical use of AI in intelligence gathering (see Figure A17). The full survey and results can be found in Appendix 1.
Research Interviews
To supplement the surveys, a mixed-methods approach, incorporating both quantitative and qualitative data, was employed. This provided further expert perspectives on the responsible integration of AI within Open-Source Intelligence (OSINT).
In this track of research, seven participants with professional and scholarly backgrounds in AI ethics, privacy, and OSINT applications, were selected to offer diverse insights into ethical and privacy considerations surrounding AI in OSINT. Five participants engaged in a hybrid survey-interview format, beginning with a structured quantitative survey followed by open-ended qualitative interview questions to deepen the data collected. Two participants participated in even more in-depth semi-structured interviews, allowing for follow-up questions, and a richer exploration of the complex ethical issues and more nuanced understandings of each expert’s views.
All qualitative responses were coded thematically to identify recurring themes, challenges, and best practices recommended for AI-integrated OSINT. A summary of key findings from both types of interviews is presented in Table A1 referenced in Appendix 2.
Research Webinars
A research survey webinar roundtable was conducted, structured around four focus groups to gather diverse insights. Each focus group comprised a carefully selected mix of participants, ensuring a balanced representation of industry experience. The findings in Table A2 under Appendix 2 represent the focus group discussions, offering precise analysis of expert opinions and highlighting key trends and concerns in alignment with the focus group themes.
Phase Two: Tools Assessment
In this section, the research examined various aspects of data management in OSINT tools, and how the tools ensure privacy and security compliance. The evaluation focused on data collection practices, retention requirements, and robust encryption measures, and on the capabilities and limitations concerning data management for three critical OSINT tools: Shodan, Maltego, and SpiderFoot. Table A3 in Appendix 2 illustrates a detailed breakdown of the tools assessed as part of the research report. The tools assessment forms an integral part of step one in the OPIF.
Shodan, a search engine for internet-connected devices, retrieves information from banners about various devices, such as routers and servers, by leveraging their publicly available IP addresses. Although Shodan does not collect personal data, it exposes device vulnerabilities and can be misused. The evaluation identified significant data exposures, such as MongoDB databases with millions of credentials. Shodan does not seem to have comprehensive data retention policies, beyond using cookies to store user preferences. While the tool employs encryption for data in transit, it lacks transparency reports or clear guidelines on third-party integrations, posing potential compliance risks.
Maltego, an advanced link analysis tool, specializes in gathering OSINT data and representing it through visual graphs for easy identification of patterns and relationships. It aggregates data from diverse online sources via APIs, web scraping, and user-contributed transforms. While Maltego claims compliance with GDPR, it does not store data for investigations but relies on trusted third-party sources to ensure data relevance. The tool excels in integrating third-party data but presents potential privacy concerns due to the volume of sensitive information handled during its investigations. Though its encryption mechanisms are robust, Maltego’s reliance on external data providers raises questions about the security and privacy risks posed by third-party integrations.
SpiderFoot automates OSINT for threat intelligence and reconnaissance by analyzing publicly available information and directly querying target systems. It uses a localized web server for real-time scans, minimizing data retention risks as collected data is typically stored temporarily in memory. SpiderFoot supports secure data transmission through encryption and offers strong authentication mechanisms to control access. While the tool adheres to GDPR standards, the potential for users externally storing data introduces retention risks. Nevertheless, SpiderFoot’s focus on automation and customization allows it to provide comprehensive intelligence for identifying vulnerabilities, making it valuable for both offensive and defensive security operations.