Responsible usage of generative AI with unstructured data

15 Sep 2025

Explore six responsible security considerations for using generative AI and unstructured data. 

The way generative AI (GenAI) mines and adds context to data plays an important role as organizations consider managing critical security risks surrounding data quality, accuracy, integrity, and credibility — also known as data veracity. These risks extend to personal data privacy, intellectual property, corporate integrity, and public liability. 

Generative models have become increasingly popular in artificial intelligence due to their ability to create content, including text, images, video, and music. As businesses begin to utilize GenAI to unlock creativity, innovation and productivity, it is crucial to prioritize security, particularly when it comes to unstructured data. By recognizing and understanding the risks, business leaders can make informed decisions and achieve better outcomes. 

By 2026, enterprises that directly tie GenAI to document processing will see a 20% increase in new use cases, leading to increased productivity, scale, and improved customer experience. 

— IDC FutureScape: Worldwide Future of Work 2024 Predictions, doc #US49963723, October 2023 

Common use cases for generative AI and data 

Regarding content creation, GenAI has evolved to combine search capabilities and context by sifting through piles of big data to create intelligently written answers. GenAI platforms are based on sophisticated algorithms that learn to quickly yield results and produce answers or new content that mimics human creativity. The results can lead to employee productivity, operational efficiency, compliance and more — with the caveat that data veracity is intact and security protocols are in place, such as data analysis and classification. 

While practical applications of GenAI are at the beginning stages, we are seeing an uptick in these use cases: 

  • Inbound and outbound information management 
  • Content creation and personalization in marketing, social media and email 
  • Document digitization and document classification and separation 
  • Data augmentation and synthesis 
  • Conversational AI and Chatbots 
  • Extractive and abstractive summarization 

However, as we delve deeper into the practical applications of these tools, especially within the framework of unstructured data, a critical issue looms large: safeguarding data integrity and security in the face of GenAI. Information governance can play a significant role in supporting data integrity and security. 

Generative AI and unstructured data 

Unstructured data, which makes up most of the global data, does not conform to a specific data model, making it challenging to analyze and process data using traditional methods. This is where generative AI shines, as it can comb through and interpret unstructured data sources to generate new insights, narratives and content. This capability has profound implications across industries, from enhancing customer engagement and optimizing workflows to fueling AI-driven decision-making. 

However, harnessing the power of generative AI and data introduces new challenges, particularly around data privacy, compliance, ethical content creation, and the fundamentals of bias and fairness inherent in these systems. Compiled with that, the massive amounts of unstructured data are often unsecured, significantly contributing to security breaches, privacy violations, high IT costs, and compliance penalties. It’s particularly subject to cybersecurity attacks because bad actors can use it to target and gain access to other applications, systems, and devices. 

Overall, unstructured data threatens the fundamentals of data veracity when used with GenAI. While GenAI can use and access unstructured data more effectively than other AI models, the risk is that it learns from the data it uses. If the data is of poor quality, accuracy, integrity and non-credible, the outcome and context will be incorrect or misleading. Therefore, business leaders should be aware of the state of their existing data. “Clean” data is now more critical than ever as GenAI becomes more mainstream. 

This brings us to tackling the unstructured data challenges, specifically dark data and ROT data. Gartner® defines dark data as “the information assets organizations collect, process and store during regular business activities but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). Dark data often comprises most organizations’ universe of information assets. Thus, organizations often retain dark data for compliance purposes only. Storing and securing this data type typically incurs more expense (and sometimes greater risk) than value.”

Then, we have the redundant, obsolete, and trivial (ROT) data within business data stores. As the acronym implies, this data offers no value to the organization but often makes up most of the unstructured data. ROT takes up valuable data storage, impedes effective search, and opens the organization to security issues, particularly personal privacy. 

With file analysis, intelligent document processing (IDP), and enterprise content management (ECM) applications, dark data and ROT data can be identified, classified, and, where necessary, eliminated. 

Six security considerations for using generative AI 

Information governance and security measures must be reinforced to be good, responsible corporate citizens when using generative AI and data. Here are six ways organizations can impact data veracity to create better outcomes: 

  1. Data Privacy 

    Unstructured text, images and multimedia data inherently carry immense potential and risk. On the positive side, it’s a wellspring of rich information for generative models to learn from and create with. However, its very nature as unstructured data also risks data privacy and confidentiality. Ensuring that sensitive information within documents, emails, and other data remains securely protected is a cornerstone of ethical AI usage. 
  2. Compliance 

    For industries governed by strict regulations, such as healthcare or finance, the use of GenAI must maintain proper compliance standards to protect against data breaches and maintain the confidential nature of personal and proprietary information. Ensuring that GenAI applications are developed and managed in full compliance with data protection laws is mandatory. 
  3. Ethical content 

    The generation of content by AI raises a host of ethical considerations, from the potential for deepfakes and misinformation to issues of copyright infringement and cultural sensitivity within content creation. Business leaders must oversee the development, adoption and monitoring of GenAI applications to ensure they align with ethical guidelines in content generation. 
  4. Bias and fairness 

    Another critical area to address is the mitigation of bias within generative models. The quality of output from GenAI relies heavily on the diversity and quality of the input data — both structured and unstructured. If the training data is biased, the model may perpetuate or even exacerbate existing biases in its output. Data veracity, again, is essential to proper representation and accuracy. Businesses must employ rigorous bias detection and correction processes to ensure the generated content’s fairness. 
  5. Security measures 

    Protecting unstructured data accessed by GenAI requires a robust, multi-layered security approach. Encryption, access controls, and regular cybersecurity audits are essential to safeguard against attacks or breaches. Implementing secured information management and governance practices at the source ensures data integrity and minimizes the risk of exploitation by malicious actors. 
  6. Transparency and accountability 

    For users and stakeholders to trust the authenticity and validity of content created by GenAI, transparency and accountability in its use are paramount. Enabling clear communication on the origins of AI-generated content through meta-labeling and maintaining records of model training and applications fosters trust and confidence in the content produced. Using citations and double-checking accuracy is key to building trust. 

Be vigilant about the data veracity of unstructured data 

GenAI’s success relies on protecting and ethically using unstructured data. Maintaining information governance and understanding your data is vital. Implementing information governance practices helps to ensure the reliable and accurate use of all data being used with GenAI. And, information governance provides a strategic framework for managing information at an organizational level, meeting compliance demands, achieving ethical goals, and ensuring transparency and accountability in privacy and security.  

Source:  RICOH USA