What Counts As ‘Correct’ Data Is A Huge Generative AI ‘Dilemma:’ Expert

Kyle Alspach

While there’s a risk of tapping into unreliable data sources when using generative AI tools, it’s also unclear who should get to judge what is and isn’t reliable, says cybersecurity consultant Dasha Deckwerth.


Even as generative AI shakes up businesses both inside and outside the tech world, massive questions remain about the reliability of data sources used by tools such as ChatGPT—and about who gets to judge what counts as reliable data to begin with, cybersecurity expert Dasha Deckwerth said Tuesday.

Knowing the data sources used to train Large Language Models is crucial for reducing the risk associated with leveraging generative AI, Deckwerth told an audience of MSP executives during XChange Security 2023, which is hosted by CRN parent The Channel Company and being held this week in Dallas.

[Related: Generative AI Is Going Viral In Cybersecurity. Data Is The Key To Making It Useful.]

“Whatever tool is out there, make sure you understand who the vendor is, where the data is coming from—because one piece of wrong information can be a big loss for our clients or for our businesses,” said Deckwerth, founder and president of Stealth-ISS Group, a Tampa, Fla.-based cybersecurity consultancy.

At the same time, there’s no centralized determination being made about what counts as reliable or unreliable data for the purpose of training generative AI models. There’s no guarantee, Deckwerth noted, “that we actually get the correct data” with generative AI tools, which have exploded in popularity since the public availability of OpenAI’s ChatGPT app in late 2022.

The question of what data counts as “correct,” and who gets to make that kind of judgment, is a further complication.

In response to an audience member’s question about who should be the gatekeeper for data reliability—particularly in an open and free society—Deckwerth answered that this is clearly a “dilemma.”

With major tech industry players, for instance, it’s not clear they have the “right standards” for making these types of determinations, she said.

The National Institute of Standards and Technology (NIST) is looking at the issue, and overall a significant amount of research is underway on the issue of “who can control it,” Deckwerth added. But at this stage, it’s an open question about the best way to proceed on determining data reliability for generative AI, she said.

AI Decision-Making

For cybersecurity tools that leverage generative AI, the question of data reliability is a serious one, particularly as more decision-making is handed over to the AI engines, said Ray Ribble, CEO of Gardena, Calif.-based SPHER.

“If we let AI make all the decisions, it’s going to make bad decisions,” Ribble said. The possibility that this may happen in the future “scares the [heck] out of me,” he said.

And that’s not only due to the potential for unreliable data sources, but also because AI technology is generally not too good at anticipating things it’s never seen before, Ribble said.

“It’s that unexpected element that AI can’t anticipate—the anomaly,” he said. “And every time there’s a new [type of] instance, it has to learn it—at your cost.”

During her presentation at XChange Security, Deckwerth said this is in fact one of the many new risks associated with AI-based detection—showing a case where AI misidentified a photo of a panda as a car. The cause, she said, was the addition of some text embedded in the photo, which isn’t visible to humans “but makes a huge difference for AI.”

On the whole, generative AI is bringing benefits to both cyber defense—enabling analysts to be more productive, for example—and to attackers, who can craft phishing emails more effectively with the technology.

Generative AI is “changing how we’ve been doing security so far—quite significantly,” Deckwerth said. “And it’s going to change a lot more.”

Kyle Alspach

Kyle Alspach is a Senior Editor at CRN focused on cybersecurity. His coverage spans news, analysis and deep dives on the cybersecurity industry, with a focus on fast-growing segments such as cloud security, application security and identity security.  He can be reached at

Sponsored Post