When AI Gets It Wrong: Addressing AI Hallucinations and Bias

Home » AI Resource Hub » AI Basics » When AI Gets It Wrong: Addressing AI Hallucinations and Bias

SKIP AHEAD TO

At a Glance

Generative AI has the potential to transform higher education—but it’s not without its pitfalls. These technology tools can generate content that’s skewed or misleading (Generative AI Working Group, n.d.; Cano et al., 2023). They’ve been shown to produce images and text that perpetuate biases related to gender, race (Nicoletti & Bass, 2023), political affiliation (Heikkilä, 2023), and more. As generative AI becomes further ingrained into higher education, it’s important to be intentional about how we navigate its complexities.

Biased Content

Problems with bias in AI systems predate generative AI tools. For example, in the Gender Shades project, Buolamwini (2017) tested AI-based commercial gender classification systems and found significant disparities in accuracy across different genders and skin types. These systems performed better on male and lighter-skinned faces than others. The largest disparity was found in darker-skinned females, where error rates were notably high.

Generative AI tools present similar problems. For example, a 2023 analysis of more than 5,000 images created with the generative AI tool Stable Diffusion found that it simultaneously amplifies both gender and racial stereotypes (Nicoletti & Bass, 2023). These generative AI biases can have real-world consequences. For instance, adding biased generative AI to “virtual sketch artist” software used by police departments could “put already over-targeted populations at an even increased risk of harm ranging from physical injury to unlawful imprisonment” (Luccioni et al., 2023). There’s also a risk that the veneer of objectivity that comes with technology tools could make people less willing to acknowledge the problem of biased outputs (Nicoletti & Bass, 2023). These issues aren’t unique to image generators, either; researchers and users have found that text generators like ChatGPT may also produce harmful and biased content (Germain, 2023).

Inaccurate Content

Generative AI tools also carry the potential for otherwise misleading outputs. AI tools like ChatGPT, Copilot, and Gemini have been found to provide users with fabricated data that appears authentic. These inaccuracies are so common that they’ve earned their own moniker; we refer to them as “hallucinations” (Generative AI Working Group, n.d.).

For an example of how AI hallucinations can play out in the real world, consider the legal case of Mata v. Avianca. In this case, a New York attorney representing a client’s injury claim relied on ChatGPT to conduct his legal research. The federal judge overseeing the suit noted that the opinion contained internal citations and quotes that were nonexistent. Not only did the chatbot make them up, it even stipulated they were available in major legal databases (Weiser, 2023).

As we integrate AI into teaching and learning, it’s important to be wary of its limitations.

Why is AI Flawed?

Generative AI systems can produce inaccurate and biased content for several reasons:

Training Data Sources: Generative AI models are trained on vast amounts of internet data. This data, while rich in information, contains both accurate and inaccurate content, as well as societal and cultural biases. Since these models mimic patterns in their training data without discerning truth, they can reproduce any falsehoods or biases present in that data (Weise & Metz, 2023).
Limitations of Generative Models: Generative AI models function like advanced autocomplete tools: They’re designed to predict the next word or sequence based on observed patterns. Their goal is to generate plausible content, not to verify its truth. That means any accuracy in their outputs is often coincidental. As a result, they might produce content that sounds reasonable but is inaccurate (O’Brien, 2023).
Inherent Challenges in AI Design: The technology behind generative AI tools isn’t designed to differentiate between what’s true and what’s not true. Even if generative AI models were trained solely on accurate data, their generative nature would mean they could still produce new, potentially inaccurate content by combining patterns in unexpected ways (Weise & Metz, 2023).

In short, the “hallucinations” and biases in generative AI outputs result from the nature of their training data, the tools’ design focus on pattern-based content generation, and the inherent limitations of AI technology. Acknowledging and addressing these challenges will be essential as generative AI systems become more integrated into decision-making processes across various sectors.

Navigate AI’s Pitfalls

Consider these strategies to help mitigate generative AI tools’ issues with hallucination and bias.

Critically Evaluate AI Outputs: Unlike humans, AI systems do not have the ability to think or form beliefs. They operate algorithmically based on their training data, without any inherent capacity for reasoning or reflection. Given this context, users must approach AI outputs with a critical eye and evaluate them with human judgement (Silberg & Manyika, 2019).
Diversify Your Sources: Always double check the accuracy of AI-generated content. This could mean consulting with experts or cross-referencing with peer-reviewed publications that you access through the MIT Libraries.
Use Retrieval-Based Tools: Some generative AI tools are built with Retrieval-Augmented Generation (RAG) architectures. This means they’ll retrieve relevant information from trusted sources—such as your syllabus, research article, or case PDF—before generating output. Research has shown that RAG improves both factual accuracy and user trust in AI-generated answers (Li et al., 2024).
Use Clear and Structured Prompts: The quality of AI output is closely tied to how specific your input is. Vague prompts often lead to vague—or even inaccurate—answers. You can reduce this risk by setting clear expectations and giving the model a structure to follow. For example, prompting the AI to explain its reasoning step-by-step can expose logical gaps or unsupported claims. This technique, known as Chain-of-Thought Prompting, has been shown to improve transparency and accuracy in complex tasks (Wei et al., 2022).
Adjust the Tool’s Temperature: Temperature is a setting that controls how random or creative the model’s responses are. In tools that allow you to adjust it, using a low temperature (e.g., 0–0.3) produces more focused, consistent, and factual outputs—especially for well-defined prompts. A higher temperature (e.g., 0.7–1.0) encourages more varied and imaginative responses, making it better suited for open-ended tasks like brainstorming or storytelling.

Conclusion

Generative AI offers great potential to improve how we teach, research, and operate. However, it’s essential to remember that AI tools can produce falsehoods and amplify harmful biases. While AI is a powerful tool, the human touch remains crucial. By working together, we can make the most of what AI offers while mitigating its known limitations.

References

Buolamwini, J. (2017). Gender shades: Intersectional phenotypic and demographic evaluation of face datasets and gender classifiers. DSpace@MIT. https://dspace.mit.edu/handle/1721.1/114068

Cano, Y. M., Venuti, F., & Martinez, R. H. (2023). ChatGPT and AI text generators: Should academia adapt or resist? Harvard Business Publishing. https://hbsp.harvard.edu/inspiring-minds/chatgpt-and-ai-text-generators-should-academia-adapt-or-resist

Generative AI Working Group. (n.d.) How can we counteract generative AI’s hallucinations? Digital, Data, and Design Institute at Harvard. https://d3.harvard.edu/how-can-we-counteract-generative-ais-hallucinations

Germain, T. (2023, April 13). ‘They’re all so dirty and smelly:’ study unlocks ChatGPT’s inner racist. Gizmodo. https://gizmodo.com/chatgpt-ai-openai-study-frees-chat-gpt-inner-racist-1850333646

Heikkilä, M. (2023, August 8). AI language models are rife with different political biases. MIT Technology Review. https://www.technologyreview.com/2023/08/07/1077324/ai-language-models-are-rife-with-political-biases

Li, J., Yuan, Y., & Zhang, Z. (2024). Enhancing LLM factual accuracy with RAG to counter hallucinations: A case study on domain-specific queries in private knowledge-bases. arXiv. https://arxiv.org/abs/2403.10446

Nicoletti, L., & Bass, D. (2023, June 14). Humans are biased. Generative AI is even worse. Bloomberg Technology + Equality. https://www.bloomberg.com/graphics/2023-generative-ai-bias

O’Brien, M. (2023, August 1.) Chatbots sometimes make things up. Is AI’s hallucination problem fixable? AP News. https://apnews.com/article/artificial-intelligence-hallucination-chatbots-chatgpt-falsehoods-ac4672c5b06e6f91050aa46ee731bcf4

Silberg, J., & Manyika, J. (2019, June 6). Tackling bias in artificial intelligence (and in humans). McKinsey & Company. https://www.mckinsey.com/featured-insights/artificial-intelligence/tackling-bias-in-artificial-intelligence-and-in-humans

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., et al. (2022). Chain of thought prompting elicits reasoning in large language models. arXiv. https://arxiv.org/abs/2201.11903

Weise, K., & Metz, C. (2023, May 1). When A.I. chatbots hallucinate. The New York Times. https://www.nytimes.com/2023/05/01/business/ai-chatbots-hallucination.html

Weiser, B. (2023, May 27). Here’s what happens when your lawyer uses ChatGPT. The New York Times. https://www.nytimes.com/2023/05/27/nyregion/avianca-airline-lawsuit-chatgpt.html

At a Glance

Biased Content

Inaccurate Content

Why is AI Flawed?

Navigate AI’s Pitfalls

Conclusion

References

AI Basics

About Us

Live Trainings

Request Forms

Self-Paced Courses

Quick Start Guides

Teaching Spaces

Recent Blog Posts