Artificial intelligence (AI) is ubiquitous in modern society. While it may seem like a brand-new technology to many, computer scientists have been building and testing AI models—albeit simple ones initially—since the 1950s. You likely have been using AI in your daily life for much longer than you realize—it’s used in chatbots on websites, robotic vacuums that learn floorplans, digital assistants such as Amazon’s Alexa, and much more.
AI took a momentous step forward in 2022 when OpenAI released its generative AI model, Chat GPT, to the public. Generative artificial intelligence is computer-based intelligence that uses large amounts of data to create its own original content, such as text, images, music, audio, or video. Generative AI differs from traditional AI in one important way: It can learn from data and improve itself over time, while traditional AI cannot.
Having a publicly available generative AI model opens new doors and poses new risks. Now, anyone can ask an AI model to write an essay for a class, create an image that never existed in real life, and explain complex topics within seconds.
This progress raises all sorts of questions about how people can and should interact with AI. Researchers from the Massachusetts Institute of Technology are using data to answer these questions. This fall, they published the first large-scale meta-analysis to help us better understand when human-AI collaborations are useful or not. Their goal was to find when humans and AI work together most effectively.
The paper, published in the journal Nature Human Behavior, combined 106 experimental studies related to a wide variety of fields, such as healthcare, human resources, communications, and the arts. Researchers only included studies where a human and an AI system worked together to perform a task; each paper used a quantitative measure to report the performance of the human alone, the AI model alone, and the human and AI systems working together.
The researchers found some interesting results: On average, human-AI combinations performed better than a human working alone but worse than AI systems operating on their own. The analysis also suggests that measuring the performance of an AI system is more nuanced than expected.
The study was able to identify specific circumstances when human-AI partnerships were productive and when it was better for AI to complete tasks alone. For example, for creative tasks, such as summarizing social media posts, generating content or imagery, and answering questions in a chat, collaborations between AI and humans often performed better. That’s likely because generating content for a specific purpose requires some insight that humans typically have, but computers do not.
But for decision-making tasks, such as classifying fake content, forecasting demand, or diagnosing some medical cases, AI alone often performed better than human-AI teams. For example, in a task to detect fake reviews, AI working alone was accurate 73 percent of the time, human-AI teams were accurate 69 percent of the time, and humans alone were accurate 55 percent of the time.
But this wasn’t always true. For example, in a task to classify bird photographs, the AI alone was accurate 73 percent of the time, the human alone was accurate 81 percent of the time, and the human–AI system was accurate 90 percent of the time. In this case, the researchers hypothesize the synergy worked the best because the humans were good at deciding when to trust their own judgements versus those of the algorithms.
The take-home message: As generative AI is employed more broadly in modern society, researchers should consult data to understand the best ways to use this new technology. The current body of evidence suggests that humans should collaborate with AI models on creative tasks but that AI models perform better alone on tasks that require decision-making.