At a Glance

In this guide, you will learn about the features and uses of ChatGPT’s Advanced Data Analysis (formerly Code Interpreter) function.

What is ChatGPT’s Advanced Data Analysis?

Advanced Data Analysis is a feature within ChatGPT’s GPT-4 that allows users to upload data directly to ChatGPT to write and test code. It is only available to premium (paid) accounts. This feature lets you run code directly on ChatGPT, significantly increasing both the use cases and accuracy of the output produced by the model. This feature is perfect for users looking to explore data, create code, and solve empirical problems with the assistance of AI tools.

In the video below, MIT Sloan PhD student Chuck Downing will show how to enable and access Advanced Data Analysis within your ChatGPT account. The video then covers some common use cases of Advanced Data Analysis including reading and describing data, cleaning your dataset, visualizing your data, running regressions, and saving your work from Advanced Data Analysis to your local device, as well as some things to look out for when working with this technology.

Update: ChatGPT Plus subscribers can now access Advanced Data Analysis in a standard chat window by default (without specifically enabling the tool). However, the example use cases for Advanced Data Analysis that you’ll see in this video have not changed.

To view or download the dataset used in the video, go to The World Bank: CO2 emissions (metric tons per capita).

Any content shared with publicly available AI tools should NOT include any non-public data such as sensitive information (e.g., social security numbers, credit card information, or hiring materials) and personally identifiable information to comply with MIT’s Policies & Procedures and the Family Educational Rights and Privacy Act of 1974 (FERPA). To learn more, see Navigating Data Privacy.

What can Advanced Data Analysis do?

Advanced Data Analysis supports multiple file formats, including text and image files, full documents such as PDFs, code or other data files, as well as audio and video. The performance of the Advanced Data Analysis feature varies depending on the file type, but it is specifically designed for data files such as .csv and .txt. Currently, Advanced Data Analysis does its work using Python, but it still uses the underlying ChatGPT model which can understand and decipher other programming languages. Because of this, it can effectively convert code between programming languages or understand files in languages other than Python.

To demonstrate some potential uses of Advanced Data Analysis, this guide will go through a simple example using the World Bank’s carbon emissions dataset, which contains the yearly CO2 emissions (metric tons per capita) for each country from 1990-2020. Other examples and use cases appear in the video at the beginning of this article. You can find and download the dataset here: The World Bank: CO2 emissions (metric tons per capita).

Example: Reading, Cleaning, and Manipulating Data

Our dataset currently contains one row for each country and one column for each year of available emissions data. In this example, we will read in the World Bank data, clean it to remove years with all null values, and then transform the dataset into a panel dataset.

You can explore the example chat conversation in the screenshots below or see the entire conversation in this accessible PDF: ChatGPT Advanced Data Analysis Demo.

To get started once you’ve activated Advanced Data Analysis, upload the file you want to work with by clicking the + button next to the Send a message box:

Screenshot of the Send a message box with the + button highlighted

Once the data is uploaded, we can ask the tool to read in the data, describe it, clean it for null values, and transform it into a panel dataset. As you can see in the video above and the screenshot here, with just a simple prompt, Advanced Data Analysis was able to produce all the steps we requested.

Screenshot of the prompt and the response

It is important to continuously check the accuracy of the output you receive. While advanced, this technology does still make errors. Asking for explanations or descriptions from the software while working is a great way to force it to check its own work.

Additionally, for those curious about the code underlying the output you receive, you can select the show work button on the chat. This allows you to view the commented code documenting the individual steps Advanced Data Analysis took to solve the problem.

Once we are satisfied, we can copy the code, ask for Advanced Data Analysis to provide us with a downloadable program file, or ask for a download of our newly cleaned and transformed dataset. For example, if we wanted to download a csv file of the new dataset, we can ask the following: “Provide a downloadable version of this newly transformed csv file.”

Screenshot of the GPT-4's link to download your file

Advanced Data Analysis has provided us with a clickable link to download this file. Upon clicking the link, the file will be downloaded to your local device, where you can view or use it outside of the ChatGPT system.

There are many more potential uses of the Advanced Data Analysis feature, including data visualization, regressions and other quantitative analysis, and work with other file types. Many of these features are covered in the tutorial video.

Conclusion

We encourage faculty members teaching classes with data-driven assignments to explore the uses of Advanced Data Analysis. With the rapid rise of this new technology, many assignments and tasks that formerly took several hours can be done in minutes. Understanding and adapting to these advances and AI is an important way to continue learning and growing so students can get the most out of their classwork.