Recreating Research with ChatGPT

Recreating Research with ChatGPT

Introduction

In this post, I will be recreating some research I had done as part of my PhD. I will be using the ChatGPT model from OpenAI. I will be using the model to generate responses to a set of questions. I will paste the questions and the responses below to give you a clear sense of what I am doing.

After the questions and responses, I will include modifications to the suggested code to get this working. (which also could have been done as subsequent q&a with the model).

Questions and Responses

Question 1

Q: How do I create a graph of income over the life course?

A: To create a graph of income over the life course, you will need to gather data on a person’s income at different points in their life. This could include their income as a teenager, in their 20s and 30s, and so on. Once you have the data, you can use a spreadsheet program like Microsoft Excel or Google Sheets to create a line graph that shows the income data over time. To create the line graph, follow these steps:

  1. Open a spreadsheet program and enter the data for the person’s income at each stage of their life. For example, you might have a column for “Age” and a column for “Income”.

  2. Select the data you want to include in the line graph, including the labels for the x-axis (age) and y-axis (income).

  3. Click the “Insert” tab and then click “Line Graph” or “Line Chart” to create the graph. The program will generate a line graph that shows the income data over time.

  4. You can customize the appearance of the line graph by changing the colors, fonts, and other design elements. You can also add a title to the graph and labels for the x- and y-axes.

Once you have created the line graph, you can use it to see how a person’s income changes over the course of their life. This can be a useful tool for understanding the factors that affect income and for making decisions about career and financial planning.

Question 2

Q: Can you show me a graph of income over the life course?

A: I’m sorry, but I am a text-based AI and I do not have the ability to generate or display graphs. I can only provide information and answer questions to the best of my ability based on the text input I receive. Is there something else I can help you with?

Question 3

Q: Can you show me how to make a graph of income over the life course in javascript?

A: To create a graph of income over the life course in javascript, you will need to use a javascript library like D3.js or Chart.js. These libraries provide tools and functions that you can use to create and customize a line graph of income over time.

Here is an example of how you might use D3.js to create a line graph of income over the life course:

  1. First, you will need to include the D3.js library in your HTML code. You can do this by adding a <script> tag that points to the D3.js library, like this:

    <script src="https://d3js.org/d3.v5.min.js"></script>
  2. Next, you will need to define the data that you want to include in the line graph. This data should include the age and income for each point in time that you want to include in the graph. For example, you might define the data like this:

    var data = [
      { age: 10, income: 1000 },
      { age: 20, income: 2000 },
      { age: 30, income: 3000 },
      { age: 40, income: 4000 },
      { age: 50, income: 5000 }
    ]
  3. Once you have defined the data, you can use D3.js to create the line graph. To do this, you will need to specify the dimensions of the graph, the data to include, and the scales for the x- and y-axes. For example, you might create the line graph like this:

    // Set the dimensions of the graph
    var margin = { top: 20, right: 20, bottom: 30, left: 50 },
      width = 960 - margin.left - margin.right,
      height = 500 - margin.top - margin.bottom
    
    // Set the scales for the x- and y-axes
    var x = d3.scaleLinear().range([0, width])
    var y = d3.scaleLinear().range([height, 0])
    
    // Define the x- and y-axes
    var xAxis = d3.axisBottom(x)
    var yAxis = d3.axisLeft(y)
    
    // Create the SVG element that will contain the line graph
    var svg = d3
      .select('body')
      .append('svg')
      .attr('width', width + margin.left + margin.right)
      .attr('height', height + margin.top + margin.bottom)
      .append('g')
      .attr('transform', 'translate(' + margin.left + ',' + margin.top + ')')
    
    // Set the domain for the x- and y-scales
    x.domain(
      d3.extent(data, function (d) {
        return d.age
      })
    )
    y.domain(
      d3.extent(data, function (d) {
        return d.income
      })
    )
    
    // Add the x-axis to the SVG element
    svg
      .append('g')
      .attr('class', 'x axis')
      .attr('transform', 'translate(0,' + height + ')')
      .call(xAxis)
    
    // Add the y-axis to the SVG element
    svg.append('g').attr('class', 'y axis').call(yAxis)

Question 4

Q: What are some household surveys I could use to collect this data?

A: Household surveys are a common method for collecting data on a variety of topics, including income. Some examples of household surveys that you could use to collect data on income over the life course include the Current Population Survey (CPS), the American Community Survey (ACS), and the Panel Study of Income Dynamics (PSID).

The Current Population Survey (CPS) is a monthly survey of households conducted by the U.S. Census Bureau and the Bureau of Labor Statistics. The CPS collects data on a wide range of topics, including employment, income, and demographic characteristics.

The American Community Survey (ACS) is an ongoing survey conducted by the U.S. Census Bureau. The ACS collects data on a range of topics, including income, education, and housing.

The Panel Study of Income Dynamics (PSID) is a long-term study that follows a sample of households over time. The PSID collects data on income, wealth, and other economic and demographic characteristics.

These surveys can provide valuable data on income over the life course, but they may not be the best option for your specific needs. It’s always a good idea to research different surveys and choose the one that best fits your research goals and objectives.

Question 5

Q: How would I create a chart of income over the life course using PSID data with R?

A: To create a chart of income over the life course using PSID data with R, you will need to do the following:

  1. Download the PSID data from the PSID website. This will typically be provided as a .zip file that contains the data in a CSV format.

  2. Use R to import the data from the CSV file into a data frame. This can be done using the read.csv() function, which allows you to read in the data from a CSV file and store it in a data frame. For example, you might use code like this:

    # Import the PSID data from the CSV file
    psid <- read.csv("psid.csv")
  3. Use R to filter the data to include only the variables that you want to include in the chart. For example, if you want to create a chart of income over the life course, you might filter the data to include only the age and income variables. This can be done using the filter() function from the dplyr package, like this:

    # Filter the data to include only the age and income variables
    psid <- filter(psid, age, income)
  4. Use R to create the chart. There are many different charting libraries available for R, so you can choose the one that best fits your needs. For example, if you want to create a line chart, you could use the ggplot2 library. To create the chart, you would first need to specify the data to include in the chart and the aesthetics, or the visual properties of the chart. You would then use the geom_line() function to add a line to the chart. For example, you might use code like this:

    # Load the ggplot2 library
    library(ggplot2)
    
    # Create the chart
    ggplot(psid, aes(x = age, y = income)) +
      geom_line()
  5. Finally, use R to customize the appearance of the chart. This could include changing the colors, fonts, and other design elements. You can also add a title and labels to the chart. For example, you might use code like this:

    # Customize the chart
    ggplot(psid, aes(x = age, y = income)) +
      geom_line(color = "blue") +
      labs(title = "Income Over the Life Course",
           x = "Age",
           y = "Income")

Once you have created and customized the chart, you can use it to visualize the income data from the PSID over the life course. This can be a useful tool for understanding the factors that affect income and for making decisions about career and financial planning.

Question 6

Q: How would I cluster those income paths using R?

A: To cluster those income paths using R, you will need to do the following:

  1. Download the PSID data from the PSID website and import it into R as a data frame, as I described in my previous answer.

  2. Use R to filter the data to include only the variables that you want to use for clustering. For example, you might filter the data to include only the age and income variables.

  3. Use the kmeans() function from the stats package to cluster the data. This function takes the data frame as input, along with the number of clusters that you want to create. For example, you might use code like this:

    # Load the stats package
    library(stats)
    
    # Cluster the data using the kmeans() function
    clusters <- kmeans(psid, 3)
  4. Use the ggplot2 library to create a chart that shows the clusters. This can be done by adding the clusters$cluster variable to the aesthetics of the chart. You can also add a legend to the chart to show which cluster each line belongs to. For example, you might use code like this:

    # Load the ggplot2 library
    library(ggplot2)
    
    # Create the chart
    ggplot(psid, aes(x = age, y = income, color = clusters$cluster)) +
      geom_line() +
      labs(title = "Income Over the Life Course (Clustered)",
           x = "Age",
           y = "Income") +
      scale_color_discrete(name = "Cluster")

This will create a chart that shows the income paths clustered into three groups. You can adjust the number of clusters to see how the data is grouped differently.

Once you have created the chart, you can use it to explore the different clusters of income paths and see how they differ from each other. This can be a useful tool for understanding the factors that affect income and for making decisions about career and financial planning.

Conclusion

As I was writing this post using Visual Studio Code (VS Code) , I was able to use the R extension to run the code in the R terminal. This made it easy to test the code and see the results. I also used the R extension to format the code and add comments. This made it easy to read and understand the code. The above highlighted text was autocompleted by GitHub Copilot which uses GPT-3 to provide often very good autocomplete suggestions.

This is a fundamentally different way of writing content/code. It is possible now to have your thoughts completed and occasionally taken in a different direction by the AI. I think this is a very interesting development and I'm curious to see how it will evolve.

Just as the ubiquity of search engines (Google, Bing, etc.) has changed the way we think (i.e., we don’t need to remember things as much as we used to), the ready access to this new frontier in language models is a fundamental shift in how content is created. As an author having thoughts autocompleted by a language model, the suggestions are often not what I initially intended, but they are what I could see another human suggesting to me as a possible completion. This opens a new world of us being able to directly collaborate with a mashup of all the content absorbed in the creation of these models.

Explainability in AI will be important in checking the biases of these models and enabling more insight into how it is affecting us as we use it. Writing with AI completions feels more like sculpting than writing. You are still able to create thoughts that are unique to you, but your ideas get mashed up with those that are presented to you as suggestions. When a good suggestion shows up and you accept it or modify it, will you always keep track of that? Without clear tracking of the AI suggestions vs your edits, it will be difficult to know what is your original. If the autocompletion essentially worded your thought correctly before you had the chance to write it, what was the difference between manually typing it and hitting tab to accept it?

This feels like a major performance enhancer for writers. Which will present an interesting situation where much/most of the content consumed in the future will have been at least partially written by AI.