---
title: 'If your future ChatGPT or AI response has atrocious grammar and bad puns, blame me'
date: '2023-04-20T10:21:59-07:00'
type: post
word_count: 212
char_count: 1342
tokens: 276
categories:
  - Random
tags:
  - ai
  - 'artificial intelligence'
  - c4
  - chatgpt
  - Google
  - 'silicon florist'
---

# If your future ChatGPT or AI response has atrocious grammar and bad puns, blame me

[The Washington Post recently posted a story about sites that have been used to train ChatGPT and other AI language models for chatbots](https://www.washingtonpost.com/technology/interactive/2023/ai-chatbot-learning/). Included in the post is the ability to look up your site to see if it’s been used. And I’m sorry to report, that Silicon Florist has. You’re welcome…?

> A web crawl may sound like a copy of the entire internet, but it’s just a snapshot, capturing content from a sampling of webpages at a particular moment in time. C4 began as a scrape performed in April 2019by the nonprofit CommonCrawl, a popular resource for AI models. CommonCrawl told The Post that it tries to prioritize the most important and reputable sites, but does not try to avoid licensed or copyrighted content.

[![](https://i0.wp.com/siliconflorist.com/wp-content/uploads/2023/04/Screenshot-2023-04-20-at-10.18.17-AM.png?resize=676%2C463&ssl=1)](https://www.washingtonpost.com/technology/interactive/2023/ai-chatbot-learning/)

To see if your site is in the Google C4 dataset, visit “[Inside the secret list of websites that make AI like ChatGPT sound smart](https://www.washingtonpost.com/technology/interactive/2023/ai-chatbot-learning/)” from the The Washington Post.
