Generative AI provides an incredible opportunity to increase efficiency, productivity, and even creativity. However, the rise of AI detection tools has many nervous about taking advantage of AI writers.
So we wanted to thoroughly test if it is safe to use AI writers and what you can do to make sure your AI content is loved by your clients, colleagues, and Google.
So we created a case study where we:
- Generated content across 20 diverse topics in the most popular AI writers: ChatGPT, Jasper, Article Forge, Copy.ai, and Writesonic.
- Ran each piece of content through WordAi’s Avoid AI Detection feature.
- Created a human control dataset by taking the article ranking first on Google for each topic (Google clearly thinks this content is high quality).
- Ran all the content through the most popular AI content detectors: Originality.ai, OpenAI, Hugging Face detector, Writer AI detector.
You can read more about our methodology and download the articles we tested here.
What the Scores for Each Category Mean
- Originality.ai: The average probability that the content is "Original" (written by a human).
- Hugging Face Detector: The average probability that the content is "Real" (not generated by AI).
- Writer AI Detector: The average probability that the content was written by a human.
- OpenAI Detector: The percent of articles that the OpenAI did not think were AI-generated.
- A note about GPTZero: We tested the human content dataset in GPTZero, but it incorrectly classified 85% of the articles as "having parts written by AI". Therefore, even though the content rewritten by WordAi scored similarly to the human control content, GPTZero is too inaccurate to be used as an AI content detector.
- In general, AI content detectors can consistently identify AI generated text. Using content that is easily detected as AI generated can harm your rankings and can be grounds for clients or colleagues to reject your writing.
- WordAi makes AI generated text pass as human-written in all AI content detectors. So, as long as you use WordAi, you can feel confident that your AI generated content will not only read naturally to humans, it will pass any AI content detection tools.
- Not even high quality human-written content passes all AI content detectors as human 100% of the time. Because of this, it is safe to say that your content should have consistently high probabilities of being human-written, but not necessarily score 100% every time.
How to Make Your Content Pass AI Detectors With WordAi
All you need to do is enter your AI generated text into the WordAi Avoid AI Detection interface, click a button, and in a matter of seconds, you will get back humanized text that will pass AI content detection.
WordAi also offers this ability via API so you can integrate avoiding AI detection directly in your current workflows.
You can test this functionality completely free with WordAi’s 3-day free trial.
Some see AI detection as justification to reject or penalize content, regardless of how useful the content itself is.
But the results of this study show that WordAi can make any AI content indistinguishable from content written by actual humans.
So as long as your content is useful to start with, you can rest easy knowing that with WordAi, your AI generated content will read naturally to humans and is safe from AI detectors.
We chose topics ranging from "Why George Orwell is an Incredibly Influential Author" to "the best beaches in Florida" to test different types of writing. Below, you can find more information about how we created each content dataset, the actual content we tested before and after using WordAi, and the AI content detectors we used.
Ranking Articles Human Control
Every AI detection tool generates different results so we needed a control group of high quality human content to set a baseline. So, we used articles that rank first on Google because Google clearly considers them high quality. We also determined that these articles were high quality and human-written.
Note: We did not run any of these human-written articles through WordAi.
Click here to download the human content we tested.
We created a 750 word article for each topic and used the instructions field in a few cases to provide context and create more diverse articles. Most importantly, we had the "Avoid AI Detection" setting turned OFF for consistency among other AI writers. Otherwise, all default settings were used.
Before running Article Forge content through WordAi’s Avoid AI Detection feature, it was detected as AI generated in most tools. After using WordAi, Article Forge content passed all AI content detection tools as human-written.
Note: Turning the "Avoid AI Detection" setting ON in Article Forge provides the same functionality as using WordAi, it just does that step automatically.
Click here to download the Article Forge content we tested.
We used the Blog Post Workflow and the Freeform template to create more diverse content around each topic. Otherwise, the default settings were used.
Before using WordAi, Jasper content performed better than average among AI writers. However, the content could still be detected as AI generated in many cases. But after using WordAi, Jasper’s content was completely undetectable as AI generated.
Click here to download the Jasper articles we tested.
We asked ChatGPT to write an article or explain each of the topics to encourage multiple writing styles.
Without WordAi, content written by ChatGPT is easily detected as AI generated. But after using WordAi, the content passes as human-written in all AI detectors.
Click here to download the ChatGPT articles we tested.
We used the Blog Post Wizard and Freestyle to create the articles in this dataset. Otherwise, the default settings were used.
While Copy.ai content almost passed some of the AI content detectors before using WordAi, the content passed every AI detector as human-written after using WordAi.
Click here to download the Copy.ai content we tested.
We used the AI Article Writer 3.0 and the AI Article Writer 4.0 to generate articles for each topic. We used the Premium quality setting and left all other default settings.
Before using WordAi, Writesonic content had the lowest probability of being human-written compared to all other AI writers. But after running Writesonic content through WordAi, it passed every AI detection tool.
Click here to download the Writesonic content we tested.
From our testing, Originality.ai seemed to produce the most consistent results among all the AI detectors. It gives its results as the probability that the content is "Original" or "AI".
On their site, Originality.ai explains that this score is the probability that the content is Human vs AI generated and does not mean some percent of the article is human and the other is AI.
Originality.ai also explains that you should not rely only on the results of their AI content detector, you do not need every article to get a 100% human score, and that the tool can sometimes incorrectly classify content:
Hugging Face Detector
This AI content detector has been around for a long time compared to other detectors and was originally built by OpenAI to detect GPT-2 content. However, it is now being used to detect AI from multiple models beyond GPT-2 and even GPT-3.
Writer AI Detector
Writer's AI detector makes it easy for users to understand the level of "human-written" content they should aim for with color coding and tips. One limitation of this tool is that it only accepts 1,500 characters at a time, but in most cases, that is enough text to generate a score that aligns with other AI content detectors.
OpenAI recently released a new AI content detector in the wake of releasing ChatGPT. Unlike their previous AI detector that is available on Hugging Face, this detector gives its results in a natural language format. The possible results are: "likely AI-generated" "possibly AI-generated" "unclear if it is AI-generated" "unlikely AI-generated" and "very unlikely AI-generated".
While generally a useful AI detector, it can be more inconsistent than other detectors. In fact, OpenAI dedicated a decent portion of their announcement blog post to the limitations of this detector:
Originally, we planned to include GPTZero in our case study. However, only 15% of the human control articles we tested "passed" GPTZero as human-written. Therefore, as mentioned above, we did not include the results of our GPTZero tests because we determined the tool too inaccurate to be useful.
GPTZero does mention their accuracy limitations on their site and explains that they are working to improve the tool. So, we will continue to monitor GPTZero for accuracy updates, but we would not recommend relying on this AI detector at this time.
If you have any questions about the study or would like us to test any other AI writers or AI content detectors, let us know in the comments!