How to Detect AI-written Content and Plagiarism

Luke Harsel

Aug 02, 20245 min read
How to Detect AI-written Content and Plagiarism
Share

TABLE OF CONTENTS

In its current state, pure AI content is far from ideal. 

If you’ve ever tried ChatGPT for writing, you’ll notice that AI written text tends to be more predictable and generic. 

Human writing can be more dynamic, with less predictable narration and a richer vocabulary. It’s also likely to have more typos than AI-written content. Human error, right? 

If you’re an editor, professor, or content marketer, it’s crucial to vet the content you review. 

How to Detect Plagiarism and AI-generated Content

So how do we check if something is written by AI? 

The obvious way is to let plagiarism checkers and AI detection software tools do the work. However, especially for AI, there’s some common “AI-ish” sign to look out for, like:

  • Incorrect and outdated information
  • Lack of depth and personality
  • Lack of anecdotes or personal stories
  • Repetitive language

Incorrect and Outdated Information

While at times generative AI may look and sound well-written and professional, it can also be misleading, irrelevant, or just plain wrong (AKA artificial hallucination). 

So, it’s always important to check how accurate the actual information is. Since most bots are trained on limited data sets (in time, form, or source), they may not have access to the latest and most complete information. 

Lack of Depth and Personality

Real people also have anecdotal stories, opinions, and perspectives. All of which inform how someone thinks and writes. 

Because AI tools don’t really write but generate text based on patterns in their training data, they don’t “understand” what they’re writing about in the same way humans do. This results in very superficial and shallow responses, a lack of critical thinking, and deep topic analysis. 

They also don’t have a personality, which is why most AI-generated texts lack a personal touch and can sound robotic and emotionless. 

In contrast to an AI tool, a journalist or copywriter can have real conversations with subject matter experts in the field they’re writing about. These kinds of conversations lead to deeper understandings, interesting stories, and relatable opinions in a way that is hard to replicate with AI. 

Repetitive Language 

Another common feature of AI is using the same words or phrases over and over again.

This may be the result of a specific keyword used in the prompt that an AI then repeats word for word. It can also lack context or just have limited and repetitive training information.

AI models are also designed to be cautious and neutral in general, which is why they may rely on more conservative language patterns, which can sometimes look repetitive.

Some have pointed out that ChatGPT often overuses cohesive and transitional devices in writing. These are words that tie together paragraphs like “furthermore, lastly, whereas, in conclusion, firstly, secondly, thirdly, etc). 

Keep in mind, no tools are perfect at this. Each tool might check for slightly different variables, meaning one AI checker could tell you a piece is completely AI written while another says it's only partly AI generated. 

So, it’s best to use multiple tools to verify whether or not a piece of text was generated by AI. 

The Best AI Content and Plagiarism Detection Tools for Content Marketers

The other way to spot AI language and plagiarism is through specialized tools. 

While no tool will be 100% accurate, they can be quite helpful for checking at speed and getting a baseline idea.

Copyleaks

AD_4nXcOalYjqXVbsksoXizQz5hJYkrAp-CDxPOunLO0bqvxyzIzO9uDV5AI9TP659fsDPpZK0EVAAdQDq1-gM2vzMltWftXTB6k9co0knW9ffMoFc7-PCEdi8ONGk0cZa3V6r0kGHibr8aT3UfTZBo73uTcneg?key=CcCqhTwmvYaleRq1Oj26fw
https://copyleaks.com/
​​​​

According to a recent study by researchers from Cornwell University, Copyleaks was validated with 99.1% accuracy and full model coverage that includes GPT-4 and Bard.

Founded in 2015, Copyleaks has millions of users, including top educational institutions and enterprise businesses. 

The basic (free) version of their AI detector is available directly through their website, with no sign-up needed. However, extended features, such as more supported languages, prioritized detection, and faster processing, are available via subscription. 

Its plagiarism detection comes as a standalone product and supports scanning regular text files, URLs, and source code for artificial writing and plagiarism, comparing texts, codes, and sites against one another, as well as extracting texts from images.

The pricing for scanning 100 Pages or 25,000 words starts at $10.99 per month.

Originality.ai

AD_4nXcgnUBQPNQ-GlGIWcGN5jnp7lT3Lx6YufeEmR7apgXikyBx_7-kZ6kp-LZaQV_rwIb0AK1saT9foXxYgLEFt_3kckW-m6Vy2tQdV-nIJe0BMFK8mOWxgO7ofE_HmNslAwJCW3qNxdRC5Ls9G8KaqsQsbnpO?key=CcCqhTwmvYaleRq1Oj26fw
https://originality.ai

Originality.ai also bills itself as “the most accurate Chat GPT, Bard, Paraphrasing, and GPT-4 AI checker,” claiming 99% AI content detection accuracy. The tool is specifically designed for content and SEO professionals who need to ensure that the content they publish is original and plagiarism-free. 

The tool doesn’t have a free or ad-supported version because it uses natural language processing techniques that require much more computing power. Unlike most AI content detection tools, Originality.ai also provides a full site scan as opposed to a single document scan and doesn’t have a character limit.

One of the most striking features of this tool is that it not only detects plagiarism and artificial writing but also paraphrased plagiarism, meaning it can tell if the content has been paraphrased or not.

The base subscription for Originality.ai starts at $14.95 per month and provides access to all of the tool’s features, including future ones. Limited access is available for a one-time payment of $30.

What Is AI Content Detection?

AI content detection is the process that combines machine learning and natural language processing techniques to figure out whether the text was written by a human or generated by AI.

Tools that use this process are called “AI content detectors” or “AI detectors” and are trained on large datasets of human- and machine-written content to identify patterns in each type of writing.

What Is Plagiarism Detection? And Is AI Content Considered Plagiarism?

Unlike AI detection, which is still relatively new and evolving, plagiarism detection has been around for a while.

Created in response to growing cases of plagiarism in the academic world, plagiarism-checking tools compare text against large databases of existing web content, as well as research papers, magazines, journals, and publications, to see if there are any matches between them.

Rather than looking for predictable patterns in words or sentence structure, as AI detection tools do, plagiarism checkers look for exact or sometimes imprecise matches in keywords, phrases, and entire sentences.

Most plagiarism checkers work in a similar way, but their results can vary depending on the databases they have access to.

Since the release of ChatGPT, plagiarism checking has become even more relevant. 

While AI-generated content may not technically be considered plagiarism because it doesn’t copy phrases or chunks of text word for word, it can paraphrase the content it’s been trained on. And in such cases, a plagiarism checker may as well mark this text as plagiarism.

How AI Content Detectors Work

AI content detectors work by analyzing two main characteristics of the text: perplexity and burstiness

Perplexity refers to how predictable a piece of text is to a language model. Burstiness refers to how repetitive or overemphasized certain words or phrases appear in a piece of text. 

The lower the perplexity, the more predictable the text, and the higher chance it was generated by an AI language model. A lower level of burstiness also means a higher likelihood of AI. 

In other words, you can detect AI writing in a piece of text by analyzing how predictable and uniform the sentences are.

Check AI Content with Confidence

As more and more companies add AI writing tools to their marketing toolkits, proofreading and checking for plagiarism and AI language in content created with these tools is becoming the norm.

While AI-generated content is getting better and more human-like, it still needs our attention to make sure the final draft is original, trustworthy, and has that personal touch that makes a story stand out.

Share
Author Photo
Content Lead at Semrush. Here to help you solve your everyday marketing challenges with Semrush‘s tools and apps.
More on this