Skip to main content
Footer-logo Scripted-horizontal-dark
Solutions Down-arrow-black
Content Audit
Audit existing content to find gaps, improve quality, and prioritize updates
Brand Compliance
Keep content aligned with brand voice, style, and governance standards
SaaS
Drive SaaS growth with content built for activation, retention, and scale
Content Strategy
Build a focused content roadmap with clear priorities, channels, and goals
Content Governance
Set rules and workflows to keep content quality and messaging consistent
Social Media
Get social content tailored to each platform, audience, and campaign goal
AEO / GEO
Optimize content for AI answers and generative search discovery
Workflow Optimization
Streamline your content process to ship faster with fewer bottlenecks
eCommerce
Improve product visibility and conversions with ecommerce-focused content
SEO
Create search-optimized content strategies that increase rankings and traffic
Content Repurposing
Turn existing assets into new formats to extend reach and ROI
Performance Analytics
Measure content impact and optimize strategy using performance insights
AI
Use AI-assisted workflows to scale content production with consistency
ABM
Power account-based marketing with targeted content for key accounts
Workflow Integrations
Connect Scripted with your stack to simplify collaboration and delivery
Expert Content
Work with expert writers to produce trusted, high-quality brand content
PR
Support PR campaigns with clear messaging and publish-ready content
Publishing & Promotion
Publish and distribute content efficiently across channels that matter
Industries Down-arrow-black
Agriculture
Art & Design
Automotive
Building Materials
Cannabis
Career
Construction
Counseling
Customer Service
Dental
Education
Energy & Environment
Engineering
Fashion & Beauty
Family Practice
Food & Beverage
Gaming
Health & Wellness
Healthcare
Higher Education
Home & Garden
Human Resources
Injury Law
Interior Design
IT & Security
Insurance
Legal
Manufacturing
Media & Entertainment
Medical Law
Nutrition
Parenting
Payments
Personal Finance
Real Estate
Relationships
Retail & Ecommerce
Religion & Spirituality
Restaurant and Bar
SaaS
Sales
Senior Services
Software
Sports & Fitness
Technology
Transportation & Logistics
Travel
Customers Down-arrow-black
Enterprises
Scale enterprise content programs with governance, quality, and speed
Sales Teams
Create sales content that builds trust and moves prospects to close
Agencies
Deliver client content faster with flexible production and managed workflows
Content Marketers
Plan and produce high-performing content across the full funnel
Media Publishers
Expand editorial output with scalable, publication-ready content support
Brand Managers
Keep every asset aligned to brand voice, tone, and positioning
Small Businesses
Get affordable, high-impact content to grow awareness, leads, and revenue
Performance Marketers
Use conversion-focused content to improve campaign efficiency and ROI
Resources Down-arrow-black
Blog
Read actionable content marketing insights, trends, and practical guides
Blog Ideas
Generate fresh blog topics aligned to your audience and goals
Case Studies
See real customer outcomes and how teams achieved measurable results
Scripted Technology
Explore the platform technology powering quality, speed, and consistency
Competitors
Compare Scripted with alternatives across quality, workflow, and value
Products
Review Scripted products and choose the right fit for your team
Find Writers
Browse vetted writers by expertise, industry, and content format
Content Glossary
Look up key content marketing terms and concepts in plain language
Pricing
Sign In GET STARTED
Menu button
x Close Menu Solutions Down-arrow-black
Content Audit
Audit existing content to find gaps, improve quality, and prioritize updates
Brand Compliance
Keep content aligned with brand voice, style, and governance standards
SaaS
Drive SaaS growth with content built for activation, retention, and scale
Content Strategy
Build a focused content roadmap with clear priorities, channels, and goals
Content Governance
Set rules and workflows to keep content quality and messaging consistent
Social Media
Get social content tailored to each platform, audience, and campaign goal
AEO / GEO
Optimize content for AI answers and generative search discovery
Workflow Optimization
Streamline your content process to ship faster with fewer bottlenecks
eCommerce
Improve product visibility and conversions with ecommerce-focused content
SEO
Create search-optimized content strategies that increase rankings and traffic
Content Repurposing
Turn existing assets into new formats to extend reach and ROI
Performance Analytics
Measure content impact and optimize strategy using performance insights
AI
Use AI-assisted workflows to scale content production with consistency
ABM
Power account-based marketing with targeted content for key accounts
Workflow Integrations
Connect Scripted with your stack to simplify collaboration and delivery
Expert Content
Work with expert writers to produce trusted, high-quality brand content
PR
Support PR campaigns with clear messaging and publish-ready content
Publishing & Promotion
Publish and distribute content efficiently across channels that matter
Industries Down-arrow-black
Agriculture
Art & Design
Automotive
Building Materials
Cannabis
Career
Construction
Counseling
Customer Service
Dental
Education
Energy & Environment
Engineering
Fashion & Beauty
Family Practice
Food & Beverage
Gaming
Health & Wellness
Healthcare
Higher Education
Home & Garden
Human Resources
Injury Law
Interior Design
IT & Security
Insurance
Legal
Manufacturing
Media & Entertainment
Medical Law
Nutrition
Parenting
Payments
Personal Finance
Real Estate
Relationships
Retail & Ecommerce
Religion & Spirituality
Restaurant and Bar
SaaS
Sales
Senior Services
Software
Sports & Fitness
Technology
Transportation & Logistics
Travel
Customers Down-arrow-black
Enterprises
Scale enterprise content programs with governance, quality, and speed
Sales Teams
Create sales content that builds trust and moves prospects to close
Agencies
Deliver client content faster with flexible production and managed workflows
Content Marketers
Plan and produce high-performing content across the full funnel
Media Publishers
Expand editorial output with scalable, publication-ready content support
Brand Managers
Keep every asset aligned to brand voice, tone, and positioning
Small Businesses
Get affordable, high-impact content to grow awareness, leads, and revenue
Performance Marketers
Use conversion-focused content to improve campaign efficiency and ROI
Resources Down-arrow-black
Blog
Read actionable content marketing insights, trends, and practical guides
Blog Ideas
Generate fresh blog topics aligned to your audience and goals
Case Studies
See real customer outcomes and how teams achieved measurable results
Scripted Technology
Explore the platform technology powering quality, speed, and consistency
Competitors
Compare Scripted with alternatives across quality, workflow, and value
Products
Review Scripted products and choose the right fit for your team
Find Writers
Browse vetted writers by expertise, industry, and content format
Content Glossary
Look up key content marketing terms and concepts in plain language
Plans Sign In Get Started
  1. Blog Home
  2. Content Marketing
  3. Ryan Fauver
  4. Lexical Diversity: Improving Writing Through Technology

Lexical Diversity: Improving Writing Through Technology

Lexical Diversity: Improving Writing Through Technology
Published by Ryan Fauver on Saturday, December 13, 2014 in Content Marketing.
How the linguistic metric of lexical diversity can help improve writing.


While interning in the engineering department at Scripted.com, I had plenty of questions: "How does that work?" "What's the best way to do this?" "Why do we do it that way?" At some point, a different question entered my head: "Is there any way we can automatically and objectively determine the quality of a given piece of writing?"

Currently, Scripted relies on our team of freelance editors and our own in-house copy editors to pore over all of the writing that passes through our system, and they do a fantastic job. The problem, though, is that two humans rarely agree on something as subjective as writing quality, an issue known as inter-rater reliability. My time spent studying psychology made me certain that this kind of question has been considered before, and, after some preliminary research, I discovered that it has!

Measuring Quality



As it turns out, researchers have been working to develop more objective measurements of writing for more than half a century! Of course, the quality of writing can not be encapsulated in one number. Instead, researchers have constructed measures that attempt to capture specific elements of writing that are known to correlate with humans' ratings of quality. Some highly correlated measures include:

* Syntactic complexity (more complex writing is rated higher)
* Word frequency (documents with high use of uncommon words are rated higher)
* Text length (in general, longer documents are rated higher)
* Lexical diversity (writing with more varied and broad vocabulary is rated higher)

Software packages that compute these measures and more already exist (like Coh-Metrix, for example), but many are restricted to academic and research use only, or are too large to suit our needs. With that in mind, I decided to try out implementing some of these measures myself in Ruby. After some experimentation, I settled on implementing lexical diversity because it does not rely on any extreme natural language processing, and there were a number of established methods of measuring it.

Lexical Diversity Basics



As I mentioned before, a lexical diversity score is a measurement of the breadth and variety of the vocabulary used in a piece of writing. The most basic lexical diversity measurement is called type-token ratio, or TTR. Take this sentence:

The dog jumped over the other dog.

This sentence contains 5 "types" ("the," "dog," "jumped," "over," "other"), and 7 "tokens" (or total words). So the TTR for this sentence is 5/7 or 0.714.

Unfortunately, TTR has a major problem: it is highly sensitive to text length. The longer the document, the lower the chance that a new token will also be a new type, causing the TTR to drop as more words are added. Fortunately, several other lexical diversity measures have been created specifically to combat this issue.

MTLD, HD-D, and Yule's I



In the end, I implemented three separate lexical diversity measures: the Measure of Textual Lexical Diversity (MTLD), the Hypergeometric Distribution D (HD-D), and Yule's I.

* MTLD, as described by Philip McCarthy and Scott Jarvis (2010), uses the fact that TTR falls as more words are added and instead computes how many words it takes before the TTR falls below a given threshold.
* HD-D is McCarthy and Jarvis' (2007) improvement of vocd-D, another lexical diversity measure. HD-D uses probability to evaluate the contribution of each word in the text to the overall lexical diversity.
* Yule's I is the inverse of Yule's Characteristic K, which was first described by statistical pioneer G. U. Yule in his 1944 book The Statistical Study of Literary Vocabulary. Yule's I is a formula based on TTR, but specially designed to avoid the issue of text length.

It's conventional for research papers to go into extreme detail about their methodology, which was great for me, because I could easily follow along and translate the steps into Ruby. To view the actual code and a more in-depth explanation of the function of each measure, take a look at the GitHub repo.

Early Results



Once the measures were fully implemented, I ran them on several thousand existing documents in the Scripted database to see if they were working properly. The results were promising. All three measures produce classic bell curves, and each correlates highly with the others, which tells me that:

* The different measures are actually measuring something.
* They are all measuring the same thing.

Next, I wanted to see if these lexical diversity scores are in any way related to the quality of the writing. Every document in the Scripted system that has gone through the editing process already has a quality rating. So the natural next step was to run a regression analysis between those quality ratings and each of the lexical diversity scores. Despite my hopes, the results were not significant.

Even though I had lost confidence in using lexical diversity as a precise scoring tool, I found that the extreme ends of the scoring range are predictive, especially on the low end. When I focused on documents with remarkably low lexical diversity scores, I finally found a pattern; they were nearly all what I would consider below our standards for quality. Of course there are some false positives and there are certain to be some false negatives, but this is a great sign. We can use this knowledge to detect writing that does not meet our standards.

Lex-D



I soon got to work on implementing this "lexical filter." Instead of slapping this functionality onto Scripted directly, I decided to create a standalone service that computes a single lexical diversity score. Then, whenever a score is needed by the Scripted application, it can easily send a document to the service and get a score back. I call the service "Lex-D" (for lexical diversity).

The score that Lex-D computes is a combination of MTLD, HD-D, and Yule's I. Each score is calculated individually, scaled based on the means and standard deviations from the Scripted database, and finally averaged together to give a single score.

Lex-D is built in Ruby using the Sinatra framework. Sinatra is incredibly small and lightweight, and it suited my needs perfectly. I was able to get a first iteration up and running quickly. The code and an explanation of how to interface with Lex-D is available on GitHub.

Lexical Diversity and Scripted



At this point, Lex-D is fully functional (try it out!), but it is not hooked in to Scripted just yet. Soon, every applicable document that passes through Scripted will be scored for lexical diversity, and if the score is below a set threshold, the document will be flagged and sent to a human for review.

I have really enjoyed working at Scripted this summer, and I've learned so much. Hopefully my work with lexical diversity will help Scripted get closer to achieving their mission of improving writing on the Internet.

What do you think? Share your thoughts with us below.

More on Writing & Engineering:



How to Teach a Computer to Read

More in Content Marketing

Why Scripted Is a Top ClearVoice Alternative

Why Scripted Is a Top ClearVoice Alternative

If you’ve been following the industry chatter, you’ve likely heard the news about ClearVoice’s plans to shut down operat...

Ajq13uzhqlmhw0y83yuv

John Becker

April 22, 2026
Self-Optimizing Content Engine for Tech Innovators

Self-Optimizing Content Engine for Tech Innovators

In tech, the ground moves every week. A competitor ships a surprise feature. A cloud vendor changes pricing. A zero-day ...

Pbhlpel9ra2z6vq2kh91

William DeLong

April 2, 2026
Continuous Learning: When Your Content Optimizes Itself

Continuous Learning: When Your Content Optimizes Itself

The dirty secret of most content calendar strategies is that they’re just guesses dressed up as insights. You brainstorm...

7on9tu2vrpuvondcpaio

Mabh Savage

March 26, 2026
View All Posts
Customer and Writer Services
Become A Writer Freelance Writing Jobs

Customer Sign In Writer Sign In
Legal
Privacy Terms of Use Writer Services Agreement GDPR Trust
Follow Us




Scripted-horizontal-light
©2011-2026
Hubspot chat