Skip to main content
Footer-logo Scripted-horizontal-dark
Solutions Down-arrow-black
Content Audit
Audit existing content to find gaps, improve quality, and prioritize updates
Brand Compliance
Keep content aligned with brand voice, style, and governance standards
SaaS
Drive SaaS growth with content built for activation, retention, and scale
Content Strategy
Build a focused content roadmap with clear priorities, channels, and goals
Content Governance
Set rules and workflows to keep content quality and messaging consistent
Social Media
Get social content tailored to each platform, audience, and campaign goal
AEO / GEO
Optimize content for AI answers and generative search discovery
Workflow Optimization
Streamline your content process to ship faster with fewer bottlenecks
eCommerce
Improve product visibility and conversions with ecommerce-focused content
SEO
Create search-optimized content strategies that increase rankings and traffic
Content Repurposing
Turn existing assets into new formats to extend reach and ROI
Performance Analytics
Measure content impact and optimize strategy using performance insights
AI
Use AI-assisted workflows to scale content production with consistency
ABM
Power account-based marketing with targeted content for key accounts
Workflow Integrations
Connect Scripted with your stack to simplify collaboration and delivery
Expert Content
Work with expert writers to produce trusted, high-quality brand content
PR
Support PR campaigns with clear messaging and publish-ready content
Publishing & Promotion
Publish and distribute content efficiently across channels that matter
Industries Down-arrow-black
Agriculture
Art & Design
Automotive
Building Materials
Cannabis
Career
Construction
Counseling
Customer Service
Dental
Education
Energy & Environment
Engineering
Fashion & Beauty
Family Practice
Food & Beverage
Gaming
Health & Wellness
Healthcare
Higher Education
Home & Garden
Human Resources
Injury Law
Interior Design
IT & Security
Insurance
Legal
Manufacturing
Media & Entertainment
Medical Law
Nutrition
Parenting
Payments
Personal Finance
Real Estate
Relationships
Retail & Ecommerce
Religion & Spirituality
Restaurant and Bar
SaaS
Sales
Senior Services
Software
Sports & Fitness
Technology
Transportation & Logistics
Travel
Customers Down-arrow-black
Enterprises
Scale enterprise content programs with governance, quality, and speed
Sales Teams
Create sales content that builds trust and moves prospects to close
Agencies
Deliver client content faster with flexible production and managed workflows
Content Marketers
Plan and produce high-performing content across the full funnel
Media Publishers
Expand editorial output with scalable, publication-ready content support
Brand Managers
Keep every asset aligned to brand voice, tone, and positioning
Small Businesses
Get affordable, high-impact content to grow awareness, leads, and revenue
Performance Marketers
Use conversion-focused content to improve campaign efficiency and ROI
Resources Down-arrow-black
Blog
Read actionable content marketing insights, trends, and practical guides
Blog Ideas
Generate fresh blog topics aligned to your audience and goals
Case Studies
See real customer outcomes and how teams achieved measurable results
Scripted Technology
Explore the platform technology powering quality, speed, and consistency
Competitors
Compare Scripted with alternatives across quality, workflow, and value
Products
Review Scripted products and choose the right fit for your team
Find Writers
Browse vetted writers by expertise, industry, and content format
Content Glossary
Look up key content marketing terms and concepts in plain language
Pricing
Sign In GET STARTED
Menu button
x Close Menu Solutions Down-arrow-black
Content Audit
Audit existing content to find gaps, improve quality, and prioritize updates
Brand Compliance
Keep content aligned with brand voice, style, and governance standards
SaaS
Drive SaaS growth with content built for activation, retention, and scale
Content Strategy
Build a focused content roadmap with clear priorities, channels, and goals
Content Governance
Set rules and workflows to keep content quality and messaging consistent
Social Media
Get social content tailored to each platform, audience, and campaign goal
AEO / GEO
Optimize content for AI answers and generative search discovery
Workflow Optimization
Streamline your content process to ship faster with fewer bottlenecks
eCommerce
Improve product visibility and conversions with ecommerce-focused content
SEO
Create search-optimized content strategies that increase rankings and traffic
Content Repurposing
Turn existing assets into new formats to extend reach and ROI
Performance Analytics
Measure content impact and optimize strategy using performance insights
AI
Use AI-assisted workflows to scale content production with consistency
ABM
Power account-based marketing with targeted content for key accounts
Workflow Integrations
Connect Scripted with your stack to simplify collaboration and delivery
Expert Content
Work with expert writers to produce trusted, high-quality brand content
PR
Support PR campaigns with clear messaging and publish-ready content
Publishing & Promotion
Publish and distribute content efficiently across channels that matter
Industries Down-arrow-black
Agriculture
Art & Design
Automotive
Building Materials
Cannabis
Career
Construction
Counseling
Customer Service
Dental
Education
Energy & Environment
Engineering
Fashion & Beauty
Family Practice
Food & Beverage
Gaming
Health & Wellness
Healthcare
Higher Education
Home & Garden
Human Resources
Injury Law
Interior Design
IT & Security
Insurance
Legal
Manufacturing
Media & Entertainment
Medical Law
Nutrition
Parenting
Payments
Personal Finance
Real Estate
Relationships
Retail & Ecommerce
Religion & Spirituality
Restaurant and Bar
SaaS
Sales
Senior Services
Software
Sports & Fitness
Technology
Transportation & Logistics
Travel
Customers Down-arrow-black
Enterprises
Scale enterprise content programs with governance, quality, and speed
Sales Teams
Create sales content that builds trust and moves prospects to close
Agencies
Deliver client content faster with flexible production and managed workflows
Content Marketers
Plan and produce high-performing content across the full funnel
Media Publishers
Expand editorial output with scalable, publication-ready content support
Brand Managers
Keep every asset aligned to brand voice, tone, and positioning
Small Businesses
Get affordable, high-impact content to grow awareness, leads, and revenue
Performance Marketers
Use conversion-focused content to improve campaign efficiency and ROI
Resources Down-arrow-black
Blog
Read actionable content marketing insights, trends, and practical guides
Blog Ideas
Generate fresh blog topics aligned to your audience and goals
Case Studies
See real customer outcomes and how teams achieved measurable results
Scripted Technology
Explore the platform technology powering quality, speed, and consistency
Competitors
Compare Scripted with alternatives across quality, workflow, and value
Products
Review Scripted products and choose the right fit for your team
Find Writers
Browse vetted writers by expertise, industry, and content format
Content Glossary
Look up key content marketing terms and concepts in plain language
Plans Sign In Get Started
  1. Blog Home
  2. Scripted Writers
  3. A Text Is A Text Is A Text

A Text is a Text is a Text

Published by Scripted Writers on Friday, July 20, 2012 in Featured, Staff.

I have a bachelor's degree in Literature, and at college I spent most of my time thinking, writing, and talking about texts. About what a text is says, what a text means, and what a text is. When I decided to make the switch from studying literature to Computer Science a year and a half ago, I would have never thought I would be fortunate enough to still be thinking about those very topics.

Here at Scripted, I work on automatically grouping similar texts or documents together---in Machine Learning terms, I'm talking about document clustering and classification. There are a variety of ways to accomplish this, and the 'best' algorithm for this goal isn't always a clear-cut choice, as it can be largely dependent on the reasons you're trying to group documents together.

One preprocessing requirement most of these algorithms have in common is the need to reduce a document to a 'bag of words'. That is, these algorithms aren't concerned with word order. They just look at how many times each word appears in a certain text, or we can take the bag of words model and then produce a sometimes more telling representation like tf*idf vectors. To get an idea of what this looks like, here are the top terms of this very blog post in bag of words form (term frequency or tf) and tf*idf form (term frequency multiplied by inverse document frequency) when considered against the entire Scripted corpus:

Term Frequency: tf*idf:











text
document
literature
computer
algorithm
grouping
vector
time
term
clustering
15
10
5
5
5
4
4
4
4
6
grouping
vector
literature
corpus
classifying
startling
arc
grouped
extracting
algorithm
0.7349142790721426
0.4409485674432856
0.2755928546520534
0.2204742837216428
0.1837285697680356
0.1377964273260267
0.1102371418608214
0.0918642848840178
0.0787408156148724
0.06263473969364851


Using representations like these can be incredibly effective for grouping documents together and, when clustering, they can be very useful for discovering some central topics the documents in a corpus are about. For example, here are few of the topics businesses have hired our writers to write about:










heart
blood
artery
pressure
high
vessel
beer
ale
brewing
flavor
hop
yeast
data
cloud
quality
information
storage
system
beach
resort
luxury
island
hotel
vacation


This type of grouping is useful to us as a kind of finger on the pulse of our writers, so that we can see who writes about what in more detail. But one thing about the bag of words model bothers me, perhaps irrationally so--it's a simplification that changes the very form of a document. Once converted, a text is no longer a text, but a vector. It's a simplification that's necessary for this type of grouping, but both the literature student and the computer scientist in me are saddened every time I convert a document to a vector. The literature student part of me is saddened because I know I am losing important features of the text like tone, style, narrative arc, and meaning. The computer scientist part of me is saddened for the exact same reason--because these features are ultimately information. Indeed, they are almost the entire reason we care about these documents at all.

This simplification is necessary, at least for the time being, because computers are still ineffective at extracting and handling these features. That is to say: computers are not good readers (and even worse writers). Advancements are always being made, but as of now it's necessary to break a text down into a bag of words so a clustering or classifying algorithm can process it. And I can't deny that there is something pleasantly surprising about the fact that the simplification of a text into a vector can be incredibly useful. My literature student's inclination is to resist anything but a holistic approach when analysing a text, but it's always startling to see how the hyper-specialized nature of an algorithm can produce fascinating results.
Customer and Writer Services
Become A Writer Freelance Writing Jobs

Customer Sign In Writer Sign In
Legal
Privacy Terms of Use Writer Services Agreement GDPR Trust
Follow Us




Scripted-horizontal-light
©2011-2026
Hubspot chat