Boris Vassilev, data scientist at Scripted, sat down for a Scripted podcast to explore the role of the data scientist and how data science helps Scripted's writers and customers, and to reveal the content analytics projects he's been working on lately.
Vassilv worked in finance after finishing college -- with degrees in creative writing and physics -- but after a while he felt it wasn't working for him. He decided to come out to California and strike it rich with Scripted. He was hired on as a data scientist and has been with Scripted for three years.
During that time, he has helped build Scripted's ranking system for writers, helped present writers with job opportunities, and built up the Scripted Analytics project for customer insights into their blogs, which is his latest project.
What Is Data Science and Big Data?
Being a data scientist, it only seems natural that Vassilev should fill us all in on what data science actually means. Data science is the intersection between programming, statistics and domain expertise. You need domain expertise to design good experiments, you need programming to deal with large groups of data, and you need statistics to help you come up with decent workable conclusions.
Big data refers to data that is beyond the norm -- too many factors plus volumes that are simply too big. Actually, Vassilev said that technical problems can result when the data volume gets too big. A whole host of technologies are being developed to address these problems.
Now, let's dive into what Vassilev is doing to make life easier for Scripted's clients.
About the Scripted Analytics Project
Scripted analytics began as an internal system, but it has expanded to a forward-facing, customer-oriented product. The Scripted team could see from Google Analytics which posts were getting the most hits, but they had difficulty figuring out on a day-to-day or week-to-week basis where the majority of their readership was originating. Enter Scripted analytics.
The tool reviews your blog from the last year and separates posts into three sections, based on people visiting your website today because of a post that was published in the last 3 months, 3 to 12 months and more than 12 months ago.
Vassilev says you want the majority of readership to be introduced to your site through the discovery of engaging evergreen content. You also want a third of your readership to originate through a new and exciting post, which will keep people interested, coming back and hopefully converting.
The analytics product automates this entire system and can provide deep insights into which blog posts are performing best.
Vassilev personally finds the stream graph to be the coolest feature, which is a primary graph with the three key data point bands. It gives you a clear view into how your blog is performing, and it allows you to see when the most popular spikes are and where to focus your efforts. From there, you can focus more on quality evergreen content or new content that engages and excites users.
Vassilev said the biggest challenge was making the Scripted analytics product robust enough to successfully navigate the many types of websites. He noted that people tend to have unique and imaginative ways to structure their websites and blogs, which is why Scripted's product needed to be smart enough to differentiate between a website's blog and the "about us" page.
Vassilev noted that soon Scripted's content analytics product will launch a new feature. The tool will crawl through your blog, collect and condense the text in conjunction with useristics, and reveal your most popular blog posts.
Not only that, but then the tool reaches into the Scripted system to tell you which writers are best suited for writing on the topics performing well on your blog. It will also help you identify which Scripted topic pitches already exist that match your blog's needs.
This new feature will help businesses that aren't sure what they're good at or where to go by providing them with content analytics on a platter, all at the click of a button. You will also have strong data to ensure this content will perform well on your blog.
Another feature in the works will effectively measure the rate of decay for your content. This rate occurs when your blog is left unattended, and even the best blog posts will get less readership over time.
Scripted can measure how much content you need to ensure your blog doesn't decay and that you continue to build readership. From there, you can understand exactly how much content you need to order, based on the personal needs of your blog.
How does Scripted's star data scientist spend his mornings? Half the time Vassilev likes to sleep in, but the other half he likes to burn off some nervous energy by climbing with friends. What is Vassilev nervous about? His never-ending quest to be the best data scientist at Scripted (even if he is the only one).
In the evenings Vassilev enjoys movies and making his boss and girlfriend nervous by driving around on his motorcycle. If he could choose anywhere to go, he would probably take his rod to someplace like Mongolia for that endless steppe feeling and semi-idea of adventure. Apparently, behind every data scientist is a secret Genghis Khan waiting to break out.