Introducing Traackr One: The Next Step in Influencer Communication

April 30th, 2012 by derek

Today we are very excited to open yet another chapter in Traackr’s history with the launch of Traackr One – the web’s first marketplace for highly relevant lists of top, topical influencers. For us at Traackr, this is a major step toward our goal of redefining the web around the people that matter most…and one that will surely push the “online influencer” market to another level.

WHAT IS TRAACKR ONE?

Traackr One is the beginning of a true marketplace for lists of top influencers across all imaginable topics. You can think of Traackr One as an ‘iTunes‘ for influencer lists and it’s our goal to make it the go-to destination for top online influencers in a wide variety of topics. Looking for the top 50 influencers in hybrid car technology?  We’ve got a list for that. How about  the top US Campaign Finance Reform influencers?  There’s a list for that as well.  Baseball Sabermetrics influencers?  Yup, got that too.

As a user, you can easily subscribe to any A-List at Traackr One.  When you do, you not only have access to the list of Top 50 influencers in that topic, their full online footprints, & relevant content…but also to other Traackr A-List platform features such as our deeper Analytics suite as well as the ability to create custom Influencer Monitors (so you can track mentions of your brand, your competitors or any other important topic by the people on your list).
Every A-List in the Traackr One marketplace is created and powered by Traackr’s unique PeopleSearch technology and Relevance engine. This means influencer lists found at Traackr One are far superior to anything else on the market. 

WHY TRAACKR ONE?

Traackr One represents a big step for us at Traackr.  For the past 3 years, we have been working very hard building the most robust online influencer platform for bigger PR and social marketing pros.  And we have been very successful at this (if I don’t say so myself).  We have built a robust solution that is being used by some of the world’s biggest agencies and brands.

But over this time, we have had an enormous amount of interest from agencies and brands who understand the value of influential people to their work, but without the budget flexibility for our existing product. And this is exactly why we built Traackr One. In fact, it’s what I like most about Traackr One – it was built purely out of customer demand.  We felt strongly that it was time to give a much wider audience access to Traackr’s unique technology.

HOW DOES TRAACKR ONE WORK?

Traackr One is pretty self-explanatory.  It’s a self-serve marketplace.  Simply head over to www.traackrone.com and start searching for the lists you need. All A-Lists are organized by categories which you can search from the home page. You can also search with an open search box to find topics in which your are interested. Once you find a list in which you are interested, you can subscribe very easily with your credit card.

When subscribing to a list, you can do so on a monthly basis, or you can get access for 12-months at a heavily discounted rate. Once you subscribe, you will have access to the all the features described earlier.

WHAT ABOUT TRAACKR’S ENTERPRISE BUSINESS?

Have no fear, the release of Traackr One does not affect our dedication to our enterprise product in any way.  Traackr enterprise is still a must-have tool for big agencies and brands who need a more robust solution to truly capture their ideal influencers – and have the budgets to support it. We have big features in the works for our enterprise platform, and will continue to bring new and awesome capabilities to our existing and future users.

Thank you for continuing to support Traackr’s growth and expansion.  As always, please let us know if you have any questions or comments (you can reach us at support [at] traackr [dot] com).

DS

Share

Value for you and your company: Traackr Certification

April 10th, 2012 by courtney

With all the buzz around influencer strategy, the most important question that seems to be missing from the conversation is how? How do I really take influencer strategy from an idea and turn it into successful engagement; how can I apply it to my daily tasks?

Come and get the answers during the Traackr Certification Course in San Francisco on April 25.

The certification course provides an entire day of learning ways to maximize your use with the tool AND become an official certified Traackr expert, all while you’re surrounded by fellow forward-thinking communications professionals. Additionally, co-host Shonali Burke, PR extraordinaire and MSL Washington’s VP of Digital, is very excited to join in on the conversations. Elevating the Traackr certification course beyond just using the tool, Shonali offers well-rounded experience in discovering, executing and measuring influencer campaigns.

Shonali is not only a PR professional, but also an influencer in her own right. Among one of the top influencers on our PR 2.0 A-list, she frequently speaks at conferences and workshops and shares her thoughts and lessons on life and PR on her blog, Waxing UnLyrical. Best of all, Shonali is also a Traackr user herself, so she can offer a bigger picture view into how the tool worked for her and ways to successfully integrate Traackr into your campaigns.

What can I expect?!

If you’re like me, you appreciate knowing what the day is going to look like. So, here’s a little peek – the day is going to start off with some breakfast and mingling with fellow PR pros to get your creative juices flowing. Then we’ll get right into Traackr and put to rest any questions you’ve ever had on scoring and our technology. Throughout the day there are going to be a ton of opportunities to pick up best practices from the Traackr experts, Shonali and your fellow industry pros. We’ll have exercises, interactive discussions and a quiz to make your certification super official. To view the agenda, you can check it out here.

Intrigued?! There’s still time to sign up! Click here to register.

Questions? Reach out to your Account Manager or contact us at info@traackr.com.

Hope to see you there!

Share

Waking the Sleeping Giant

April 8th, 2012 by pierreloic

A great post by Oscar Del Santo, The Day That Influence Became The New Online Currency, reminded me I had meant to write this post for some time now.

Oscar is talking about the emerging standard of influence and that though there’s still much debate on what the standard is, how to measure influence, and on undeniable black spots in current solutions, there’s no debate that influence has imposed itself as the new online currency.

As one of the early players in the space (want to hear us speak about clout before Klout came out?), we have seen this industry go a very very long way already experimenting, figuring out repeatable patterns of success and investing more and more money into influence communication. That said, even though the field of influence is still in its infancy, budgets for it and ‘earned media’ in general are still only a fraction of the money invested in more traditional media.

This may all be about to change… a few months ago, I attended a P&G Alum talk in New York with Bob McDonald, CEO of Procter & Gamble. Bob is a very traditional P&G Sr Exec, who grew through the ranks of the company from assistant brand manager to CEO. I was stunned at the time to see that Bob decided to spend most of his time with us talking about the online social phenomenon created by the Old Spice ad campaign and that more awareness was generated for the brand from online spoofs and repurposed content by users than the ad itself, and how this was changing the role of the brand manager at P&G.

Procter & Gamble is the mecca for marketing, advertisement and media buy: pouring the most money and spending a tremendous amount of energy tracking ROI. I thought at the time that P&G’s CEO saying  that brand managers should shift their focus to earned media could be an industry shaking piece of news.

Two month after I attended this event, the other shoe dropped: P&G announced that they will be cutting $1 billion from their marketing budget by 2016 and doing so with a renewed focus on digital media (earned and paid).

If we’re looking for signs of change in the industry, this news is as big as it gets. P&G’s direct impact on the ad industry (being the largest spender) and their tremendous influence on brands and agencies ought to be a waking sign to all communication professionals that the marketing industry is about to undergo a major shakedown. Those already involved in influence communication are on the frontline of this revolution. My advice? If you were walking, run, and don’t look back. The further along you are in this new promising space, the more likely you are to grow with the market.

Share

Brian Solis’ “The Rise of Digital Influence” – a bit too shallow?

March 23rd, 2012 by derek

Yesterday, Brian Solis from the Altimeter Group released a report on “The Rise of Digital Influence.”  You can see Brian’s post about it here and Techcrunch’s post here.

The timing is definitely right for this type of report and I think Brian does a nice job of summarizing the landscape at a high level.  Here are a few of the great points from Brian’s study:

  • I really like his focus and understanding of “potential influence” vs “influence” (Brian refers to people’s “capacity to influence“ a couple times in his report).  We talk about this all the time at Traackr.  Understanding POTENTIAL influence is absolutely what we do.  Our thoughts on the topic can be found in this post.
  • I also like Brian’s emphasis on the limitation of an influence “number” (he says, “Brands cannot afford to make marketing or engagement decisions based on scores alone“).  Again, another thing we have espoused since we launched Traackr.  A number does not represent actionable data.  In fact, we would argue it actually leads to the wrong actions.  More of our thoughts on this can be found here and here.
  • His thoughts on engagement are also right on the money (“Thinking about the behavior or outcomes you wish to cause, it’s time to work backwards to find the right people and then develop a mutually beneficial engagement program.“).  Totally agree.
  • And, of course, Brian’s suggested methodology for determining influence – combination of Reach, Resonance & Relevance, certainly rings a bell around our offices.  :)

An Unfortunately Shallow Look at Relevance

However, where Brian gets lazy in this report is in his categorization of “Relevance.”  To us, this is a major oversight – one that is a real diservice to the audience for this report (those professionals looking to create real business impact through social influence work).

Because above all else, RELEVANCE matters most.  Our firm belief (and one that is reiterated by our clients on a daily basis) is that when it comes to someone’s potential influence, RELEVANCE is the most actionable and important piece of data available.  It holds the key to someone’s potential influence; opinion; his/her willingness to talk to you; as well as the most effective way to engage with him/her.  Ultimately, one’s Relevance is the key to understanding WHO he/she is.

So, to say that all methods of calculating/determining someone’s Relevance on the social web are equal (as this report does) shows a very shallow understanding – not only of the technology behind these apps/products – but of their ultimate effectiveness.  To say that determining someone’s relevance to a topic by manually ‘tagging’ them with one of eight incredibly broad topics is the same as reindexing the web’s content around people and  tracking, indexing, relating, searching, scoring, hundreds of millions of posts in order determine someone’s influence based on complex keyword strings, is just wrong.  It’s shallow and it’s wrong.

Calling these methodologies equal is like comparing a road atlas with Google Maps. Related?  Sure.  But hardly the same thing…

I react to this oversight so strongly because Relevance is core to what we do here at Traackr.  And we know it’s what is most important to all of our clients.  The more shallow the methodology for determining Relevance in the context of a person’s influence, the more shallow the results are AND, most importantly, the more shallow your efforts will be.

Let’s be frank here.  Measuring and calculating a relatively accurate Reach & Resonance score on the social web, while not easy, isn’t the hard part.  It’s all about the Relevance.

Besides that….#propz to Brian for a nice, necessary piece.

DS

 

Share

Kony 2012: The East African Perspective

March 20th, 2012 by Paul

Rosebell Kagumire: A journalist and blogger from Uganda, whose voice has influenced the media's coverage of the Kony 2012 campaign.

In the last week, the Internet erupted in conversation about Kony 2012, a film created by Invisible Children to promote their “Stop Kony” campaign. It is an effort which aims to make war criminal Joseph Kony known by the global community with the hopes of leading to his arrest. While the idea that an organization wants to stop a war criminal sounds like a just and worthy cause, the film itself sparked much controversy, as people from all corners of the globe reacted in drastically different ways to IC’s call for justice. The conversation which started as a result of the Kony 2012 film, marks the complexities of the issues in the region the film speaks of. While many around the world showed support of the campaign, many others responded with statements that Kony 2012 was an oversimplification of the events being portrayed.

In addition, others out there have been carefully studying the pulse of this conversation sparked by Invisible Children.   The folks at Social Flow have provided their analysis from the inception of the campaign, IC’s strategy of targeting key individuals, via “Invisible Networks” over a broad geographic range, and have mapped out the conversation here.

But amidst the global conversations, we wanted to take a look at what the people from the region were saying. Who are the leaders in the conversation who are actually from the areas affected, and what opinions are they sharing with the rest of the world?  We ran a search for influencers in East Africa who are speaking about the topics in this global dialogue, and this is what we learned:

We see  yet another case of technology becoming a resounding megaphone into the global ear. A video by Rosebell Kagumire has been linked to and commented on by major publications and networks as one of the leading critics to the Kony 2012 campaign. Her words have impacted public opinion, and others join her in this quest to start digging deeper into the issues raised by the film, not only for this particular issue with Joseph Kony and the LRA, but in general how “developed nations” deal with issues pertaining to countries in the developing world.  Meanwhile, others like Norbert Mao, the highest elected official in Gulu, Uganda, the region that was first affected by LRA mobilization many years ago, has shown support for Invisible Children and Kony 2012.

Taking a look at the influencer list, we see many more people who are critical of Invisible Children’s efforts, than those who are supporting it. Does it mean that this list reflects that the majority of people in the region are against the Kony 2012 movement?  Not necessarily.  Those with access to technology are the ones who have the advantage of having their voices heard.  Unless a mobile phone or computer is put in the hands of every person in the region, we don’t necessarily get a complete picture of the entire population’s response to Kony 2012.

Regardless, it is a fact that one must have the means to broadcast to get your voice heard, and it is becoming increasingly more possible to steer public opinion if you have an audience, a compelling piece of content, and a strategy for getting your message out there. From one single video, we have seen a firestorm spark on the web and immediate, passionate responses from key individuals who have continued to give shape to the dialogue surrounding the issues at hand. Issues that were never discussed or known throughout the world until now.

To view the Kony 2012 A-list, just go to http://lists.traackr.com/kony2012.

Share

“Traackr allows us to get back on offense…”

March 19th, 2012 by derek

A client of ours recently told us:

“When I use general monitoring tools, I feel like I’m constantly on the defensive.  But when I use Traackr, I feel like I am playing offense again.”  

I thought this was a really interesting, insightful statement.  And I think it provides an important hint at the future for social media work.

 

To date, MONITORING has been at the heart of any work within social media.  It is a key task – maybe THE key task for most people working on the marketing/PR side of social.  There are two real reasons for this:

1.  THE VOLUME OF SOCIAL MEDIA DATA IS STAGGERING

Part of the issue is the sheer volume of content created and shared on social media.  It absolutely dwarfs the media activity of the pre-SM world (yes, young Traackr employees, there was a time when Facebook didn’t exist).   I remember when a HUGE weekly clip book contained a couple hundred articles.  Today, 10,000 social media mentions would be a slow week for some of the bigger brands (for some, this would be a slow DAY).  So, naturally, the amount of time and effort needed to monitor the vast activity on the social web is much greater than that which was needed in the pre-SM world.  At the same time…

2.  UNTIL RECENTLY, SM TOOLS HAVE FOCUSED EXCLUSIVELY ON MONITORING

The first wave of tools to address this social media market were built to solve this first, basic need – MONITORING the content of the social web.  The tools were essentially built as ‘Clipping Services 2.0.’ built to help PR/Marketers more efficiently access mentions of their products/brands across the social web.  Fair enough.  And not for nothing – these companies also did a very nice job of inciting fear among their potential clients  - “Missing a single post can destroy your business!” they SHOUTED, leading communication pros to believe that they had to track/pay attention to every, single piece of content on the web.

These two factors, together, helped to put MONITORING at the center of social media work.

The problem is that MONITORING only tells you about the things that have happened in the past.  It gives you great  access to YESTERDAY’S news, but the value of yesterday’s news is fairly limited.  The only thing you can really do with yesterday’s news is to…react to it.  Which puts you on defense.

So….like our client from earlier said, the more time you spend MONITORING (and with general monitoring tools), the more time you will spend playing defense.

IN PR, YOU WANT TO BE ON THE OFFENSIVE

I know what you’re going to say…What’s wrong with defense?  Doesn’t defense win championships??  Maybe in the NFL, but not in PR.  In PR, you want to be on the offensive.

This has been true since the beginning of PR.  In fact, one would argue that this is why PR was created.  PR was created to give you full control of your (or your client’s) message.  What it is, how it’s said, who hears it and when…basically everything.  In PR, the goal is to always be on the offensive.

WHAT’S THE SOLUTION?

So…the issue remains.  If MONITORING social media is the key activity for any social media initiative, but this activity forces you to spend most of your time on defense….how do you get back on offense?

In my humble (and completely biased) opinion, the key is stepping away from general monitoring (not completely, per say, but definitely a few steps away) and finding a solution that delivers the useful, actionable data.  To us at Traackr, the most important and most actionable piece of data you can derive from social media is WHO.  Information about people – the right people – is incredibly actionable.  With this information, you can do what our clients are doing every day – creating the right messaging & engagement strategies, engaging, inviting, collaborating, tracking, measuring, reporting, and succeeding.

In short — general social monitoring gives you access to YESTERDAY’s news.  Knowing the right people offers you the opportunity to become TOMORROW’s news.

Getting back on Offense.  Because, that’s what wins championships in this game  :)

DS

 

Share

Welcoming Traackr’s New Era

March 13th, 2012 by courtney

We are very excited by the recent additions in our client roster, and the speed at which these top-tier agencies are starting to sign on. It seems like the idea of strategic influencer communications is finally starting to click, and we are thrilled to continue welcoming among the most forward thinking and respected agencies out there, including Hill & Knowlton Strategies, Ketchum, WCG, Cohn & Wolfe and Eastwick.

Yes, this is business for us, and yes, it may feel like we are bragging a bit by talking about adding these presitgious firms onto our client roster. And maybe we are. Ok, we are. But for me personally, it’s not about the bragging rights – it’s simply about sharing the satisfaction of seeing firsthand the undeniable proof that what we are doing is changing business as we know it and the companies that are giving it a chance are seeing amazing growth and opportunities from it.

I have seen our clients work on some truly amazing campaigns, ones that would never have been possible had they not let go of their traditional thinking a bit and started thinking miles away from the box. Case in point: our good friends over at Eastwick realized the data that Traackr was telling them was different than what their client and previous knowledge was telling them. They responded to their client’s wishes, as any good agency would, but once they started responding to the data Traackr had given to them, they were able to achieve success far beyond what they ever could have hoped for their client. They stood up for what they were doing and what they believed was right – even though it was all new and perhaps against what they had always practiced.

In a recent release we just put out, the CEO of Eastwick, Barbara Bates, put it pretty eloquently, “Most PR professionals look at influencers as pitch targets. By taking a strategic approach to working with influencers, we help clients develop much richer relationships – and results. Traackr has really been invaluable in helping us and our clients benefit from the changing influence dynamics. It uncovers the kind of authentic influence that’s really driving changes in business.”

We have a very dedicated, intelligent and driven team that has, and continues to, work tirelessly on essentially re-indexing the web to easily allow our users to put in a set of keywords and return a set of people. If you really think about it, nowhere else on the web can anyone do this, but our clients do it on Traackr’s platform everyday – and that’s pretty awesome. And it may sound so simple, yet it is so complex it makes my head hurt every time I hear our engineering team talk all of their scripting, coding, and big data talk.

So, whether you find this a little self-indulgent or just celebratory, nevertheless, I hope it makes you want to raise a glass and say “cheers” to the future of the industry and Traackr and join us on this ride because if you don’t, you may be missing out on some truly great opportunities for your business and your clients. Cheers!

Share

Tapping Into the Value of Influencer Data

February 15th, 2012 by pierreloic

Traackr was one of the very early market-makers in the influencer space and, yet, we never gave into the hype of the influencer conversation, promising a silver bullet that was going to solve influencer communication for communication professionals. We actually never understood those who did, whether vendors, agencies or brands, because by making themselves or others believe that this was easy, they tremendously hurt their chances of succeeding (who’d want to run a marathon without being told??!).

We have always said that finding, analyzing, monitoring and engaging influencers is HARD WORK. It can yield amazing results as many of our clients have proven time and time again, but it takes effort, craft, and, yes, the right toolset. We have never sold Traackr as the Holy Grail to our clients; our promise has been to be the best platform on the market to accompany their efforts towards high-yield high-return communication programs with influencers.

Another critical element of our approach from very early on was that we never believed that there would ever be “one social media platform to rule them all.” Instead of trying to build a platform that did a little bit of everything from social media monitoring, sentiment analysis, influencer search, to social CRM, etc., we focused on our craft, influencer search, and made it easy for our partners and customers to integrate our influencer data with a solution that catered to their business.

This approach has paid off in more ways than we could have hoped. For one thing, it gave us a sense of focus that enabled us to leap ahead of anyone else on our core expertise of influencer discovery (to my knowledge, we have come ahead in every single due diligence exercise conducted by prospective customers). But maybe even more importantly, this approach has enabled our partners and customers to be creative in ways we never expected. They wanted to exploit the value of the data generated by Traackr and built synergetic ensembles of data, revealing new pockets of value that they identify and build on from.

In this context, last week Engage121, a leading social media platform with a focus on point-of-sale, has announced their integration with Traackr. We have never focused specifically on use cases for our search technology for retail but we now have a partner who is bringing all the pieces together for that market. In all likelihood, they will be able to unleash value from our data for this market in better ways than we could. I’m sure that the partnership and their input will in turn help us build smarter influencer searches.

In the months to come, don’t be surprised as you see more announcements like this one coming out of Traackr, as we’re expanding our reach and leveraging the expertise of business partners to make Traackr a part of many new uses cases on how to create value from influencer and expert search. So stay tuned…

Share

The 3 “Buzziest” Super Bowl Brands

February 9th, 2012 by steven

Last week we rolled out our Super Bowl XLVI A-list, using keywords to find targeted influencers in the pre-game conversation around the Super Bowl. This week, we took that data a step further and ran a monitor with every brand that advertised in the Super Bowl. In order to find who was generating the most buzz among the top 25 influencers, and whether or not that buzz was positive or negative for each individual brand, we input each of the brands that advertised as monitor keywords and manually marked the sentiment on posts that were returned. Please note these are only for the night of the Super Bowl and only from the influencers on our Super Bowl XLVI A-list. I shared some insights on the top 3 mentioned brands the night of the game, take a look.

1. GoDaddy

Godaddy.com proved once again that sex sells with their two “racy” (or as racy as can be shown on TV) Super Bowl commercials. Or rather, they proved that sex at least gets people talking. While they were able to lead the way with mentions among the 25 influencers, not a single one was positive and only two were neutral. Among their negative comments were comments such as “Just go away, Go Daddy,” “I hate GoDaddy,” and “At this point, are parents more worried about GoDaddy’s use of semi-scandalous nudity or its terrible trolling?” In fact, even a comment I labeled as neutral took a shot at GoDaddy by calling their branding “silly” before mentioning “their domain service is good though.”

So in general, the sentiment was that GoDaddy may have good domain service, but that their commercials are unnecessarily over the top. But this doesn’t mean, in my opinion, that GoDaddy would be better off with a duller commercial. After all, you have to remember what they’re selling. They’re selling domain service. That’s pretty dull to begin with. The sentiment might not be positive, but for a domain service to be leading in total buzz around the Super Bowl, that’s pretty impressive. So I’m calling this a win for GoDaddy.

2. Bud Light

Bud Light and Super Bowl commercials are almost synonymous. They always purchase a bunch of commercial slots every year and have had such classics as “magic fridge.” This year was no exception, at least in terms of the quantity of Super Bowl commercials they aired, totaling 6 this year.  In the bunch, there were some good ones. One tweet talked about how USA Today rated Bud Light’s “Weego” commercial, with Weego, a dog that fetches Bud Light, #1, while #10 on the Super Bowl a-list Ken Fang gave the “Weego” commercial a solid B.

However, I still think Bud Light would be disappointed by the sentiment around their commercials because the reaction to their commercials for their new product “Bud Light Platinum” was less than favorable. Quotes included “Bud Light Platinum? I don’t think so” and “Whose performance has been more disappointing thus far — the Patriots or Anheuser-Busch,” while Fang gave the two “Platinum” commercials a C and a C- respectively. Considering they ran two commercials in order to get some positive publicity for their new product, Budweiser would probably not be happy to know that some of the most influential Super Bowl writers are not talking favorably about their new product.

3. Chevy

Chevy ran 3 ads for three separate brands – the Silverado, the Camaro, and the Sonic. They actually had the most total positive mentions, but there was a mixed batched for them as their other mentions were ones I deemed negative. Fang gave the Silverado commercial about the apocalypse an A, while the Chevy Sonic commercial received a modest B-. Meanwhile, Cindy Boren, 13th on our list, named Chevy a 1st quarter winner for their apocalypse commercial. Unrelated to commercials, #1 on our list, Bruce Raffel gave them a positive mention in relation to one of Chevy’s other marketing initiatives related to the Super Bowl, giving Super Bowl MVP Eli Manning a Chevy Corvette, a car which Raffel called “sweet.”

On the flip side, however, Fang gave their commercial for the Camaro, which featured a high school graduate mistakenly thinking his parents bought him a Camaro for graduation, an F. Meanwhile, #23 Andy Hutchins mocked Chevy’s apocalypse commercial by tweeting “‘Drive a Chevy or die!’ – Chevy.” Fortunately for Chevy, the comment was by an influencer on the list because of high relevance and lower reach and resonance, so he only received 11 retweets. All in all, I’d say Chevy should be pleased with the mentions surrounding their brand, especially since they received 7 total. People were definitely talking about their 3 commercials.

Between all the super bowl commercial among these top three, the sentiment was pretty evenly split as the below graph indicates. Note: red = positive, black = negative, gray = neutral.

 

Sentiment analysis breakdown visual in Analytics Suite

 

As you can see, discovering influencers is only half the battle. The other half is in the monitoring of your influencers. Being able to track how many times the most influential people in a certain topic are mentioning your brand or keyword and exactly what they’re saying with the ability to assign the sentiment surrounding it can be incredibly valuable for brands, regardless of whether or not they are running one 3.5 million dollar Super Bowl commercial, or an ongoing social media campaign.

Share

Traackr’s migration from HBase to MongoDB

February 8th, 2012 by george

Let’s get one thing out of the way before we start: this post is not an attempt to disparage HBase. HBase is an extremely powerful tool; applied appropriately and skillfully under the right scenarios, it can move mountains. This post is about the evolution of Traackr’s data storage needs and how MongoDB ended up satisfying them. It’s also a tip of the hat at the MongoDB team and 10gen and the tremendous work they have done.

Back in late 2009 early 2010, Traackr was designing the foundations of its’ search engine and hunting for an appropriate datastore to back it up. Some of the requirements were:

  • Built-in support for storing terabytes of text: that meant that we shouldn’t have to use or modify the software in an unconventional fashion beyond its’ original design to get it to store and retrieve the quantities of data we wanted.
  • Flexible schema: Traackr deals with heterogenous data sources from the web, constantly discovering new content and new properties that characterize that content. The database had to allow us to model those properties across all stored content without requiring extensive schema migrations that would take the system offline.
  • Ability to batch process the data: Traackr’s scoring algorithms take into account statistical measurements derived from our entire active data set. Those computations need to be run at least once a week to account for the continuous growth and shifts in data samples. We therefore needed a system that allowed performing computations on the whole data set in a “reasonable” amount of time. We didn’t have an exact number for “reasonable” but our goal was to keep those processes running under 6 hours early on Saturdays and leave enough time for other batch jobs that depended on these refreshed computations to run the remainder of the weekend hours.

Some of the contenders in our product selection matrix were:

  • Traditional enterprise packages such as Oracle: ruled out because they were way out of our budget.
  • MySQL: our content sizes vary from 140 character tweets to multi-page articles and using one size fits all BLOBs would be a tremendous waste of space. Granted, storage is cheap or one could design the schema to split content to different tables accordingto size but that would add needless complexity when there were other solutions that didn’t. Also, the flexibleschema requirement would not be fulfilled as every new column added or modified in a multi-million row table would require a time consuming migration. There are ways to mitigate this by creating attribute tables but those tables become very large on their own (think a dozen attributes per post times millions of posts). And while MySQL can be sharded to split large tables and data sets it requires the code the be cognizant about it while other solutions don’t. So we passed not only on MySQL, but also on most other open-source RDBMS solutions for the same reasons. Enter NoSQL.
  • Cassandra: It fit the bill in terms of schema flexibility and storage capacity. The model of having the same deployable for each cluster node was very enticing; it made for much easier setup and maintenance for a small team like ours. But in late 2009 / early 2010, it still lacked batch processing options like MapReduce (those were introduced later on in 2010). It also seemed that there had been some period of inactivity around the project afterits’ initial 2008 release, so at the time, we were uncertain about its’ future adoption.
  • MongoDB: it was still new at the time, so we had concerns about its’ stability and adoption. The document-based schema flexibility looked great but auto-sharding was still not available (came out mid-2010) and there were no out-of-the-box options for batch processing. So we made a note of it and kept looking.
  • Riak: it was a serious contender for us; most of our requirements were being met and it presented the same promise of ease of use and deployment as Cassandra did. The team was an impressive bunch from Akamai that really seemed to know what they were doing. To top it all off, they were local to Boston and startup-friendly. Despite all of this, we ended up shying away due to questions of adoption. It was still too young of a project for us.
  • HBase: back then, it was one of the most polished solutions with quite a bit of traction. The requirements were all there: ability to grow with large data sets, flexible schema, built-in batch processing with MapReduce, healthy community for support.We had our pick. It also provided “out-of-the box” secondary indexing through a contrib package that came with the source. This allowed us to avoid writing our own app layer indexing code or so we thought. Those secondary indexes ended up being a lot more critical in the longer run than we originally anticipated.

So we picked HBase and started running with it. We had to deal the learning curve of its’ setup and the various components and configurations. This pretty much took most of my time which unfortunately detracted from working on features. But we eventually got there and were able to get it humming (most of the time). Our weekend batch scoring requirements were met as expected: we were able to re-score our entire database in less than 30 minutes. We even contributed back to the code base (HBASE-2438 and HBASE-2426). Things looked good.

Then came the upgrade from HBase 0.20.x to 0.89.x/0.90. The code base was changing fast and we wanted to keep up with the latest speed and stability fixes. But there was one problem: the secondary indexing contrib packages were moved out of the main code base and as a result, our HBASE-2426 customizations were becoming stale. This was also signaling that indexing was in fact about to fall even further behind in priority instead of making it to the core source. Bad news for us since we depended on it; we had no choice but to keep our customizations up to speed. We eventually ended up dropping the contrib packages all together and completely re-wrote our secondary indexes using a more generic approach to avoid running on unsupported 3rd-party code. Even then, we still knew that app layer indexing was going to be slower and more brittle than any DB layer solution. This became even more apparent when we needed to evolve our domain model.

The basis of our data is built on 3 core entities: influencers, the channels on which they post content and the posts themselves. We had originally de-normalized the relationship between influencers and their affiliated channels, admittedly to be more inline with how our NoSQL datastore was intended to be used. While we knew that a given channel could be shared across multiple authors, we chose to repeat some channel data across influencers to simplify runtime random access. While this decision simplified queries, it ended up putting more strain and complexity on our content attribution logic. With more complexity come more bugs and those started rearing their ugly heads in the form of some mis-attributed content. The issue was compounded when we cranked our content tracking up a notch and introduced our daily monitor feature in mid-2011. To add to this challenge, we had also discovered after months in production that we needed a better approach for modeling the details of the relationship between an influencer and a channel as each influencer interacted with a given channel in their own way. So after a year and half of running on our original assumptions it was becoming clear that our model needed revisiting.

All of this was happening around the time we created an opening for a big-data engineer. The 2011 Santa Clara Hadoop Summit was also being held. We ended up attending the conference hoping to meet some talent. We even showed up at the HBase Contributor meetup where we mingled with some of the great minds behind the scenes. The trip was both exciting and revealing. The two major take aways for us were:

  • Big-data engineers came at a premium: good luck competing with some of the big pocket books in the valley.
  • Experienced HBase engineers were even more rare and the good ones where all strategically positioned around firms in the valley, so we would be better off either hiring on the East Coast or developing the capabilities in-house.

For a shop our size, there is a limit to how many of your resources you can take away from feature development to dedicate to infrastructural concerns. We had already spent a lot of dev time on the datastore infrastructure and our impending model changes were about to call for some more. Adding this up with the seemingly slim odds of us attracting an experienced HBase developer, it became apparent that continuing down the HBase route would be akin to trying to fit a square peg in a round hole. It was time to move on and look for a more appropriate solution. The tool was just not built for what we were trying to do and we could not afford to try to get it there. So we went back to re-evaluating NoSQL solutions. This time, solid support for secondary indexes was added to the requirements. The round two contenders were:

  • Neo4j: a very powerful graph database capable of efficiently traversing complex relationships; too much for what we needed and primarily designed to be used in an embedded JVM mode, we would have to make some significant changes to our system to integrate it.
  • MongoDB: it had matured by leaps and bounds since the last time we looked at it, with increased adoption from many shops and great support from 10gen. It came with advanced indexing out-of-the-box as well as some batch processing options (http://www.mongodb.org/display/DOCS/MapReduce, https://github.com/mongodb/mongo-hadoop). To top it all off, it was a breeze to use, well documented and fit into our existing code base very nicely.
  • Cassandra: it had matured as well and now had support for secondary indexes but those seemed more restrictive than Mongo’s and Mongo still had an edge over it in terms of developer friendliness.
  • Riak: still a strong contender and supported secondary indexes since release 1.0 but still lacked the traction that MongoDB had.

This time, the choice was much more straight forward. Many of the solutions had come a long way to meeting our needs, so we were able to make a selection not only based on specs but also based on ease of adoption. MongoDB was hands down the most approachable solution for us.

Having worked with it now, it’s no wonder why MongoDB is currently enjoying such growth. While the migration from HBase took us about three months the integration with MongoDB itself was achieved in just a couple weeks since we already had a DAO layer abstracted from the rest of our applications. The rest of the time was spent tweaking our new model and re-writing our content acquisition and attribution services. And at every step of that refactoring, we found that MongoDB was making things easier for us:

  • Normalizing channels and influencers became a straight forward exercise as we were now able to model the associations as influencer collection sub-documents.
  • Creating indexes for fast random access queries no longer required specialized code. We still had to be mindful of the memory implications but the implementation was much cleaner and easier to maintain.
  • Ad hoc queries and reports became easier to write: there was no longer a need for a Java developer to write map reduce code to extract the data in a usable form. The plethora of admin UIs meant that our own product manager could use his JavaScript skills to whip out reports using the the powerful out-of-the-box query facilities.
  • Basic things like backups became a breeze again. While HDFS has built-in redundancy within a cluster, replicating data in a backup cluster is still advisable but becomes expensive from a hardware and network point of view. For the longest time, our approach consisted in exporting the HBase tables to S3 on a regular basis, which took a lot of time and far from guaranteed data consistency on restore. With MongoDB, all our data currently fits on a single instance with a hot replica that we can switch to if the master goes down and a third backup machine whose EC2 EBS drive we snapshot on a regular basis after freezing its’ XFS file system to ensure data consistency. While such a setup is of course possible with HDFS/HBase, support for it comes out-of-the-box with MongoDB and it can be done more affordably with a lot less hardware.
  • The documentation was fantastic. Leaps and bounds ahead of the other solutions we evaluated (although the Riak folks are doing a great job as well), it was very well organized and the disqus comment integration meant that the dev community could easily pitch in if there were specific gaps.
  • The speed was really impressive. We found that we were able to replace our weekly influencer re-scoring MapReduce jobs with straight MongoDB cursor iterators and still get the final results faster than before. Our data may of course outgrow this approach at some point but we are confident that we will be able to adapt using the available hadoop integration if we need to.
  • The community was within reach and the feedback was consistant. We attended Mongo Boston and we were pleased to see that the size of the crowd confirmed what we were hearing and reading about the adoption of MongoDB. The sessions were super informative with great tips about how to optimize one’s setup and what the major gotchas are. What’s more, the suggestions and advise were consistant with what we were reading on line from separate sources, which was a refreshing change from some of the blind trial and error experience we had with our previous setup.

Looking forward, we think that we have finally nailed the solution that fits out needs for the foreseeable future. We no longer feel that we are fighting our datastore every step of the way. On the contrary, all our developers have given nothing but positive feedback on their MongoDB experience thus far and report being a lot more productive with it. This is a testament to the thoughtfulness the MongoDB engineering crew and 10gen have put behind the solution and we are looking forward to working with them in the months and years to come.

Share