The 5 Most “Hair-Pulling” APIs

Written by Cesar Del Solar on Nov 19, 2015

Dealing with 3rd party APIs can be temperamental, and no two of them are exactly alike. At AppInsights, we take pride in our ability to delve into their secrets and get the most out of them for our integrations. However, over the years there have been some APIs that have brought fists smashing into keyboards, and cold shivers shooting down spines from the mere mention of their names.

Join us as we share with you 5 of the most memorable API fails (in no particular order). Some of the names listed may surprise you, as you might love their services (hey, we do too!). But for one reason or another, deficiencies in a key aspect of API qualities (speed, data integrity, easy of use, documentation, power, and openness) have caused tears of frustration and great gnashing of teeth in the halls of AppInsights.

5 most hair pulling api

Magento

Magento is a popular (used on an estimated ~1% of all websites) eCommerce web application that hosts your online store and handles your payments. It’s popular because it’s straightforward and powerful. However, dealing with their API was anything but. One problem was that their REST API was erratic and gave inconsistent output with the various sets of filters that we tried.

But the biggest sin and what ultimately killed it for us was the incredible permissions gymnastics users were required just to access their data. Users must:

  • Add new roles, permissions, and users through their internally installed Magento app.
  • Submit: Consumer Key, Consumer Secret, and the 3 OAuth URLs.
  • Make 2 modifications to the Magento .php code due to bugs in the Magento core.
  • Expose these URLs on the web server hosting Magento.

All of these steps require someone with a deep understanding of the Magento system, typically the administrator who set up the company’s installation. And you thought OAuth was one-click.

Google Analytics

It may be surprising to find Google Analytics listed here, as their API is very powerful, fast, and robust. However, there is a glaring flaw.

Here’s a scenario for you: You have a highly trafficked site (>100,000s/day) and you want to monitor this traffic by the hour so you fire up an API call for visitors using ga:hour as the dimension. Pretty simple right? Wrong! You are greeted with a string of 0s for every hour.

The root of the problem is that GA data sometimes lags, so somehow they thought passing zeros was better than no data. This would probably be OK if the lag was a few seconds, but no, it can be up to 48 hours!

As you can imagine, this was a huge problem for our customers and we had to rework our entire backend to allow data to auto-correct itself backwards in time, so that the right data would show up eventually for a user. Eventually.

The kicker is that they are not the only ones who think this is a good solution as Facebook recently seems to have made an unannounced change where some of their insights metrics return 0s for data not yet collected.

Bonus flaw: This isn’t value killing as the other one but like the whine of a dental drill, is annoying to no end. It is impossible to change your GA timezone if your account is tied to Adwords at all — yes, impossible. This is why our whole development team is going bald in their 20’s.

KISSmetrics

KISSmetrics is a powerful web analytics app that allows a company to dig deep into their customers’ interactions with their website. They are incredibly data-centric and their blog offers daily insights for the data-driven community. Which is why it’s bonkers that they have no API at all, at least not in the traditional sense. Their event collection API is their bread-and-butter and so they have a very capable one, but in order to extract information from their outbound API, these are the steps that need to be done:

  • Create an AWS account, a data bucket in S3, and give Kissmetrics access to it.
  • Access your bucket, which will only have data up to the previous day
  • Get a massive .csv file for each day containing all your events
  • ???
  • Profit?

Even then, there can be events from a previous or next day in each file, so multiple files must be parsed.

Writing a robust integration that allows the user to filter data and view it in real time is quite challenging, given these limitations. We understand that not every company has a contract to create an external API, but as Winston Churchill must have put it, “OMG!”, that data’s just sitting there for Pete’s sake!

Salesforce

The Salesforce API is actually fairly simple as it pretty much revolves around SOQL, a SQL-like query language they use for their Salesforce objects. This allows lots of simple queries that can fetch tons of data from various datasets – if only the documentation cooperated. And that’s where the problem starts.

The main doc page literally still uses frame tags, which aren’t even supported in HTML5 anymore. How do you obtain column metadata for a SOQL call? Figure it out from looking at the HTTP requests in their Apex explorer (hint: use ?columns=true as a query parameter). But don’t rely on this undocumented parameter, as it’s subject to be removed at any time as the API versions update.

Why is this column metadata useful? Well, column metadata just tells us the relationships that columns have to each other, you know, the entire point of a SQL database. We had to try many different types of queries to figure out what the different fields in the response mean, as well as employ the code-breaking services of Lawrence Waterhouse, as we cannot find any documentation on these fields. At least our custom Salesforce widgets are pretty robust now.

Update: There’s a “tooling API” that’s releasing soon that can get us this information. However, it is not at all clear whether it can be used with the existing REST API tokens and easily tested through the command line; the documentation states that this metadata is part of a complex type that “represents the result of a SOQL query in an ApexExecutionOverlayResult”. As a precaution, we have removed all sharp objects from our coding cave.

Twitter

Twitter has massive amounts of data going through their servers every minute, which must present an amazing architectural challenge. As far as APIs go, theirs is pretty straightforward and revolves mostly around showing you tweets given a search term or a screen name, as well as some simple metrics about your account… and that’s it.

The problem is, in order to get real value from Twitter, you need to track metrics such as how often mentions of your brand occur with time. This type of information is completely unavailable unless you connect through their “firehose” and parse it yourself. There are now companies that have sprung up around the concept of tracking metrics as simple as historic follower count due to the lack of a historical endpoint on the Twitter API. Most of the people we talked to would be happy to pay Twitter money for these metrics, but nope. I mean did they also choose a 140-character limit for their API output or something?

Summary

And there you have it, 5 examples of how not to build your API. Of course, it’s easy for us to complain, and we do recognize that without an exception, these APIs do lots of great things too. But it’s that potential that leaves us so frustrated (It’s like finding out Diablo 3 was online only, WHY Blizzard, WHY!?).

Light hearted criticism aside, APIs are definitely paving the future of business interactions. And as harsh as we sound, these APIs should be commended on how much they’ve already accomplished. If these flaws can be fixed (and without a doubt they will eventually be), these APIs will lead the way.