Common Pitfalls When Target Scraping/Filtering

When scraping for your ideal audience, the accounts you believe to possess the greatest likelihood of converting into followers/customers, the goal is to typically narrow down your audience to a molecular level. Most advanced tools like InstaInfantry will generally have filters you can enable to help you eliminate some of the more “undesirable targets” such as by language or gender.

However, I am personally not a big fan of using filters for a few key reasons. The first and largest one being that your infantry/accelerator accounts will die at a much, MUCH faster pace with filtration enabled.The amount of API calls that now need to be made in order to check each user increases EXPONENTIALLY. We’re talking literally 100x more for every hundred users scraped. Instead of simply collecting 100 target user IDs it finds on the source, the account must now individually visit each account to check whether it meets the filtration criteria or not. This absolutely demolishes accounts in the long-term so the cost of running just about any strategy increases by a ridiculous margin.

That should be enough to dissuade a lot of users from utilizing filtration but the 2nd reason is that it oftentimes can actually decimate your pool of already decent targets. I find that many users try to find the most obscure, niche-ified sources and then proceed to also add ten different filters across a pool of like 50,000 potential accounts. As a marketer, I understand the inclination to do so. To get as granular with my audience as possible. We’ve all heard of the phrase “if you’re marketing to everyone, you’re relating to no one”. But there is obviously a limit as to how granular you can truly get.

Using one too many sets of criteria ends up becoming counter-productive to your marketing efforts. The script doing the parsing cannot read the context of someone’s page like we can. Sometimes, spending more of your time finding the good sources ends up being far more effective than agonizing over each positive feature you want the profiles in question to have.

For example:

Back when I was focusing solely on the hip-hop niche, I’d have times where I’d go for weeks on end without any clients. In an attempt to get a bit more work, I’d resort to using my own marketing methods to find clientele. I’ve spent countless hours trying to find the perfect filters, endlessly reviewing the profiles of aspiring rappers and artists to find as many commonalities between them as I could. I had a list of hundreds of positive keywords. “Artist, rapper, EP, LP, single”. I had even begun implementing long-tail keywords such as “...out on all major platforms”. I had so many spreadsheets of the filters I had used for each campaign and their respective conversion rates that it was hard to stay organized. At one point, I spent almost $60 on testing just to determine the ideal size of an artist’s audience/follower count (with regards to their likelihood of becoming my client).

That was until I had some bot comment on one of my music theme pages, saying “contact @xxx for promotion”. The tagged profile was some massive rap themed page whose entire content strategy was posting small clips of aspiring artists for promotional purposes. I’ve seen those pages before and I harbored quite a bit of disdain towards the people that ran them as the degree to which they artificially inflated their follower count, views and engagement was ridiculous. They were essentially preying on artists by charging ridiculously expensive promo fees to post their music on a completely dead page. It was blatantly obvious when scrolling through the comment section of each post. Same exact phrases, a shit ton of emojis and almost no negative feedback whatsoever (which, in this niche, is practically unheard of. Mfs love to hate on others in the music field).

So I decided on my next campaign to simply scrape the usernames of all the artists that were tagged or mentioned on those posts. No filters or anything else. The conversion rate ended up being so high that I had to tell several of them to wait until the foreseeable future because I didn’t have the infrastructure nor the time to support that many clients at once.

In hindsight, it was so simple. These fake promo pages have done practically all of the legwork for me. They’ve reached out to thousands of potential clients, manually vetted each applicant’s ability to pay and only then proceeded to post their music on their page. Not only was I getting targets who perfectly fit the criteria but I was also getting targets who possessed a variable that I couldn’t possibly filter for: a proclivity towards paying for promotion. And as many of us psych students know - “The best predictor of future behavior is past behavior.”

What made it even better was that their clients received practically zero value from their previous promo and many were a lot more eager to try my black hat ways after being scammed one too many times by pages like these. It was perfect.

I suppose the moral of this whole tangent is to again, put more time and thought into finding sources that would logically exhibit a higher ratio of good to bad targets than spending hours trying to map out all the ways you’ll be filtering out the irrelevant ones. Filters are useful in some contexts but when utilized to a ridiculous extent, you’ll be hard-pressed to find a sizable group of eligible targets not to mention killing a ton of accelerators/scrapers in the process from all the API calls you’ll be making.

If you absolutely need a filtered audience, I would actually recommend using tools outside of InstaInfantry whose entire purpose is to just scrape and filter targets. Those tools will oftentimes not ask you to have your own accounts and proxies and just charge you for the amount of threads you have running at any given time. But again, this is all completely optional and not really a necessity in my mind. So long as you just focus on content and proper source assessment, you will have virtually zero need for filtering to get stellar results out of your campaigns.

Last updated