How to win (or at least not lose) the war on phishing? Enlist machine learning

Enlarge / Coming to a device near you: Freddi Fish 666—the Phishing Apocalypse.
collage by Sean Gallagher from urraheesh iStock & Humongous Entertainment

It's Friday, August 3, and I have hooked a live one. Using StreamingPhish, a tool that identifies potential phishing sites by mining data on newly registered certificates, I've spotted an Apple phishing site before it's even ready for victims. Conveniently, the operator has even left a Web shell wide open for me to watch him at work.

The site's fully qualified domain name is appleld.apple.0a2.com, and there's another registered at the same domain—appleld.applle.0a2.com. As I download the phishing kit, I take a look at the site access logs from within the shell. Evidently, I've caught the site just a few hours after the certificate was registered.

As I poke around, I find other phishing sites on the same server in other directories. One targets French users of the telecommunications company Orange; others have more generic names intended to disguise them as part of a seemingly legitimate URL, such as Secrty-ID.com-Logine-1.0a2.com. Others still are spam blogs filled with affiliate links to e-commerce sites.

I check the access log again. The phisher has come back, logged in from an IP address in Morocco. He's unzipped the phishing kit. I send a heads up to the hosting company, SingleHop, with screenshots of the phishing page. I report the site to Google Safe Browsing and check one more time to see if I've missed anything.

The phisher notices something suspicious in his access logs. His site now up and running, he's deleted his shell—but not one on another subdomain. I consider going back in, but my work here is done anyway.

During the two hours I spent investigating this Apple phish, another 1,678 suspicious sites have popped up—spoofing brands including Apple, PayPal, Netflix, Instagram, and Bank of America. It will be nearly two days before SingleHop responds about the initial Apple one: "We were in touch with the management of the allegedly abused server, and after discussion the reported problem is claimed to be resolved."

That sort of interaction can't scale very well, but phishing seems to only be growing in its popularity. So if I learned anything from my StreamingPhish-time on the frontlines of this new digital war, it's this: if we're going to make a dent in these phishes, we're going to need a bigger boat—one with a whole lot more machine learning-based automation.

Anatomy of a modern phish

A screenshot of the landing page for the Dr. jOker phishing kit. Look familiar, Apple users?
Oh, we need to give more than just our Apple ID, it seems.
Tell us a little more about yourself, Mr. Callahan, was it?
OK, Harry, how about a credit card number? The site code "validates" the number by recognizing number patterns that match to credit card types.
Oh, while you're at it, give us your Social Security Number.
And with a thank you, the site forwards you off to the real Apple login site.
The root directory of this phishing site was left viewable—including the .zip file with the code. And...is that a backdoor?
It's Dr. jOker's personal (s)hell—a backdoor with no password required.
What's Dr. jOker been up to? Welp, looks like we got here just as he was setting up.
On the same server: an Orange phishing site.
Some of the other sites on the same server.

The modern phishing economy is like that of a gold rush. There are lots of small players streaming in with little or no prior skill after word spread about big heists. These minnows enrich larger predatory outfitters selling kits and infrastructure. Elsewhere, there are bottom-feeders that steal other people's kits or create cut-and-pasted kits full of sloppy code, and many of those barely work—though it can be just enough to fool the most credulous among us. But at the top of the food chain, there are a few professional operations that either produce increasingly sophisticated kits for sale or use them for their own more targeted purposes.

Unskilled phishers either just drop kits on hacked WordPress sites or—if they're more ambitious—on low-rent virtual private servers configured with cPanel, which makes managing phishing domains a totally point-and-click affair. Many tried and now burned kits (essentially ones that have been catalogued as threats by endpoint protection companies) are available by the truck load in packs costing as little as $10.

Phishing kits are usually written in the PHP scripting language, with bits of JavaScript often scraped right from the sites they mimic. These kits are usually delivered in a compressed file with a text file that even explains to the customer how to configure them. Luckily for those trying to battle back, operational security for these low-level phishes tends to be horrible—the people running these phishing sites are inexperienced, lazy, or both.

"They'll copy the .zip file over to the server, or they'll leave directory indexing turned on so you can just walk back a directory and download the kit," said Jeremy Richards, a threat intelligence researcher at the mobile security service provider Lookout. "Some of the kits are backdoored. Others store logs in text files on the server. And then we can do things like track the IP address of the successfully phished users—and in some cases, the IP address of the phisher logging in to download the credentials."

The Apple phish I encountered on August 3 was being run by a long-time operator with the nom-du-hack of Dr. jOker. Accessing his server account from Morocco, Dr. jOker ran a number of fraudulent affiliate shopping sites in addition to a handful of phishing sites from his Web account (named "mintmake"). At one point, he may have hosted malware based on some directory names on the site, but those pages had been pillaged by the "Cyb3R Command0S" (site defacers claiming to be out of Bangladesh) before they were shut down.

Despite his operational security failures, Dr. jOker appears to be a fairly competent coder. He maintained his sites with his own bespoke backdoor—a Web shell, a page within the site, that allowed him or anyone else that happened to stumble upon it to upload, delete, edit, and rename the files in each phishing domain he had configured. That structure is what allowed me to watch the access logs and, as a result, his every move.

The phishing site itself was competent. The design clung closely enough to Apple's current Web style to convince anyone who didn't pay attention to the address in the browser window (at least at first) that it was legitimate. Leveraging a free geolocation API, the kit was able to customize the language and other aspects of the phishing page to each visitor. It also performed credit card validation and checked number patterns to identify the type of card being used, just as Apple's site does. All the personal data collected was sent straight to jOker's Gmail address: docteur[.]joker[at]gmail.com. (I reached out to the Doctor for comment on this story, but he hasn't yet replied. I also alerted Google.)

Carefully crafted phishing kits are just part of the phishing arms race. Exploits that allow rapid scanning and compromise of vulnerable sites—along with other sources of low-cost or no-cost hosting and free or cheap domain name and SSL certificate registration—make it fairly painless to create phishing domains and subdomains that seem legit right down to the green lock next to the fake Web address. These things also make phishing sites more disposable. Some more targeted phishes have life cycles of 48 hours or less.

And just as we're getting to the point where organizations feel like they have email phishing under better control by using measures like domain graylisting, email filtering, or user education, phishing attacks are evolving again. Many now target people where their defenses are weakest: mobile devices.

"People are just more distracted when they're using their mobile device and trust it more," said Lookout's Richards. "They're not paying as much attention. And you've got more vectors for phishing: SMS, WhatsApp. People aren't being trained that these are risky vectors as well."

Just as with Caller ID on robocalls and scammer calls, the source of SMS messages is easily anonymized or spoofed. And there's currently no way to screen SMS messages for spam like there is with email.

Phishnet —

How to win (or at least not lose) the war on phishing? Enlist machine learning

We can catch phishes before they're even cast—using real-time data and open source code.

Anatomy of a modern phish

Channel Ars Technica

Anatomy of a modern phish

reader comments

Channel Ars Technica