Swiss scientists develop algorithm to locate source of malware, spam

Algorithm aims to determine probable source of online attack by analyzing about 10 percent of network connections

Swiss scientists have developed an algorithm that can be used to locate spammers as well as the source of a computer virus or malware.

The algorithm finds the source by only checking a small percentage of the connections in a network, said Pedro Pinto, postdoctoral researcher at the Audiovisual Communications Laboratory of the Swiss Federal Institute of Technology (EPFL), on Monday.

[ Security expert Roger A. Grimes offers a guided tour of the latest threats and explains what you can do to stop them in "Fight Today's Malware," InfoWorld's Shop Talk video. | Keep up with key security issues with InfoWorld's Security Adviser blog and Security Central newsletter. ]

If you would like to find the source of a virus, malware, or spam attack, it is impossible to track the status of all nodes on the Internet, Pinto said in a telephone interview. "That would mean you would need about 1 billion sensors. And you don't want to monitor the entire Internet," he added.

Instead he and his colleagues devised an algorithm that shows that it is possible to estimate the location of the source from measurements collected by sparsely placed observers or sensors.

By using the algorithm, the specific computer in the network from which the spam mail is being sent can be found so that the network provider can, for instance, shut it down, said Pinto. Using the same method, the first computer where a virus was injected could be pinpointed, he added.

The location of the source is basically accomplished by using the network structure, looking at who is connected to whom, and determining the time of arrival of the virus to the sensors, Pinto said.

The algorithm only has to analyze ten to twenty percent of all the nodes in a network to determine what the likely source of an attack is, Pinto said. "Sometimes this is five percent," he added, pointing out that the number of nodes that need to be analyzed depends on the complexity of the network.

The workings of the algorithm were detailed in a paper entitled "Locating the source of diffusion in large-scale networks" that was published in the Physical Review Letters journal on Friday.

In the paper, the scientists expect that the algorithm can be used for other things besides finding computer culprits. The method is, for instance, intended to find the source of biological viruses and epidemics like SARS -- the algorithm could be used to determine the city in which the virus appeared for the first time. But it could also be used to find the source of a rumor spreading on Facebook or sniff out the source of an airborne contaminant that was let loose by terrorists in a subway network, according to the scientists.

While the technique could have uses in many different industries, the first commercial interest in the algorithm has come from computer security companies, Pinto said. "Some companies emailed me after we published the paper last Friday," he said, adding that he did not want to disclose the names of the companies.

Another natural fit for the technology would be its use by public services like governments, Pinto said. Besides looking for ways to use the technology commercially, the scientists will also try to improve the results of the algorithm.

Loek covers all things tech for the IDG News Service. Follow him on Twitter at @loekessers or email tips and comments to loek_essers@idg.com

Copyright © 2012 IDG Communications, Inc.