IP to location: writing a efficient scanner

In the first article on IP to location I described the overall concept. In this edition we'll look at the implementation and how to scale the tracing to a reasonable speed.

With the examples from the last article you can also implement everything to build your own database. Before you jump into it take a minute to look at the implications.

Taking the example from the previous article 217.80.0.0/12 for the German Telekom we have to trace 1mio addresses and scanning them all will take a while:

# time traceroute -w 1 -m 8 -q 1 217.80.0.1
traceroute to 217.80.0.1 (217.80.0.1), 8 hops max, 40 byte packets
 2  217.0.72.230  36.974 ms   
 3  ke-eb1.ke.de.net.dtag.de (62.154.98.82)  54.454 ms
 4  217.0.74.213  54.610 ms
 5  * * *
 6  * * *
 7  * * *
 8  * * *

real    0m3.244s

If we can make the make the host lookup optional we can reduce it to:

# time traceroute -n -w 1 -m 8 -q 1 217.80.0.1
traceroute to 217.80.0.1 (217.80.0.1), 8 hops max, 40 byte packets
 2  217.0.72.230  37.636 ms 
 3  62.154.98.82  55.631 ms 
 4  217.0.74.213  55.614 ms 
 5  * * *
 6  * * *
 7  * * *
 8  * * *

real    0m1.053s

Scanning everything with one traceroute takes:

 ips * time_per_scan / num_of_scanners = scan_time

1 048 576 IPs * 1s / 1 scanner = 12 days

To manage traceroute you can use something like Net::Traceroute with Perl. It will start parallel traceroute instances for you and you should be able to scan 32 IPs in parallel. It would reduce the over all time down to 9hrs, for scanning not even a 1/1000 of the internet.

Latency

The time-per-scan is mostly controlled by the network-latency:

# traceroute -w 1 -m 6 -q 1 cnn.com
 3  217.0.72.230  <b>38.105 ms *DSL-line*</b>
 4  f-ec1.f.net.dtag.de (62.154.17.242)  <b>52.038 ms *across-Germany*</b>
 5  pop1-frr-s2-2-0.atdn.net (66.185.147.97)  <b>136.859 ms *across-the-ocean*</b>

To scan worldwide you should have scanners in each continent. The sea-cable or satellite connections would add too much latency otherwise.

As there will always be slow IPs to connect, you have to send more packets in parallel before you get back a response.

Build your own traceroute

Quite some time is spent in starting and stopping the programs as traceroute itself is not really spending CPU cycles for its work. A traceroute-daemon would be a first step to scan more in parallel.

Checking the net you will find several traceroute implementations for UDP, TCP and other protocols. You can either take their source and use libpcap and libnet to build your own. libnet is for building the UDP packets with a increasing TTL and libpcap is to receive the ICMP messages for ICMP_TIME_EXCEEDED.

As we want to scale nicely up to 1000 parallel scans we should also take a look at event-driven, non-blocking IO to handle all the scans in a single binary.

Scan it

In a example implementation it can look like:

$ ./loc-scanner
usage: ./loc-scanner <hostmask> <steps> <regex> <type> <children>
  hostmask: 192.168.0.0/16
  steps   : 64
  regex   : '[a-z]'
  type    : (dns|traceroute|whois)
  children  : number of parallel children

$ ./loc-scanner 217.80.0.1/16 32 'dtag.de' traceroute 16
IPs: 65536, childs: 16, jobs/childs: 16
childs started: 16
................................................................    4.92     63488 ke-eb1.KE.DE.net.DTAG.DE,217.80.21.193
................................................................    6.74     61440 ke-eb1.KE.DE.net.DTAG.DE,217.80.7.193
................................................................    7.68     59392 ke-eb1.KE.DE.net.DTAG.DE,217.80.11.193

After some warm up we go up to 12 scans/s with 16 childs, with larger networks we can use more scanners and increase the scan-rate to 40 scan/s. This is currently limited by the network latency on my network.

Algorithms

If we can't add more scanners and can't reduce the latency even more, we have to reduce the number of IPs that we scan. This is a balance between accuracy and speed.

IP-allocations are controlled in blocks. To reduce management overhead and keep the routing-tables small. Depending on the provider this might be done in blocks of 32 addresses or even in blocks of 256k. Assuming that the smallest size is 16 IPs in a block we can pretty only scan every 16th address and reduce the scan-time by 16.

We can also be a bit more predictive and start with larger blocks, like 128 IPs and probe a few address in the block if they are handled by the same router. Instead of random probes we can use the assumption of block-allocations again and test the IPs at the border of the next possible smaller block:

Address block (/24)
-------------------
Block:
.0                                      .255

Probes next to the border of /25 (.126 and .129)
 .1              .126  .129            .254

If all 4 are handled by the same router, we can assume that the whole block is handled the same block. If not, you divide the block and repeat until the condition is met.

Scanning 4 IPs to predict 256 IPs. That's a reduction of factor 64. A lot better than just assuming that the smallest block is 16 IPs.

... and next week ?

In the next article we will look into mapping the collected data to a useful data-structure and compress it nicely.