I recently collected a list off the 1,000,000 most visited domain names for the month. At this point, I couldn't resist comparing how often different words showed up in domain names compared to others.
Let's see some results!
| WORDS | FREQUENCY |
|---|---|
| work > play | 4375 to 2266 |
| sex > god | 6391 to 768 |
| take > give | 400 to 262 |
| porn > money/cash > love | 5169 to 3220 to 2978 |
| democrat > republican | 45 to 21 |
| linux > windows | 391 to 291 |
| hoes > bros | 360 to 190 |
| free > shop > buy | 8188 to 7439 to 1559 |
| win > lose > fail | 4281 to 481 to 102 |
| nba > nfl > mlb | 865 to 349 to 24 |
| king > queen > president | 3223 to 213 to 27 |
| good > evil | 832 to 499 |
Anyone else not particularly optimistic about the direction our society is headed?
The actual reason for collecting the domain names was to run n-gram frequency analysis on domains/subdomains. Eventually, this will be used to prove or disprove their usefulness in detecting covert communication in domain names (i.e. DNS Tunnels). However, that is a post for another time :-D