Thursday, May 21, 2015

Trai and their blunders

Once TRAI had published the list of emails(read about it here)  it had received along with the addresses I was very skeptic of what measures would be taken to correct this blunder and how effective they would be.

After a little time TRAI decided that in order to discourage Spam bots from using the list as a source, they must do something. A decision with it's heart in the right place. Then came the blow to my intelligence.

The measures TRAI undertook was to replace @ with ( at ) and "." with (dot) in every email.
This was unexpected. If you have ever typed into GMail any address and performed the same replacements you would notice that it does not matter if you use @ or (at).

Another thing of note. I expected it to be relatively easy to extract the emails from the website and compile a list of them. Thus I sat down with my friend's 2G Internet connection on Aircel and began to download the web pages containing the emails. There were 18 we pages of note which contained the emails.

With this in mind, I fired up Vim (text editor) and began to type out a python script which would do the extraction for me. An easy enough job and after letting it run for 192.55 seconds (I timed it) I had a list of 8,90,537 emails. Not quiet the 1 million as claimed but substantially close.

All in all the efforts TRAI made to keep our data private was commendable even though it only took a student with a slow Internet connection and a little knowledge of Python to extract the emails.

As expected my email was also within the ones found.