Arjoonn: Python

Showing posts with label Python. Show all posts

Wednesday, February 15, 2017

Attendance in St. Stephen's College

I've been hunting around for datasets I can relate to for some time now. This fine evening I arrived on the conclusion that I could simply use attendance data from the website of St Stephen's College.

The complete analysis notebook is available as a gist (here's the gist).

Collecting the data was a matter of inspecting how the website was obtaining it's data. A company has been handling our attendance since the beginning it seems. (GreenClouds site link). As far as I'm concerned they've made a mess of the data transfer as I can see. Perhaps it originated in the college itself, perhaps it's due to some mistakes on the company level. Whatever the cause the data format is a mess.

I open up my console and spin up a Python script to get the data from the college website. Takes about two minutes on my Internet connection to collect all the information.

We then proceed to put it in a nice tabular form with columns being Name, percent LA, percent TA, percent PA, admission_year, course.

Things are now ready for graphing.

Graphs

We go on to make Box plots. If you were to line up people in order of their attendance you would get a box plot. The person in the middle of the line is the line cutting the box in the middle. It shows that there are an equal number of people on either side of the line.

You cut each part of the line in half again and you get the bounds of the box.

This one is a gem. What is up with Chem! The attendance of their Lectures is absolutely ridiculous. While everyone more or less has the same attendance in a course Chem people are all over the attendance spectrum! Then again, their Tutorials box plot is neat too!

Yes. I see it too. Math people are consistently in class with a few people as outliers. Nobody else has this kind of spread.

Perhaps a better way of seeing this would be violin plots where you get to see the density of the people instead of simple boxes. The thickness of the violin denotes how many people people are at that point.

For example we can see that a lot of the math people are high attendance junkies and a substantial amount of CHE, PHY, PCH and, PCS are low attendance lovers.

There! The average PHI student absolutely does not give a hoot about attendance! The PCH and PCS programs do every class together as is apparent from their spreads. ENG and HST do compete with PHI but PHI does take the cake.

Let's move on to the simplest of all graphs. The Histogram. No explanation needed here.

Looking at the histograms we can see that almost everyone attends class with the exception of a few people who absolutely don't come to class. Perhaps people who left the course and the administration just did not remove them from roll calls? What about experience?

We do a regular point plot to see change in attendance over months. The vertical lines along the points denote a 95% confidence interval.

The third years have found that perfect balance of leisurely attendance during most of the semester and then picking it up during the last months. First years exhibit that trait of a "नया नया मौलवी" perfectly. Second years have figured out that attendance does not really matter, but are still to find out how badly those marks can affect their marks. With time comes wisdom,

That's it for now. I'd love to do some more if someone can come up with data at a lower granularity.

Tuesday, September 13, 2016

WebCheck: A way to watch websites for change in content

There are a lot of times when quick updates are essential to a task. For example when universities release examination forms, or when there is an on line sale for something, or even better, when your favorite blog author publishes something new.

RSS was built to handle such situations, but as we all know, people don't always follow standards. In cases where RSS has not been implemented by site authors, we may use webcheck to keep a watch on the site for changes.

This is a Python3 dependent script that I've used for some time now to keep track of entrance exam notifications. The software and it's installation instructions can be obtained from:

https://github.com/theSage21/reimagined-chainsaw

The way this works is simple.

Get a list of links to watch from a file provided by the user
Download each link and call it the reference page
After a certain amount of time, download everything again and see if something has changed with respect to the reference pages
Repeat 3

Saturday, May 14, 2016

A Regular Expression Engine in Python

Code is here

Regular expressions are quiet useful in programming tasks. They are a useful method to search for patterns in long strings of text. For example the pattern "(c|b)at" allows us to search for either 'cat' or 'bat' in any string that this pattern is run over. This example was a simple one and regular expressions in practice are far more powerful and complex.

One must take care however in remembering that regular expressions are a loaded mathematical term. They are not the same as regex and regexp since those two do not make regular languages.

Here, for the sake of brevity we shall call regular expressions regex. Regexes are usually made using Deterministic Finite Automatons. They take a string of input symbols and tell us if the DFA accepted the symbol string or not. Such an activity is called the run of the DFA.

Steps

We define that our regex engine will only provide the basic elements of concatenation, union and kleen star. Thus '|*' are the only special characters allowed besides the other things in the alphabet.
We first build a Non-Deterministic Finite Automaton with epsilon transitions using Thompson's construction algorithm from the given pattern.
Then we convert this e-NFA to a regular NFA without epsilon transitions by the epsilon closure method.
The new NFA is converted to an equivalent DFA by the power set construction method.
The DFA is finally run on a string. The Union (|) and the Kleen closure (*). Concatenation is assumed for consecutive symbols.
Upon completing the run, the DFA either returns True or False depicting that it has accepted the string or rejected it.

Example

For the regex '0|10*' (which means either a zero or one followed by any number of zeros) on the alphabet '01' the following strings return results as shown:

'1' : True
'0' : True
'01': False
'10': True
'11': False

The entire code is available as a gist.

Sunday, April 24, 2016

Why You Should Code

Every day, every second now, we are immersed in a world where dumb machines work for us. To list the obvious, your phones, TV, desktop, watch and, laptop are all computers. In the not so obvious list come power plants, rockets, cars, water supply, GPS, and, the post office. Despite living in this obviously computer driven world very few people know how to code. So I'm going to tell you why I code, in an effort to get you to pick up yourself and get going.

Programming at it's heart is communication. You instruct the machine. You tell it what to do, your desires, the things you don't like, the things you want more of, the things you want done quickly. You tell the machine what it is that you want done. In those moments of history, you are God. You can instruct the machine to anything and it will obey. It will obey in a manner you never thought possible.

There will be no complaint, no groans. There will be no raised eyebrows. You have a powerful ally at your command. You can designate repetitive tasks to it and it will do them till the end of time itself. You can assign some intelligent tasks to it and it will do them till the end of time. What the machine does is up to you.

The first problem I solved was downloading YouTube videos. I knew I wanted this entire play list and I did not want to go on clicking anymore. So I wrote a program for it and the machine obediently downloaded the entire play list for me. That is all there is to computing. Nothing fancy, nothing mysterious. Simple communication.

I now solve a lot of problems using my laptop. I have to apply for a PhD. So I write a program to suggest names of PhD guides to me.

There are lots of ways to communicate with the computer. Sure you can use something like Excel to do calculations for you. But in that you are limited to the options that the makers of Excel provide to you. Imagine this like talking to a person with cards. You can only say what is on the cards. Of course you will be limited! On the other hand, the way for unlimited communication is programming languages. Pick any one!

All programming languages are designed with one thing in mind. How can we maximize the things you can communicate to a computer. With a programming language at hand, you can communicate with your computer in an almost unlimited fashion. So learn one.

Try out Python! Try out Matlab( or Octave if you prefer free software like I do)! Try Java or even C!

All these languages allow you to communicate with the computer. All of them are powerful! With an untiring assistant at your beck and call, your life will be a lot simpler.

Wednesday, April 13, 2016

The Terminal - (Not the Movie)

"Ah, you think that a GUI is your ally? You merely adopted the GUI. I was born in it, molded by it. I did not see a terminal until I was already a man, by then it was nothing to me but blinding." - the difference between now and then.

As mentioned, I was born in 1994, and computers already had GUI by that time. Ever since I was introduced to computers, all I knew was the GUI. That had, as aptly said, defined my computer using habits. I was dependent on the mouse, I needed WYSIWYG editors and so on. Not that they are a bad thing, but they made the computer a black box for me.In college I was introduced to the command line and that gave way to a whole new experience.

This article is about how I finally embraced the terminal and started my life inside of it. As of now, I have completely un-installed the X server on my Archlinux installation and currently only have the terminal. Note that this is a TTY and not a terminal emulator like terminator and xterm.

Some of the things I had to introduce in my life to make this easier were:

MUTT : for email purposes
IRSSI : for IRC purposes
MDP : for presentation purposes
TMUX : for the sake of sanity.
LYNX : for web browsing purposes
GNUPLOT: for plotting purposes ( the text mode is useful)
PANDOC : for converting various files to text

These things have made life a lot simpler. The terminal also streamlines your thought process a lot. Though it is a little tough initially, since our daily experience is so enriched by graphics, after a while you begin to see the effects.

The Changes

First off you miss the desktop. With nothing to see, you need to constantly be aware of where you are in the computer and what you want to do. That said, it takes away a lot of distraction by removing the oh-so-good movies folder from your sight.

With the Desktop gone, the file explorer is the next thing that is missed. This bit of tech had us in binds, making exploring as easy as point and click. It does however mask some things in the file system which we would have been better off knowing.

The next thing which hurts us is the browser. Firefox, Chromium and so on are all graphical browsers. We have given all of that up for the sleek, powerful and ad-free environment of terminal based browsers. Keep in mind that the image displaying powers of w3m also do not work as the X server has been removed.

Images and videos do not work. Obviously! That puts a hitch in our world, sure does. VLC media player to the rescue. Turns out that VLC has a ASCII mode where it can display images and videos as painted by ASCII characters. Here is an example.

Best of all it works in the TTY too. That takes care of seeing a few short tutorials without the X server involved. Do take care though that you are using tmux when you are doing this as it takes over the TTY when a video is run, not allowing you to Ctrl-C it or close it in any other manner (AFAIK).

Another thorn in the side is research papers which are usually PDF files. Pandoc comes to the rescue by allowing us to convert between files. I use another utility called pdftotext.

When professors ask you to give a presentation on some topic, you begin to miss the old PowerPoint. In hindsight though, as Uncle Ben said "With great power comes great responsibility". PowerPoint makes it easy to obscure the message and deliver nothing at all. I have begun using MDP to provide presentations from the terminal itself.

With the browser gone, Gmail and Facebook has been cut off too. Although LYNX provides a decent web browsing experience, the lack of Javascript shows. For email I have started using MUTT and for Facebook, well let's just say that ship has sailed.

With the browser went the ability to use Jupyter notebooks. And so we come to our last program which helps ease the terminal life (pun probably intended). GNUPLOT let's us plot graphs in text mode, allowing us to get a general idea of what the data is like. Of course it is nowhere near the full capability of GNUPLOT but it gets the job done.

And so our life in the terminal is now rolling and work is so much more easier because of it.

Thursday, January 14, 2016

Prerequisites for Conditional Random Fields

When I first started out to learn CRFs there was a scarcity of material that I could consume. They were either too technical for my ability or were repeating what I already knew. This article hopes to bridge that gap and allow you to read material on CRFs. We assume knowledge of some math related to set theory and probability theory.

Factorization of functions is when a function is shown to be the product of some other functions.
Probability

It is the chance of something happening.

Graphs

A Graph is defined as a set of Vertices, Edges. The edges may be directed or undirected.
A Bipartile graph is a one in which the vertices can be split into two groups where members of a group are note connected to any other member of that group.
A factor graph is a Bipartile Graph representing the factorization of a function.

Graphical Model

Probabilistic Graphical Models are probabilistic models where a graph represents the conditional dependence structure between variables.
Two branches of graphical representations of distributions are common.

Bayesian networks

Hidden Markov Models (HMMs) and Neural networks are special cases of Bayesian networks

Markov networks/ Markov random fields

Markov networks may be cyclic and are undirected whereas Bayesian networks are directed and acyclic.

Markov property

At it's core the Markov property asserts memory-less-ness.
It says that each observation must not be influenced by the past ones.

Just as Capital sigma is used to denote the sum of a series, product of a series is denoted py PI.

Feel free to delve deeper into that body of knowledge before understanding what a CRF is. Understand what they represent before stepping onto this next lot.

A Conditional Random field is an undirected probabilistic graphical model. It is used to predict a set of classification labels for a set of inputs. Instead of considering a single input variable individually it considers the effect of neighbours.

A well worn example is the English sentence. Classifying the words of a sentence as 'verb', 'noun' etc while individually looking at the word is classification. A CRF would look at the neighbouring words too and thus predict the classification for the entire sentence together and not just one word at a time.

A CRF is a graphical model where the set of vertices can be split into two disjoint sets X and Y such that X is the set of inputs and Y is the set of outputs and then the conditional probabilities are modeled according to P(Y|X).

A very good example of CRF with python can be seen in this notebook (Jupyter Notebook).

Thursday, December 3, 2015

Why is Python a high level language?

High level languages are usually defined as languages in which one instruction translates to more than one machine level instructions. Though I find this definition simple and precise, I also find that a lot of students find it difficult to understand this definition.

A simple demonstration can be done in Python. First one must understand that any hardware comes with it's own instruction set. The hardware allows others to instruct it using an instruction set which is created by the manufacturer of the hardware. This is the basic instruction set and consists of things like ADD, SUB, LOAD, BIND etc.

In Python we may consider a hypothetical hardware which needs instructions called byte code to run. Now we see how a simple python program converts to multiple byte code instructions per statement.

First let us create a simple function in Python.
def my_function():
x = 1
y = 2
z = x + y
What this does is create a function that creates two variables x and y and adds them together before storing them in another variable called z.

To see the machine level instructions which must be carried out to complete this function we will add the following to the python file (possibly myfn.py)

import diss
diss.diss(my_function)

What we get as output is:

2           0 LOAD_CONST               1 (1)
              3 STORE_FAST               0 (x)

3           6 LOAD_CONST               2 (2)
              9 STORE_FAST               1 (y)

4          12 LOAD_FAST                0 (x)
             15 LOAD_FAST                1 (y)
             18 BINARY_ADD
             19 STORE_FAST               2 (z)
             22 LOAD_CONST               0 (None)
             25 RETURN_VALUE

The initial numbers in the first column are line numbers in myfn.py

Hence we can see the expected machine instructions to compute this function. Note that line 2 and 3 in our file expanded to 2 statements while line 4 to 6.

Hence we conclude that Python is a high level language.
For more details refer to the Python docs.

Wednesday, November 11, 2015

Statistcis of the 2014 General Election

The 2014 General Elections were very popular and saw the introduction of many new faces onto the political platform. Some interesting statistics from those elections and their results are shown here which might lead us to some interesting conclusions. To begin with, let us define our data set. The links used to provide data were:

With these links we had with us the election results of all candidates in the 2014 GE. Along with that we also had with us the number of criminal cases against approximately 1400 candidates.

Using DATA1 and DATA2 we create a list of candidates present in both the data sets. Thus we have ~1000 candidates. From this list we only use 3 pieces of information.

Candidate name
No of criminal cases against them
Number of votes they got

With that we obtain some interesting figures:

The minimum vote anyone received was 1
The maximum anyone received was 758482
The maximum number of criminal cases was 382 against Uday Kumar SP
The second highest criminal count was against Sridip Bhattacharya at 57
Minimum number of criminal cases were 1
Correlation factor between criminal cases and votes was 0.032

Some things of note are:

The correlation is weak. Thus it is ignored. In order for it to be relevant, it must be at least greater than 0.1
Real criminals are still contesting for public office

For the time being the 2014 GE show nothing interesting statistically for me to discover at my level. I will consider it interesting the day the covariance crosses 0.1

Comments and similar efforts are welcome.

Sunday, August 30, 2015

PyTongue - OSFY article

PyTongue
========

Teaching programming with non-English languages
-----------------------------------------------

Title                     : Teaching programming with non-English languages
Author                 : Arjoonn Sharma
Target Audience : People who teach programming at various
                               levels and general programming enthusiasts

Any program which must be run needs to be converted to a string of bits in
order to run. This has been true ever since the inception of programmable
computers. If this is the case, then all we need to do is write the proper
string of bits to make any program. Then what do programming languages do?
After all we are not writing bit-strings but are writing text based programs.
The bit-strings are all that a computer needs to run. The programming languages
exist to help us make sense of what we have written. This means that
programming in a language of our own is very much possible. All that is needed
is a method of translating the language to the appropriate bit-string. This is
where PyTongue comes in handy. It lets you teach people who do not know the
Latin script how to program.

For the purpose of teaching programming to children who do not know English I
developed PyTongue. Thus I can teach them Python without having to teach them
English first. Python3 was the choice of language as it has nice Unicode
support throughout and Python by nature has a small learning curve. After a
while people have to learn English as it is the language of the trade. This way
however I can make sure that a child who has the potential to program does not
miss out simply because of a language barrier.

PyTongue has simple logic powering it. It is a transliteration service to be
precise. It takes a program written in one language and transliterates it to
normal Python code which is in English. The new code is then executed.

To get started with PyTongue we must first install Python3 and then install
PyTongue. For this article Ubuntu 14.04 was used.

First we make sure that Python3 is installed. Next we create a directory
`pytongue_folder` and download PyTongue in it.

--------------------------------CODE-----------------------------
sudo apt-get install python3
cd ~
mkdir pytongue_folder
cd pytongue_folder
git clone https://github.com/theSage21/pytongue
cd pytongue
--------------------------------CODE-----------------------------

In case the `git` command shows an error you can download the `zip` file from
the same link and unzip it. The effect is the same. We need to have a folder
called pytongue. After that we navigate into the folder with

In order to write code in a certain language we need to get the mapping for it.
Some mappings like Hindi, Russian etc are already provided.
Mappings are obtained by running `pytongue-mapgen hi` for Hindi and
so on. The two word short forms are from

A mapping is a word-for-word translation of the basic keywords and builtins
in Python. To create a mapping we need to create a file with the required
language name and write a JSON encoded dictionary having the general structure
of :.

This creates a mapping for the requested language. While it is downloading the
language it prints out the keywords mapped. After this we begin to write a
program. For the purposes of demonstration we will first write a simple 'hello
world' program and then a simple calculator in Hindi.

To write these programs any plaintext editor will do (gedit, vim, notepad etc).
The first line of the program must always be of the format
`# `

Program1: hello.py
--------------------------------CODE-----------------------------
# HI
छाप('हैलो दुनिया')
--------------------------------CODE-----------------------------

This prints out हैलो दुनिया'

Program2: calc.py
--------------------------------CODE-----------------------------
# HI
क = पूर्णांक( इनपुट('एक संख्या दर्ज करें'))
ख = पूर्णांक( इनपुट('एक और संख्या दर्ज करें'))
कार्य = इनपुट('क्या करना है दर्ज करें (+,-,*,/) : ')
अगर कार्य == '+':
    छाप(क + ख)
अगर कार्य == '-':
    छाप(क - ख)
अगर कार्य == '*':
    छाप(क * ख)
अगर कार्य == '/':
    छाप(क / ख)
--------------------------------CODE-----------------------------

This prompts you for two numbers and then an operation to perform on them. It
prints out the result of the operation on the two numbers. The translation
created for this program is:-

--------------------------------CODE-----------------------------
a = int(input('Enter a number'))
b = int(input('Enter another number'))
op = input('Enter the operation:')
if op == '+':
    print(a + b)
if op == '-':
    print(a - b)
if op == '*':
    print(a * b)
if op == '/':
    print(a / b)
--------------------------------CODE-----------------------------

How would someone know what words to use? That is done using the created maps.
The maps are stored as JSON in the languages folder. They are thus editable by
hand. Hence if some translation seems strange to you, you can simply edit by
hand to make it more to your liking. In order to know what a particular
function is called, you simply need to open the particular language file
(`$ gedit ./languages/RU` for russian) and look up the particular
translation pair.

After writing the desired code, we need to run it. Instead of the traditional
`python3 hello.py` command we would have issued, we run the shell script called
`pytongue.sh hello.py`

That is all. We have successfully written python in Hindi. For other languages
similar procedures follow. To sum up, we need

1. Mapping of the required language
2. Source code with first line as `# `
3. Run the program with `pytongue.sh .py`

Have fun with the software and let those who may become great programmers
experience programming even without knowing the Latin script.

Wednesday, August 26, 2015

openjudge

Recently I made available on PyPi, openjudge. This post is about that library.

Programming contests are popular among engineering colleges in India. Even in other colleges, wherever there happens to be a Computer Science department, a programming contest is sure to happen at least once a year.

Attending various contests in my graduation, I fell in love with them. There was no influence, no underhand tricks. Your code spoke for you. In the University of Delhi, there were contests happening left right and all across. Over the course of two years I began to like programming contests.

The cracks began to appear only once we hosted our own. Checking code was a pain. The accepted method was that contestants were asked to write code on paper for the preliminary rounds. They were simple programs and were designed to be easy to check. This was a huge bottleneck. The main rounds were held after about an hour of the preliminary rounds happening.

In the main rounds the institute provided machines and a compiler (sometimes an IDE) for an already declared language. That was a big handicap for us. We had to send teams proficient in multiple languages. This made life horrible and sometimes limited contests for us.

Then came codechef.com

This website had removed everything we had resented and gone on to produce the perfect mechanism for hosting programming contests. Using the codechef platform however required authority from the Principal and so on. I did not bother.

Being bitten by the 'roll-your-own' bug, I decided to build my own judge. Hence openjudge.

First I had to set up an interface. For that I learnt Django. Then came communication over sockets. After that the problem of multiprogramming and sub processes.

Recently I am struggling with the ability to run these programs in a sandboxed mode. It is an experience which allowed me to grow with the project.

The software now boasts of a capability of handling ~200 participants at the same time. The format we now follow is:

Participants get their own machines. If they don't we provide and they cannot complain.
No language bars. We support all major languages.
A single round of competition. No preliminary and finals. One big code fight
A leaderboard for everyone to know who is winning.

Here are a few screenshots.

Tuesday, June 2, 2015

Listening to paintings

Is beauty dependent on the medium which transmits it? Is a beautiful painting itself beautiful or is it the representation of something which is beautiful? These questions come to mind often when seeing "A thing of beauty".

If one assumes that beauty is in itself an independent thing then one can assume that beautiful things may be translated from one medium to another. Can my visually impaired friends one day see a beautiful painting and understand it?

Introducing audio sight.A piece of software which I wrote to let people without sight experience paintings. A painting goes in one end and an mp3 file comes out the other. How cool is that? The idea was to translate beauty from one medium to another. In a minute we see how that is done but first we have a look at some samples.

That was Vincent Van Gogh's Starry Night converted to sound. It made some sort of sense, although if you look at the things I had obtained in earlier versions of the software you would be very surprised.

Now for the method.

A simple enough task, given that the picture is already digitized. First we re size the picture to smaller dimensions in order to have some sort of melody going on.

After that we start taking one column at a time from the left side. Each pixel in the column dictates what note to play and how long to play it.

With that as we proceed from left to right, we obtain a melodious sounding audio file, which is dictated by the painting and not just something random.

Is beauty dependent on the medium?

The audio did not sound very melodious to me even though I am no art fan and so I conclude that the beauty of a painting lies in the beholder of the painting itself.

Thursday, May 21, 2015

Trai and their blunders

Once TRAI had published the list of emails(read about it here) it had received along with the addresses I was very skeptic of what measures would be taken to correct this blunder and how effective they would be.

After a little time TRAI decided that in order to discourage Spam bots from using the list as a source, they must do something. A decision with it's heart in the right place. Then came the blow to my intelligence.

The measures TRAI undertook was to replace @ with ( at ) and "." with (dot) in every email.
This was unexpected. If you have ever typed into GMail any address and performed the same replacements you would notice that it does not matter if you use @ or (at).

Another thing of note. I expected it to be relatively easy to extract the emails from the website and compile a list of them. Thus I sat down with my friend's 2G Internet connection on Aircel and began to download the web pages containing the emails. There were 18 we pages of note which contained the emails.

With this in mind, I fired up Vim (text editor) and began to type out a python script which would do the extraction for me. An easy enough job and after letting it run for 192.55 seconds (I timed it) I had a list of 8,90,537 emails. Not quiet the 1 million as claimed but substantially close.

All in all the efforts TRAI made to keep our data private was commendable even though it only took a student with a slow Internet connection and a little knowledge of Python to extract the emails.

As expected my email was also within the ones found.

Thursday, May 14, 2015

Programming in multiple languages.

English is a language that has been the language of progress for a long time. A lot of the best things in the world have come forth from English speakers.

For people who do not natively speak English, programming is a thing which is subject to first learning the language of the world. For a lot of people that is a big step and not always such an easy one if their native language has a different structure along with a different script.

PyTongue to the rescue!! This little piece of code makes writing programs in your native language a lot easier by providing transcription services for python. Sadly the entire ecosystem of python is not supported but on the bright side people can understand the code being written.

For example:-

# HI
छाप('नमस्ते दुनिया!')
के_लिए i में range(10):
छाप(i)
जब सच:
छाप('नमस्ते दुनिया!')

This is code written in Hindi and produces completely understandable results. The second last line evaluates to While True: and so produces an infinite loop (there is no running from those things even in a perfect world.)

For proof of concept I tried out Arabic and Hindi programs and they ran flawlessly on my computer. Although I have pretty much no idea what the Arabic program says, I am sure it makes sense.

Since I know Hindi, I know that the program makes sense, although is a broken manner. This should not matter as constructs in programming languages are made up of only a handful of expressions. The loops are in a specific format and so are the conditional statements.

The downside is that since the software provides only Transliteration the generated programs still have to make sense only in the English way of sentence construction. The program in Hindi makes absolutely no sense in Hindi grammar but is still more readable than a program in English.

Also the programs need to be run; which indicates an interaction with the terminal/ OS in some manner. This itself indicates a point of failure as the OS is English supporting most of the time ( I know of no systems which have commands in other languages. So `ls` still remains `ls`). Sigh, some day I am going to make a language which is language independent. May that day dawn quickly. (I know about Lisp so do not even start. :)

Wednesday, May 13, 2015

Corruption: India's best friend

Why I can say this.

After a sequence of years unearthing scandalous corruption scams one after another, I wanted to know if India would ever be corruption free. Wishful thinking says it will and pessimism says it will not. Those two imposers are not what I put my faith in. I only believe what is backed by data. Since data is only available from the past I had nothing to work with.

With that being the start of this explorers trail, I began to look for methods which might help me satisfy my curiosity. Finally it hit me. I simply needed to simulate populations. With that idea and Python in hand I began to write code. This article is about the code, the results and the idea.

--Update--
After running about 160 more simulations I discovered that in cases where the police have a pay at least twice as great as the bribe they receive, corruption does not spread to the entire society and is limited in all cases to less than 60%.

--End of Update--

The method of evaluation

First we create people who will populate the society we want to study.
We give people some characteristics

They are all born with "initial_money" number of coins.
They all have a value "stoicity" coming from the word stoic. This is a measure of how honest they are.

In case that got you thinking, we select the "stoicity" values such that if you would plot the histogram it would result in a Gaussian distribution.

In order to study behavior we must have behavior to study. Hence we add some more attributes to a person.

Any person may be "police".
Any person may be "criminal".

How does this society operate? Everyone goes about saying hi to everyone else( in a round robin tournament fashion). Whenever two people meet:-

One of them is policeman

We ask the other person if they want to bribe the policeman?
If he says yes we ask the policeman if he accepts?

Both of them are policemen

We randomly ask one of them if they want to bribe the other?
We then proceed to ask the other if they accept the bribe?

None of them is a policeman

We randomly transact a random amount of coins from one of them to the other.
This is to provide a statistically even distribution of people earning money.

With these things in mind I conducted some simulations with parameters of interest. Parameters being how much to bribe, how much is the punishment of the criminal etc.

Source code

All the source code for this was written in Python 3.4.0 and is available on my github repository.

Results of interest

All graphs are fractions of population. Hence if policemen is shown at 0.8 it means 80% of the population is policing by behaviour.

During the first simulation run I found that despite some scenarios where criminals themselves died out, corruption itself never died out. The people who were accepting and giving bribes became policemen. Overall the policemen dominated the society.

Other simulations also provided similar results. (Most of the plots are available on my github page.)

No matter how we reward policemen and how we punish crime, once more than 50% of the population is indulging in bribes, police or not, bribing quickly saturates in the population.

For those interested, here are some more graphs. (I will be posting more as soon as the simulations keep on completing.)

Wednesday, May 6, 2015

Computer Vision with Python and openCV

Of late I have been obsessed with computer vision. This is in part due to my ambition of creating my own butler and the 3d scanner project. What this led to was a long and extensive study of the mathematics involved behind computer vision.

After some days of searching I discovered the git repository of OpenCV. A wonderful library full of interesting mathematical features and so on. Since there was no simple pip install as is the case with most non-trivial installations, I spent quiet some time building and installing this piece of code.

Once installed I was at a complete loss of knowledge because every possible documentation was for C/C++. I could not find any(partly because I was not using google. I use duckduckgo.) After a while I did find some documentation and it was quiet fun.

About half an hour of understanding the math and finally moving on to the code I began with getting the webcam feed to show up.

import cv,cv2

def get_live_feed():

    window=cv.NamedWindow('live',0)
    #calibrate the camera
    #required to adjust for lighting
    for i in range(10):
        img=cv.QueryFrame(cam)
    #capture and show the feed
    while True:
        img=cv.QueryFrame(cam)
        if img!=0:
            cv.ShowImage('live',img)
        c=cv.WaitKey(10)
        if c==27:break
    cv.DestroyWindow('live')

if __name__=='__main__':
    get_live_feed()

With this I had a live feed working.Now came the part where I had to detect my face in the frames obtained. Hence with a few documentation snippets and code from here and there I had the following.

import cv2
import sys

casc = sys.argv[1]
faceCascade = cv2.CascadeClassifier(casc)

video_capture = cv2.VideoCapture(0)

while True:
    ret, frame = video_capture.read()
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    faces = faceCascade.detectMultiScale(
        gray,
        scaleFactor=1.1,
        minNeighbors=5,
        minSize=(30, 30),
        flags=cv2.cv.CV_HAAR_SCALE_IMAGE
    )

    # Draw a rectangle around the faces
    for (x, y, w, h) in faces:
        cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)
    # Display the resulting frame
    cv2.imshow('Video', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# When everything is done, release the capture
video_capture.release()
cv2.destroyAllWindows()

That led to the following video being created. The vision is still far off from what the Butler must see , I will probably teach it to recognize other objects like keys etc. Also after face detection comes the task of face recognition. Expect a post soon on such a topic.The thing is a little off but works fine generally speaking.

Saturday, May 2, 2015

3D scanning with blender and python

For my physics project this year I arrived on the conclusion that I needed to make a 3d scanner. It was a gutsy move since I knew that if I became committed to this I could not buy this anywhere in the market and would have to build it myself. This burnt all the white flags I may have had and made sure I made my own project. Let me tell you it was horrible. What I had in mind was something awesome, what I obtained was something organic. Graciously Anurag helped me with this herculean task.

First was the problem of the laser. It was too damn expensive. I bought a laser diode from a friend and quickly burnt. What I did not realize was that intensity of the laser depended on the current provided and not on the voltage.

Then came the problem of making a line laser out of a point laser. The first method I stumbled upon was to use a rotating mirror placed in front of the point which would cause a circle of laser light. This failed miserably as the required RPM was not met. Then I came upon the idea of using a cylindrical lens. A glass stirrer cut to size( not cut in prototype 1) was nimbly attached to the front of the laser pointer and lo behold we had a laser line.

Now to tackle the problem of holding my mobile phone upright. This was indeed a messy one. After some time I gave up and replaced it with a friend's DSLR. Problem solved. The camera now sits perfectly on it's own body.

The rotating mechanism was simplicity itself and was simply too easy to build. A little DIY(or jugaad for that matter) and we had a rotating pedestal.

What came next was the mathematics. After digging around a lot I still could not understand exactly how this thing was supposed to work. Then came a moment of truth and everything was a walk in the park. Using blender I managed to extract frames from the video and ended up with about a thousand frames to work with.

Next came the cleaning of the frames. A simple blur, increased contrast and grey scale conversion gave me a very good image of the laser. Then we selected the brightest point in every row of the image and marked it as the laser line.

Next came the reconstruction. With a simple python script I managed to get the cylindrical coordinates of every point in the picture. With the knowledge that the object was rotated 360 degrees and with the assumption that the rate of rotation did not change much I recreated the scene.

A collection of points was created and saved as a scene. This was then put into blender to create a 3d representation of the object which was very very wrong. What had happened was that the glow of the laser had reflected off the laser and created data points where there should have been none. This created a scan which had a lot of errors.

I later realized that I was calculating angles in degrees and python implicitly(any good program) uses radians. Recalculating the slices led to a new plot which fared a lot better than the previous ones. A lot of the reconstruction was noise but I could make out the nose, and ears of the Buddha statue. It was a magical moment.

Finally Anurag scanned another object, an emergency flashlight. The results were good and funny at the same time. The flashlight had a small volume and so the point cloud was very dense. During the scan the strap attached to the flashlight was also scanned. It was pleasing to note that the scan reconstructed the strap too. Due to the dense point cloud it was not easy to make out the rest of the geometry of the flashlight.

To see the geometry we moved the point of view to inside the flashlight and could see the objects clearly.

The next problem to be tackled was the problem of mesh regeneration from the point cloud. The problem was that our cloud had non uniform density. This led to some algorithms being discarded. Ball Pivoting Algorithm and Poisson Surface Reconstruction are what got my eye. Will be writing about them soon. All the source code is available on my Github Page.

Tuesday, April 14, 2015

Python. Savvy?

In 2010 my friend Aditya Duggal called me up and said "Have you seen this yet? Its called node.js". Aditya calling me up was rare and do I began to look in to nodeJS. As it turned out I was not a good enough programmer to have a go at NodeJS just yet but the underlying word called Python caught my eye.

What language was this with a snake for it's symbol? What sorcery had forced Aditya to love it? From a man who worshiped C to a man who liked Python Aditya had changed. I wanted to know why?To be truthful the hype got me hooked. What language is this that seems to pop up everywhere I look. I have not looked back since. More than the language I believe it is the community of Python which fuels it's growth. There is nothing more addictive than to have people who listen to you with rapt attention and solve all your problems. When the time comes to give back you no longer have to; you want to. Quickly learning the syntax and loving the fresh perspective to writing code I began to love python. Slowly my side projects started to shift to Python from C.

The next stone was the Cheese shop. It became clear that code in python unless carefully written grew large. With the ability to speedily write code comes the ability to write horrible demonic code. I first used the cheese shop when I began to look for ways to graph in python. Instead of writing my own I discovered the cheese shop. From then on I have rarely had to write my own code for anything. There are of course the odd projects which need a lot of custom code. With the Cheese Shop at my disposal work became easy.

There was a period of learning which accelerated my admiration and knowledge of the open source movement. This was when I discovered Internet Relay Chat. I was watching Fifth Estate and became curious. What were these people chatting on? It was most certainly not Facebook. It also looked cool. Hence I began to hunt around. IRC was found. The very next minute I was configuring Irssi and moving onto how Irssi works and how to chat in IRC. Initially it was confusing but slowly it all came together. It was beautiful. People I did not know and people who were famous developers. I could now talk to them.

Believe it or not the next big thing was GitHub. Strange as it seems I had not heard of version control at all before Git. GitHub was a boon. Code was easier to experiment with and people were easier to collaborate with. I no longer had to worry what would happen if my computer decided to crash and burn. With github I could easily collaborate and contribute. Up until then I was simply a leech on the goodwill of the community. With github I could help others.

Since Github I have not looked back. Python has now become my primary language of development. There is literally nothing I have not done with python. To anyone new to programming I recommend python due to it's relatively small learning curve. When one can write anything one wants in a language there comes freedom of expression. With freedom of expression comes the ability to create. With python I have had that ability. I can see things happening before I have written them.

Most of python's diamonds are in the documentation. If you are beginning to learn python and want a starting point I would suggest the documentation. The docs are what survive after nights of distilling information and processes.

Enjoy python while you can. Who knows how long this ecosystem will remain this good. We seem to have destroyed every other one. A great place to learn stuff is the IRC channels of python and the specific libraries it has. Usually I hang out at #Python and #django. Those are the ones I like. Lots of good discussions going on there. Since it is a chat it is more dynamic than the docs.

For a young and upcoming python developer, the three places to learn and github,docs and IRc.