Wednesday, February 22, 2017

When Needed

Hear me slip these words,
into the folds of your mind.

For you in years hence,
like money in trousers, to find.

Calm a panicked heart,
while the winds howl around.

With these words now forgotten,
in tempest, strength bound.

Friday, February 17, 2017

Vehicles in Rajasthan

I was perusing once more and my thoughts turned back to my home state of Rajasthan. A quick search revealed that the site does indeed host data on Rajasthan (see this).

We take up the motor vehicle registration data provided on the site and immediately discover that like all the data on the site, this dataset is not prepared for consumption. We shall have to perform some data cleaning.

A few Excel copy pastes later we have a single file which I've put up on my Google Drive here.

With that I loaded this stuff into a Jupyter notebook and got this neat plot as the end product of some nifty manipulations.  The entire notebook is available here (I'll upload this soon).

First let's take a look at the New Registrations in every vehicle Type over the years.

Everyone seems to be buying two wheelers in Rajasthan! Cars and Tractors are the next big wigs but they stand no where close to where two wheelers are.

 Irrespective of vehicle class, there is growth in motor vehicles in Rajasthan as can be seen in this Point plot. We can see that what has caused the explosion in vehicle numbers is within the non-transport class of vehicles.
We can see that it's just one outlier which has caused this boost! What is that? We know that it has always been high in registrations. It's probably the motorcycles. It's curious that the total shoots up so quickly in 2012 but the growth is generally increasing. Let's see if the data has mistakes in it. We plot the total counts reported over the years and then we plot the cumulative sum of the new registrations.
That's strange.

New registrations for a year should contribute to the next year's total. Since we don't have data before 2005, we should expect the difference between reported numbers and expected numbers to be constant.

That difference changes. It changes abruptly in 2012 going from massively underreported to over reported. An effect of the Aadhaar card wherein people suddenly started reporting unregistered vehicles? Perhaps vehicles came in from other states. I don't know if that is counted as a new registration or not. It's either that or our government is bad at reporting / collecting statistics.

Thursday, February 16, 2017

For you

Moving mountains, parting seas,
Deeds that lovers please.
Wooed love, admantine.
To be loved beyond dying.

That love, is for the strong,
Like life, lopsided; mostly wrong.
Fighting evils within the self,
This love I feel is a saving grace.

I won't kill dragons to win your affection,
Nor shout to the world my mind.
I'll hand to you, with shaking knees,
my being and a knife.

Wednesday, February 15, 2017

Attendance in St. Stephen's College

I've been hunting around for datasets I can relate to for some time now. This fine evening I arrived on the conclusion that I could simply use attendance data from the website of St Stephen's College.

The complete analysis notebook is available as a gist (here's the gist).

Collecting the data was a matter of inspecting how the website was obtaining it's data. A company has been handling our attendance since the beginning it seems. (GreenClouds site link). As far as I'm concerned they've made a mess of the data transfer as I can see. Perhaps it originated in the college itself, perhaps it's due to some mistakes on the company level. Whatever the cause the data format is a mess.

I open up my console and spin up a Python script to get the data from the college website. Takes about two minutes on my Internet connection to collect all the information.

We then proceed to put it in a nice tabular form with columns being Name, percent LA, percent TA, percent PA, admission_year, course.

Things are now ready for graphing.


We go on to make Box plots. If you were to line up people in order of their attendance you would get a box plot. The person in the middle of the line is the line cutting the box in the middle. It shows that there are an equal number of people on either side of the line.

You cut each part of the line in half again and you get the bounds of the box.
This one is a gem. What is up with Chem! The attendance of their Lectures is absolutely ridiculous. While everyone more or less has the same attendance in a course Chem people are all over the attendance spectrum! Then again, their Tutorials box plot is neat too!

Yes. I see it too. Math people are consistently in class with a few people as outliers. Nobody else has this kind of spread. 

Perhaps a better way of seeing this would be violin plots where you get to see the density of the people instead of simple boxes. The thickness of the violin denotes how many people people are at that point.

For example we can see that a lot of the math  people are high attendance junkies and a substantial amount of CHE, PHY, PCH and, PCS are low attendance lovers.
There! The average PHI student absolutely does not give a hoot about attendance! The PCH and PCS programs do every class together as is apparent from their spreads. ENG and HST do compete with PHI but PHI does take the cake.

Let's move on to the simplest of all graphs. The Histogram. No explanation needed here.

Looking at the histograms we can see that almost everyone attends class with the exception of a few people who absolutely don't come to class. Perhaps people who left the course and the administration just did not remove them from roll calls? What about experience?

We do a regular point plot to see change in attendance over months. The vertical lines along the points denote a 95% confidence interval.

The third years have found that perfect balance of leisurely attendance during most of the semester and then picking it up during the last months. First years exhibit that trait of a "नया नया मौलवी" perfectly. Second years have figured out that attendance does not really matter, but are still to find out how badly those marks can affect their marks. With time comes wisdom,

That's it for now. I'd love to do some more if someone can come up with data at a lower granularity.