Tuesday, December 26, 2017

Moving the Blog

This blog has been moved permanently to arjoonn.com/blog

The existing articles will remain here for archival purposes. New articles will appear at the above link.

Monday, December 4, 2017

Systematic Povery in Education

A while back I found in my hands a list of prices that CBSE books sell for. A curious little dataset, for what could you answer using this information? Well, a simple question one might ask is how expensive is education in the various classes of CBSE? One gets this chart from the data.

Almost all books a re cheaper than 250 rupees. Although this chart says nothing about the number of books needed in each class to clear the exams set by CBSE. I remember having to purchase 4-5 books. What about the sum of prices in each class?
As expected, 11 and 12 branch out into multiple streams and so have a significantly large number of books, which when multiplied by their prices puts the entire set out of reach for a lot of people.
A man below the poverty line mentioned to be INR 32 a day (IndiaToday) does not earn enough over a year to be able to afford all the books in class 11 and 12. 

Besides school fees / clothing / rent this expense is critical if people must have books to learn. Even if you allow for the segregation of books equally into the main streams of learning (science, humanities, commerce) you are still left with an expense of ~ INR 5000. This is no small amount for someone earning INR 32 a day. It would take almost three months before they can even accumulate this amount.

It would seem that a system which makes access to education difficult in proportion to the need for education will only breed ignorant citizens. Although the CBSE has done tremendous work in making the books this cheap, there are still a ways to go.

If you want to see how these conclusions were reached you can take a look at the code on Github.

Tuesday, November 28, 2017

Building on Sand: Software without theoretical backing

I've been working in a corporate setting for seven months now. It's a different setting to be clear than what I have been used to in college and otherwise. There's a stark difference when we compare the two settings. This post is a discussion on what I found those differences to be and why they are important.

First off, in academia writing code is usually the last step in building anything. The way we usually went about it was to first define the inputs and outputs to the system. With that in place we defined the constrains and expectations from the system. Perhaps something needs to run in under a second, or perhaps we require something to run with a RAM cap of 500 MiB and so on. The constrains are the harshest we can set without inching onto the territory of premature optimization. We cannot allow ourselves the luxury of powerful hardware since most of academia is penny rich. Academia thus suffers at times because of people's inability to define the problem in academic terms.

In industries it seems to be different. The output is usually defined in simple terms without the formalism of academia. This lets it change as per customer need. The downside is that it also allows people to abuse this ability. With a hazy target you can get away with changing our goal to state that your system is working. Since the industry is usually richer, it allows sub-par algorithms to be used in production and letting the hardware pick up the slack. 32 / 64 GiB RAM systems are commonplace especially since bad algorithms are everywhere and RAM is cheap. Inputs to the system are usually defined well since without that code cannot be put down.

The major difference is the hurry to put down things in code. This is the part which begets the title of this post. Algorithms and data structures are the bread and butter of any software company. Writing code without being clear on the algorithm is a bad practice. Of course one need not write algorithms on paper before writing the code but one must always think of them before writing the code. An important quality arises from this kind of thinking. You are able to clearly argue about the structures that your code really needs and about the ins and outs of the system. Once you begin to write code, the algorithm gets lost in the implementation.

Often times in industrial settings one hears the term "academic" with a condescending tone behind it. It is rarely the case that the word is associated with reverence. I find that to be disturbing. Those who prize practical knowledge above theoretical knowledge must at least be able to boast of being as good as the "academics" in implementing things. It is after all "practical" knowledge that they seek. In all my time writing software so far, I am yet to see someone who writes beautiful code without knowing the theory behind it. Theory is that tool which lets you kill problems before they arise instead of waiting for them to arise in practice and then dealing with them. Theory is what distinguishes a normal system from a civilization level system like Google search. I won't talk more on this as more accomplished minds have spoken on this topic.

Another itch I have to scratch with industries is their inability to distinguish features from distractions. With the ability to create software comes the enormous responsibility of deciding what to create. When you don't consciously question the objective of your creation you run the risk of creating useless things. Since your time is a limited resource you have successfully spent time on things you will not use ultimately.

Building software is to be treated with as much care as building rockets. Simply because it is cheaper for you to build it does not mean you can adopt evolution as your software development strategy. When you build on sand, don't expect to build castles.

Tuesday, October 10, 2017

Topics Discussed in the Indian Parliament

A dataset was released on Kaggle regarding the questions and answers discussed in the Rajya Sabha.(dataset link) This obviously lead to some interesting questions being raised. For example, have we started to ask different questions over the years? Have our priorities changed?

The complete table can be found at the Kaggle Notebook along with it's code.

Friday, October 6, 2017

Youtube with only Audio

The bliss of working with beautiful music is too good to be missed. However! Whilst under the mercies of a slow internet connection (perhaps a precious office resource) one tends to sacrifice the heavenly nectar.

Not on my watch!

A few minutes of ear shattering silence led me to find youtube-audio. This beautiful plugin allows you to only stream audio from YouTube turning the once bandwidth hungry site into a lightweight music streaming service.

Saturday, September 30, 2017

Evolving Neural Network Weights

Neural networks offer quiet a large repertoire of tasks that they can accomplish well. With the increasingly smart solutions people come up with to squeeze out performance with smaller networks(squeezeNet ,ReLu to name a few) it's likely that neural networks will dominate the Machine learning scenario for some time to come.

There is one limitation however. Current implementations of learning mechanisms ([1], [2]) are mostly gradient based. This limits the loss function to being at least differentiable, if not double differentiable in some cases. There are alternatives like reinforcement learning and genetic search but those have not been as heavily invested into by the community as the gradient based methods.

This post explores genetic search as a method for finding the appropriate weights for neural network architectures. We illustrate the basic principles and follow along with some basic code.

Let's look at the XOR problem. One of the classic problems that Minsky famously pointed to in the Perceptron book[3]. The problem is simply to compute the XOR function given two binary inputs. While a single perceptron is unable to do this, a multi layer perceptron is able to perform this computation.

We use a simple 3 layer neural network and sigmoid activation as done in the classic days of neural networks. Instead of using the wonderful powers of Theano, tensorflow or torch to perform automatic differentiation in order to use gradient based methods, we are able to drop those architectures from our program entirely.

We use numpy to make the matrix operations a little easier to read. The first part of the code imports the relevant libraries and defines the sigmoid function to be used later on.

We then define a roulette function which selects elements from a given array with a probability proportional to the magnitude of elements in the array. This is present in this part of the code.

The next bit defines the forward calculation of the network given it's weights and input data. Besides that it also defines a function which calculates the "score" of a given output from the network and the output expected. Here we ask the genetic search to maximize the ROC AUC score.

The get fitness function is another building block which makes use of the functions we have already defined. It takes in a network configuration and returns the "fitness" associated with this configuration. In genetic search terms this network configuration / network weight list is known as the "gene".

To employ genetic search in it's classic form we write two function which emulate gene crossover and mutation. This is the bread and butter of the genetic search algorithm which allows is to move around in the search space.

The last of our function definitions defines the "main" function which evolves network weights to fit on a certain data set that we provide. This data set is the XOR data set that we manufacture using numpy in the later sections of the code.

All that is left is to set the various configuration variables for the genetic search and to call the evolution function with these variables. We see that genetic search very quickly finds the "correct" weights for this neural network. It does get lost sometimes for very long periods of time but that is simply because we have implemented a vanilla version of genetic search.

The entire code is available at https://github.com/theSage21/evonet.

Friday, September 22, 2017

BookReview: Sialkot Saga

This book caught my attention on the railway station book stall as I almost boarded my train. That particular shade of red always draws my eye, as it had previously with Bhim. Particularly lucrative was the thickness of the book which promised to fill the dull hours of the train journey with more interesting scenery.

The work itself was an easy read, the language simple and natural although sometimes I wondered if the characters in the books would actually speak in those terms. Indians have their own set of slangs which could have been more faithfully rendered.

The story itself is engrossing, as all stories filled with crime are. Being illiterate in the current affairs of our country I struggled to put together some of the references of the crimes. Some were fresh in my memory and needed no dab of paint to spring forth, new again from the folds of my memory.

The book felt all too familiar to my world. I could have written it. I would have written it in the exact same way too, since I knew nothing of the streets of Mumbai or Calcutta for that matter. All I've known is what I've read and most of my reading has been in English and by foreign authors.

Towards the end the book took on a fantastic spin of science fiction and hand waved away a lot of the things as being 'quantum' in nature. I must say that good science fiction resides in pockets where science does not know the answer yet. If instead a book tries to claim fiction in a field science does know, it is merely fiction. No grudges held however, seeing as how the book was indeed called fiction.

Reading this book sparked off a strange thought. Books which reinterpret Indian mythology / history and leverage the idea of 'lost science' have become all too common since Amish wrote his Meluha. There's been a spate of these books, all reinterpreting classics with more 'lost knowledge' and better explanations. It reminds me of Hussain Haidry's poem this independence day. The authors too have picked up the saffron fever.

Thursday, April 20, 2017

Meenamkulam Beach

I walked to the beach today. Quiet a lovely experience. Here's the path I took.

I had begun by stitching a phone holder onto my bag but that turned out to be an exercise in futility since the phone was touchscreen and so kept getting "touched" by the bag/cover leading to mayhem.

Perhaps I'll use my Raspberry Pi to document the journey next time. It would certainly be a lot easier.

I started off during the evening. Maps gave the estimated time as 1.5 Hours and so taking into account my walking speed and the various elevations in the land, I had estimated 2-2.5 Hours to get there.

At first I was a little unsure of where I was going since the route looked quiet different from the Google Maps route. Those nearly straight lines? They are not nearly as straight as it seems. There are also elevations and flyovers adding to the confusion of an inexperienced walker.

During the last legs of the journey I did find confirmation that I was on the right track by way of seeing the water over the horizon of the road.

I might try going to St. Andrews beach next time, though I am told that the people there are not friendly to the passing admirer.

Wednesday, April 19, 2017

Conversations With Fire

I wrote to you and You,
wrote back to me.

Arose a tide of overnight love;
three years in the making,
Bewildered, scared; unprepared,
I sat with my heart pounding, aching.

All this while seeing things as I do,
Not seeing, the sight You had too.
I approach to bask in Dragon flame,
the inferno itself, knew me only by name.

Holding on to, what of You;
rebuilt from memory, I now remember.
Unnoticed; until I do,
my every act seeks You,
like ash seek'th ember.

Zeus's spark, when I approached,
For evermore to stay near the blaze,
My blizzard bitten bones to blame,
the blessed ignited, broke the gaze.

So I'll stay beyond your silent fence,
Throw a rock your way, now and then.
in hope that you'll find the ore;
sometime years and decades hence.

When you do; look beyond your moat.
You'll find a man, still strung to the name,
Held by death, or still drawing breath.
warm from; to your flare, his claim.

A claim he lays, now steeped in longing and desire,
to conversations he had once held with Fire.

Thursday, March 16, 2017

When You Are the Problem

When you begin to hear what the people around you are saying and perhaps begin to arrive at the conclusion that maybe what you are doing is not right and that maybe those who taught you everything in life are wrong. Very wrong about quiet a few things.

I am the problem. It's a devastating blow when you arrive at this conclusion on your own. The rot in society that I have wholeheartedly cursed is in me.

It's a strange position to be in. You begin to look back and try to look for the source of this behavior as if trying to blame someone/something else for it. The first stage is always disbelief. I refused to accept that I was wrong on a moral level.

Slowly as you notice the evidence of your nature, you begin to lament since you have become the demon you condemned so little a while back with so much empty venom. I like to compare life then to that of Ravan. All of your thoughts are suddenly arriving from heads that you do would not like to call your own. In situations the first thoughts you have are those you need to condemn. It is hell, or as close to that as can be. Knowing that you are the problem.

Over time though it eases off. You begin to notice that you no longer do the things you used to do. You now have an understanding of the nature of your thoughts. That you can and probably will be wrong in the future. You catch yourself asking again and again "Who am I to say so?". I was inevitably drawn to science due to this. There was a safety in knowing how much it is that you don't know and how sure you can be of certain things. Yet you demolish your own thoughts in the light of this finding. You begin to rely on surer things like logic.
You begin to notice the true nature of your memories. They are fickle shape shifters constantly changing to appease the current you to such an extent that you can only be sure of the facts you can wring out of them and even those needing confirmation. Some suddenly become hateful remembrances of instances. Most of them contain you and oh you have not been the saint you thought you were. The only solace you have is in the fact that you have the will to change.

You begin to be subject to Reverse Culture Shock. You begin to realize that potentially, there are only a few places where your demon could have been birthed. You can name each place. It is too late however. You witness the young being fed this poison and do nothing because you are afraid.

You are still not strong enough to refute this beast which until so recently and perhaps even now, resides in your own mind. You are weak and you lash yourself for this weakness, knowing that your silence is all it takes for this ugly existence to grow. Yet you keep the silence because you now know that it was this squall which secured your position in society and that years of living with this in your mind has made you so terribly frail that you would not survive it's cleaving from one's self.

You hate yourself for your actions. You repent your deeds. Perhaps if change smiles upon you, you get to ask for forgiveness. You spend your days knowing that your actions have supported a system which is wrong. The German citizen lending support to Hitler no longer seems incredulous to you. On some level, you understand how it could have happened and you pray that it did not happen that way because it would show you a sight which would scorch your soul.

You know things. You know, and you can never forget them. That is the curse of knowledge, for if ignorance is bliss the is not knowledge pain? They are constantly there at the fore of your mind, coloring your experiences and marking the schools of thought in people. Turning otherwise happy moments into objects of deep sadness and rage. You begin to generalize and then quickly correct yourself. You begin to find it difficult to love and hate people. People are no longer so simple. You surprise yourself at not being angry at things which would have enraged your earlier self. You begin to question views of your own which are too absolute to be true. You begin to realize that objectivity is after all, manufactured. You begin to take philosophy seriously and wonder why you took so long.

A line from a beautiful song comes to mind. Perhaps only when you are the problem, it is the easiest to fix. Finding out that you are the problem is hard; and painful. No wonder we live in a world difficult to change. I'll leave you with a line from a song that was introduced to me not so long ago.
"Black is white the other way, paradise is in the gray"

Wednesday, February 22, 2017

When Needed

Hear me slip these words,
into the folds of your mind.

For you in years hence,
like money in trousers, to find.

Calm a panicked heart,
while the winds howl around.

With these words now forgotten,
in tempest, strength bound.

Friday, February 17, 2017

Vehicles in Rajasthan

I was perusing https://data.gov.in once more and my thoughts turned back to my home state of Rajasthan. A quick search revealed that the site does indeed host data on Rajasthan (see this).

We take up the motor vehicle registration data provided on the site and immediately discover that like all the data on the site, this dataset is not prepared for consumption. We shall have to perform some data cleaning.

A few Excel copy pastes later we have a single file which I've put up on my Google Drive here.

With that I loaded this stuff into a Jupyter notebook and got this neat plot as the end product of some nifty manipulations.  The entire notebook is available here (I'll upload this soon).

First let's take a look at the New Registrations in every vehicle Type over the years.

Everyone seems to be buying two wheelers in Rajasthan! Cars and Tractors are the next big wigs but they stand no where close to where two wheelers are.

 Irrespective of vehicle class, there is growth in motor vehicles in Rajasthan as can be seen in this Point plot. We can see that what has caused the explosion in vehicle numbers is within the non-transport class of vehicles.
We can see that it's just one outlier which has caused this boost! What is that? We know that it has always been high in registrations. It's probably the motorcycles. It's curious that the total shoots up so quickly in 2012 but the growth is generally increasing. Let's see if the data has mistakes in it. We plot the total counts reported over the years and then we plot the cumulative sum of the new registrations.
That's strange.

New registrations for a year should contribute to the next year's total. Since we don't have data before 2005, we should expect the difference between reported numbers and expected numbers to be constant.

That difference changes. It changes abruptly in 2012 going from massively underreported to over reported. An effect of the Aadhaar card wherein people suddenly started reporting unregistered vehicles? Perhaps vehicles came in from other states. I don't know if that is counted as a new registration or not. It's either that or our government is bad at reporting / collecting statistics.

Thursday, February 16, 2017

For you

Moving mountains, parting seas,
Deeds that lovers please.
Wooed love, admantine.
To be loved beyond dying.

That love, is for the strong,
Like life, lopsided; mostly wrong.
Fighting evils within the self,
This love I feel is a saving grace.

I won't kill dragons to win your affection,
Nor shout to the world my mind.
I'll hand to you, with shaking knees,
my being and a knife.

Wednesday, February 15, 2017

Attendance in St. Stephen's College

I've been hunting around for datasets I can relate to for some time now. This fine evening I arrived on the conclusion that I could simply use attendance data from the website of St Stephen's College.

The complete analysis notebook is available as a gist (here's the gist).

Collecting the data was a matter of inspecting how the website was obtaining it's data. A company has been handling our attendance since the beginning it seems. (GreenClouds site link). As far as I'm concerned they've made a mess of the data transfer as I can see. Perhaps it originated in the college itself, perhaps it's due to some mistakes on the company level. Whatever the cause the data format is a mess.

I open up my console and spin up a Python script to get the data from the college website. Takes about two minutes on my Internet connection to collect all the information.

We then proceed to put it in a nice tabular form with columns being Name, percent LA, percent TA, percent PA, admission_year, course.

Things are now ready for graphing.


We go on to make Box plots. If you were to line up people in order of their attendance you would get a box plot. The person in the middle of the line is the line cutting the box in the middle. It shows that there are an equal number of people on either side of the line.

You cut each part of the line in half again and you get the bounds of the box.
This one is a gem. What is up with Chem! The attendance of their Lectures is absolutely ridiculous. While everyone more or less has the same attendance in a course Chem people are all over the attendance spectrum! Then again, their Tutorials box plot is neat too!

Yes. I see it too. Math people are consistently in class with a few people as outliers. Nobody else has this kind of spread. 

Perhaps a better way of seeing this would be violin plots where you get to see the density of the people instead of simple boxes. The thickness of the violin denotes how many people people are at that point.

For example we can see that a lot of the math  people are high attendance junkies and a substantial amount of CHE, PHY, PCH and, PCS are low attendance lovers.
There! The average PHI student absolutely does not give a hoot about attendance! The PCH and PCS programs do every class together as is apparent from their spreads. ENG and HST do compete with PHI but PHI does take the cake.

Let's move on to the simplest of all graphs. The Histogram. No explanation needed here.

Looking at the histograms we can see that almost everyone attends class with the exception of a few people who absolutely don't come to class. Perhaps people who left the course and the administration just did not remove them from roll calls? What about experience?

We do a regular point plot to see change in attendance over months. The vertical lines along the points denote a 95% confidence interval.

The third years have found that perfect balance of leisurely attendance during most of the semester and then picking it up during the last months. First years exhibit that trait of a "नया नया मौलवी" perfectly. Second years have figured out that attendance does not really matter, but are still to find out how badly those marks can affect their marks. With time comes wisdom,

That's it for now. I'd love to do some more if someone can come up with data at a lower granularity.

Friday, January 20, 2017


It’s Monday again. The weekly tests plague us...me. Not this time though; this time it’s the lovable creatures of Math. Everyone seems to think that they are monsters, but they are beautiful creatures. If you ask them they would tell you anything, but you have to understand their language to be able to ask them. Even our teacher thinks that this particular species of knowledge is a monster. Little does he know about the conversations I've had with them. My mother told me once that one should always try to understand. That the world is a knowable place and we must make an effort to understand it. These angels help me do that.
The invigilator hands out our papers. Sheets of wood beaten into submission so that we may give existence to what would otherwise only be in our minds. Human arrogance at display again. We would kill, only to make our thoughts known; for if you die without doing that, nobody would remember your memories. The first question asks me to define Pi.
I remember the class where our teacher told us that Pi is 22 parts divided into 7 pieces. Oh the confusion I had had that day! If that is all Pi is why does it have a name? What is so special about 22/7 that it deserves a symbol to name it? Where is the equality among those who belong to the species of mathematics? No! There must be something else about it, something which all the other mathemonsters have deemed worthy of their respect.
I asked mum about Pi. She does not usually now a lot about math but there is no other person who has taught me more about it than she has. She tells me that I was right; that Pi was a special bird. She does not remember what was special though. I take to the Internet. As long as I write well formed sentences and Google before I ask, I pass as a respectable adult here. My questions are taken seriously.
“Question 1. Define Pi. (2 marks)”
A statement so simple for those who do not understand. Oh what joy to be able to answer it! To simply write 22 / 7 or perhaps the circumference / diameter for those who pride themselves in going a step further than the rest. It would be lies of course. Pi herself is laughing at everyone present. They all take her for 22 / 7 and we both laugh at the private joke.
Pi is everything and nothing at the same time. I can only see her face which starts with 3.14.... and as I look lower I lose my ability to see her. I can look at parts of her but then the others disappear. I am thankful for even this. The Internet tells me that people like me, those who could see and communicate with the mathemonsters could not even see that earlier. They had to always start at the face and go down from there. Any second that you take your eyes off and she would disappear. It was some men who later discovered a way to look at any part of her without looking at her face to start with. This sight came at a cost obviously; we could not look at her as a whole.
Pi is an infinite being. She never ends. Starting with the face, as you get further and further away you would see more and more of her and yet never come across a part which seems familiar for she is a never ending and never repeating number. She is of the irrational clan. She mentioned e once but I haven’t met him yet.
If I was to manage to look at all of Pi at once. If I was to mark the even parts of her as 1 and the odd parts as 0, I might have something like a computer’s code. Pi however, because of her nature would have given me all the knowledge in the world. If you can look at Pi; you would have all the knowledge in the world.
The name of my great great grandfather. How did Bose die? Are we alone? When will I die? Who would be my friend? All of these Pi would tell you, you only have to look at her once. But these feeble eyes of mine don’t allow me to do that. This beautiful woman who promises me an understanding of everything, lets me have nothing as long as I cannot look at her. As long as I don’t understand all of her.
She is everything and nothing at the same time. She is all that I want to know, and yet I cannot. My future, my past, my love, my hate; all is known to Pi and she tells me nothing.
“Everything and nothing”
That’s how I managed to fail most of my tests. People kept telling my mum that she needed to put me into tuition, but both her and Pa knew that tuition would do me more harm than good. They would blind me and have me believe that the mathemonsters are nothing more than monsters. They would reduce pi to 22 /7 and e to a benign number. They would label the other numbers as complex and lend us faith in their non existence. They would never let me know about the family relations of the clan of irrationals and the complex. They would hide from me the love expressed within e^iπ = -1