The Latest

Lost journal discovered!

On July 24, 2013, in Althea, by vijay
1
Time was passing at scorching speed and so were our create-iterate cycles at Althea. Such was the case and our workspaces had ended up looking like it could use some self house-keeping. So in an attempt to retain minimal stuff on our tables, we kicked off the activity in full zeal. 

Clean all the things!

Er.. Yes, that'd be me.

In the midst of my (over)enthusiastic attempt, a piece of paper with some handwritten info caught my attention. It was by Rajesh(our platform main-man) on the experiments that he had worked upon with Abhishek(super cool intern). Courtesy my curiosity, I quickly glanced through it and this stuff definitely looked like material waiting to be published! Predetermined to put it up, it flashed to me that dropping a note to him informing him that this piece was gonna live would be a good idea . So quietly during his busy time, I sneaked up a chat message to him mentioning about it. In return, I received an “Ok” (a usual from him, which I loath in terms of reply standards BTW :\ ), but who cares? It was a green signal and thats all that mattered to me. So I pulled out some pictures of Abhishek from that archive and composed it with the journal. Below is the resultant, directly from the POV of the person who saw system evolve at every stage.

 

This is a report of the work done by an intern at Althea systems. In ancient times (or so it seems, but really just 3 months ago) we used Redis for maintaining our statistics — and yes, by that we mean every single countable under the Sun. A small overview of the older systems could give you an idea of all the problems we had.

 

For using platform specific data, we used to have Redis hashes, say for example, to track the number of signups that we were having, we’d use something like: signup -> android, signup -> ios, and such, within Redis. Which is not exactly good , but might have lasted had it not been for the data which needed to be tracked by the day. Those would need us to conjure up new hashes everyday, in the form: <date>.signup  -> android, <date>signup -> ios, etc.

 

By now you can see how unmanageable it was becoming. There were problems when Redis was getting overloaded, and its response times used to take more time than the actual operation itself! Graphing data was also becoming a pain. We had to write scripts for each one of the stats to draw the graph (though we reused a few of them). And then, out of thin air, an article in Hacker news regarding StatsD came to our collective attention. We were very impressed because it was solving many of our problems. During this time we had Abhishek coming in for his internship with us. He seemed very passionate about coding and trying out new things so we thought this could be an interesting problem for him to solve. Our set of problems actually increased before he could start making any dent in it, when we realized that the makers of StatsD had something very different in mind when they were creating this highly likeable daemon — they were more focussed on monitoring stateless stats (the ones where the past values don’t matter). We on the other hand, wanted saving, storing, aggregating, and all of the other Redis goodness, without actually having a resource intensive Redis instance running. Sure enough, (in addition to essentially annihilating three of our local boxes with his random experiments and significantly offsetting our caffeine expenses) Abhishek did manage to eventually crank out everything that was required for us to have a swanky new statistic storage and display system, which includes Graphite and a StatsD fork that is tailored to our specific needs (with a tiny database, storage containers and aggregating mechanisms built in), in addition to the initial set of stat pulling scripts which feed our data to the right places. And of course, in true programming spirit, we’ve had him build a generic, abstracted client that will do all of our stat pushing for us, wherever we want — at the touch of a button (more specifically, at the addition of one line of code).

 

The whole setup is still in its infancy, and will undoubtedly undergo changes. Being true geeks at heart and in profession, we naturally have the next batch of ideas as to what we’d have this module do. The details of that will be released as we go along that path. And for now, we’re starting to think that having received our desired substance, it is time that we focus on adding some style in the mix. Oh, and the stars seem just too perfectly aligned — it smells like new interns here! :-)

The Night-Owl intern

Abhishek- Our night-owl intern

Abhishek, it was fun having you work at Althea, mate. Thought i’ll wish you for prosperity and success on behalf of the team. but i’d rather say

May the force be with you, young padawan.

Vijay, on behalf of the folks @ Shufflr.

Tagged with:
 

Game Of Thrones @ Althea

On April 7, 2013, in Uncategorized, by vijay
1

The new chess set-up at office.

Glass chess board

Cool, ain’t it? :D

Tagged with:
 

Holi 2013

On April 5, 2013, in Althea, by vijay
0

This year at Althea, the Gen-Y team made sure to set a competitive standard for the Holi celebration (pun intended) :D . Here are a few pictures that were clicked in midst of the joyous celebration.

Holi-2013

The colorful celebration!

Festive wishes to everyone!

- Folks @ Althea

Tagged with:
 

Adieu 2012

On January 18, 2013, in Althea, by vijay
0
ChristmasCollage2012

To put the words in a more meaningful way, here's a collage of couple of gifts, the Christmas tree and the exquisite caramel cheese Christmas cake :^). People in Picture: (L-R) Praveen, Sam and Abhishek.

With a wonderful year coming to an end, we at Althea celebrated Christmas and bid farewell to 2012 on a joyous note. Thanks to Praveen and Sam for organizing the “treasure hunt” which had all of us trying to be Sherlock, hunting for hidden notes, which would be our ticket to a box of delicacies. Trust me, the spirit in some of us put even the likes of Indian Jones to shame. :-D Special thanks to Paul for sending us a delicious cake for Christmas(Center picture in collage). In midst of all this, to spice things up even further, Secret Santa was organised at office :-) . All genres of gifts, from Chocolates to Cycling gloves, Wristwatches to a game of Othello were delivered secretly by the respective Stealth-Santas. Looks like our intern, Abhishek couldn’t have picked a better time to join us at office. :-)

In summary, the Christmas tree glowed in all of its beauty, jingles were played, cakes were cut and gifts were exchanged as we welcomed the new year with all spirit and joy. Looking forward to 2013 with high aspirations as ever, here’s wishing you guys a fantastic year ahead!

Folks @ Althea.

Tagged with:
 

The making of a chess table

On December 20, 2012, in Althea, by vsagarv
1

There was a painter’s odd hack table gathering dust in our office. With winter at its peak, it only looked fit for firewood.

But, I went and asked our amazingly resourceful man-friday, Mr.Gangaram whether it can be turned into a respectable chess table for us amateurs at Althea. In his usual smiling and assured manner, he said that something can be done. I would check with him once in a while, but didn’t realise until a few days later that he was silently working with the table on the terrace, square by square, blacks afters whites after blacks to perfection.

And this is what he created in just a week, in his break times, while taking care of a dozen other things we go to him for! :)

 

A painter's standing table wears chess squares. Escapes becoming firewood.

Mr.Gangaram carefully paints chess squares on a discarded painter's standing table.

 

And the games begin.

 

The games begin. Abhishek The Intern vs Sam The Champ.

The games begin. Abhishek The Intern vs Sam The Champ.

Tagged with:
 

Bird’s-eye view of Althea HQ

On December 7, 2012, in Althea, by vijay
2
Tiny-Planet-Althea

Working in a start-up almost feels like living on your very own planet. :) Well, what can convey it better, than the picture itself.

 

Tagged with:
 

Onshore developer reports to base.

On December 3, 2012, in Althea, by vijay
0

Occupied as we were, celebrating Deepavali at Althea, little did we know that it could get even better. Yes, we had Smita joining us for the celebrations at Office! As often said, when it comes to catching up with friends to celebrate, the more the merrier :)

 

Sarala and Smita
That’s Sarala and Smita, sharing a light moment amidst a couch coding session :)
Tagged with:
 

Shufflr Platform – AWS EC2 Spotathon Zone

On November 17, 2012, in Althea, Shufflr, by vsagarv
2

We have been thinking of sharing some of our platform experiences with the developer community for a while and the announcement of AWS EC2 Spotathon 2012 provided just the right push to start it off. Watch this space for updates on how the Shufflr platform uses EC2 spot instances to its advantage in various subsystems and use cases.

Introduction

Shufflr is a multi-screen, personalised social video discovery service powered by its platform on the AWS cloud (see shufflr.tv and products).

Over 4 million users across 170 countries discover online videos of their interest every day on Shufflr’s iPad, iPhone, Android, Windows8 and Facebook apps.

Shufflr takes the ‘Daily Fix’ approach for delivering a personalised discovery experience to its users – with an algorithmic combination of various signals such as the users’ social graphs, hot topics / buzzing videos / celebrity activity on the web, popular content sources, live events etc.

The Shufflr Platform

The Shufflr platform on AWS comprises of many independently scalable services such as user signup, authentication, authorisation, online video indexing, algorithmic filtering and classification of videos based on meta-data, making & tracking social graph connections, social data fetching, personalisation algorithms per-user and per-group, creating and delivering activity streams, developer APIs for apps etc. The platform services communicate with each other over well defined API contracts and SLAs.

AWS EC2 spot instance use cases in the platform

Overview

Shufflr platform services that are of non-realtime, offline data-fetch / compute nature have flexible SLAs and are prime candidates for cost optimization with AWS EC2 spot instances (though the overall cost of is planned & managed as a function of reserved, on-demand and spot instances). Some such spot-instance friendly services in Shufflr are, social graph data fetchers, video indexers and filters, background (asynchronous) tasks, test (staging / pre-production) environments etc.

While it is typical to think of spot instances purely as cost savers, we see it as a reminder for asking sound architectural questions such as:

  • Which tasks can be delayed without affecting the product requirements and upto when can they be delayed (not only freeing up web/app/DB servers to cater to low latency requests but also has the positive side effect of the flexibility to wait for the right spot price for delayable tasks)?
  • How much and how frequently should background tasks be persisting intermediate state (thus lending to robust fault tolerant designs and incidentally becoming amenable to the unpredictable and ephemeral nature of spot instances)?
  • Should a given background task complete quicker (by launching more instances in parallel for the same cost), to help other services down the compute chain make their decisions faster?
Spot Scaler Code Snippets

Spot scaler code snippets (click to zoom in)

And we find that, a probably-not-so-obvious aspect of spot instances is that it lends to two conceptually different use cases:

  1. Doing a job at the lowest possible cost and
  2. Doing more jobs for a set cost.

The latter use case helps us innovate better due to our thinking on the lines of:

“These instances cost less and we have budgeted for the worst-case of on-demand pricing. So, let’s see if we can infer any more patterns by fetching and mining a bit more social data / compute a few more iterations to improve recommendation accuracy.”

Use case details

Shufflr’s service oriented architecture lends itself to applying independent cloud deployment / configuration strategies and scaling up/down individual sub-systems as required. We will outline some examples of our spot usages in this context.

  1. Social graph data fetch with priorities & time-bounds: Users connect their Facebook and Twitter accounts to Shufflr, so that they can get all the videos shared by their friends. When a new user signsup, Shufflr fetches such video posts from his/her social graphs ‘asap’ (so that the user can make an easy emotional connection with the app), indexes the video metadata from the source site into Shufflr and adds it to the user’s social stream – all of them asynchronously. In parallel to those, taste-inference algorithms run through the user’s social graph data to start making personalised video recommendations. The key here is to be able to scale all these asynchronous activities according to varying new user signup loads along with current daily, weekly, monthly active users (in decreasing order of priority of resource allocation as well as job-completion latency). For instance, it is good to fetch the social graph data of a new user within a few seconds of his/her signing up and hence scheduling him at a higher priority than say, a weekly active user. We have a custom, multi-level queue scheduling algorithm to address this. The job queues, scheduler and dispatcher are run on on-demand instances, whereas large numbers of workers run on spot instances. The number of spots are a function of the job queue length, a feedback loop that assesses instance/worker health and launches replacements for terminated spots (due to price spike, say). To maintain a minimum assured throughput, a few workers are run on on-demand instances too. We had used MIT StarCluster initially, but eventually switched a custom scheduler-dispatcher-autoscaler implementation.
  2. Social Graph Video Fetchers

    From the drawing board: Social Graph Video Fetchers with spot workers in 'red' (click to zoom)

     

  3. Per-geography, context sensitive, hot topic videos’ map reducer: Shufflr’s users are spread across 170+ countries. One of the video discovery anchors in Shufflr is videos about hot (trending) topics of the hour. For this, Shufflr uses Twitter trends amongst other signals, to infer hot topics of the hour. A twitter trend, being just a keyword (say, ‘Obama’), does not come along with any context (say, ‘US Election 2012′). Shufflr uses certain patent-pending algorithmic techniques to infer and attach context to trends and use that information to fetch video links+metadata from various video sources online. Spot instances are used here, in a custom map-reduce implementation running multi-threaded workers over a filtered data stream of Twitter firehose. Portions of this system – a Redis server for sorted data sets, nginx fronted raw-FS servers for tweets storage etc are run on on-demand instances as they hold critical state. A pending area of parallelization is to launch one set of spots per geography (woeid) and cover all 170+ countries within the first few minutes of every hour.
  4. Location & context aware, hot topic video fetchers

    Location & context aware, hot topic video fetchers; spot work loads shown in 'red' (click to zoom in)

  5. Video queue processors: Shufflr has an ever growing index of over 110-million online videos. As time passes, some of the videos URLs in this index become stale. So, periodically these videos are queued to a ‘Stale video remover’ which launches a number of spot instances as a function of ‘job queue depth, instance type and worker throughput’. This is a pure background, best effort non-realtime task with no latency bounds and hence ideal for spot instances. The job queues along with some additional state are maintained on on-demand instances. Another use case on similar lines is the background task of fetching associated videos for a given video in the index. In this case, the video queue and the associated video fetch workers together create an every growing tree of indexed videos in a breadth-first manner recursively. Spot instances are used for running the workers while a large MySQL DB holds the video index along with the video metadata. Since the queue depth increases exponentially here, a cap is placed on the number of spots.
Associated Videos Fetchers

From the drawing board: Associated Videos Fetchers; spot workers shown in 'red' (click to zoom in)

Stale Videos Removal

From the drawing board: Stale Videos Removers ; spot workers shown in 'red' (click to zoom in)

 

Our video queue tracking charts

Yes, we like drawing them by hand and sticking them in the aisle for all to see. :-)

Video queue tracking chart #1

Daily video queue tracking charts - 1. // medium: sketch ink on paper, display: glass wall in the aisle. (click to zoom in)

Daily video queue tracking #2

Daily video queue tracking charts - 2. // medium: sketch ink on paper, display: glass wall in the aisle. (click to zoom in)

Spot workers on the video queue

(To illustrate the workers running in parallel, we need to get a better composite screenshot. We hope to post an update soon.)

Video queue jobs being processed by spot workers

Video queue jobs being processed by spot workers (click to zoom in)

Cost savings

Just this year, we have seen 75% cost savings till date over 51000 spot instance hours (normalised to m1.small equivalents, across various spot instance types we’ve used in multiple availability zones). This is only one slice of our AWS platform costs but, as they say, every thousand$ counts :-)

Notes on bidding

These cost savings came about with a combination of experiments on our bidding strategy. Initially (there weren’t many AZs and regions), we used to bid for spots with in our main AZ (us-east-1c) at on-demand prices. In general we would obtain the instances at about 1/3rd that price. Sometimes we would notice that, the spot prices were shooting higher than on-demand in our AZ, while other AZs continued to give lower prices (notably, us-east-1e).

We slightly altered the strategy to move across AZs when our bid wouldn’t succeed at on-demand pricing for a given timeout duration. That gave us a fairly good hold on the spot costs.

Of late, the ability to bid in the lowest cost AZ has reduced the guess-work because of AWS spot pricing historic data and current price APIs. Someday, we will rig up the perfect price bidding strategy algorithm (and use the same to pick winning stocks in the stock market ;-) .

Notes on scaling down spots

While scaling down spots, our auto-scale strategies (a) turn down only those spots that are about to complete an instance hour and (b) utilise a spot-instance as long as possible for the current hour, before terminating.

Performance use cases

There are certain job queues in the Shufflr platform which are serviced by background workers, even though the nature of those jobs are different from typical relaxed-latency jobs in other subsystems. These job queues are meant for asynchronous processing of elements in the web request chain (so that our app servers – Nginx+Passenger+Rails – are free to service the next web requests). Such asynchronous processing has stricter latency bounds in our use case and hence needs a very health ratio of workers to jobs, to provide predictable throughput, even during user load spikes beyond the upper control limit. To maintain such a ratio and complete servicing these job queues in the shortest possible duration, an array of spots with workers are launched in parallel.

Summary

We have been using spot instances since a long while and we are very glad with the cost savings as well as the spot-friendly elements we have built (and continue to build) into our platform architecture. As soon as we can get some free cycles, we would love to share these with the open source community.

Tagged with:
 

Deepavali 2012 at Althea

On November 16, 2012, in Althea, by vijay
0

As the festive celebrations kicked off at Althea, we had some wonderful effort at office to deck it up ethnically for the occasion, courtesy Sarala. After prayers were offered to the Almighty, delicious Kadubu prasadam (Fried sweet dumplings) were feasted upon by all of us to our hearts content. :) This year around, we at Althea also resolved to celebrate a brighter, calmer (read less-noisy) and environment-friendly Deepavali.

That’s the wonderful decoration, Kadabu prasadam and the welcoming Diya-Rangoli at office. (Person in picture: Sarala). Psst, don’t miss the time-lapse “making of” video below!

 

Video credits to our resident van-gogh- Sreejith, for helping us pull off this gig neatly :)
Background score: Musician – Sri Chittibabu, Instrument – Veena, Composer – Saint Tyaagaraja

Wishing all of you a very happy, prosperous, bright and safe Deepavali. Fiat Lux.

- folks @ Althea

Tagged with:
 

deepAvaLi is here

On November 13, 2012, in Althea, by vsagarv
0

The time is bright & sweet with the joy of deepAvaLi (yes, that’s how it is said here in Bangalore, while our brethren from around the country also say ‘divAli’ and ‘dIwAli’).

We celebrate it as a five day festival of prosperity, joy, lights, victory and love.

 

deepAvaLi - the 5-day festival of prosperity, lights, joy, victory and love

Murali brought a car load of dry fruits and sweets for us all to take home :)

 

Happy deepAvaLi to all of you from the folks @ Althea :)

 

Tagged with: