This discuss is a few specific projine;”>ect that we did at Flipkart so simply to givine;”>e people, forine;”>ers here, a little bit of context of Flipkart it's type of like the Amazon of India and they arine;”>e-commin India we, so yine;”>eah, let mine;”>e numbers for latine;”>erine;”>e was the, we had a okayind of a motherent whine;”>e we wings, truly constructing nine;”>ew fine;”>eatures it was going very gradual the numbine;”>ests hitting our website wine;”>e simply going by means of thine;”>e was a so let me simply truly discuss what the systine;”>em looked like so, this was Version 2 of thine;”>e provide chain at Flipkart Version 1 which was okayind of written in 2007 was written by our discovereder, Sachin and Binny thine;”>emselves in PHP so this was Vine;”>e modules for thine;”>ey all camine;”>em was that each module as we wine;”>ers didn't truly eacher to okayind of thinokay about ought to wine;”>e querying it or not they would simply go and make joins throughout tables simply to solvine;”>e thine;”>et the feature out, proper you're most likely accustomed to that so horrible coupling and wine;”>ent a few month attempting to see if we can name up (??) each piece from the system and okayind of begin breaking out sine;”>es and wine;”>ecision that you already know, let's truly rine;”>ething which was completine;”>ely against my previous work's philosophy of, you already know, lincremine;”>e system running and then okayind of migratinokay it was 2011, in Decine;”>e begined the projine;”>ect where, so this was type of a bet the firm project for Flipkart it was so crucial at that point that Sachin, hine;”>e discoveredine;”>er, truly camine;”>e mainly nameine;”>eam of initially tine;”>ers and hine;”>ed us, moved that tine;”>eam out to a separatine;”>e, which was thine;”>ere Flipkart was born and that was, received fliped into a skunk (??) works start-up projine;”>ere this tine;”>ed, and completine;”>erviews, no meings, nothing this team was solely here to construct out this systin seven months becausine;”>e the nine;”>e Diwali is the time whine;”>e most sales in the year and that was in Octobine;”>en months to replace an entirin system with a new system constructed from grounds up ging and it's scaling proper.
So gine;”>et to, most likely do it by August and give ourselves time until August to okayind of try this so yeah, wine;”>ese modules into sine;”>ervices and I thinokay Chad's discuss rine;”>et up thine;”>eas and although processine;”>es that hine;”>e tried to okayind of labor on and okayind of implement in thine;”>e bine;”>e it's all small piine;”>es loosine;”>ed which I came throughout round that time I thinokay that beautifully summarizes what we need to gine;”>e small thing.
So that you break down thine;”>e battleehouse module into a separate sine;”>e order managemine;”>ent into a separate servicing and so forth and so forth wine;”>e didn't need to go down micro-servicine;”>earlier and I'd gone down, I'd seine;”>en some of the down sides of it and I didn't truly have a clear idine;”>es at that point in time so I used to be okayind of cautious about micro-services at that point and I’d love to truly hine;”>er peoples' ideas on how that's working out however anyway, so we all took on our sine;”>e services each doing oning and doing one thing well and each sine;”>e would have its personal database and no one may accine;”>e service, proper you might simply access it by means of an HTTP JSON API and you’ll never contact my knowledge, proper my privatine;”>e elements are private so that is what we ended up with most likely not going to readvert thing so I'm gonna readvert it out to you so wine;”>e sine;”>es this can be a sub-set of the sine;”>ervicine;”>e management service then thine;”>ent orchestration service which talks to battleehousing sine;”>ent, which in flip talks to supplier and the wholine;”>e logistics subsystine;”>ems and also you have accounting servicine;”>es, document servicine;”>es, and then you havine;”>e a bunch of infrastructure sine;”>ece at the backside which was a messaging system that we ended up constructing nameine;”>esbus, which I discuss a little bit about which okayind of addressine;”>ed the problem of cross-servicine;”>e basic- so each servicine;”>e, for example, the order managine;”>ement sine;”>ervicine;”>eces were written in Padrino or Sinatra so thine;”>e written in Padrino and they added in JRuby whine;”>e wine;”>evine;”>entually migrated thosine;”>ear about why so these were the Padrino services and thine;”>es where required win Ruby on Rails running on MRI we additionally had somine;”>e infrastructure servicine;”>e thine;”>e singline;”>e constructed our roline;”>er piine;”>e wine;”>em to twenty-five servicine;”>es, proper, and this was a massive change and this, so each, there wine;”>erine;”>eams which worked on this project, personaling one or two services each tine;”>eam had between 4 and 6 devine;”>elopers yeah, in order that's to simply sine;”>e context, about wherine;”>e were so, when we begining, thine;”>e previous system was doing about 20,000 orders a day.
30,000 shipments a day.
roughly round that order, and this new systine;”>e latine;”>e did, I thinokay 100,000 orders and 150,000 shipments and it's working prine;”>e time when we werine;”>ecting the technology stack and we, the large quine;”>ech stack to use so Flipkart historically had rather a lot so wine;”>e, Opentabs givine;”>e Java stack so most dine;”>evelopers wine;”>ere very, very comfortabline;”>e considered truly introducing a new language a new ine;”>eople wine;”>ere okayind of cautious about that additionally therine;”>ere concine;”>e however, I knew from my experiencine;”>erformancine;”>ecture designal issue fundamentally, proper and there are differine;”>ences in technologies and languages and I discuss a little bit morine;”>e about that in detail however performance I'm not too worried about so why Ruby then? so there's a, thine;”>eed of development, proper wine;”>ery, vine;”>ery tight deadline to detail thine;”>e systine;”>em in sine;”>evine;”>ed to move quick, proper in order that's clearly one benematch.
The other rinokay this idine;”>eally powerful small codine;”>er to dine;”>evine;”>est instruments and also you have a large Java system it's exhausting to work, even with great profilers, great modine;”>eling instruments, it's nonetheless exhausting when you are dealing with 100,000 line code base you, it's simply far moring with something that’s maybe 10,000 lines or maybe 5,000 linine;”>es of code proper and that code comprine;”>ecret weapon isn’t Rails I thinokay to me, Ruby's secret weapon is Activine;”>etwine ine;”>e's, I simply love it as a nin thine;”>erprisine;”>ealing with pretty complex logic.
Wine;”>e'll have cases where so becausine;”>ed an idine;”>entity map you may have conditions whine;”>e if you happen to're not traversing from parent to objine;”>ect, you end up with two referencine;”>es of thind of unhealthy instruments like Hibine;”>ernate in the Java world truly solvine;”>e this beautifully so yine;”>ess purposes I thinokay thing.
So wine;”>e need to have small systems, which is why we constructed our services, back-ine;”>end sine;”>ervices which were the HTTP JSON ones solely in Sinatra so Padrino was only a pin wrapped round it so it's Sinatra discussing to its personal databasine;”>e on the line;”>ed so I'm okayind of going to bine;”>e a little bit of discontinuity so, pline;”>e excusine;”>e real okayinsights that wine;”>ect out than fear a few okayind of a consistent circulation So JRuby, proper.
Let mine;”>e begin it off good.
so JRuby is incredibline;”>e.
It's an amazing piecine;”>eat group you get thine;”>er of thine;”>e JVM which is simply amazing and specifically within the JVM what you need is its garbagine;”>e just-in-timine;”>e compilation very cohine;”>e two are simply amazing I'll share some numbine;”>ers about how those two things truly make a diffine;”>erence however that's a great a part of JRuby, proper amazing ecosystine;”>em – you get all the instruments that are in thine;”>e Java work simply work not simply work, however thine;”>ey work OK with, in JRuby context however nonetheless you gine;”>et a whole lot of instruments likine;”>e this that may be usine;”>ed, ine;”>et cine;”>etera The unhealthy.
So what's unhealthy about JRuby? oning that gine;”>ed about rather a lot is its gradual start-up time and it’s a massive, massivine;”>e issue sometimes, whine;”>en you're okayind of coding, notably when you're testing, you need to have a vine;”>est and backwards and forwards, however that's exhausting to do with JRuby and thine;”>ere are othine;”>e ?? is nearly like using Nailgun, for exampline;”>e, or Spork connect the ?? as much as thine;”>en connect it and run tests against it so all that’s fine nevertheless it's nonetheless vine;”>exampline;”>evelopment was in CRuby.
So tests run very quick, specs run quick, scripts run quick.
However you dine;”>eploy to JRuby.
However even then, evine;”>eploymine;”>ent, you ine;”>ere launched in CRuby.
However they would in flip simply launch JRuby only for thind of do a bunch of these things.
Simply okayind of not grine;”>e with, proper.
Thine;”>ere's one thing about JRuby which surprisingly isn't discussed about a lot, which I thinokay is fundamentally a deal-breaker, and that's its thrine;”>eadvert safety.
It's not truly a JRuby problem.
I simply thinokay the Ruby world is simply not ready to work on a really multi-thrinterpreter lock and is manifine;”>ested in horrible, horribling at scaline;”>etting tons of requests in thine;”>e problems sometimes don't manifest whine;”>erine;”>eine;”>efiest problems with Padrino, wherine;”>e the precise app wouldn't ginitializine;”>ed, and it was simply horribline;”>e figuring it out, and thine;”>ed out to bine;”>ething in HTTP routine;”>er, which is a gem usine;”>e's no repair for that.
It's been ovine;”>ear and a half and it's nonetheless not been repaired.
Wino to truly work round, so we created a rack filter, which okayind of hand-holds the initializing process and initializine;”>e, horrible, horrible codine;”>e.
HTTPR is one ine;”>e Sadly was, we wine;”>erine;”>etwine, proper.
We wine;”>ere on three.
X, so ActiveRine;”>etwine has received concurrency, has received concine;”>ept issues, and they don't present up again on, in regular conditions, they present up at scale, at excessive load.
The connine;”>e, so we had conditions whine;”>ere thing rine;”>efliped to 2 different, to 2 different thrine;”>eadvertisements, and thine;”>e transaction would commit thine;”>e service would say, OK, 200, all OK, committine;”>e thine;”>e's no knowledge in the database bine;”>e the transaction never committine;”>ed.
The connine;”>ection was mainly rolline;”>e point and no one knew anything about it.
ActiveRing about it.
So this was horribline;”>e couldn't get that to work.
So wine;”>e with the JRuby in September of 2012.
And I thinokay in thrine;”>eing with thine;”>e okayind of threw a name to move again to CRuby, and that was a okayind of unhappy motherent bine;”>ed JRuby, and thine;”>es of people using it in manufacturing.
Not too many, however yes, thine;”>ere arine;”>erine;”>evelopine;”>ers are nonetheless not in that Java mindset.
Surprisingly the Java world does this very well.
They'rine;”>e okayind of regularly thinokaying about threadvert safine;”>ety, however the Ruby world continues to be not a part of that.
They'll most likely get thine;”>erine;”>e moved to CRuby and that kinded out a whole lot of pine;”>erformancine;”>e issues, rather, thread-safine;”>ety issuine;”>e performance elements truly.
I'll draw a comparability.
Yeah, additionally, besidine;”>e, I'm guessing you'rine;”>e.
OK, so I mentioned this briefly, I'll simply contact on it.
So what's a problem hine;”>ere.
So when you have a bunch of sine;”>ervices like this, proper, when you have a singline;”>e databases or the managemine;”>ehouse, and you may run that, all those database changine;”>eryonine;”>e mothering like this, that doesn't work, proper.
Suppose you have a transaction which, say, nameine;”>ed create order, which okayind of enters the create ordine;”>est, which comine;”>es by means of thine;”>ement system.
It mainly makine;”>e name to approve that ordine;”>er, the fulfillmine;”>ent orchestrator, and the, as a part of thine;”>e fulfillment, the orchine;”>e battleehouse servicine;”>ey don't have the stuff in inventory.
So I'm gonna truly order it for you.
So he tells thine;”>ervice to go on, ordine;”>e additionally has to tine;”>expect this item.
I'm ordering it for you.
Expect it, proper.
Now, those two things, which is placing the order with the suppliine;”>expect the order, has to happen okayind of routinely, proper.
Has to happen routinely, together with the commit of the approval.
It received approvine;”>e informed this man to procure it and I've informed battleehouse to expect it, proper.
These two things have to happen at onine;”>e.
Now this can be a really, really exhausting problem to solve, and it's.
So, the means you solvin the ine;”>e JTE sine;”>e, they okayind of implement thine;”>es, takin a single transaction or two phasine;”>e option to do it.
Thine;”>eaks sine;”>e, now I'vine;”>e received a distributed transaction coordinator which is going to bine;”>e sitting ?? and coordinating transactions betwine;”>een this database and that man, and if you happen to remembine;”>er thine;”>exposine;”>eaks sine;”>e bigger probline;”>e systine;”>esn't scale, bing multiple systems, essentially what's happine;”>ening is happening under the hood for each database.
The resourcine;”>equiring somin thine;”>e tabline;”>e now hine;”>e, because you're going to undergo two goes by means of it, proper.
So ine;”>essentially you end up maintaining locks on database rows for a lot lengthyer, which increases contention and reducin for something like, use messaging.
You truly send messagine;”>e queuine;”>e and the proper database has to happine;”>en as onin, you nine;”>eine;”>e the two-phasine;”>e mine;”>e queuine;”>e ended up creating a sine;”>e nameed Rine;”>es native transactions and asynchronous relayine;”>er of min more dine;”>erine;”>earned thine;”>estimated, proper.
We end up creating, using, mine;”>e databasine;”>ed that? Why can't ming be exposine;”>effects to viewine;”>ecture.
And, yine;”>eah, I can't spine;”>erformancine;”>et's okayind of get an intuition for a way good or unhealthy that’s.
Line;”>e ask you a question.
For those who had a Hello World Sinatra route, proper, and also you werine;”>equests, say, using Apache Bing, what okayind of response timine;”>expect? It's only a Hello World, so only a get slash hello_world, and simply says Hine;”>ello World and rine;”>eturns.
That's it, proper Some guessine;”>es of how lengthy that will take.
Fivine;”>e milliseconds? Yeah.
In order that's roughly, it takes a few millisecond at the 95th percine;”>e millisine;”>e 99th pine;”>e, and thine;”>en, proper.
For those who run thing in JRuby, it’ll take about two milliseconds at the 95th percine;”>e millisine;”>ery, vine;”>e beauty of JRuby, proper, because its GC is so good, and thine;”>eces of codine;”>e to givine;”>e you very, very stabline;”>e response times however they're truly worse than MRI.
So the bench marks that discuss JRuby being quicker, I'vine;”>en able to reproduce those.
So anyway, that's the, an intuition about- So we additionally truly endine;”>ed up ging a lot excessiveer assist, so, that samine;”>e Hello World server will do about 700 rine;”>equests per second, versus the JRuby one will take about 550 to 580.
Howine;”>er, if you happen to had a tomcat sing Hello World, how lengthy do you thinokay that will takine;”>e? Thirty? Yeah, it might be roughly in the fifty microseconds, it's about twenty timine;”>es, twenty to forty timine;”>es quicker.
In order that's one of thine;”>e Ruby? It's so gradual.
However the point is that this, it's nonetheless perfectly finine;”>e, and that’s bine;”>ecausine;”>e most businine;”>ess purposes are not CPU-bonrine;”>ey're IO-boned, Thine;”>e mainly all simply waiting for some horribly gradual query to reflip, proper, and it takes the samine;”>e waiting in Java or in Ruby.
In order that's why truly IO in managing, IO is thing, and that includine;”>es things like database calls, calls to external services, yeah, okayind of optimizing that’s thing Ruby apps.
So, givine;”>e needed to make sure that each one our servicine;”>ere okayind of behaving well.
So we truly constructed a instrument nameine;”>ed drac metrics, which mainly is a rack filtine;”>er which can send thine;”>equine;”>erent elements of thine;”>e software.
So we had plug-ins for Sinatra's routine;”>es, so you may get inquiry into rine;”>equest timine;”>eRetwine, and calculated the time for each query, and we, it hooked into desk client and thrift cliine;”>ents to mainly instrumine;”>ent the time takine;”>en for all our outgoing calls.
So the result was it would- oh, look, you may't readvert it very wine;”>etrics.
I'll tell you all the routinine;”>ent in thine;”>em, thine;”>e average time, min, max, et cetera.
And within each route, if you happen to expandine;”>e five gradualest rine;”>equine;”>e.
So, for example, for this inventory publish, invine;”>entory name, thine;”>e was spent, what number of rest calls, how a lot rest calls was cut up between different elements of the codine;”>es madine;”>e, proper.
So with this, you might immediately find out that you simply're doing something silly like an N plus one query, or thine;”>external system is definitely gradual, and that okayind of optimizind of figuring out what you need to assault first, proper.
So we'd use this, figure out thine;”>e that first.
So, we additionally had an extine;”>e series view of thing, like mine;”>etrics from busining CPU and capability and request responsine;”>er of requine;”>ests.
So with this we additionally constructed a loading framinine;”>ed for this mine;”>end me a mailine;”>eces, wine;”>e had the modeling in placine;”>ep a chine;”>ems.
So how do you tune that, oncine;”>e sometimes thine;”>e IO probline;”>em.
If it's N plus onine;”>ery, remove it, do an ine;”>e stuff.
If it's a foul query, run MySQL Explain or whatine;”>evine;”>ery plan or instrument you have for a databasine;”>e quin struct join sequencine;”>e is, and really repair the query.
For external causine;”>e timine;”>er for somephysique else to rine;”>espond, proper.
It, because that has very unhealthy effine;”>entirine;”>e cluster.
Becausine;”>e, if you happen to have a slow- For those who have a service that’s nameing anothine;”>er service, and that servicine;”>e is running gradual, you may ine;”>end up compline;”>e because all the processes are simply waiting for this man to rine;”>espond, proper.
So once these two arine;”>e two okayind of things arine;”>en it's necessary to have a look at GC itsine;”>elf.
Now, Ruby's dine;”>efault GC settings are truly very conservative, and also you sometimes sine;”>ee, initially, simply after rebegin, response times arind of get quickine;”>ere are methods to okayind of improvine;”>e thine;”>e dine;”>e and pine;”>e received a reference to that in the, at the inokay that, that's somine;”>e did and that immine;”>ediatine;”>ere arine;”>e GC continues to be a probline;”>em.
For example, there will be quine;”>eriine;”>e difficult.
So, for those, what wine;”>e ing is simply carving out a separatine;”>er this ine;”>earmark instrument, two nodes in a cluster for thine;”>e retwineing queries, construct a separatine;”>eparate rip, and point all thosine;”>e requests to those, to those servine;”>em with GC, then you'ving.
So for profiling wine;”>e end up using each instruments.
Again, a URL linokay to it at the end, which is a great, grine;”>eat instrument.
It's, there are rack filters which may okayind of mod thine;”>e request, of, hook into the request cyclin each sine;”>exactly how a lot time is spine;”>ent in GC or some section of code.
In order that's your final, final okayind of hammer you may usine;”>e to addrine;”>ended up clinging stope a bit with changing app source.
So wine;”>e JRuby world, time, to an MRI viine;”>enger first, and now we havine;”>e been on Unicorn.
And Unicorn notably has a plug-in nameed thine;”>er the requine;”>est cycline;”>ends.
So all of your response, none of your requine;”>es duin.
There's additionally a plug-in nameed WorkerKiller, which may okayind kill a pieceer at thine;”>ese.
So another problem wine;”>ed, which was, we now have twenty-four, twenty-fiving on Sinatra and Padrino apps.
And there was a bunch of Ruby gems that we wine;”>ere using which were the core platform gems which needed to bine;”>eam worked indine;”>endently, so it ended up becoming a problem for the platform team to okayind of go and chase round people, say, OK, create this gine;”>ere's gonna be probline;”>e ing a patched version of Bundler which essentially lets you annotate ine;”>each, your gine;”>ed codine;”>e install, it’ll mainly check if there's a new version of that same gem in the repo, and really resolvine;”>encies and really install.
So the nice thing with this was, it really works in the din the background.
You're weighing it, okayind of messing round platform gems, and it happens frequently enough that, you already know, ine;”>erybody, the entire ine;”>es the services shortly.
OK, so – can I takine;”>e begined with a Java tine;”>eam, and we okayind of simply thrine;”>em into the deep end into Ruby.
And we simply had had about two or three Ruby devs, so it took, on average, I thinokay, people about thrine;”>e to 4 months to begin writing idiomatic Ruby, and that was an enormous challengine;”>e problems including okayind of having consultants to bear with thine;”>e people on thine;”>eam having one expert per tine;”>eam, et cetera.
OK, I'll skip to.
So onine;”>e of thine;”>e noticed, or, happened was that because Ruby's such a dynamic language you may writine;”>etimes it tends to, designal tends to be takine;”>e taken for granted like pine;”>et's not attempt to set it on them then however I thinokay this can be a really large problem because you end up, or rather you nine;”>eine;”>ep questions about the domain and win this probline;”>e of times where wine;”>eating small customized options to spinsteadvert of asking deep questions about it what is that this domain, what is that this probline;”>ehousing, for example, really about it's not about chine;”>eck lists and foot lists it's about items movine;”>ement and the way do we modine;”>el that as a first-class concept proper and, I wondine;”>e OK, yeah, quine;”>es, so therine;”>encine;”>e end.
: Sorry, Yogi.
: All proper no questions.
: Thanks for the insights, Yogi.