Stack History: A Timeline of Pinterest’s Tech Stack Evolution by StackShare

pinterest tech stack stackshare timeline decisions 12 |  Uncategorized | infrasture developmentPinterest now has 335 million Monthly Active Users with more than 200 billion Pins across more than 4 billion boards. It’s hard to believe that Pinterest started out as a small site with a few thousand users back in 2009. But their initial tech stack is part of the company’s DNA and is still around in some form today. Many know Pinterest as a Python shop, but there’s far more to the story. Since early on they’ve had to deal with scaling their site to keep up with their rapid growth.

Over time their data infrastructure and machine learning capabilities became their strongest assets as they scaled to recommending over 4 billion ideas to over 250 million users in 2018. This stack timeline tells a story of one of the fastest growing consumer software companies in history. Find out how they scaled from 0 to billions of pins and inspired hundreds of millions to be more creative.

How we did it

We synthesized dozens of sources (most of which are available on Pinterest’s new Company Engineering Profile) from YouTube, community sites, and Pinterest’s Engineering blog. Each Stack Decision has a link to the original source content if you want to dive in deeper into any of the decisions.

The original Pinterest tech stack

Pinterest was founded in 2009, and the site launched in closed beta in March 2010. Marty Weiner, Founding Engineer at Pinterest, gave a tech talk where he outlined their initial tech stack which was cobbled together by the 3 founders which was Python, MySQL, and the intial site was hosted on Rackspace.

GOTO 2014 • Scaling Pinterest • Marty Weiner

Mar 2010

The early Pinterest tech stack

In 2011, Paul Sciarra, Co-founder of Pinterest, shared details on their initial tech stack at the time: > We use python + heavily-modified Django at the application layer. Tornado and (very selectively) node.js as web-servers. Memcached and membase / redis for object- and logical-caching, respectively. RabbitMQ as a message queue. Nginx, HAproxy and Varnish for static-delivery and load-balancing. Persistent data storage using MySQL. MrJob on EMR for map-reduce. Marty Weiner, Founding Engineer at Pinterest, also gave some more details on their tech stack early that year outlining more details on their early tech stack. He mentioned that they moved from Rackspace to Amazon Web Services:

GOTO 2014 • Scaling Pinterest • Marty Weiner

Aug 2011

Building User Typeahead for discovery

Using HBase as a storing solution, Pinterest embarked on a journey to improve its typeahead for better discovery thus solving the problem of a slow and unresponsive typeahead. With HBase in place, few things that improved considerably were speed, scalability, fault tolerance, and writeable. To tackle the challenge of Pinners with millions of followers, 2 kinds of schemas stored the contacts: wide schema and tall schema. The new typeahead implementation was markedly faster and registered increase in message sends.

Infrastructure priorities (Service Reliability, Developer Productivity and Infra Efficiency).

Aug 2014

Scaling past just databases

By 2013 Pinterest had reached significant scale and started to build out their data infrastructure. They spun up a search team, a spam team, web, mobile, growth, infrastructure and operations team.

GOTO 2014 • Scaling Pinterest • Marty Weiner

Oct 2014

Reading the right metrics using Pinalytics

Pinterest built Pinalytics to help employees analyze information quickly and absorb meaningful analytics more efficiently. The web application stack consisted of MySQL, SQLAlchemy and the Flask framework. The frontend had React for building user interface components with simple and interactive visualizations. With tools such as Metric Composer, Custom Reporting and Metric Computation built-in and a scalable analytics engine, teams at Pinterest were able to track relevant metrics and make fast decisions backed by data to improve the user experience.

Infrastructure priorities (Service Reliability, Developer Productivity and Infra Efficiency).

Dec 2014

Building Bulk Editing with JavaScript and Transit

With the number of Pins in the Pinterest system growing to well over 30 billion, a top requested feature had been the ability to move a large number of Pins to another board and better manage them. This resulted in the use of JavaScript Promises and the JavaScript library Transit. Through Transit, they ran a function, and, upon completion, run other code without dealing with callbacks. They used promises to make calls to the API, and, upon success or failure, sent a message to a different module to tell it to enter or exit Bulk Edit mode. Transit let them run CSS animations from JavaScript and even allow them to chain animations together. This gave them fine-grained control over when different elements animate into and off of the page.

Jan 2015

MySQL improvements for more throughput

Starting in 2014, Pinterest made a number of improvements to the asynchronous job execution system. With some tweaks, 5X more throughput was achieved on the MySQL backend, which allowed over 2,000 enqueues per second with a single i2.2xl MySQL EC2 instance. Since MySQL offers higher durability with reliable replication, this change allowed to move all the workloads to MySQL and deprecate the use of the Redis backend. Pinterest also built a checkpointing feature to support long running jobs and complex workflows. In addition, a new dashboard was created acting as a one stop shop to view job status, debug job failures, lookup any job by ID or body text and a number of other features that were a delight to the users.

PinLater Thrift service architecture.

Nov 2015

AWS for Auto Scaling Pinterest

With Pinterest built on top of AWS, Amazon Auto Scaling was applied to their service. AWS provided spot instances that were more cost effectice and implemented a spot auto scaling group to run in tandem with demand instances to reduce the number of on-demand instances running during peak hours. AWS lifecycle hook was implemented into their auto scaling engine and deploy service. Since enabling auto scaling, Pinterest saw promising gains as the average CPU utilization for the fleets were flat. Auto scaling saved a significant number of instance hours every day, which led to several million dollars in savings every year.

Auto scaling Pinterest.

Sept 2016

Migrating legacy web experience to React

Pinterest decided to move to React because it rendered faster than the previous template engine (Jinja), had fewer obstacles to iterating on features and had a large developer community. The first step was to consolidate to a single template rendering engine between client and server before they could replace that engine with something else. If the server could interpret JavaScript, use another template engine called Nunjucks to render templates and share their client-side code, they could then move forward with an iterative migration to React. Pinterest then set up Node processes behind an Nginx proxy layer and architected the interface in such a way that each network request would be a stateless render. They also refactored the client-side rendering JavaScript so it could be used by both the client and the server. This resulted in a synchronous rendering module with an API that took environment-agnostic requests. Once the Nunjucks engine was in place and serving 100% of Pinterest.com’s templates, it was developers started converting their modules to React components which set them on a path to build a complete React application while seeing many performance and developer productivity gains from the migration.

Switching Pinterest to React.

Nov 2016

RocksDB Replication for computing with Data

Pinterest adopted RocksDB as it was adaptable, supports basic and advanced database operations with high performance and meets the majority of requirements for building large-scale, production-strength distributed stateful services. RocksDB gave them an embeddable persistent key-value store for fast storage. While they found the majority of the production needs met, Pinterest built and open sourced a RocksDB replicator called Rocksplicator, cluster management library and tools for RocksDB based stateful services.

RocksDB at Pinterest.

Nov 2016

Python & Pants for continuous development

Pinterest built Python commons to provide a seamless experience for their Python developers. Python tools work great for managing code in a single repo; they built a monorepo called Python Commons using the Pants build tool. To streamline the release process, a Python EXecutable(PEX) file was the release primitive. Using this development setup they took care of all the boilerplate code a developer writes before working on a project. This helped the developers focus on code without having to worry about setup.py, tox, virtualenv. It also eliminated the need to create scripts to setup and run the project locally, scripts to release a Docker or Debian packages or scripts to test code locally or in Jenkins.

Oct 2017

Kafka Streams for Predictable Budgets

Pinterest started using Kafka Streams API to provide inflight spend data to thousands of ads servers in mere seconds. They built a predictive system using Kafka Streams to reduce overdelivery in the ads system and provide near-instant information about inflight spend. Kafka Streams has a millisecond delay guarantee that beats Spark and Flink. Kafka Streams is a Java application with no heavy external dependencies like dedicated clusters which minimizes maintenance costs. Using Apache Kafka Streams’ API to build a predictive spend pipeline was a new initiative for their ads infrastructure and it turned out to be fast, stable, fault-tolerant and scalable.

Oct 2017

Detecting Duplicate Images

In order to detect near-duplicate images Pinterest utilized Spark and TensorFlow. By leveraging Batch LSH implementation in Spark, they drastically reduced computational complexity by skipping unlikely-to-be-similar pairs of images. The Spark-based implementation combines efficient distribution of workload as well as low-level optimization to minimize memory and CPU footprint. The subsequent fine-tuning step uses a supervised feed-forward network to select and rank image pairs that are above the NearDup similarity threshold. The combination of Spark and TensorFlow inference uses the best of distributed computations as well as vectorization per core to achieve both high throughput as well as low latency for prediction. The results of these two steps were then used to cluster images that help power tens of billions of search results and recommendations on Pinterest every day.

Jun 2018

Kafka for Scaling Up

Pinterest used Apache Kafka extensively as a message bus to transport data and to power real-time streaming services, ultimately helping more than 250 million Pinners around the world discover and do what they love. Kafka was used to transport data to our data warehouse, including critical events like impressions, clicks, close-ups, and repins. Pinterest also used Kafka to transport visibility metrics for internal services. If the metrics-related Kafka clusters had any glitches, Pinterest couldn’t accurately monitor their services or generate alerts that signal issues. On the real-time streaming side, Kafka is used to power many streaming applications, such as fresh content indexing and recommendation, spam detection and filtering, real-time advertiser budget computation, and so on.

How Pinterest runs Kafka at scale.

Nov 2018

Recommending 4+ Billion Ideas to 250+ Million Users in Real Time

One of the primary engineering challenges at Pinterest was how to help people discover ideas they want to try, which means serving the right idea to the right person at the right time. While most other recommender systems have a small pool of possible candidates (like 100,000 film titles on a movie review site), Pinterest has to recommend from a catalog of more than 4+ billion ideas. To make it happen, they built Pixie, a flexible, graph-based system for making personalized recommendations in real-time. For more info, watch the tech talk on Pixie:

Recommending 4+ Billion Ideas to 250+ Million Users in Real Time

Nov 2018

Faster & reliable iOS builds with Bazel

Pinterest had been looking for ways to streamline the process of iOS builds, and set out to improve the speed and reliability of the iOS builds on local and continuous integration environments. After comparing Xcode, Cocoapods, Buck and Bazel, they identified Bazel was the best fit for building a foundation for an order of magnitude improvement in performance and eliminating variability in build environments. Post Bazel, Pinterest was shipping all the iOS releases using Bazel, as it resulted in faster builds, local disk caches allowing for instant rebuilds for anything you’ve built before (other branches, commits, etc); environments were identical between CI and local environments, so build issues were easy to reproduce, and increased automation. Tasks like code generation were included as part of the build graph. With Bazel, you can build once and reuse everywhere. After introducing remote build caching, build times dropped under a minute and as low as 30 seconds since they don’t need to rebuild anything that has been built on any machine.

Feb 2019

Kafka Optimized and Ready for the Cloud

Pinterest set out to build and execute locality aware systems and balancing algorithms that can help reduce costs, they also made Kafka Producer and Consumer rack aware such that is helps efficiently route traffic. They rolled out AZ aware S3 transporter to production, which resulted in more than 25% savings in AZ transfer cost for Logging. Pinterest constantly yearned for and worked towards slowly rolling out AZ aware logging agent to further reduce the AZ transfer cost.

Optimizing Kafka for the cloud

Apr 2019

Singer powering Pinterest in performant and reliable logging

Singer has been a critical component in Pinterest’s data infrastructure and streams over one trillion messages per day. Pinterest shared open sourced Singer on GitHub. Singer can be easily extended to support data uploading to custom destinations. Singer can automatically detect newly added configuration and process related log streams. Running as a daemonset in Kubernetes environment, Singer can query the kubelet API to detect live pods on each node, and process log streams on each pod based on the configuration.

Open sourcing Singer

Jun 2019

Kubernetes for Pinterest

Kubernetes provided infrastructure priorities such as service reliability, developer productivity, and infrastructure efficiency. Over the years, 300 million Pinners have saved more than 200 billion Pins on Pinterest across more than 4 billion boards. To serve this vast user base and content pool, Pinterest developed thousands of services, ranging from microservices of a handful CPUs to huge monolithic services that occupy a whole VM fleet. There are also various kinds of batch jobs from all kinds of different frameworks, which can be CPU, memory or I/O intensive. To support these diverse workloads, the infrastructure team at Pinterest implemented Kubernetes.

Infrastructure priorities (Service Reliability, Developer Productivity and Infra Efficiency).

For more information, watch this tech talk Pinterest’s Journey from VMs to Containers

Aug 2019

Managing Web Deploys with Zookeeper

Pinterest started using Zookeeper while moving to a CI/CD model. The route map was stored in ZooKeeper and distributed via the config pipeline. The capacity per version per stage was calculated from the available endpoints on the published serversets (which also exist in ZooKeeper). That is, endpoints had metadata about their versions which was then used for capacity calculation. This was all very convenient, because Pinterest could rely on existing and battle tested systems. However, it also came with the challenge of eventual consistency. Not all Envoy servers had the same view of the world at the same time. They had to maintain a lot of state in ZooKeeper to keep track of what had previously been served, when the new version became ready, etc. Over the years, the state machine controlling all of this grew wildly complex to the point where it was hard to change something without causing an incident. Pinterest published a tech talk on Envoy, https://www.youtube.com/watch?v=y2O6dsaqqRA

Oct 2019

Measuring real-time experiment analytics using Apache Flink

As a solution to run thousands of experiments every day, and rely mostly on daily metrics to evaluate performance, Pinterest developed a near real-time experimentation platform for fresher experiment metrics to help in catching these issues as soon as possible. The dashboards created display the volume (i.e. number of actions) and propensity (i.e number of unique users) of the control and treatment groups of an experiment for a selected event. The counts are aggregated for three days after an experiment gets launched or ramped up. If a re-ramp (increase in user allocation of control and treatment groups) occurs after the three days, the counts accumulate again from zero for another three days. Real-time Experiment Analytics was the first Flink-based application in production at Pinterest. It has been built out of the Flink platform and provide it as a service.

Oct 2019

Effective Scaling of Pinterest ads analytics with Apache Druid

Pinterest found that HBase worked very well when it came to accessing random data points, but it wasn’t built for fast grouping and aggregation. Druid allowed Pinterest to bypass all of the complicated data slicing ingestion logic, and also supports – Real-time ingestion via Kafka, automatic versioning of ingested data, data pre-aggregation based on user-set granularity, approximate algorithms for count-distinct questions, a SQL interface, easy to understand code and a very supportive community. Druid supports two modes of ingestion: Native and Hadoop. Both are launched via a REST call to the Overlord node. On the server side, they found the basic cluster tuning guidance suggested by the Druid documentation very helpful. One non-obvious caveat was being mindful of how many GroupBy queries can be executed at any time given the number of merge buffers configured. As Pinterest’s business grows, the work on the core Druid platform for analytics would evolve alongside it.

Jan 2020

pinterest tech stack stackshare timeline decisions 12 |  Uncategorized | infrasture developmentpinterest tech stack stackshare timeline decisions 12 |  Uncategorized | infrasture developmentpinterest tech stack stackshare timeline decisions 12 |  Uncategorized | infrasture developmentpinterest scaling first stack 5 |  Uncategorized | infrasture developmentpinterest tech stack stackshare timeline decisions 12 |  Uncategorized | infrasture developmentpinterest tech stack stackshare timeline decisions 12 |  Uncategorized | infrasture development

Source: Stack History: A Timeline of Pinterest’s Tech Stack Evolution | StackShare

0 0 vote
Article Rating
Author profile

About Maida
She is Zipsite's all around Zipsiter. Only clocks out when she can barely stand. She sings, bakes, and tries so hard to be fit. But don't judge, she can literally strong arm you no problem. She's currently involved with eCorp, Contrib and VNOC, besides being a Brazilian Jiu Jit Su practitioner.

Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments
Total
0
Share this Page

So we promise if you share this page, you get a free social media boost on us 🍺!

 

 

Facebook
Twitter
Google+
Pinterest0
Linkedin0
GMail0
Messenger0
Diigo0
Total
0

Everybody needs a helping hand and we feel that we could all get anything we want, even stop this pandemic, by just reaching out.

 

 

ajax-loader
0
Would love your thoughts, please comment.x
()
x