# applied-ml
Curated papers, articles, and blogs on data science & machine learning in production. ⚙️
Figuring out how to implement your ML project? Learn how other organizations did it:
- How the problem is framed 🔎(e.g., personalization as recsys vs. search vs. sequences)
- What machine learning techniques worked ✅ (and sometimes, what didn't ❌)
- Why it works, the science behind it with research, literature, and references 📂
- What real-world results were achieved (so you can better assess ROI ⏰💰📈)
P.S., Want a summary of ML advancements? 👉ml-surveys (opens new window)
Table of Contents
- Data Quality
- Data Engineering
- Data Discovery
- Feature Stores
- Classification
- Regression
- Forecasting
- Recommendation
- Search & Ranking
- Embeddings
- Natural Language Processing
- Sequence Modelling
- Computer Vision
- Reinforcement Learning
- Anomaly Detection
- Graph
- Optimization
- Information Extraction
- Weak Supervision
- Generation
- Audio
- Validation and A/B Testing
- Model Management
- Efficiency
- Ethics
- MLOps Platforms
- Practices
- Team Structure
- Fails
# Data Quality
- Monitoring Data Quality at Scale with Statistical Modeling (opens new window)
Uber - An Approach to Data Quality for Netflix Personalization Systems (opens new window)
Netflix - Automating Large-Scale Data Quality Verification (opens new window) (Paper (opens new window))
Amazon - Meet Hodor — Gojek’s Upstream Data Quality Tool (opens new window)
Gojek - Reliable and Scalable Data Ingestion at Airbnb (opens new window)
Airbnb - Data Management Challenges in Production Machine Learning (opens new window) (Paper (opens new window))
Google - Improving Accuracy By Certainty Estimation of Human Decisions, Labels, and Raters (opens new window) (Paper (opens new window))
Facebook
# Data Engineering
- Zipline: Airbnb’s Machine Learning Data Management Platform (opens new window)
Airbnb - Sputnik: Airbnb’s Apache Spark Framework for Data Engineering (opens new window)
Airbnb - Unbundling Data Science Workflows with Metaflow and AWS Step Functions (opens new window)
Netflix - How DoorDash is Scaling its Data Platform to Delight Customers and Meet Growing Demand (opens new window)
DoorDash - Revolutionizing Money Movements at Scale with Strong Data Consistency (opens new window)
Uber - Zipline - A Declarative Feature Engineering Framework (opens new window)
Airbnb - Real-time Data Infrastructure at Uber (opens new window)
Uber
# Data Discovery
- Amundsen — Lyft’s Data Discovery & Metadata Engine (opens new window)
Lyft - Open Sourcing Amundsen: A Data Discovery And Metadata Platform (opens new window) (Code (opens new window))
Lyft - Amundsen: One Year Later (opens new window)
Lyft - Using Amundsen to Support User Privacy via Metadata Collection at Square (opens new window)
Square - Discovery and Consumption of Analytics Data at Twitter (opens new window)
Twitter - Democratizing Data at Airbnb (opens new window)
Airbnb - Databook: Turning Big Data into Knowledge with Metadata at Uber (opens new window)
Uber - Turning Metadata Into Insights with Databook (opens new window)
Uber - Metacat: Making Big Data Discoverable and Meaningful at Netflix (opens new window) (Code (opens new window))
Netflix - DataHub: A Generalized Metadata Search & Discovery Tool (opens new window) (Code (opens new window))
LinkedIn - DataHub: Popular Metadata Architectures Explained (opens new window)
LinkedIn - How We Improved Data Discovery for Data Scientists at Spotify (opens new window)
Spotify - How We’re Solving Data Discovery Challenges at Shopify (opens new window)
Shopify - Nemo: Data discovery at Facebook (opens new window)
Facebook - Apache Atlas: Data Goverance and Metadata Framework for Hadoop (opens new window) (Code (opens new window))
Apache - Collect, Aggregate, and Visualize a Data Ecosystem's Metadata (opens new window) (Code (opens new window))
WeWork - Exploring Data at Netflix (opens new window) (Code (opens new window))
Netflix
# Feature Stores
- Introducing Feast: An Open Source Feature Store for Machine Learning (opens new window) (Code (opens new window))
Gojek - Feast: Bridging ML Models and Data (opens new window)
Gojek - Building a Scalable ML Feature Store with Redis, Binary Serialization, and Compression (opens new window)
DoorDash - Building Riviera: A Declarative Real-Time Feature Engineering Framework (opens new window)
DoorDash - Michelangelo Palette: A Feature Engineering Platform at Uber (opens new window)
Uber - Optimal Feature Discovery: Better, Leaner Machine Learning Models Through Information Theory (opens new window)
Uber - Distributed Time Travel for Feature Generation (opens new window)
Netflix - Fact Store at Scale for Netflix Recommendations (opens new window)
Netflix - The Architecture That Powers Twitter's Feature Store (opens new window)
Twitter - Building the Activity Graph, Part 2 (Feature Storage Section) (opens new window)
LinkedIn - Rapid Experimentation Through Standardization: Typed AI features for LinkedIn’s Feed (opens new window)
LinkedIn - Accelerating Machine Learning with the Feature Store Service (opens new window)
Condé Nast - Building a Feature Store (opens new window)
Monzo Bank - Zipline: Airbnb’s Machine Learning Data Management Platform (opens new window)
Airbnb - ML Feature Serving Infrastructure at Lyft (opens new window)
Lyft - Butterfree: A Spark-based Framework for Feature Store Building (opens new window) (Code (opens new window))
QuintoAndar
# Classification
- High-Precision Phrase-Based Document Classification on a Modern Scale (opens new window) (Paper (opens new window))
LinkedIn - Chimera: Large-scale Classification using Machine Learning, Rules, and Crowdsourcing (opens new window) (Paper (opens new window))
WalmartLabs - Large-scale Item Categorization for e-Commerce (opens new window) (Paper (opens new window))
DianPing,eBay - Large-scale Item Categorization in e-Commerce Using Multiple Recurrent Neural Networks (opens new window) (Paper (opens new window))
NAVER - Categorizing Products at Scale (opens new window)
Shopify - Learning to Diagnose with LSTM Recurrent Neural Networks (opens new window) (Paper (opens new window))
Google - Discovering and Classifying In-app Message Intent at Airbnb (opens new window)
Airbnb - How We Built the Good First Issues Feature (opens new window)
GitHub - Teaching Machines to Triage Firefox Bugs (opens new window)
Mozilla - Testing Firefox More Efficiently with Machine Learning (opens new window)
Mozilla - Using ML to Subtype Patients Receiving Digital Mental Health Interventions (opens new window) (Paper (opens new window))
Microsoft - Prediction of Advertiser Churn for Google AdWords (opens new window) (Paper (opens new window))
Google - Scalable Data Classification for Security and Privacy (opens new window) (Paper (opens new window))
Facebook - Uncovering Online Delivery Menu Best Practices with Machine Learning (opens new window)
DoorDash - Using a Human-in-the-Loop to Overcome the Cold Start Problem in Menu Item Tagging (opens new window)
DoorDash
# Regression
- Using Machine Learning to Predict Value of Homes On Airbnb (opens new window)
Airbnb - Using Machine Learning to Predict the Value of Ad Requests (opens new window)
Twitter - Open-Sourcing Riskquant, a Library for Quantifying Risk (opens new window) (Code (opens new window))
NetFlix - Solving for Unobserved Data in a Regression Model Using a Simple Data Adjustment (opens new window)
DoorDash
# Forecasting
- Forecasting at Uber: An Introduction (opens new window)
Uber - Engineering Extreme Event Forecasting at Uber with RNN (opens new window)
Uber - Transforming Financial Forecasting with Data Science and Machine Learning at Uber (opens new window)
Uber - Under the Hood of Gojek’s Automated Forecasting Tool (opens new window)
GoJek - BusTr: Predicting Bus Travel Times from Real-Time Traffic (opens new window) (Paper (opens new window), Video (opens new window))
Google - Retraining Machine Learning Models in the Wake of COVID-19 (opens new window)
DoorDash - Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow (opens new window) (Paper (opens new window), Code (opens new window))
Atlassian - Greykite: A flexible, intuitive, and fast forecasting library (opens new window)
LinkedIn
# Recommendation
- Amazon.com Recommendations: Item-to-Item Collaborative Filtering (opens new window) (Paper (opens new window))
Amazon - Temporal-Contextual Recommendation in Real-Time (opens new window) (Paper (opens new window))
Amazon - P-Companion: A Framework for Diversified Complementary Product Recommendation (opens new window) (Paper (opens new window))
Amazon - Recommending Complementary Products in E-Commerce Push Notifications (opens new window) (Paper (opens new window))
Alibaba - Deep Interest with Hierarchical Attention Network for Click-Through Rate Prediction (opens new window) (Paper (opens new window))
Alibaba - Behavior Sequence Transformer for E-commerce Recommendation in Alibaba (opens new window) (Paper (opens new window))
Alibaba - TPG-DNN: A Method for User Intent Prediction with Multi-task Learning (opens new window) (Paper (opens new window))
Alibaba - PURS: Personalized Unexpected Recommender System for Improving User Satisfaction (opens new window) (Paper (opens new window))
Alibaba - SDM: Sequential Deep Matching Model for Online Large-scale Recommender System (opens new window) (Paper (opens new window))
Alibaba - Multi-Interest Network with Dynamic Routing for Recommendation at Tmall (opens new window) (Paper (opens new window))
Alibaba - Controllable Multi-Interest Framework for Recommendation (opens new window) (Paper (opens new window))
Alibaba - MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction (opens new window) (Paper (opens new window))
Alibaba - ATBRG: Adaptive Target-Behavior Relational Graph Network for Effective Recommendation (opens new window) (Paper (opens new window))
Alibaba - Session-based Recommendations with Recurrent Neural Networks (opens new window) (Paper (opens new window))
Telefonica - How 20th Century Fox uses ML to predict a movie audience (opens new window) (Paper (opens new window))
20th Century Fox - Deep Neural Networks for YouTube Recommendations (opens new window)
YouTube - Personalized Recommendations for Experiences Using Deep Learning (opens new window)
TripAdvisor - E-commerce in Your Inbox: Product Recommendations at Scale (opens new window)
Yahoo - Product Recommendations at Scale (opens new window) (Paper (opens new window))
Yahoo - Powered by AI: Instagram’s Explore recommender system (opens new window)
Facebook - Netflix Recommendations: Beyond the 5 stars (Part 1 (opens new window) (Part 2 (opens new window))
Netflix - Learning a Personalized Homepage (opens new window)
Netflix - Artwork Personalization at Netflix (opens new window)
Netflix - To Be Continued: Helping you find shows to continue watching on Netflix (opens new window)
Netflix - Calibrated Recommendations (opens new window) (Paper (opens new window))
Netflix - Marginal Posterior Sampling for Slate Bandits (opens new window) (Paper (opens new window))
Netflix - Food Discovery with Uber Eats: Recommending for the Marketplace (opens new window)
Uber - Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations (opens new window)
Uber - How Music Recommendation Works — And Doesn’t Work (opens new window)
Spotify - Music recommendation at Spotify (opens new window)
Spotify - Recommending Music on Spotify with Deep Learning (opens new window)
Spotify - For Your Ears Only: Personalizing Spotify Home with Machine Learning (opens new window)
Spotify - Reach for the Top: How Spotify Built Shortcuts in Just Six Months (opens new window)
Spotify - Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits (opens new window) (Paper (opens new window))
Spotify - Contextual and Sequential User Embeddings for Large-Scale Music Recommendation (opens new window) (Paper (opens new window))
Spotify - The Evolution of Kit: Automating Marketing Using Machine Learning (opens new window)
Shopify - Using Machine Learning to Predict what File you Need Next (Part 1) (opens new window)
Dropbox - Using Machine Learning to Predict what File you Need Next (Part 2) (opens new window)
Dropbox - Personalized Recommendations in LinkedIn Learning (opens new window)
LinkedIn - A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 1) (opens new window)
LinkedIn - A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 2) (opens new window)
LinkedIn - Learning to be Relevant: Evolution of a Course Recommendation System (opens new window) (PAPER NEEDED)
LinkedIn - Building a Heterogeneous Social Network Recommendation System (opens new window)
LinkedIn - How TikTok recommends videos #ForYou (opens new window)
ByteDance - A Meta-Learning Perspective on Cold-Start Recommendations for Items (opens new window) (Paper (opens new window))
Twitter - Lessons Learned Addressing Dataset Bias in Model-Based Candidate Generation (opens new window) (Paper (opens new window))
Twitter - Zero-Shot Heterogeneous Transfer Learning from RecSys to Cold-Start Search Retrieval (opens new window) (Paper (opens new window))
Google - Improved Deep & Cross Network for Feature Cross Learning in Web-scale LTR Systems (opens new window) (Paper (opens new window))
Google - Self-supervised Learning for Large-scale Item Recommendations (opens new window) (Paper (opens new window))
Google - Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations (opens new window) (Paper (opens new window))
Google - Personalized Channel Recommendations in Slack (opens new window)
Slack - Deep Retrieval: End-to-End Learnable Structure Model for Large-Scale Recommendations (opens new window) (Paper (opens new window))
ByteDance - Future Data Helps Training: Modeling Future Contexts for Session-based Recommendation (Paper (opens new window))
Tencent - Using AI to Help Health Experts Address the COVID-19 Pandemic (opens new window)
Facebook - A Case Study of Session-based Recommendations in the Home-improvement Domain (opens new window) (Paper (opens new window))
Home Depot - Balancing Relevance and Discovery to Inspire Customers in the IKEA App (opens new window) (Paper (opens new window))
Ikea - Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time (opens new window) (Paper (opens new window))
Pinterest - How we use AutoML, Multi-task learning and Multi-tower models for Pinterest Ads (opens new window)
Pinterest - Multi-task Learning for Related Products Recommendations at Pinterest (opens new window)
Pinterest - Improving the Quality of Recommended Pins with Lightweight Ranking (opens new window)
Pinterest - Personalized Cuisine Filter Based on Customer Preference and Local Popularity (opens new window)
DoorDash
# Search & Ranking
- Amazon Search: The Joy of Ranking Products (opens new window) (Paper (opens new window), Video (opens new window), Code (opens new window))
Amazon - Why Do People Buy Seemingly Irrelevant Items in Voice Product Search? (opens new window) (Paper (opens new window))
Amazon - Semantic Product Search (opens new window) (Paper (opens new window))
Amazon - QUEEN: Neural query rewriting in e-commerce (opens new window) (Paper (opens new window))
Amazon - How Lazada Ranks Products to Improve Customer Experience and Conversion (opens new window)
Lazada - Using Deep Learning at Scale in Twitter’s Timelines (opens new window)
Twitter - Machine Learning-Powered Search Ranking of Airbnb Experiences (opens new window)
Airbnb - Applying Deep Learning To Airbnb Search (opens new window) (Paper (opens new window))
Airbnb - Managing Diversity in Airbnb Search (opens new window) (Paper (opens new window))
Airbnb - Improving Deep Learning for Airbnb Search (opens new window) (Paper (opens new window))
Airbnb - Ranking Relevance in Yahoo Search (opens new window) (Paper (opens new window))
Yahoo - An Ensemble-based Approach to Click-Through Rate Prediction for Promoted Listings at Etsy (opens new window) (Paper (opens new window))
Etsy - Learning to Rank Personalized Search Results in Professional Networks (opens new window) (Paper (opens new window))
LinkedIn - Entity Personalized Talent Search Models with Tree Interaction Features (opens new window) (Paper (opens new window))
LinkedIn - In-session Personalization for Talent Search (opens new window) (Paper (opens new window))
LinkedIn - The AI Behind LinkedIn Recruiter Search and recommendation systems (opens new window)
LinkedIn - Learning Hiring Preferences: The AI Behind LinkedIn Jobs (opens new window)
LinkedIn - Quality Matches Via Personalized AI for Hirer and Seeker Preferences (opens new window)
LinkedIn - Understanding Dwell Time to Improve LinkedIn Feed Ranking (opens new window)
LinkedIn - Ads Allocation in Feed via Constrained Optimization (opens new window) (Paper (opens new window), Video (opens new window))
LinkedIn - Talent Search and Recommendation Systems at LinkedIn (opens new window) (Paper (opens new window))
LinkedIn - Understanding Dwell Time to Improve LinkedIn Feed Ranking (opens new window)
LinkedIn - AI at Scale in Bing (opens new window)
Microsoft - Query Understanding Engine in Traveloka Universal Search (opens new window)
Traveloka - The Secret Sauce Behind Search Personalisation (opens new window)
GoJek - Food Discovery with Uber Eats: Building a Query Understanding Engine (opens new window)
Uber - Neural Code Search: ML-based Code Search Using Natural Language Queries (opens new window)
Facebook - Bayesian Product Ranking at Wayfair (opens new window)
Wayfair - COLD: Towards the Next Generation of Pre-Ranking System (opens new window) (Paper (opens new window))
Alibaba - Globally Optimized Mutual Influence Aware Ranking in E-Commerce Search (opens new window) (Paper (opens new window))
Alibaba - Graph Intention Network for Click-through Rate Prediction in Sponsored Search (opens new window) (Paper (opens new window))
Alibaba - Reinforcement Learning to Rank in E-Commerce Search Engine (opens new window) (Paper (opens new window))
Alibaba - Aggregating Search Results from Heterogeneous Sources via Reinforcement Learning (opens new window) (Paper (opens new window))
Alibaba - Cross-domain Attention Network with Wasserstein Regularizers for E-commerce Search (opens new window)
Alibaba - Understanding Searches Better Than Ever Before (opens new window) (Paper (opens new window))
Google - Shop The Look: Building a Large Scale Visual Shopping System at Pinterest (opens new window) (Paper (opens new window), Video (opens new window))
Pinterest - Driving Shopping Upsells from Pinterest Search (opens new window)
Pinterest - GDMix: A Deep Ranking Personalization Framework (opens new window) (Code (opens new window))
LinkedIn - Bringing Personalized Search to Etsy (opens new window)
Etsy - Building a Better Search Engine for Semantic Scholar (opens new window)
Allen Institute for AI - Query Understanding for Natural Language Enterprise Search (opens new window) (Paper (opens new window))
Salesforce - How We Used Semantic Search to Make Our Search 10x Smarter (opens new window)
Tokopedia - Powering Search & Recommendations at DoorDash (opens new window)
DoorDash - Things Not Strings: Understanding Search Intent with Better Recall (opens new window)
DoorDash - Query Understanding for Surfacing Under-served Music Content (opens new window) (Paper (opens new window))
Spotify - How We Built A Context-Specific Bidding System for Etsy Ads (opens new window)
Etsy - Query2vec: Search query expansion with query embeddings (opens new window)
GrubHub - Embedding-based Retrieval in Facebook Search (opens new window) (Paper (opens new window))
Facebook - Towards Personalized and Semantic Retrieval for E-commerce Search via Embedding Learning (opens new window) (Paper (opens new window))
JD - MOBIUS: Towards the Next Generation of Query-Ad Matching in Baidu’s Sponsored Search (opens new window)
Baidu - Pre-trained Language Model based Ranking in Baidu Search (opens new window) (Paper (opens new window))
Baidu
# Embeddings
- Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba (opens new window) (Paper (opens new window))
Alibaba - Embeddings@Twitter (opens new window)
Twitter - Listing Embeddings in Search Ranking (opens new window) (Paper (opens new window))
Airbnb - Understanding Latent Style (opens new window)
Stitch Fix - Towards Deep and Representation Learning for Talent Search at LinkedIn (opens new window) (Paper (opens new window))
LinkedIn - Should we Embed? A Study on Performance of Embeddings for Real-Time Recommendations (opens new window)(Paper (opens new window))
Moshbit - Vector Representation Of Items, Customer And Cart To Build A Recommendation System (opens new window) (Paper (opens new window))
Sears - Machine Learning for a Better Developer Experience (opens new window)
Netflix - Announcing ScaNN: Efficient Vector Similarity Search (opens new window) (Paper (opens new window), Code (opens new window))
Google - Personalized Store Feed with Vector Embeddings (opens new window)
DoorDash
# Natural Language Processing
- Abusive Language Detection in Online User Content (opens new window) (Paper (opens new window))
Yahoo - How Natural Language Processing Helps LinkedIn Members Get Support Easily (opens new window)
LinkedIn - Building Smart Replies for Member Messages (opens new window)
LinkedIn - DeText: A deep NLP Framework for Intelligent Text Understanding (opens new window) (Code (opens new window))
LinkedIn - Smart Reply: Automated Response Suggestion for Email (opens new window) (Paper (opens new window))
Google - Gmail Smart Compose: Real-Time Assisted Writing (opens new window) (Paper (opens new window))
Google - SmartReply for YouTube Creators (opens new window)
Google - Using Neural Networks to Find Answers in Tables (opens new window) (Paper (opens new window))
Google - A Scalable Approach to Reducing Gender Bias in Google Translate (opens new window)
Google - Assistive AI Makes Replying Easier (opens new window)
Microsoft - AI Advances to Better Detect Hate Speech (opens new window)
Facebook - A State-of-the-Art Open Source Chatbot (opens new window) (Paper (opens new window))
Facebook - A Highly Efficient, Real-Time Text-to-Speech System Deployed on CPUs (opens new window)
Facebook - Deep Learning to Translate Between Programming Languages (opens new window) (Paper (opens new window), Code (opens new window))
Facebook - Deploying Lifelong Open-Domain Dialogue Learning (opens new window) (Paper (opens new window))
Facebook - Introducing Dynabench: Rethinking the way we benchmark AI (opens new window)
Facebook - Dynaboard: Moving Beyond Accuracy to Holistic Model Evaluation in NLP (opens new window) (Code (opens new window))
Facebook - Goal-Oriented End-to-End Conversational Models with Profile Features in a Real-World Setting (opens new window) (Paper (opens new window))
Amazon - How Gojek Uses NLP to Name Pickup Locations at Scale (opens new window)
GoJek - Give Me Jeans not Shoes: How BERT Helps Us Deliver What Clients Want (opens new window)
Stitch Fix - The State-of-the-art Open-Domain Chatbot in Chinese and English (opens new window) (Paper (opens new window))
Baidu - PEGASUS: A State-of-the-Art Model for Abstractive Text Summarization (opens new window) (Paper (opens new window), Code (opens new window))
Google - Photon: A Robust Cross-Domain Text-to-SQL System (opens new window) (Paper (opens new window)) (Demo (opens new window))
Salesforce - GeDi: A Powerful New Method for Controlling Language Models (opens new window) (Paper (opens new window), Code (opens new window))
Salesforce - Applying Topic Modeling to Improve Call Center Operations (opens new window)
RICOH - WIDeText: A Multimodal Deep Learning Framework (opens new window)
Airbnb - Dynaboard: Moving Beyond Accuracy to Holistic Model Evaluation in NLP (opens new window)
Facebook
# Sequence Modelling
- Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction (opens new window) (Paper (opens new window))
Alibaba - Search-based User Interest Modeling with Sequential Behavior Data for CTR Prediction (opens new window) (Paper (opens new window))
Alibaba - Deep Learning for Electronic Health Records (opens new window) (Paper (opens new window))
Google - Deep Learning for Understanding Consumer Histories (opens new window) (Paper (opens new window))
Zalando - Continual Prediction of Notification Attendance with Classical and Deep Networks (opens new window) (Paper (opens new window))
Telefonica - Using Recurrent Neural Network Models for Early Detection of Heart Failure Onset (opens new window) (Paper (opens new window))
Sutter Health - Doctor AI: Predicting Clinical Events via Recurrent Neural Networks (opens new window) (Paper (opens new window))
Sutter Health - How Duolingo uses AI in every part of its app (opens new window)
Duolingo - Leveraging Online Social Interactions For Enhancing Integrity at Facebook (opens new window) (Paper (opens new window), Video (opens new window))
Facebook
# Computer Vision
- Categorizing Listing Photos at Airbnb (opens new window)
Airbnb - Amenity Detection and Beyond — New Frontiers of Computer Vision at Airbnb (opens new window)
Airbnb - Powered by AI: Advancing product understanding and building new shopping experiences (opens new window)
Facebook - New AI Research to Help Predict COVID-19 Resource Needs From X-rays (opens new window) (Paper (opens new window), Model (opens new window))
Facebook - Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning (opens new window)
Dropbox - How we Improved Computer Vision Metrics by More Than 5% Only by Cleaning Labelling Errors (opens new window)
Deepomatic - A Neural Weather Model for Eight-Hour Precipitation Forecasting (opens new window) (Paper (opens new window))
Google - Machine Learning-based Damage Assessment for Disaster Relief (opens new window) (Paper (opens new window))
Google - RepNet: Counting Repetitions in Videos (opens new window) (Paper (opens new window))
Google - Converting Text to Images for Product Discovery (opens new window) (Paper (opens new window))
Amazon - How Disney Uses PyTorch for Animated Character Recognition (opens new window)
Disney - Image Captioning as an Assistive Technology (opens new window) (Video (opens new window))
IBM - AI for AG: Production machine learning for agriculture (opens new window)
Blue River - AI for Full-Self Driving at Tesla (opens new window)
Tesla - On-device Supermarket Product Recognition (opens new window)
Google - Using Machine Learning to Detect Deficient Coverage in Colonoscopy Screenings (opens new window) (Paper (opens new window))
Google - Shop The Look: Building a Large Scale Visual Shopping System at Pinterest (opens new window) (Paper (opens new window), Video (opens new window))
Pinterest - Developing Real-Time, Automatic Sign Language Detection for Video Conferencing (opens new window) (Paper (opens new window))
Google - Vision-based Price Suggestion for Online Second-hand Items (opens new window) (Paper (opens new window))
Alibaba - Making machines recognize and transcribe conversations in meetings using audio and video (opens new window)
Microsoft - An Efficient Training Approach for Very Large Scale Face Recognition (opens new window) (Paper (opens new window))
Alibaba
# Reinforcement Learning
- Deep Reinforcement Learning for Sponsored Search Real-time Bidding (opens new window) (Paper (opens new window))
Alibaba - Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning (opens new window) (Paper (opens new window))
Alibaba - Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising (opens new window) (Paper (opens new window))
Alibaba - Productionizing Deep Reinforcement Learning with Spark and MLflow (opens new window)
Zynga - Deep Reinforcement Learning in Production Part1 (opens new window) Part 2 (opens new window)
Zynga - Building AI Trading Systems (opens new window)
Denny Britz - Reinforcement Learning for On-Demand Logistics (opens new window)
DoorDash - Reinforcement Learning to Rank in E-Commerce Search Engine (opens new window) (Paper (opens new window))
Alibaba
# Anomaly Detection
- Detecting Performance Anomalies in External Firmware Deployments (opens new window)
Netflix - Detecting and Preventing Abuse on LinkedIn using Isolation Forests (opens new window) (Code (opens new window))
LinkedIn - Preventing Abuse Using Unsupervised Learning (opens new window)
LinkedIn - The Technology Behind Fighting Harassment on LinkedIn (opens new window)
LinkedIn - Uncovering Insurance Fraud Conspiracy with Network Learning (opens new window) (Paper (opens new window))
Ant Financial - How Does Spam Protection Work on Stack Exchange? (opens new window)
Stack Exchange - Auto Content Moderation in C2C e-Commerce (opens new window)
Mercari - Blocking Slack Invite Spam With Machine Learning (opens new window)
Slack - Cloudflare Bot Management: Machine Learning and More (opens new window)
Cloudflare - Anomalies in Oil Temperature Variations in a Tunnel Boring Machine (opens new window)
SENER - Using Anomaly Detection to Monitor Low-Risk Bank Customers (opens new window)
Rabobank - Fighting fraud with Triplet Loss (opens new window)
OLX Group - Facebook is Now Using AI to Sort Content for Quicker Moderation (opens new window) (Alternative (opens new window))
Facebook - How AI is getting better at detecting hate speech Part 1 (opens new window), Part 2 (opens new window), Part 3 (opens new window), Part 4 (opens new window)
Facebook - Deep Anomaly Detection with Spark and Tensorflow (opens new window) (Hopsworks Video (opens new window))
Swedbank,Hopsworks
# Graph
- Building The LinkedIn Knowledge Graph (opens new window)
LinkedIn - Retail Graph — Walmart’s Product Knowledge Graph (opens new window)
Walmart - Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations (opens new window)
Uber - AliGraph: A Comprehensive Graph Neural Network Platform (opens new window) (Paper (opens new window))
Alibaba - Scaling Knowledge Access and Retrieval at Airbnb (opens new window)
Airbnb - Contextualizing Airbnb by Building Knowledge Graph (opens new window)
Airbnb - Traffic Prediction with Advanced Graph Neural Networks (opens new window)
DeepMind - SimClusters: Community-Based Representations for Recommendations (opens new window) (Paper (opens new window), Video (opens new window))
Twitter - Metapaths guided Neighbors aggregated Network for Heterogeneous Graph Reasoning (opens new window) (Paper (opens new window))
Alibaba - Graph Intention Network for Click-through Rate Prediction in Sponsored Search (opens new window) (Paper (opens new window))
Alibaba - JEL: Applying End-to-End Neural Entity Linking in JPMorgan Chase (opens new window) (Paper (opens new window))
JPMorgan Chase - Graph Convolutional Neural Networks for Web-Scale Recommender Systems (opens new window) (Paper (opens new window))
Pinterest
# Optimization
- How Trip Inferences and Machine Learning Optimize Delivery Times on Uber Eats (opens new window)
Uber - Next-Generation Optimization for Dasher Dispatch at DoorDash (opens new window)
DoorDash - Matchmaking in Lyft Line (Part 1) (opens new window) (Part 2) (opens new window) (Part 3) (opens new window)
Lyft - The Data and Science behind GrabShare Carpooling (opens new window) (PAPER NEEDED)
Grab - Optimization of Passengers Waiting Time in Elevators Using Machine Learning (opens new window)
Thyssen Krupp AG - Think Out of The Package: Recommending Package Types for E-commerce Shipments (opens new window) (Paper (opens new window))
Amazon - Optimizing DoorDash’s Marketing Spend with Machine Learning (opens new window)
DoorDash
# Information Extraction
- Unsupervised Extraction of Attributes and Their Values from Product Description (opens new window) (Paper (opens new window))
Rakuten - Information Extraction from Receipts with Graph Convolutional Networks (opens new window)
Nanonets - Using Machine Learning to Index Text from Billions of Images (opens new window)
Dropbox - Extracting Structured Data from Templatic Documents (opens new window) (Paper (opens new window))
Google - AutoKnow: self-driving knowledge collection for products of thousands of types (opens new window) (Paper (opens new window), Video (opens new window))
Amazon - One-shot Text Labeling using Attention and Belief Propagation for Information Extraction (opens new window) (Paper (opens new window))
Alibaba
# Weak Supervision
- Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale (opens new window) (Paper (opens new window))
Google - Osprey: Weak Supervision of Imbalanced Extraction Problems without Code (opens new window) (Paper (opens new window))
Intel - Overton: A Data System for Monitoring and Improving Machine-Learned Products (opens new window) (Paper (opens new window))
Apple - Bootstrapping Conversational Agents with Weak Supervision (opens new window) (Paper (opens new window))
IBM
# Generation
- Better Language Models and Their Implications (opens new window) (Paper (opens new window))
OpenAI - Language Models are Few-Shot Learners (opens new window) (Paper (opens new window)) (GPT-3 Blog post (opens new window))
OpenAI - Image GPT (opens new window) (Paper (opens new window), Code (opens new window))
OpenAI - Deep Learned Super Resolution for Feature Film Production (opens new window) (Paper (opens new window))
Pixar - Unit Test Case Generation with Transformers (opens new window)
Microsoft
# Audio
- Improving On-Device Speech Recognition with VoiceFilter-Lite (opens new window) (Paper (opens new window))
Google - The Machine Learning Behind Hum to Search (opens new window)
Google
# Validation and A/B Testing
- The Reusable Holdout: Preserving Validity in Adaptive Data Analysis (opens new window) (Paper (opens new window))
Google - Twitter Experimentation: Technical Overview (opens new window)
Twitter - Experimenting to Solve Cramming (opens new window)
Twitter - Building an Intelligent Experimentation Platform with Uber Engineering (opens new window)
Uber - Analyzing Experiment Outcomes: Beyond Average Treatment Effects (opens new window)
Uber - Under the Hood of Uber’s Experimentation Platform (opens new window)
Uber - Announcing a New Framework for Designing Optimal Experiments with Pyro (opens new window) (Paper (opens new window)) (Paper (opens new window))
Uber - Enabling 10x More Experiments with Traveloka Experiment Platform (opens new window)
Traveloka - Large Scale Experimentation at Stitch Fix (opens new window) (Paper (opens new window))
Stitch Fix - Multi-Armed Bandits and the Stitch Fix Experimentation Platform (opens new window)
Stitch Fix - Experimentation with Resource Constraints (opens new window)
Stitch Fix - Modeling Conversion Rates and Saving Millions Using Kaplan-Meier and Gamma Distributions (opens new window) (Code (opens new window))
Better - It’s All A/Bout Testing: The Netflix Experimentation Platform (opens new window)
Netflix - Computational Causal Inference at Netflix (opens new window) (Paper (opens new window))
Netflix - Key Challenges with Quasi Experiments at Netflix (opens new window)
Netflix - Constrained Bayesian Optimization with Noisy Experiments (opens new window) (Paper (opens new window))
Facebook - Detecting Interference: An A/B Test of A/B Tests (opens new window)
LinkedIn - Making the LinkedIn experimentation engine 20x faster (opens new window)
LinkedIn - Our Evolution Towards T-REX: The Prehistory of Experimentation Infrastructure at LinkedIn (opens new window)
LinkedIn - How to Use Quasi-experiments and Counterfactuals to Build Great Products (opens new window)
Shopify - Improving Experimental Power through Control Using Predictions as Covariate (opens new window)
Doordash - Supporting Rapid Product Iteration with an Experimentation Analysis Platform (opens new window)
DoorDash - Improving Online Experiment Capacity by 4X with Parallelization and Increased Sensitivity (opens new window)
DoorDash - Leveraging Causal Modeling to Get More Value from Flat Experiment Results (opens new window)
DoorDash - Iterating Real-time Assignment Algorithms Through Experimentation (opens new window)
DoorDash - Running Experiments with Google Adwords for Campaign Optimization (opens new window)
DoorDash - Spotify’s New Experimentation Platform (Part 1) (opens new window) (Part 2) (opens new window)
Spotify - Overlapping Experiment Infrastructure: More, Better, Faster Experimentation (opens new window) (Paper (opens new window))
Google - Experimentation Platform at Zalando: Part 1 - Evolution (opens new window)
Zalando - Scaling Airbnb’s Experimentation Platform (opens new window)
Airbnb - Designing Experimentation Guardrails (opens new window)
Airbnb - Reliable and Scalable Feature Toggles and A/B Testing SDK at Grab (opens new window)
Grab - Meet Wasabi, an Open Source A/B Testing Platform (opens new window) (Code (opens new window))
Intuit - Building Pinterest’s A/B Testing Platform (opens new window)
Pinterest
# Model Management
- Runway - Model Lifecycle Management at Netflix (opens new window)
Netflix - Overton: A Data System for Monitoring and Improving Machine-Learned Products (opens new window) (Paper (opens new window))
Apple - Managing ML Models @ Scale - Intuit’s ML Platform (opens new window)
Intuit - Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions (opens new window)
Comcast
# Efficiency
- GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce (opens new window) (Paper (opens new window))
Facebook - Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks (opens new window) (Paper (opens new window))
Uber - How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUs (opens new window)
Roblox
# Ethics
- Building Inclusive Products Through A/B Testing (opens new window) (Paper (opens new window))
LinkedIn - LiFT: A Scalable Framework for Measuring Fairness in ML Applications (opens new window) (Paper (opens new window))
LinkedIn
# Infra
- Reengineering Facebook AI’s Deep Learning Platforms for Interoperability (opens new window)
Facebook - Elastic Distributed Training with XGBoost on Ray (opens new window)
Uber
# MLOps Platforms
- Managing ML Models @ Scale - Intuit’s ML Platform (opens new window)
Intuit - Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions (opens new window)
Comcast - Big Data Machine Learning Platform at Pinterest (opens new window)
Pinterest - Real-time Machine Learning Inference Platform at Zomato (opens new window)
Zomato - Meet Michelangelo: Uber’s Machine Learning Platform (opens new window)
Uber - Building Flexible Ensemble ML Models with a Computational Graph (opens new window)
DoorDash - LyftLearn: ML Model Training Infrastructure built on Kubernetes (opens new window)
Lyft
# Practices
- Practical Recommendations for Gradient-Based Training of Deep Architectures (opens new window) (Paper (opens new window))
Yoshua Bengio - Machine Learning: The High Interest Credit Card of Technical Debt (opens new window) (Paper (opens new window)) (Paper (opens new window))
Google - Rules of Machine Learning: Best Practices for ML Engineering (opens new window)
Google - On Challenges in Machine Learning Model Management (opens new window)
Amazon - Machine Learning in Production: The Booking.com Approach (opens new window)
Booking - 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com (opens new window) (Paper (opens new window))
Booking - Successes and Challenges in Adopting Machine Learning at Scale at a Global Bank (opens new window)
Rabobank - Challenges in Deploying Machine Learning: a Survey of Case Studies (opens new window) (Paper (opens new window))
Cambridge - Continuous Integration and Deployment for Machine Learning Online Serving and Models (opens new window)
Uber - Tuning Model Performance (opens new window)
Uber - Reengineering Facebook AI’s Deep Learning Platforms for Interoperability (opens new window)
Facebook - The problem with AI developer tools for enterprises (opens new window)
Databricks
# Team structure
- Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department (opens new window)
Stitch Fix - Beware the Data Science Pin Factory: The Power of the Full-Stack Data Science Generalist (opens new window)
Stitch Fix - Cultivating Algorithms: How We Grow Data Science at Stitch Fix (opens new window)
StitchFix - Analytics at Netflix: Who We Are and What We Do (opens new window)
Netflix
# Fails
- 160k+ High School Students Will Graduate Only If a Model Allows Them to (opens new window)
International Baccalaureate - When It Comes to Gorillas, Google Photos Remains Blind (opens new window)
Google - An Algorithm That ‘Predicts’ Criminality Based on a Face Sparks a Furor (opens new window)
Harrisburg University - It's Hard to Generate Neural Text From GPT-3 About Muslims (opens new window)
OpenAI - A British AI Tool to Predict Violent Crime Is Too Flawed to Use (opens new window)
United Kingdom - More in awful-ai (opens new window)
P.S., Want a summary of ML advancements? Get up to speed with survey papers 👉ml-surveys (opens new window)