# applied-ml

Curated papers, articles, and blogs on data science & machine learning in production. ⚙️

contributions welcome Summaries (opens new window) HitCount

Figuring out how to implement your ML project? Learn how other organizations did it:

  • How the problem is framed 🔎(e.g., personalization as recsys vs. search vs. sequences)
  • What machine learning techniques worked ✅ (and sometimes, what didn't ❌)
  • Why it works, the science behind it with research, literature, and references 📂
  • What real-world results were achieved (so you can better assess ROI ⏰💰📈)

P.S., Want a summary of ML advancements? 👉ml-surveys (opens new window)

Table of Contents

  1. Data Quality
  2. Data Engineering
  3. Data Discovery
  4. Feature Stores
  5. Classification
  6. Regression
  7. Forecasting
  8. Recommendation
  9. Search & Ranking
  10. Embeddings
  11. Natural Language Processing
  12. Sequence Modelling
  13. Computer Vision
  14. Reinforcement Learning
  15. Anomaly Detection
  16. Graph
  17. Optimization
  18. Information Extraction
  19. Weak Supervision
  20. Generation
  21. Audio
  22. Validation and A/B Testing
  23. Model Management
  24. Efficiency
  25. Ethics
  26. MLOps Platforms
  27. Practices
  28. Team Structure
  29. Fails

# Data Quality

  1. Monitoring Data Quality at Scale with Statistical Modeling (opens new window) Uber
  2. An Approach to Data Quality for Netflix Personalization Systems (opens new window) Netflix
  3. Automating Large-Scale Data Quality Verification (opens new window) (Paper (opens new window))Amazon
  4. Meet Hodor — Gojek’s Upstream Data Quality Tool (opens new window) Gojek
  5. Reliable and Scalable Data Ingestion at Airbnb (opens new window) Airbnb
  6. Data Management Challenges in Production Machine Learning (opens new window) (Paper (opens new window)) Google
  7. Improving Accuracy By Certainty Estimation of Human Decisions, Labels, and Raters (opens new window) (Paper (opens new window)) Facebook

# Data Engineering

  1. Zipline: Airbnb’s Machine Learning Data Management Platform (opens new window) Airbnb
  2. Sputnik: Airbnb’s Apache Spark Framework for Data Engineering (opens new window) Airbnb
  3. Unbundling Data Science Workflows with Metaflow and AWS Step Functions (opens new window) Netflix
  4. How DoorDash is Scaling its Data Platform to Delight Customers and Meet Growing Demand (opens new window) DoorDash
  5. Revolutionizing Money Movements at Scale with Strong Data Consistency (opens new window) Uber
  6. Zipline - A Declarative Feature Engineering Framework (opens new window) Airbnb
  7. Real-time Data Infrastructure at Uber (opens new window) Uber

# Data Discovery

  1. Amundsen — Lyft’s Data Discovery & Metadata Engine (opens new window) Lyft
  2. Open Sourcing Amundsen: A Data Discovery And Metadata Platform (opens new window) (Code (opens new window)) Lyft
  3. Amundsen: One Year Later (opens new window) Lyft
  4. Using Amundsen to Support User Privacy via Metadata Collection at Square (opens new window) Square
  5. Discovery and Consumption of Analytics Data at Twitter (opens new window) Twitter
  6. Democratizing Data at Airbnb (opens new window) Airbnb
  7. Databook: Turning Big Data into Knowledge with Metadata at Uber (opens new window) Uber
  8. Turning Metadata Into Insights with Databook (opens new window) Uber
  9. Metacat: Making Big Data Discoverable and Meaningful at Netflix (opens new window) (Code (opens new window)) Netflix
  10. DataHub: A Generalized Metadata Search & Discovery Tool (opens new window) (Code (opens new window)) LinkedIn
  11. DataHub: Popular Metadata Architectures Explained (opens new window) LinkedIn
  12. How We Improved Data Discovery for Data Scientists at Spotify (opens new window) Spotify
  13. How We’re Solving Data Discovery Challenges at Shopify (opens new window) Shopify
  14. Nemo: Data discovery at Facebook (opens new window) Facebook
  15. Apache Atlas: Data Goverance and Metadata Framework for Hadoop (opens new window) (Code (opens new window)) Apache
  16. Collect, Aggregate, and Visualize a Data Ecosystem's Metadata (opens new window) (Code (opens new window)) WeWork
  17. Exploring Data at Netflix (opens new window) (Code (opens new window)) Netflix

# Feature Stores

  1. Introducing Feast: An Open Source Feature Store for Machine Learning (opens new window) (Code (opens new window)) Gojek
  2. Feast: Bridging ML Models and Data (opens new window) Gojek
  3. Building a Scalable ML Feature Store with Redis, Binary Serialization, and Compression (opens new window) DoorDash
  4. Building Riviera: A Declarative Real-Time Feature Engineering Framework (opens new window) DoorDash
  5. Michelangelo Palette: A Feature Engineering Platform at Uber (opens new window) Uber
  6. Optimal Feature Discovery: Better, Leaner Machine Learning Models Through Information Theory (opens new window) Uber
  7. Distributed Time Travel for Feature Generation (opens new window) Netflix
  8. Fact Store at Scale for Netflix Recommendations (opens new window) Netflix
  9. The Architecture That Powers Twitter's Feature Store (opens new window) Twitter
  10. Building the Activity Graph, Part 2 (Feature Storage Section) (opens new window) LinkedIn
  11. Rapid Experimentation Through Standardization: Typed AI features for LinkedIn’s Feed (opens new window) LinkedIn
  12. Accelerating Machine Learning with the Feature Store Service (opens new window) Condé Nast
  13. Building a Feature Store (opens new window) Monzo Bank
  14. Zipline: Airbnb’s Machine Learning Data Management Platform (opens new window) Airbnb
  15. ML Feature Serving Infrastructure at Lyft (opens new window) Lyft
  16. Butterfree: A Spark-based Framework for Feature Store Building (opens new window) (Code (opens new window)) QuintoAndar

# Classification

  1. High-Precision Phrase-Based Document Classification on a Modern Scale (opens new window) (Paper (opens new window)) LinkedIn
  2. Chimera: Large-scale Classification using Machine Learning, Rules, and Crowdsourcing (opens new window) (Paper (opens new window)) WalmartLabs
  3. Large-scale Item Categorization for e-Commerce (opens new window) (Paper (opens new window)) DianPing, eBay
  4. Large-scale Item Categorization in e-Commerce Using Multiple Recurrent Neural Networks (opens new window) (Paper (opens new window)) NAVER
  5. Categorizing Products at Scale (opens new window) Shopify
  6. Learning to Diagnose with LSTM Recurrent Neural Networks (opens new window) (Paper (opens new window)) Google
  7. Discovering and Classifying In-app Message Intent at Airbnb (opens new window) Airbnb
  8. How We Built the Good First Issues Feature (opens new window) GitHub
  9. Teaching Machines to Triage Firefox Bugs (opens new window) Mozilla
  10. Testing Firefox More Efficiently with Machine Learning (opens new window) Mozilla
  11. Using ML to Subtype Patients Receiving Digital Mental Health Interventions (opens new window) (Paper (opens new window)) Microsoft
  12. Prediction of Advertiser Churn for Google AdWords (opens new window) (Paper (opens new window)) Google
  13. Scalable Data Classification for Security and Privacy (opens new window) (Paper (opens new window)) Facebook
  14. Uncovering Online Delivery Menu Best Practices with Machine Learning (opens new window) DoorDash
  15. Using a Human-in-the-Loop to Overcome the Cold Start Problem in Menu Item Tagging (opens new window) DoorDash

# Regression

  1. Using Machine Learning to Predict Value of Homes On Airbnb (opens new window) Airbnb
  2. Using Machine Learning to Predict the Value of Ad Requests (opens new window) Twitter
  3. Open-Sourcing Riskquant, a Library for Quantifying Risk (opens new window) (Code (opens new window)) NetFlix
  4. Solving for Unobserved Data in a Regression Model Using a Simple Data Adjustment (opens new window) DoorDash

# Forecasting

  1. Forecasting at Uber: An Introduction (opens new window) Uber
  2. Engineering Extreme Event Forecasting at Uber with RNN (opens new window) Uber
  3. Transforming Financial Forecasting with Data Science and Machine Learning at Uber (opens new window) Uber
  4. Under the Hood of Gojek’s Automated Forecasting Tool (opens new window) GoJek
  5. BusTr: Predicting Bus Travel Times from Real-Time Traffic (opens new window) (Paper (opens new window), Video (opens new window)) Google
  6. Retraining Machine Learning Models in the Wake of COVID-19 (opens new window) DoorDash
  7. Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow (opens new window) (Paper (opens new window), Code (opens new window)) Atlassian
  8. Greykite: A flexible, intuitive, and fast forecasting library (opens new window) LinkedIn

# Recommendation

  1. Amazon.com Recommendations: Item-to-Item Collaborative Filtering (opens new window) (Paper (opens new window)) Amazon
  2. Temporal-Contextual Recommendation in Real-Time (opens new window) (Paper (opens new window)) Amazon
  3. P-Companion: A Framework for Diversified Complementary Product Recommendation (opens new window) (Paper (opens new window)) Amazon
  4. Recommending Complementary Products in E-Commerce Push Notifications (opens new window) (Paper (opens new window)) Alibaba
  5. Deep Interest with Hierarchical Attention Network for Click-Through Rate Prediction (opens new window) (Paper (opens new window)) Alibaba
  6. Behavior Sequence Transformer for E-commerce Recommendation in Alibaba (opens new window) (Paper (opens new window)) Alibaba
  7. TPG-DNN: A Method for User Intent Prediction with Multi-task Learning (opens new window) (Paper (opens new window)) Alibaba
  8. PURS: Personalized Unexpected Recommender System for Improving User Satisfaction (opens new window) (Paper (opens new window)) Alibaba
  9. SDM: Sequential Deep Matching Model for Online Large-scale Recommender System (opens new window) (Paper (opens new window)) Alibaba
  10. Multi-Interest Network with Dynamic Routing for Recommendation at Tmall (opens new window) (Paper (opens new window)) Alibaba
  11. Controllable Multi-Interest Framework for Recommendation (opens new window) (Paper (opens new window)) Alibaba
  12. MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction (opens new window) (Paper (opens new window)) Alibaba
  13. ATBRG: Adaptive Target-Behavior Relational Graph Network for Effective Recommendation (opens new window) (Paper (opens new window)) Alibaba
  14. Session-based Recommendations with Recurrent Neural Networks (opens new window) (Paper (opens new window)) Telefonica
  15. How 20th Century Fox uses ML to predict a movie audience (opens new window) (Paper (opens new window)) 20th Century Fox
  16. Deep Neural Networks for YouTube Recommendations (opens new window) YouTube
  17. Personalized Recommendations for Experiences Using Deep Learning (opens new window) TripAdvisor
  18. E-commerce in Your Inbox: Product Recommendations at Scale (opens new window) Yahoo
  19. Product Recommendations at Scale (opens new window) (Paper (opens new window)) Yahoo
  20. Powered by AI: Instagram’s Explore recommender system (opens new window) Facebook
  21. Netflix Recommendations: Beyond the 5 stars (Part 1 (opens new window) (Part 2 (opens new window)) Netflix
  22. Learning a Personalized Homepage (opens new window) Netflix
  23. Artwork Personalization at Netflix (opens new window) Netflix
  24. To Be Continued: Helping you find shows to continue watching on Netflix (opens new window) Netflix
  25. Calibrated Recommendations (opens new window) (Paper (opens new window)) Netflix
  26. Marginal Posterior Sampling for Slate Bandits (opens new window) (Paper (opens new window)) Netflix
  27. Food Discovery with Uber Eats: Recommending for the Marketplace (opens new window) Uber
  28. Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations (opens new window) Uber
  29. How Music Recommendation Works — And Doesn’t Work (opens new window) Spotify
  30. Music recommendation at Spotify (opens new window) Spotify
  31. Recommending Music on Spotify with Deep Learning (opens new window) Spotify
  32. For Your Ears Only: Personalizing Spotify Home with Machine Learning (opens new window) Spotify
  33. Reach for the Top: How Spotify Built Shortcuts in Just Six Months (opens new window) Spotify
  34. Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits (opens new window) (Paper (opens new window)) Spotify
  35. Contextual and Sequential User Embeddings for Large-Scale Music Recommendation (opens new window) (Paper (opens new window)) Spotify
  36. The Evolution of Kit: Automating Marketing Using Machine Learning (opens new window) Shopify
  37. Using Machine Learning to Predict what File you Need Next (Part 1) (opens new window) Dropbox
  38. Using Machine Learning to Predict what File you Need Next (Part 2) (opens new window) Dropbox
  39. Personalized Recommendations in LinkedIn Learning (opens new window) LinkedIn
  40. A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 1) (opens new window) LinkedIn
  41. A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 2) (opens new window) LinkedIn
  42. Learning to be Relevant: Evolution of a Course Recommendation System (opens new window) (PAPER NEEDED)LinkedIn
  43. Building a Heterogeneous Social Network Recommendation System (opens new window) LinkedIn
  44. How TikTok recommends videos #ForYou (opens new window) ByteDance
  45. A Meta-Learning Perspective on Cold-Start Recommendations for Items (opens new window) (Paper (opens new window)) Twitter
  46. Lessons Learned Addressing Dataset Bias in Model-Based Candidate Generation (opens new window) (Paper (opens new window)) Twitter
  47. Zero-Shot Heterogeneous Transfer Learning from RecSys to Cold-Start Search Retrieval (opens new window) (Paper (opens new window)) Google
  48. Improved Deep & Cross Network for Feature Cross Learning in Web-scale LTR Systems (opens new window) (Paper (opens new window)) Google
  49. Self-supervised Learning for Large-scale Item Recommendations (opens new window) (Paper (opens new window)) Google
  50. Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations (opens new window) (Paper (opens new window)) Google
  51. Personalized Channel Recommendations in Slack (opens new window) Slack
  52. Deep Retrieval: End-to-End Learnable Structure Model for Large-Scale Recommendations (opens new window) (Paper (opens new window)) ByteDance
  53. Future Data Helps Training: Modeling Future Contexts for Session-based Recommendation (Paper (opens new window)) Tencent
  54. Using AI to Help Health Experts Address the COVID-19 Pandemic (opens new window) Facebook
  55. A Case Study of Session-based Recommendations in the Home-improvement Domain (opens new window) (Paper (opens new window)) Home Depot
  56. Balancing Relevance and Discovery to Inspire Customers in the IKEA App (opens new window) (Paper (opens new window)) Ikea
  57. Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time (opens new window) (Paper (opens new window)) Pinterest
  58. How we use AutoML, Multi-task learning and Multi-tower models for Pinterest Ads (opens new window) Pinterest
  59. Multi-task Learning for Related Products Recommendations at Pinterest (opens new window) Pinterest
  60. Improving the Quality of Recommended Pins with Lightweight Ranking (opens new window) Pinterest
  61. Personalized Cuisine Filter Based on Customer Preference and Local Popularity (opens new window) DoorDash

# Search & Ranking

  1. Amazon Search: The Joy of Ranking Products (opens new window) (Paper (opens new window), Video (opens new window), Code (opens new window)) Amazon
  2. Why Do People Buy Seemingly Irrelevant Items in Voice Product Search? (opens new window) (Paper (opens new window)) Amazon
  3. Semantic Product Search (opens new window) (Paper (opens new window)) Amazon
  4. QUEEN: Neural query rewriting in e-commerce (opens new window) (Paper (opens new window)) Amazon
  5. How Lazada Ranks Products to Improve Customer Experience and Conversion (opens new window) Lazada
  6. Using Deep Learning at Scale in Twitter’s Timelines (opens new window) Twitter
  7. Machine Learning-Powered Search Ranking of Airbnb Experiences (opens new window) Airbnb
  8. Applying Deep Learning To Airbnb Search (opens new window) (Paper (opens new window)) Airbnb
  9. Managing Diversity in Airbnb Search (opens new window) (Paper (opens new window)) Airbnb
  10. Improving Deep Learning for Airbnb Search (opens new window) (Paper (opens new window)) Airbnb
  11. Ranking Relevance in Yahoo Search (opens new window) (Paper (opens new window)) Yahoo
  12. An Ensemble-based Approach to Click-Through Rate Prediction for Promoted Listings at Etsy (opens new window) (Paper (opens new window)) Etsy
  13. Learning to Rank Personalized Search Results in Professional Networks (opens new window) (Paper (opens new window)) LinkedIn
  14. Entity Personalized Talent Search Models with Tree Interaction Features (opens new window) (Paper (opens new window)) LinkedIn
  15. In-session Personalization for Talent Search (opens new window) (Paper (opens new window)) LinkedIn
  16. The AI Behind LinkedIn Recruiter Search and recommendation systems (opens new window) LinkedIn
  17. Learning Hiring Preferences: The AI Behind LinkedIn Jobs (opens new window) LinkedIn
  18. Quality Matches Via Personalized AI for Hirer and Seeker Preferences (opens new window) LinkedIn
  19. Understanding Dwell Time to Improve LinkedIn Feed Ranking (opens new window) LinkedIn
  20. Ads Allocation in Feed via Constrained Optimization (opens new window) (Paper (opens new window), Video (opens new window)) LinkedIn
  21. Talent Search and Recommendation Systems at LinkedIn (opens new window) (Paper (opens new window)) LinkedIn
  22. Understanding Dwell Time to Improve LinkedIn Feed Ranking (opens new window) LinkedIn
  23. AI at Scale in Bing (opens new window) Microsoft
  24. Query Understanding Engine in Traveloka Universal Search (opens new window) Traveloka
  25. The Secret Sauce Behind Search Personalisation (opens new window) GoJek
  26. Food Discovery with Uber Eats: Building a Query Understanding Engine (opens new window) Uber
  27. Neural Code Search: ML-based Code Search Using Natural Language Queries (opens new window) Facebook
  28. Bayesian Product Ranking at Wayfair (opens new window) Wayfair
  29. COLD: Towards the Next Generation of Pre-Ranking System (opens new window) (Paper (opens new window)) Alibaba
  30. Globally Optimized Mutual Influence Aware Ranking in E-Commerce Search (opens new window) (Paper (opens new window)) Alibaba
  31. Graph Intention Network for Click-through Rate Prediction in Sponsored Search (opens new window) (Paper (opens new window)) Alibaba
  32. Reinforcement Learning to Rank in E-Commerce Search Engine (opens new window) (Paper (opens new window)) Alibaba
  33. Aggregating Search Results from Heterogeneous Sources via Reinforcement Learning (opens new window) (Paper (opens new window)) Alibaba
  34. Cross-domain Attention Network with Wasserstein Regularizers for E-commerce Search (opens new window) Alibaba
  35. Understanding Searches Better Than Ever Before (opens new window) (Paper (opens new window)) Google
  36. Shop The Look: Building a Large Scale Visual Shopping System at Pinterest (opens new window) (Paper (opens new window), Video (opens new window)) Pinterest
  37. Driving Shopping Upsells from Pinterest Search (opens new window) Pinterest
  38. GDMix: A Deep Ranking Personalization Framework (opens new window) (Code (opens new window)) LinkedIn
  39. Bringing Personalized Search to Etsy (opens new window) Etsy
  40. Building a Better Search Engine for Semantic Scholar (opens new window) Allen Institute for AI
  41. Query Understanding for Natural Language Enterprise Search (opens new window) (Paper (opens new window)) Salesforce
  42. How We Used Semantic Search to Make Our Search 10x Smarter (opens new window) Tokopedia
  43. Powering Search & Recommendations at DoorDash (opens new window) DoorDash
  44. Things Not Strings: Understanding Search Intent with Better Recall (opens new window) DoorDash
  45. Query Understanding for Surfacing Under-served Music Content (opens new window) (Paper (opens new window)) Spotify
  46. How We Built A Context-Specific Bidding System for Etsy Ads (opens new window) Etsy
  47. Query2vec: Search query expansion with query embeddings (opens new window) GrubHub
  48. Embedding-based Retrieval in Facebook Search (opens new window) (Paper (opens new window)) Facebook
  49. Towards Personalized and Semantic Retrieval for E-commerce Search via Embedding Learning (opens new window) (Paper (opens new window)) JD
  50. MOBIUS: Towards the Next Generation of Query-Ad Matching in Baidu’s Sponsored Search (opens new window) Baidu
  51. Pre-trained Language Model based Ranking in Baidu Search (opens new window) (Paper (opens new window)) Baidu

# Embeddings

  1. Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba (opens new window) (Paper (opens new window)) Alibaba
  2. Embeddings@Twitter (opens new window) Twitter
  3. Listing Embeddings in Search Ranking (opens new window) (Paper (opens new window)) Airbnb
  4. Understanding Latent Style (opens new window) Stitch Fix
  5. Towards Deep and Representation Learning for Talent Search at LinkedIn (opens new window) (Paper (opens new window)) LinkedIn
  6. Should we Embed? A Study on Performance of Embeddings for Real-Time Recommendations (opens new window)(Paper (opens new window)) Moshbit
  7. Vector Representation Of Items, Customer And Cart To Build A Recommendation System (opens new window) (Paper (opens new window)) Sears
  8. Machine Learning for a Better Developer Experience (opens new window) Netflix
  9. Announcing ScaNN: Efficient Vector Similarity Search (opens new window) (Paper (opens new window), Code (opens new window)) Google
  10. Personalized Store Feed with Vector Embeddings (opens new window) DoorDash

# Natural Language Processing

  1. Abusive Language Detection in Online User Content (opens new window) (Paper (opens new window)) Yahoo
  2. How Natural Language Processing Helps LinkedIn Members Get Support Easily (opens new window) LinkedIn
  3. Building Smart Replies for Member Messages (opens new window) LinkedIn
  4. DeText: A deep NLP Framework for Intelligent Text Understanding (opens new window) (Code (opens new window)) LinkedIn
  5. Smart Reply: Automated Response Suggestion for Email (opens new window) (Paper (opens new window)) Google
  6. Gmail Smart Compose: Real-Time Assisted Writing (opens new window) (Paper (opens new window)) Google
  7. SmartReply for YouTube Creators (opens new window) Google
  8. Using Neural Networks to Find Answers in Tables (opens new window) (Paper (opens new window)) Google
  9. A Scalable Approach to Reducing Gender Bias in Google Translate (opens new window) Google
  10. Assistive AI Makes Replying Easier (opens new window) Microsoft
  11. AI Advances to Better Detect Hate Speech (opens new window) Facebook
  12. A State-of-the-Art Open Source Chatbot (opens new window) (Paper (opens new window)) Facebook
  13. A Highly Efficient, Real-Time Text-to-Speech System Deployed on CPUs (opens new window) Facebook
  14. Deep Learning to Translate Between Programming Languages (opens new window) (Paper (opens new window), Code (opens new window)) Facebook
  15. Deploying Lifelong Open-Domain Dialogue Learning (opens new window) (Paper (opens new window)) Facebook
  16. Introducing Dynabench: Rethinking the way we benchmark AI (opens new window) Facebook
  17. Dynaboard: Moving Beyond Accuracy to Holistic Model Evaluation in NLP (opens new window) (Code (opens new window)) Facebook
  18. Goal-Oriented End-to-End Conversational Models with Profile Features in a Real-World Setting (opens new window) (Paper (opens new window)) Amazon
  19. How Gojek Uses NLP to Name Pickup Locations at Scale (opens new window) GoJek
  20. Give Me Jeans not Shoes: How BERT Helps Us Deliver What Clients Want (opens new window) Stitch Fix
  21. The State-of-the-art Open-Domain Chatbot in Chinese and English (opens new window) (Paper (opens new window)) Baidu
  22. PEGASUS: A State-of-the-Art Model for Abstractive Text Summarization (opens new window) (Paper (opens new window), Code (opens new window)) Google
  23. Photon: A Robust Cross-Domain Text-to-SQL System (opens new window) (Paper (opens new window)) (Demo (opens new window)) Salesforce
  24. GeDi: A Powerful New Method for Controlling Language Models (opens new window) (Paper (opens new window), Code (opens new window)) Salesforce
  25. Applying Topic Modeling to Improve Call Center Operations (opens new window) RICOH
  26. WIDeText: A Multimodal Deep Learning Framework (opens new window) Airbnb
  27. Dynaboard: Moving Beyond Accuracy to Holistic Model Evaluation in NLP (opens new window) Facebook

# Sequence Modelling

  1. Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction (opens new window) (Paper (opens new window))Alibaba
  2. Search-based User Interest Modeling with Sequential Behavior Data for CTR Prediction (opens new window) (Paper (opens new window)) Alibaba
  3. Deep Learning for Electronic Health Records (opens new window) (Paper (opens new window)) Google
  4. Deep Learning for Understanding Consumer Histories (opens new window) (Paper (opens new window)) Zalando
  5. Continual Prediction of Notification Attendance with Classical and Deep Networks (opens new window) (Paper (opens new window)) Telefonica
  6. Using Recurrent Neural Network Models for Early Detection of Heart Failure Onset (opens new window) (Paper (opens new window)) Sutter Health
  7. Doctor AI: Predicting Clinical Events via Recurrent Neural Networks (opens new window) (Paper (opens new window)) Sutter Health
  8. How Duolingo uses AI in every part of its app (opens new window) Duolingo
  9. Leveraging Online Social Interactions For Enhancing Integrity at Facebook (opens new window) (Paper (opens new window), Video (opens new window)) Facebook

# Computer Vision

  1. Categorizing Listing Photos at Airbnb (opens new window) Airbnb
  2. Amenity Detection and Beyond — New Frontiers of Computer Vision at Airbnb (opens new window) Airbnb
  3. Powered by AI: Advancing product understanding and building new shopping experiences (opens new window) Facebook
  4. New AI Research to Help Predict COVID-19 Resource Needs From X-rays (opens new window) (Paper (opens new window), Model (opens new window)) Facebook
  5. Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning (opens new window) Dropbox
  6. How we Improved Computer Vision Metrics by More Than 5% Only by Cleaning Labelling Errors (opens new window) Deepomatic
  7. A Neural Weather Model for Eight-Hour Precipitation Forecasting (opens new window) (Paper (opens new window)) Google
  8. Machine Learning-based Damage Assessment for Disaster Relief (opens new window) (Paper (opens new window)) Google
  9. RepNet: Counting Repetitions in Videos (opens new window) (Paper (opens new window)) Google
  10. Converting Text to Images for Product Discovery (opens new window) (Paper (opens new window)) Amazon
  11. How Disney Uses PyTorch for Animated Character Recognition (opens new window) Disney
  12. Image Captioning as an Assistive Technology (opens new window) (Video (opens new window)) IBM
  13. AI for AG: Production machine learning for agriculture (opens new window) Blue River
  14. AI for Full-Self Driving at Tesla (opens new window) Tesla
  15. On-device Supermarket Product Recognition (opens new window) Google
  16. Using Machine Learning to Detect Deficient Coverage in Colonoscopy Screenings (opens new window) (Paper (opens new window)) Google
  17. Shop The Look: Building a Large Scale Visual Shopping System at Pinterest (opens new window) (Paper (opens new window), Video (opens new window)) Pinterest
  18. Developing Real-Time, Automatic Sign Language Detection for Video Conferencing (opens new window) (Paper (opens new window)) Google
  19. Vision-based Price Suggestion for Online Second-hand Items (opens new window) (Paper (opens new window)) Alibaba
  20. Making machines recognize and transcribe conversations in meetings using audio and video (opens new window) Microsoft
  21. An Efficient Training Approach for Very Large Scale Face Recognition (opens new window) (Paper (opens new window)) Alibaba

# Reinforcement Learning

  1. Deep Reinforcement Learning for Sponsored Search Real-time Bidding (opens new window) (Paper (opens new window)) Alibaba
  2. Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning (opens new window) (Paper (opens new window)) Alibaba
  3. Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising (opens new window) (Paper (opens new window)) Alibaba
  4. Productionizing Deep Reinforcement Learning with Spark and MLflow (opens new window) Zynga
  5. Deep Reinforcement Learning in Production Part1 (opens new window) Part 2 (opens new window) Zynga
  6. Building AI Trading Systems (opens new window) Denny Britz
  7. Reinforcement Learning for On-Demand Logistics (opens new window) DoorDash
  8. Reinforcement Learning to Rank in E-Commerce Search Engine (opens new window) (Paper (opens new window)) Alibaba

# Anomaly Detection

  1. Detecting Performance Anomalies in External Firmware Deployments (opens new window) Netflix
  2. Detecting and Preventing Abuse on LinkedIn using Isolation Forests (opens new window) (Code (opens new window)) LinkedIn
  3. Preventing Abuse Using Unsupervised Learning (opens new window) LinkedIn
  4. The Technology Behind Fighting Harassment on LinkedIn (opens new window) LinkedIn
  5. Uncovering Insurance Fraud Conspiracy with Network Learning (opens new window) (Paper (opens new window)) Ant Financial
  6. How Does Spam Protection Work on Stack Exchange? (opens new window) Stack Exchange
  7. Auto Content Moderation in C2C e-Commerce (opens new window) Mercari
  8. Blocking Slack Invite Spam With Machine Learning (opens new window) Slack
  9. Cloudflare Bot Management: Machine Learning and More (opens new window) Cloudflare
  10. Anomalies in Oil Temperature Variations in a Tunnel Boring Machine (opens new window) SENER
  11. Using Anomaly Detection to Monitor Low-Risk Bank Customers (opens new window) Rabobank
  12. Fighting fraud with Triplet Loss (opens new window) OLX Group
  13. Facebook is Now Using AI to Sort Content for Quicker Moderation (opens new window) (Alternative (opens new window)) Facebook
  14. How AI is getting better at detecting hate speech Part 1 (opens new window), Part 2 (opens new window), Part 3 (opens new window), Part 4 (opens new window) Facebook
  15. Deep Anomaly Detection with Spark and Tensorflow (opens new window) (Hopsworks Video (opens new window)) Swedbank, Hopsworks

# Graph

  1. Building The LinkedIn Knowledge Graph (opens new window) LinkedIn
  2. Retail Graph — Walmart’s Product Knowledge Graph (opens new window) Walmart
  3. Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations (opens new window) Uber
  4. AliGraph: A Comprehensive Graph Neural Network Platform (opens new window) (Paper (opens new window)) Alibaba
  5. Scaling Knowledge Access and Retrieval at Airbnb (opens new window) Airbnb
  6. Contextualizing Airbnb by Building Knowledge Graph (opens new window) Airbnb
  7. Traffic Prediction with Advanced Graph Neural Networks (opens new window) DeepMind
  8. SimClusters: Community-Based Representations for Recommendations (opens new window) (Paper (opens new window), Video (opens new window)) Twitter
  9. Metapaths guided Neighbors aggregated Network for Heterogeneous Graph Reasoning (opens new window) (Paper (opens new window)) Alibaba
  10. Graph Intention Network for Click-through Rate Prediction in Sponsored Search (opens new window) (Paper (opens new window)) Alibaba
  11. JEL: Applying End-to-End Neural Entity Linking in JPMorgan Chase (opens new window) (Paper (opens new window)) JPMorgan Chase
  12. Graph Convolutional Neural Networks for Web-Scale Recommender Systems (opens new window) (Paper (opens new window))Pinterest

# Optimization

  1. How Trip Inferences and Machine Learning Optimize Delivery Times on Uber Eats (opens new window) Uber
  2. Next-Generation Optimization for Dasher Dispatch at DoorDash (opens new window) DoorDash
  3. Matchmaking in Lyft Line (Part 1) (opens new window) (Part 2) (opens new window) (Part 3) (opens new window) Lyft
  4. The Data and Science behind GrabShare Carpooling (opens new window) (PAPER NEEDED) Grab
  5. Optimization of Passengers Waiting Time in Elevators Using Machine Learning (opens new window) Thyssen Krupp AG
  6. Think Out of The Package: Recommending Package Types for E-commerce Shipments (opens new window) (Paper (opens new window)) Amazon
  7. Optimizing DoorDash’s Marketing Spend with Machine Learning (opens new window) DoorDash

# Information Extraction

  1. Unsupervised Extraction of Attributes and Their Values from Product Description (opens new window) (Paper (opens new window)) Rakuten
  2. Information Extraction from Receipts with Graph Convolutional Networks (opens new window) Nanonets
  3. Using Machine Learning to Index Text from Billions of Images (opens new window) Dropbox
  4. Extracting Structured Data from Templatic Documents (opens new window) (Paper (opens new window)) Google
  5. AutoKnow: self-driving knowledge collection for products of thousands of types (opens new window) (Paper (opens new window), Video (opens new window)) Amazon
  6. One-shot Text Labeling using Attention and Belief Propagation for Information Extraction (opens new window) (Paper (opens new window)) Alibaba

# Weak Supervision

  1. Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale (opens new window) (Paper (opens new window)) Google
  2. Osprey: Weak Supervision of Imbalanced Extraction Problems without Code (opens new window) (Paper (opens new window)) Intel
  3. Overton: A Data System for Monitoring and Improving Machine-Learned Products (opens new window) (Paper (opens new window)) Apple
  4. Bootstrapping Conversational Agents with Weak Supervision (opens new window) (Paper (opens new window)) IBM

# Generation

  1. Better Language Models and Their Implications (opens new window) (Paper (opens new window))OpenAI
  2. Language Models are Few-Shot Learners (opens new window) (Paper (opens new window)) (GPT-3 Blog post (opens new window)) OpenAI
  3. Image GPT (opens new window) (Paper (opens new window), Code (opens new window)) OpenAI
  4. Deep Learned Super Resolution for Feature Film Production (opens new window) (Paper (opens new window)) Pixar
  5. Unit Test Case Generation with Transformers (opens new window) Microsoft

# Audio

  1. Improving On-Device Speech Recognition with VoiceFilter-Lite (opens new window) (Paper (opens new window))Google
  2. The Machine Learning Behind Hum to Search (opens new window) Google

# Validation and A/B Testing

  1. The Reusable Holdout: Preserving Validity in Adaptive Data Analysis (opens new window) (Paper (opens new window)) Google
  2. Twitter Experimentation: Technical Overview (opens new window) Twitter
  3. Experimenting to Solve Cramming (opens new window) Twitter
  4. Building an Intelligent Experimentation Platform with Uber Engineering (opens new window) Uber
  5. Analyzing Experiment Outcomes: Beyond Average Treatment Effects (opens new window) Uber
  6. Under the Hood of Uber’s Experimentation Platform (opens new window) Uber
  7. Announcing a New Framework for Designing Optimal Experiments with Pyro (opens new window) (Paper (opens new window)) (Paper (opens new window)) Uber
  8. Enabling 10x More Experiments with Traveloka Experiment Platform (opens new window) Traveloka
  9. Large Scale Experimentation at Stitch Fix (opens new window) (Paper (opens new window)) Stitch Fix
  10. Multi-Armed Bandits and the Stitch Fix Experimentation Platform (opens new window) Stitch Fix
  11. Experimentation with Resource Constraints (opens new window) Stitch Fix
  12. Modeling Conversion Rates and Saving Millions Using Kaplan-Meier and Gamma Distributions (opens new window) (Code (opens new window)) Better
  13. It’s All A/Bout Testing: The Netflix Experimentation Platform (opens new window) Netflix
  14. Computational Causal Inference at Netflix (opens new window) (Paper (opens new window)) Netflix
  15. Key Challenges with Quasi Experiments at Netflix (opens new window) Netflix
  16. Constrained Bayesian Optimization with Noisy Experiments (opens new window) (Paper (opens new window)) Facebook
  17. Detecting Interference: An A/B Test of A/B Tests (opens new window) LinkedIn
  18. Making the LinkedIn experimentation engine 20x faster (opens new window) LinkedIn
  19. Our Evolution Towards T-REX: The Prehistory of Experimentation Infrastructure at LinkedIn (opens new window) LinkedIn
  20. How to Use Quasi-experiments and Counterfactuals to Build Great Products (opens new window) Shopify
  21. Improving Experimental Power through Control Using Predictions as Covariate (opens new window) Doordash
  22. Supporting Rapid Product Iteration with an Experimentation Analysis Platform (opens new window) DoorDash
  23. Improving Online Experiment Capacity by 4X with Parallelization and Increased Sensitivity (opens new window) DoorDash
  24. Leveraging Causal Modeling to Get More Value from Flat Experiment Results (opens new window) DoorDash
  25. Iterating Real-time Assignment Algorithms Through Experimentation (opens new window) DoorDash
  26. Running Experiments with Google Adwords for Campaign Optimization (opens new window) DoorDash
  27. Spotify’s New Experimentation Platform (Part 1) (opens new window) (Part 2) (opens new window) Spotify
  28. Overlapping Experiment Infrastructure: More, Better, Faster Experimentation (opens new window) (Paper (opens new window)) Google
  29. Experimentation Platform at Zalando: Part 1 - Evolution (opens new window) Zalando
  30. Scaling Airbnb’s Experimentation Platform (opens new window) Airbnb
  31. Designing Experimentation Guardrails (opens new window) Airbnb
  32. Reliable and Scalable Feature Toggles and A/B Testing SDK at Grab (opens new window) Grab
  33. Meet Wasabi, an Open Source A/B Testing Platform (opens new window) (Code (opens new window)) Intuit
  34. Building Pinterest’s A/B Testing Platform (opens new window) Pinterest

# Model Management

  1. Runway - Model Lifecycle Management at Netflix (opens new window) Netflix
  2. Overton: A Data System for Monitoring and Improving Machine-Learned Products (opens new window) (Paper (opens new window)) Apple
  3. Managing ML Models @ Scale - Intuit’s ML Platform (opens new window) Intuit
  4. Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions (opens new window) Comcast

# Efficiency

  1. GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce (opens new window) (Paper (opens new window)) Facebook
  2. Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks (opens new window) (Paper (opens new window)) Uber
  3. How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUs (opens new window) Roblox

# Ethics

  1. Building Inclusive Products Through A/B Testing (opens new window) (Paper (opens new window)) LinkedIn
  2. LiFT: A Scalable Framework for Measuring Fairness in ML Applications (opens new window) (Paper (opens new window)) LinkedIn

# Infra

  1. Reengineering Facebook AI’s Deep Learning Platforms for Interoperability (opens new window) Facebook
  2. Elastic Distributed Training with XGBoost on Ray (opens new window) Uber

# MLOps Platforms

  1. Managing ML Models @ Scale - Intuit’s ML Platform (opens new window) Intuit
  2. Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions (opens new window) Comcast
  3. Big Data Machine Learning Platform at Pinterest (opens new window) Pinterest
  4. Real-time Machine Learning Inference Platform at Zomato (opens new window) Zomato
  5. Meet Michelangelo: Uber’s Machine Learning Platform (opens new window) Uber
  6. Building Flexible Ensemble ML Models with a Computational Graph (opens new window) DoorDash
  7. LyftLearn: ML Model Training Infrastructure built on Kubernetes (opens new window) Lyft

# Practices

  1. Practical Recommendations for Gradient-Based Training of Deep Architectures (opens new window) (Paper (opens new window)) Yoshua Bengio
  2. Machine Learning: The High Interest Credit Card of Technical Debt (opens new window) (Paper (opens new window)) (Paper (opens new window)) Google
  3. Rules of Machine Learning: Best Practices for ML Engineering (opens new window) Google
  4. On Challenges in Machine Learning Model Management (opens new window) Amazon
  5. Machine Learning in Production: The Booking.com Approach (opens new window) Booking
  6. 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com (opens new window) (Paper (opens new window)) Booking
  7. Successes and Challenges in Adopting Machine Learning at Scale at a Global Bank (opens new window) Rabobank
  8. Challenges in Deploying Machine Learning: a Survey of Case Studies (opens new window) (Paper (opens new window)) Cambridge
  9. Continuous Integration and Deployment for Machine Learning Online Serving and Models (opens new window) Uber
  10. Tuning Model Performance (opens new window) Uber
  11. Reengineering Facebook AI’s Deep Learning Platforms for Interoperability (opens new window) Facebook
  12. The problem with AI developer tools for enterprises (opens new window) Databricks

# Team structure

  1. Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department (opens new window) Stitch Fix
  2. Beware the Data Science Pin Factory: The Power of the Full-Stack Data Science Generalist (opens new window) Stitch Fix
  3. Cultivating Algorithms: How We Grow Data Science at Stitch Fix (opens new window) StitchFix
  4. Analytics at Netflix: Who We Are and What We Do (opens new window) Netflix

# Fails

  1. 160k+ High School Students Will Graduate Only If a Model Allows Them to (opens new window) International Baccalaureate
  2. When It Comes to Gorillas, Google Photos Remains Blind (opens new window) Google
  3. An Algorithm That ‘Predicts’ Criminality Based on a Face Sparks a Furor (opens new window) Harrisburg University
  4. It's Hard to Generate Neural Text From GPT-3 About Muslims (opens new window) OpenAI
  5. A British AI Tool to Predict Violent Crime Is Too Flawed to Use (opens new window) United Kingdom
  6. More in awful-ai (opens new window)

P.S., Want a summary of ML advancements? Get up to speed with survey papers 👉ml-surveys (opens new window)