10 Data Problems Every Pipeline Hits (and the One-Liner Fixes)
Every data engineer writes throwaway scripts to fix the same problems. Phone numbers in 15 formats. Dates that aren't dates. "N/A" and "null" and "" all meaning the same thing. The scripts are alwa...

Source: DEV Community
Every data engineer writes throwaway scripts to fix the same problems. Phone numbers in 15 formats. Dates that aren't dates. "N/A" and "null" and "" all meaning the same thing. The scripts are always slightly different, never reusable, and break when the data changes. Here are the 10 problems we see in every dataset, what they actually look like, and how to fix each one. 1. Phone numbers in 15 formats Your CRM has phone numbers entered by humans over a decade. No two look the same. Before After (555) 123-4567 +15551234567 555.123.4567 +15551234567 +1-555-123-4567 +15551234567 5551234567 +15551234567 1 (555) 123-4567 +15551234567 goldenflow transform contacts.csv Zero-config mode detects phone columns and normalizes to E.164 automatically. Every downstream system — Twilio, Salesforce, your matching pipeline — expects E.164. Do it once at the source. 2. Mojibake from bad CSV exports Someone exported from Excel on a Windows machine, someone else opened it on a Mac, and now your text looks