Best practices for developing data-integration pipelines

Isaac SacolickOctober 31, 2018

10 1 minute read

Data-integration pipeline platforms move data from a source system to a downstream destination system. Because data pipelines can deliver mission-critical data and for important business decisions, ensuring their accuracy and performance is required whether you implement them through scripts, data-integration and ETL (extract transform, and load) platforms, data-prep technologies, or real-time data-streaming architectures.

When you implement data-integration pipelines, you should consider early in the design phase several best practices to ensure that the data processing is robust and maintainable. Whether you formalize it, there’s an inherit service level in these data pipelines because they can affect whether reports are generated on schedule or if applications have the latest data for users. There is also an ongoing need for IT to make enhancements to support new data requirements, handle increasing data volumes, and address data-quality issues.

If you’ve worked in IT long enough, you’ve probably seen the good, the bad, and the ugly when it comes to data pipelines. Figuring out why a data-pipeline job failed when it was written as a single, several-hundred-line database stored procedure with no documentation, logging, or error handling is not an easy task. So, when engineering new data pipelines, consider some of these best practices to avoid such ugly results.

Apply modular design principles to data pipelines

As a data-pipeline developer, you should consider the architecture of your pipelines so they are nimble to future needs and easy to evaluate when there are issues. You can do this modularizing the pipeline into building blocks, with each block handling one processing step and then passing processed data to additional blocks. ETL platforms from vendors such as Informatica, Talend, and IBM provide visual programming paradigms that make it easy to develop building blocks into reusable modules that can then be applied to multiple data pipelines.

Moustafa Elshaabiny, a full-stack developer at CharityNavigator.org, has been using IBM DataStage to automate data pipelines. He says that “building our data pipeline in a modular way and parameterizing key environment variables has helped us both identify and fix issues that arise quickly and efficiently. Modularity makes narrowing down a problem much easier, and parametrization makes testing changes and rerunning ETL jobs much faster.”

Isaac SacolickOctober 31, 2018

10 1 minute read

Best practices for developing data-integration pipelines

Apply modular design principles to data pipelines

Isaac Sacolick

Mobile: Expert Review: Samsung Propel ™ Pro Cellular Phone

Callcentric price plans – technology made budget-friendly

Why a Slack acquisition would make sense for Salesforce

Best practices for working with Amazon Aurora Serverless

Automating database migration monitoring with AWS DMS

The Chosen one

…. The GAME ….

gsmarena_012-jpg

exploring mysql binlog server ripple

Callcentric price plans – technology made budget-friendly

Computer Networking Fundamentals

7 Tips for Training Children Scientific Research

Mobile: Expert Review: Samsung Jack ™ Cell PhoneCall high…

Mobile: Expert Review: Samsung Propel ™ Pro Cellular Phone

Mobile: Best of the Mobile WebOffering up place as well as …

Linux: Find Files Containing Text

image captionUS regulatory authorities will review authorizations for two coronavirus vaccines this month

Are you prepared? 10 steps to becoming a millionaire in your thirties

Samsung Display teases tri-folding screen and rollable devices

iPhone 12 Pro Max vs. Mate 40 Pro vs. Xperia 1 II vs. Galaxy Note20 Ultra

Callcentric price plans – technology made budget-friendly

Why a Slack acquisition would make sense for Salesforce

Best practices for working with Amazon Aurora Serverless

Automating database migration monitoring with AWS DMS

PHP MySQL BLOB PDF: Display in Browser

The Chosen one

Mobile : Best of the Mobile Web

Mobile : Expert Review: Samsung Propel™ Pro Cell Phone

Mobile : Expert Review: Samsung Jack™ Cell Phone

Mobile : Expert Review: Samsung Epix™ Cell Phone

Mobile : Expert Review: Samsung Gravity™2 Cell Phone

Apply modular design principles to data pipelines

With Product You Purchase

Subscribe to our mailing list to get the new updates!

Related Articles