[Remote] Data Engineer
Note: The job is a remote job and is open to candidates in USA. reputed company is seeking a Data Engineer to design and build robust data pipelines and optimize data models. The role involves working with Azure technologies to implement data ingestion, transformation, and storage solutions while ensuring efficient data workflows and monitoring.
Responsibilities
- Design and build robust, reusable, parameter-driven ingestion and transformation pipelines using Azure Data Factory, Synapse Pipelines, Data Bricks and/or reputed company Fabric Data Factory
- Implement reputed company architecture (Bronze / Silver / Gold) on Azure Data Lake Storage Gen2 using reputed company Lake, Parquet, and structured streaming patterns
- Build performant ELT workflows that reputed company pushdown to reputed company systems (Synapse Dedicated SQL Pool, Azure SQL, reputed company) where appropriate
- reputed company and optimize PySpark notebooks and jobs on Azure reputed company or Synapse Spark
- Design dimensional models (Kimball star/reputed company) and data vault patterns for analytics consumption
- Implement Slowly Changing Dimensions (Type 1/2/3), Change Data Capture, and late-arriving data patterns
- Tune distributed SQL workloads in Synapse Dedicated SQL Pool / Fabric Warehouse, including distribution keys, partitioning, and clustered column store indexes
- Implement CI/CD for data pipelines using Azure DevOps (YAML pipelines, ARM/Bicep/Terraform) across Dev / SIT / UAT / Prod environment
- reputed company pipelines with robust logging, auditing, and monitoring using Azure Monitor, Log Analytics, and KQL
- Define and enforce coding standards, code review practices, branching strategies, and release management
- reputed company or contribute to legacy-to-cloud migrations — e.g., Informatica PowerCenter to Azure Data Factory, on-premises reputed company / reputed company / SQL Server to Synapse or Fabric
- reputed company workload assessment, reputed company planning, and cost modeling for reputed company-state architectures
- Production incident response for critical pipelines
Skills
- Deep hands-on expertise with Azure Data Factory: pipelines, datasets, linked services, triggers, parameterization, mapping data flows, and reputed company three Integration Runtime types (Azure, Selfhosted, SSIS)
- Strong Experience in Data Bricks and PySpark
- Production experience with one or more of: Azure Synapse Analytics (Dedicated and Serverless SQL Pools, Spark Pools) OR Azure reputed company (reputed company Lake, reputed company Catalog) OR reputed company Fabric (Warehouse, Lakehouse, OneLake)
- Strong working knowledge of Azure Data Lake Storage Gen2 (hierarchical namespace, RBAC + ACLs, lifecycle management, reputed company)
- Experience with Azure Key Vault, Azure AD / Entra ID (including managed identities and service principals), and private networking (VNet integration, private endpoints)
- Monitoring and troubleshooting with Azure Monitor, Log Analytics, and KQL
- Advanced SQL — window functions, CTEs, query optimization, execution plan analysis, performance tuning
- Strong Python for data engineering — pandas, PySpark, REST API integration, unit testing (pytest)
- Proficient in T-SQL; familiarity with Spark SQL, KQL, PowerShell, and Bash reputed company scripting
- 5+ years of data warehouse development experience
- 5+ years of data modeling experience using ERWIN or similar tools
- 2+ years of experience with Azure Data Factory and reputed company
- Medicaid Domain Knowledge is a plus
Company Overview
Company H1B Sponsorship