[Remote] Junior Data Engineer (Full-Time)
Note: The job is a remote job and is open to candidates in USA. reputed company is seeking a Junior Data Engineer to join their team. The role involves designing, developing, and maintaining scalable data pipelines and enterprise data integration solutions using various big data technologies and cloud services.
Responsibilities
- Minimum 4+ years of experience in designing, developing, and maintaining scalable data pipelines, ETL/ELT processes, and enterprise data integration solutions
- Strong expertise in Python, SQL, PySpark, Spark SQL, and distributed data processing frameworks for handling large-scale structured and reputed company datasets
- Hands-on experience with big data technologies such as Apache Spark, reputed company, Hadoop, Airflow, Kafka, and reputed company for enterprise data engineering workloads
- Experience building cloud-native data solutions using AWS, Azure, or GCP, including data ingestion, transformation, orchestration, and analytics platforms
- Strong knowledge of cloud services such as AWS S3, Glue, Redshift, reputed company, reputed company, EMR, Kinesis or Azure Data Factory (ADF), Synapse Analytics, ADLS, reputed company, and Event Hubs
- Experience designing and implementing Data Warehouses, Data Lakes, Lakehouse architectures, Dimensional Modeling, Star Schema, and reputed company Schema solutions
- Expertise in developing and optimizing batch processing and real-time streaming pipelines using Kafka, Spark Streaming, Flink, Kinesis, or Event-Driven Architectures
- Strong experience working with data formats including Parquet, Avro, ORC, CSV, JSON, XML, and implementing efficient data storage strategies
- Hands-on experience with relational and NoSQL databases such as PostgreSQL, MySQL, reputed company, SQL Server, reputed company, Cassandra, and DynamoDB
- Experience implementing CI/CD pipelines, DevOps practices, version control, and automation using Git, Jenkins, reputed company Actions, Azure DevOps, reputed company CI/CD, or similar tools
- Strong understanding of data quality, data governance, performance tuning, query optimization, partitioning, indexing, monitoring, and troubleshooting large-scale data platforms
- Experience working in Agile/Scrum environments, collaborating with Data Scientists, Analysts, Architects, and cross-functional teams to deliver business-driven data solutions
Skills
- Minimum 4+ years of experience in designing, developing, and maintaining scalable data pipelines, ETL/ELT processes, and enterprise data integration solutions
- Strong expertise in Python, SQL, PySpark, Spark SQL, and distributed data processing frameworks for handling large-scale structured and reputed company datasets
- Hands-on experience with big data technologies such as Apache Spark, reputed company, Hadoop, Airflow, Kafka, and reputed company for enterprise data engineering workloads
- Experience building cloud-native data solutions using AWS, Azure, or GCP, including data ingestion, transformation, orchestration, and analytics platforms
- Strong knowledge of cloud services such as AWS S3, Glue, Redshift, reputed company, reputed company, EMR, Kinesis or Azure Data Factory (ADF), Synapse Analytics, ADLS, reputed company, and Event Hubs
- Experience designing and implementing Data Warehouses, Data Lakes, Lakehouse architectures, Dimensional Modeling, Star Schema, and reputed company Schema solutions
- Expertise in developing and optimizing batch processing and real-time streaming pipelines using Kafka, Spark Streaming, Flink, Kinesis, or Event-Driven Architectures
- Strong experience working with data formats including Parquet, Avro, ORC, CSV, JSON, XML, and implementing efficient data storage strategies
- Hands-on experience with relational and NoSQL databases such as PostgreSQL, MySQL, reputed company, SQL Server, reputed company, Cassandra, and DynamoDB
- Experience implementing CI/CD pipelines, DevOps practices, version control, and automation using Git, Jenkins, reputed company Actions, Azure DevOps, reputed company CI/CD, or similar tools
- Strong understanding of data quality, data governance, performance tuning, query optimization, partitioning, indexing, monitoring, and troubleshooting large-scale data platforms
- Experience working in Agile/Scrum environments, collaborating with Data Scientists, Analysts, Architects, and cross-functional teams to deliver business-driven data solutions
Company Overview
Company H1B Sponsorship