Big Data Engineer (PySpark)

Capgemini

Singapore
Permanent
Full-time

17 days ago

Roles and Responsibilities:Develop Spark Scala and PySpark jobs for data transformation and aggregation.Create unit tests for Spark transformations and helper methods.Utilize Spark and Spark SQL to read parquet data and create tables in Hive using the Scala API.Collaborate closely with the Business Analysts team to review test results and obtain sign-off.Prepare design and operations documentation for future reference.Conduct peer code quality reviews and act as a gatekeeper for quality checks.Engage in hands-on coding, often in a pair programming environment.Work within highly collaborative teams to build high-quality code.Qualifications and Requirements:4-10 years of experience as a Hadoop data engineer.Strong expertise in Hadoop, Spark, Scala, PySpark, Python, Hive, and Impala.Familiarity with Oracle, Spark streaming, Kafka, and machine learning (ML).Proficiency in Agile methodologies, CI/CD, Git, Jenkins, and Cloudera Distribution.Solid understanding of data structures, data manipulation, distributed processing, application development, and automation.Experience in Core Banking or Finance domains is desirable.Cloud experience with AWS is a plus.Ref: 1810769Posted on: May 13, 2024Experience level: Experienced Non-ManagerContract Type: Permanent Full TimeLocation:Singapore, 02, SGDepartment: Big Data & Analytics

Capgemini

Apply Now