Lead, Data Engineer
CPP Investments
- Mumbai, Maharashtra
- Permanent
- Full-time
- Diverse and inspiring colleagues and approachable leaders
- Stimulating work in a fast-paced, intellectually challenging environment
- Accelerated exposure and responsibility
- Global career development opportunities
- Being motivated every day by CPP Investments' important social purpose and unshakable principles
- A deeply rooted culture of Integrity, Partnership and High Performance
- Design solutions aligned with long-term architecture and technology strategy using Amazon Web Services (AWS) for Cloud development
- Participate in the development life cycle from start to completion - requirements analysis, development, testing, and deployment
- Work with data analysts to select and acquire non-traditional datasets and access, clean, and pre-process data as required by use cases
- Select appropriate datasets and data representation methods
- Collaborate with data analysts to have necessary data pipelines built
- Build, train, validate and test models using criteria relevant to business objectives
- Work in a fast-paced environment collaborating with data analysts, data engineers and architects
- Ensure architecture will support the requirements of CPP Investments' business
- Prepare, transform, combine and manage structured and unstructured data for use by CPP Investments' business users
- Recommend ways to improve data reliability, efficiency and quality
- Define and shape CPP Investments' future technology and research process
- University degree in Engineering or Computer Science preferred
- Proven ability to work independently as well as to perform effectively in a team-oriented and open-concept environment
- Deep experience working with big data including cleaning/transforming/cataloguing/mapping/ etc.
- Experienced in cloud technology best practices to enable the distribution and analysis of big data on the cloud (formatting/partitioning/etc.)
- Experience of ETL pipelines, managing multiple datasets and providing necessary support
- Familiarity working with data lakes using S3/Redshift
- Exposure to big data workflows and analytics tools (Spark/EMR/Databricks/Cassandra).
- Deep proficiency in Python with experience using Spark, Pandas or PySpark.
- An understanding of CI/CD pipelines and experience with DevOps
- Experience building flexible solutions that can adapt quickly to changing requirements.
- Ability to work in an entrepreneurial environment and be a self-starter
- Interests in the financial industry
- Proven attention to accuracy and detail, highly organized with the ability to prioritize and multi-task
- Exceptional (written and verbal) communication skills and interpersonal skills
- Ability to work in a high performing culture with time-sensitive deadlines
- Personable, easily interacts with all types of personalities and at all levels with a high degree of professionalism
- Exemplify our Guiding Principles of Integrity, Partnership, and High Performance