SME Data Engineer
ManTech
- Ashburn, VA
- Permanent
- Full-time
- Developing large volume data sets sourced from multitude of relational (Oracle) tables to allow data analysts and scientists to construct training sets for machine learned models and recurring reports & dashboards.
- Working closely with government client and technical teams (DBA’s etc.) to assist in creating/managing/optimizing scheduled Extract, Transform and Load (ETL) jobs and workflows.
- Data analysis, problem solving, investigation and creative thinking to manage very large datasets to be used in variety of formats for varying analytical products.
- Assist with the implementation of data migration/pipelines from on-prem to cloud/non-relational storage platforms.
- Respond to data queries/analysis requests from various groups within an organization. Create and publish regularly scheduled and/or ad hoc reports as needed.
- Researching and documenting data definitions and provenance for all subject areas and primary data sets supporting the core business applications.
- Responsible for data engineering source code control using GitLab.
- Experience with relational databases and knowledge of query tools and/or BI tools like Power BI or OBIEE and data analysis tools.
- Experience with the Hadoop eco system, including HDFS, YARN, Hive, Pig, and batch-oriented and streaming distributed processing methods such as Spark, Kafka, or Storm.
- Strong experience in automating ETL jobs via UNIX/LINUX shell scripts and CRON jobs.
- Demonstrate a strong practical understanding of data warehousing from a production relational database environment.
- Strong experience using analytic functions within Oracle or similar tools within non-relational (MongoDB, Cassandra etc.) database systems.
- Experience with Atlassian suite of tools such as Jira and Confluence
- Knowledge of Continuous Integration & Continuous Development tools (CI/CD).
- Must be able to multitask efficiently and progressively and work comfortably in an ever-changing data environment.
- Must work well in a team environment as well as independently.
- Excellent verbal/written communication and problem solving skills; ability to communicate information to a variety of groups at different technical skill levels.
- 5+ years of experience in developing, maintaining and optimizing complex Oracle PL/SQL packages to aggregate transactional data for consumption by data science/machine learning applications.
- 10+ years of experience in working in large (80+ TB) complex data warehousing environment. Must have full life cycle experience in design, development, deployment and monitoring.
- Experience with one or more relational database systems such as Oracle, MySQL, Postgres, SQL server, with heavy emphasis on Oracle.
- Experience in architecting data engineering pipelines/data lakes within cloud services (AWS, GCP etc.).
- Experience with Amazon S3, Redshift, EMR and Scala.
- Experience with migrating on-prem legacy database objects and data to the Amazon S3 cloud environment.
- Strong experience in converting JSON documents to targets such as Parquet, Postgres, and Redshift.
- Experience or familiarity with data science/machine learning and development experience for supervised and unsupervised learning with structure and unstructured datasets.
- HS Diploma/GED and 20+ years
BS/BA and 12+ years
MS/MA/MBA and 9+ years
PhD/Doctorate and 7+ yearsCertification:
- None
- Must be a U.S Citizen with the ability to obtain DHS CBP Suitability and Top Secret clearance
- DHS Suitability required before start
- The person in this position needs to occasionally move about inside the office to access file cabinets, office machinery, or to communicate with co-workers, management, and customers, which may involve delivering presentations.