Data Engineer

MKolinski

Job Overview
We are looking for a self-motivated Data Engineer to join a growing team of mission oriented analysts and developers. The hire will be responsible for expanding and optimizing our data and data pipeline architecture, as well as optimizing data flow from multiple operational domains across multiple network domains.  The Data Engineer will support software developers, database architects, and data analysts and will ensure optimal data delivery architecture is consistent throughout ongoing projects. Candidate must be self-directed and comfortable supporting the data needs of multiple projects and diverse data products.

Responsibilities

  • Create and maintain optimal data pipeline architecture,
  • Manage data flow pipeline across 3 networks
  • Prepare data in xml format for consumption by cross-domain solution (CDS) to other networks in accordance with security requirements/policy
  • Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources via REST, SOAP, WSDL APIs
  • Create flowfiles in nifi that move data into Accumulo via Kafka

Qualifications / Skills

Must have TS clearance and be a US citizen

  • Applicants should also have a demonstrated understanding and experience using software and tools including big data tools like Nifi, Kafka, Spark and Hadoop; relational NoSQL and SQL databases including Accumulo and Postgres
  • Graph technology such as Neo4j is a bonus, but not required
  • Familiarity with Horton Data Flow (HDF) framework
  • Experience building and optimizing ‘big data’ data pipelines, architectures and data sets.
  • Strong analytic skills related to working with unstructured datasets.
  • Build processes supporting data transformation, data structures, metadata
  • A successful history of manipulating, processing and extracting value from large disconnected datasets.
  • Working knowledge of message queuing, stream processing, and highly scalable ‘big data’ data stores.
  • Candidate should have experience using the following software/tools:
    • Experience with big data tools: Hadoop, Spark, Kafka, etc.
    • Experience with relational SQL and NoSQL databases, including Postgres and Accumulo
    • Experience with stream-processing systems: Storm, Spark-Streaming, etc.
    • Experience with object-oriented/object function scripting languages: Python, Java, C++, Scala, Jupyter Notebooks, Matlab

Education / Training / Experience

  • Bachelor’s Degree (Computer Science or related, relevant field)
  • 3-5 years of experience building and optimizing data pipelines, architectures and data sets
  • Security + (preferred)
  • Vendor specific certifications are not required, but beneficial.  The most relevant vendor specific certification would be the CCP Data Engineer for Cloudera certification that shows the individual has proven experience in ETL analytics and tools.

MKolinski

Name(Required)
Max. file size: 128 MB.