Overview

We are looking to connect with Big Data Engineers who will work on collecting, storing, processing, and analyzing huge sets of data. The primary focus will be on choosing optimal solutions to use for these purposes, then maintaining, implementing, and monitoring them. Responsibilities include integrating solutions with the architecture used across the organisation.

The following represents details related to some of the above:

 

  1. Selecting and integrating any Big Data tools and frameworks required to provide requested capabilities
  2. Implementing ETL process
  3. Monitoring performance and advising any necessary infrastructure changes
  4. Defining data retention policies
  5. Proficient understanding of distributed computing principles
  6. Management of Hadoop cluster, with all included services
  7. Ability to solve any ongoing issues with operating the cluster
  8. Proficiency with Hadoop v2, MapReduce, HDFS
  9. Experience with building stream-processing systems, using solutions such as Storm or Spark-Streaming
  10. Good knowledge of Big Data querying tools, such as Pig, Hive, and Impala
  11. Experience with Spark
  12. Experience with integration of data from multiple data sources
  13. Experience with NoSQL databases, such as HBase, Cassandra, MongoDB
  14. Knowledge of various ETL techniques and frameworks, such as Flume
  15. Experience with various messaging systems, such as Kafka or RabbitMQ
  16. Experience with Big Data ML toolkits, such as Mahout, SparkML, or H2O
  17. Good understanding of Lambda Architecture, along with its advantages and drawbacks
  18. Experience with Cloudera/MapR/Hortonworks