GO BACK
available
Rafael d.
Data Engineer

United Kingdom-London

13+ years experience

Interested in this profile?

805€/day

PROPOSE A MISSION

MY EXPERIENCE

Santander Bank

Mar 2017

Big Data Developer

PROJECTS:

SMART SEARCH
Tasks:
* Currently working on this project developed entirely in Spark on Scala with TDD approach and ScalaTest (FlatSpec) framework.
* Loading data from Hive tables with billions of records.
* Storing information in parquet files to be loaded on a different Hadoop cluster.

INSTANT PRICING
Tasks:
* Development of a system to gather data to calculate insurance premium quotes for every client in the Bank using Sparktacus (Santander official framework based on Spark and Scala). Achievements:
* Managing tens of millions of records successfully.

OMRS NRT DASHBOARD
Achievements and tasks:
* Near Real Time (NRT) ingestion of XML data generated by a retail banking application and presented on a dashboard based on MicroStrategy technology using the following technologies in Cloudera distribution:
* NRT Ingestion with Flume using Kafka as a channel and as a sink in a multi-hop Flume architecture.
* Development of a Flume custom interceptor for parsing XML files in Java.
* Development in Spark and SparkSQL Scala APIs of a scheduled job for compacting the small files stored in HDFS by Flume to protect NameNode memory.
* Enabling data visualization for MicroStrategy Impala connector using Hive and Impala.
* Java and Scala code Unit testing following TDD approach and ScalaTest (FlatSpec) framework.
* Saving a full amount in license costs and providing better reliability and performance.

Using Agile/Scrum methodologies and Git for version control in every project.

Innovery Spain

Feb 2016 - Mar 2017

Big Data Developer

PROJECTS:

ILMS. INNOVERY LOG MANAGEMENT SYSTEM
Achievements and tasks:
* Ingestion, archiving, processing and query of cybersecurity systems logs compound using the following technologies in Cloudera CDH 5.7.0 distribution:
* Log archiving and log integrity check processes with Spark and SparkSQL using RDDs and DataFrames.
* Ingestion with Flume into Hbase using a CEF log format interceptor and Hbase custom serializer.
* Indexing of HBase stored data with Cloudera Search.
* Data visualization with HUE Search / custom application.
* Job scheduling and integration with Oozie API Rest.

GOLDEN CONTROL PANEL. DEVELOPMENT OF THE FIRST BIG DATA NEAR REAL TIME
ARCHITECTURE FOR SANTANDER BANK IN CHILI
Achievements and tasks:
* Ingestion of execution data from Data Warehouse and other critical systems with Flume into Cloudera Search (near real time) and Hive (batch) in a lambda architecture.
* Batch calculation of statistics from the ingested data using Hive queries scheduled in Oozie.
* Storing data in Cloudera Search (Solr) to show the statistics and the execution data in near real time by means of a dashboard built in Spring Boot.
* Whole project deployed on Cloudera CDH 5.5.5 distribution.

DEVELOPMENT OF A PoC FOR THE TOP TIER ITALIAN BANKING FIRM POSTE ITALIANE
Achievements and tasks:
* Replacement of its current financial accomplishment system with another one based on big data. The aim of it is complying with FATCA rules and having a single source of information and processes for reporting.
* The technical proposal was based on the tool Talend for Big Data.
* Use of Sqoop, HBase, HDFS and MapReduce technologies embedded in Talend.

IVDF. DEVELOPMENT OF A SYSTEM FOR INGESTING AND UPDATING SOFTWARE
VULNERABILITIES STORING THEM IN MONGODB

Using Agile/Scrum methodologies, Git for version control and AWS and Docker Hadoop clusters for devel- opment in every project.

ServiWeb

Nov 2012 - Jan 2016

IT Team Leader

PROJECTS:

ILMS. INNOVERY LOG MANAGEMENT SYSTEM
Achievements and tasks:
* Ingestion, archiving, processing and query of cybersecurity systems logs compound using the following technologies in Cloudera CDH 5.7.0 distribution:
* Log archiving and log integrity check processes with Spark and SparkSQL using RDDs and DataFrames.
* Ingestion with Flume into Hbase using a CEF log format interceptor and Hbase custom serializer.
* Indexing of HBase stored data with Cloudera Search.
* Data visualization with HUE Search / custom application.
* Job scheduling and integration with Oozie API Rest.

GOLDEN CONTROL PANEL. DEVELOPMENT OF THE FIRST BIG DATA NEAR REAL TIME
ARCHITECTURE FOR SANTANDER BANK IN CHILI
Achievements and tasks:
* Ingestion of execution data from Data Warehouse and other critical systems with Flume into Cloudera Search (near real time) and Hive (batch) in a lambda architecture.
* Batch calculation of statistics from the ingested data using Hive queries scheduled in Oozie.
* Storing data in Cloudera Search (Solr) to show the statistics and the execution data in near real time by means of a dashboard built in Spring Boot.
* Whole project deployed on Cloudera CDH 5.5.5 distribution.

DEVELOPMENT OF A PoC FOR THE TOP TIER ITALIAN BANKING FIRM POSTE ITALIANE
Achievements and tasks:
* Replacement of its current financial accomplishment system with another one based on big data. The aim of it is complying with FATCA rules and having a single source of information and processes for reporting.
* The technical proposal was based on the tool Talend for Big Data.
* Use of Sqoop, HBase, HDFS and MapReduce technologies embedded in Talend.

IVDF. DEVELOPMENT OF A SYSTEM FOR INGESTING AND UPDATING SOFTWARE
VULNERABILITIES STORING THEM IN MONGODB

Using Agile/Scrum methodologies, Git for version control and AWS and Docker Hadoop clusters for devel- opment in every project.

AIRBUS

Mar 2008 - Oct 2012

Database Administration Team Leader

Achievements and tasks:
* DBA for the object-oriented database DOORS.

Indra Sistemas

Apr 2007 - Apr 2008

Systems Engineer

Achievements and tasks:
* Becoming expert for the object-oriented database IBM Rational DOORS.

IBM

Jan 2006 - Dec 2006

Trainee

Achievements and tasks:
* Winner of the ``Personal Achievement Recognition Programme'' of IBM.

Lloyds Banking Group

Mar 2019 - Dec 2006

Data Engineer

Working on the development IFRS17 regulatory project, using Spark Streaming, Kafka, HBase and

Apache Phoenix

Isban UK

Mar 2017 - Mar 2018

Big Data Developer Contractor

Working as a freelance/contractor in Big Data projects.

PROJECT OMRS NRT DASHBOARD
Achievements and tasks:
- Near Real Time (NRT) ingestion of XML data generated by a retail banking application and presented on a dashboard based on MicroStrategy technology using the following technologies in
Cloudera distribution:
▪ NRT Ingestion with Flume using Kafka as a channel and as a sink in a multi-hop Flume
architecture.
▪ Development of a Flume custom interceptor for parsing XML files in Java.
▪ Development in Spark and SparkSQL Scala APIs of a scheduled job for compacting the
small files stored in HDFS by Flume to protect NameNode memory.
▪ Enabling data visualization for MicroStrategy Impala connector using Hive and Impala.
▪ Java and Scala code Unit testing following TDD and BDD approaches and and ScalaTest
(FlatSpec) framework.

- Saving license costs of previous systems and better reliability and performance.
- Agile/Scrum methodologies, Git for version control.

-

Mar 2017 - Mar 2018

Licenses & Certifications

Machine Learning With Big Data - Coursera - Apr 2016

MY TESTS

NAME
SCORE
Spark and Scala quiz medium level
15/20

Newest members that made a profile

ACCESS OUR FREELANCERS

CONTACT US

OK