About

I'm Leonardo Benitez

With over 7 years of experience in data science and software development, I have worked on diverse projects across various industries (such as agriculture and digital marketing) and for companies of distinct sizes (including agile startups, research labs, and renowned enterprises like BMW and HDI).
I build my solutions using mainly Python and Azure services, adhering to state-of-the-art practices such as infrastructure-as-code and automated testing. Additionally, I have excellent knowledge of Databricks, Snowflake, and AWS. Even though I have done a lot of "classical Machine Learning" (including deep networks and all sorts of things that were cutting edge a few years ago), nowadays I mostly develop systems involving large foundational models and generative AI.

Download my resume

lsBenitezPereira@gmail.com

Portfolio

Check out a few of my works

Finance

Financial copilot

Our client wished to enhance their web-based product with an integrated custom copilot. Working closely with them for several months, our solution was thoroughly integrated with their entire ecosystem, implementing a copilot with access to the client's Snowflake data warehouse, internal REST APIs, and other data sources. Our solution utilized autonomous agents built with Semantic Kernel and PromptFlow, incorporating PaaS offerings such as AzureML and Azure OpenAI. Through this endeavor, we optimized the client's processes and enhanced the functionality of their products.

Open-source

Vision-unlearning library

I'm the main contributor to the open-source library Vision Unlearning, that provides a standard interface for unlearning algorithms, datasets, metrics, and evaluation methodologies commonly used in Machine Unlearning for vision-related tasks. Through the coordination of several researchers, we designed and developed novel machine unlearning algorithms for generative models, with a strong focus on efficiency and scalability. Furthermore, we are developing a comprehensive benchmark to evaluate unlearning methods on generative models, analyzing their impact across various aspects such as concept entanglement, generalization, and fairness.

Academic publications:

Novel Machine Unlearning Method for Image Generation, Conference Tudományos Diákkör

IUN-INT: A Benchmark for Interference in Generative Model Unlearning, in review process

Presentation in the Machine Learning Week Europe

E-commerce

Corporate data lake

The project revolves around setting up a comprehensive data infrastructure for a large e-commerce company. This entails implementing Terraform code to establish a data ecosystem in Azure, following the Microsoft Cloud Adoption Framework's best practices. Key tasks include developing data ingestion mechanisms via Azure Data Factory and Azure Functions, ensuring seamless integration of on-premises data into the data lake. Additionally, the project focuses on organizing the logical schema and serving layer utilizing Azure Synapse. Although my involvement doesn't extend to building PowerBI dashboards, my responsibilities encompass various aspects of DevOps, including pipeline implementation and Terraform utilization. Furthermore, I contribute to the design of Synapse tables and the development of data ingestion pipelines within Azure Data Factory.

IT Security

Cyber Security Center

We enhanced the productivity of our Security Operations Center service offering by building a set of automations to facilitate the analysis and resolution of security incidents, along with monthly reports generated automatically for each customer. Furthermore, we implemented strategies to ingest streaming data from on-premises firewalls into our time-series database and integrated with third-party data providers to gather open-source intelligence information. Our tech stack included Azure Functions, Cosmos DB, OpenAI, among others.

Agriculture 4.0

Pest detection

The project focused on detecting and eliminating weeds in sugar beet plantation fields. I implemented Microsoft IoT Edge modules to read video camera data, perform inference using machine learning models, and upload the data to the cloud. I also wrote the Azure DevOps pipelines for continuous integration and delivery of the solution. Additionally, we trained and deployed a machine learning model using Azure Machine Learning to detect diseases in crops from pictures of the plants.

Insurance

Internal AI chat

The project delivered an internal chatbot for thousands of employees, tailored for them to interact with company documents. Essentially, it's a self-hosted version of ChatGPT, customized with enterprise controls and additional features such as integrated RAG with internal documents. The tech stack includes Cloud Foundry, GitLab pipelines, FastAPI, Azure Cognitive Search, and Azure OpenAI. Tasks involve implementing data ingestion and retrieval workflows to enhance the generative chat functionality.

Medical Imaging

Brain decoding

This ambicious decade-long research project utilized cutting-edge brain imaging techinques to understand how sensory information is represented in the brain. I participated in their ML-related research lines by architecting and training a deep Conditional Generative Adversarial Network (cGAN) to decode brain activity with minimal preprocessing. This work paved the way for new biomedical methods, such as developing a fully data-driven method for generating functional maps in sensory regions from high-resolution functional ultrasound imaging (fUSI).

Marketing

Mining internet texts

I worked on a project to enhance an existing NLP data processing pipeline, mainly training text classification models using large language models like BERT. I also implemented a high-performance web scraping system using AWS ECS. Additionally, I set up a test pipeline to compare the performance of AWS OpenSearch and AWS Athena, analysing how we could refactor our pipeline to handle five times more input data than before.

IT-Support

Capivara

We considerably improved the resolution of support tickets by developing a semantic layer above our database of historical tickets, allowing us to retrieve past tickets that could be used to solve new problems. We also created a system to redirect incoming tickets to the best-suited employee based on the ticket subject and the employee's skills. Additionally, we enriched the ticket metadata with information about its language, named entities, and other relevant details.

Academic publications:

Machine Learning for Classification of IT Support Tickets, IEEE CyMaEn

Manufacturing

Corporate data lake

This project involved creating an Azure-based data lake to store data generated in a vehicle factory. We defined the architecture of the data lake and developed connectors to inject and transform data from production systems to the cloud. Our system could parse a variety of data formats (PDF, XML, etc.) into a standard schema in our object-oriented database. Users in Quality Management could then search for quality-related data based on specified parameters. The project involved ingesting files generated by factory machines into Azure Blob Storage, processing them using Azure Functions, saving to a CosmosDB database, and providing a Streamlit web interface for querying and retrieving the parsed content.

Smart Animal Agriculture

Barn 4.0

This was an applied research project for the dairy industry, developed at Häme University of Applied Sciences, Finland. My role in the project was to apply computer vision techniques to recognize the behavior of the cows: identify whether they were aggressive, how they were interacting, if they were stressed, among others. I also helped to cross this information with other data sources (weather, milk quality, etc) to analyze what stresses them.

Academic publications:

Deep learning image recognition of cow behavior and an open data set acquired near an automatic milking robot, Journal Agricultural and Food Science

Integrando gerenciamento energético e bem-estar animal utilizando aprendizado de máquina e visão computacional, International Conference on Information Resources Management

Presentation in the HAMK Beyond seminar

Digital Advertising

Fellow

The project aimed to deliver contextual advertisement for digital television. We developed an edge application to recommend ads based on user behaviors and preferences, enabling programmatic advertisement for the main television groups in Brazil. I worked with large-scale databases, handling tens of millions of requests per day, and improved the applications that consumed this data, enhancing the reliability and availability of our services.

Modern work

O365 analytics

The project involved understanding which software and licenses each user was using by ingesting data from Microsoft's O365 Unified Logs. Additionally, we summarized the licenses used by each user and provide this aggregated information through a REST API with OAuth authentication. The tech stack included Log Analytics, Azure Functions and FastAPI.

Energy Management Systems

SmartIFSC

Our lab built an end-to-end Energy Management System to public buildings, based on ISO 50.001 standard,successfully saving about 10% in the energy usage of several Brazilian universities. My role there was to develop ML models to consumption forecast, Knowledge Discovery from Databases, among others. I also helped to develop the web interface and to develop the hardware and firmware used to collect the data.

Academic publications:

Modeling of Energy Management Systems using Artificial Intelligence, IEEE International Systems Conference

Energy management web portal prototype (PGEN) for public institutions, IEEE Chilecon

Development of a phasor measurement unit prototype applied to Brazilian and Chilean electrical systems, IEEE Chilecon

Mining Industry

Lóris Industrial Intelligence

This project was a stillborn startup: we developed IoT + AI solutions to the mining industry, monitoring grinder machines and aiming to optimize the process. We made great progress technically, but not real money... well, I was the CEO, so I learned a lot about project management, business, and marketing.

Academic publications:

Machine Learning Applied to Energy Efficiency of Large Consumers, IEEE Chilecon

Extração de conhecimento de uma base de dados real sobre mineração, SEPEI seminar

One of the winners of the IFSC Innovation Challenge