Tomás
Gonçalves

Welcome to my online resume!

I'm Tomás, a data scientist and consultant based in Copenhagen. This site gives you a better sense of who I am and what I do beyond the CV.

Let's chat!

This chatbot is trained on my background. Ask me about my work, studies, or projects. Instead of reading through everything, you can just ask!

Demo: KYC compliance Review

Built a KYC compliance review agent as the primary proof of concept, the kind of document-heavy, judgment-intensive workflow that financial services firms run manually today.

A compliance officer uploads a customer document. The agent reviews it for completeness, consistency, and risk flags, then returns a structured verdict (Approve / Escalate / Refer). No client data ever reaches the model.

This demo runs against two examples:

Example 1 - Clean file
Example 2 - Incomplete

Disclaimer: This is a proof of concept built independently. It is not in production and has no organizational affiliation.

Passion Projects

SurrogateAI

Problem: The explosion of AI agents and LLMs creates a real problem for European organisations: the most powerful models run on US cloud infrastructure, and feeding them real client data creates immediate GDPR exposure. For financial services firms handling KYC documents, client records, and transaction data, this is not a theoretical risk. It is a hard blocker on AI adoption.

Solution: I built a pipeline that automatically detects and replaces sensitive information (names, financial identifiers, health data, government IDs) with realistic-looking surrogates before anything reaches the model. The LLM reasons over fake-but-coherent data, and the response is reconstructed back to the original values before the user ever sees it. The redaction layer is completely invisible. The pipeline is task-agnostic and can be applied to any workflow that requires an LLM to reason over sensitive data.

Architecture: Redaction and Reversible Anonymization

Demo:

2,200+ active listings aggregated daily across 35+ sources
Interactive map with price-coded clustering vs neighbourhood median
Per-property valuation with good deal, fair market, and overpriced ranges
Price history tracking per listing
Comparable properties and neighbourhood-level market analysis
Best deals table: properties 10%+ below the local median
Saved search alerts for new listings and price drops
Automatic detection of private sellers listing without agents
LLM-powered assistant to explore listings and find opportunities in natural language

Disclaimer: This is a proof of concept built independently. It is not in production and has no organizational affiliation.

Lisbon Real Estate Market Tool

Problem: Lisbon's real estate market has been growing at one of the fastest rates in Europe, prices up year on year, inventory tight, and the best deals absorbed before they are widely seen. I am Portuguese, from Lisbon, and at some point I want to buy a flat back home. Doing that from Copenhagen, with no local network and no agent working in my corner, means I am always a step behind. Most property portals show you what is already in front of everyone else. I wanted to see more.

Solution: I built a full-stack intelligence platform that scrapes 35+ Portuguese property portals daily, deduplicates and normalises the data into a structured database, and surfaces it through an interactive dashboard. The system tracks every listing across sources, detects price changes over time, identifies properties priced below their neighbourhood median, and flags private sellers listing without agents.

Architecture: Automated Scraping and Market Intelligence Pipeline

Academic Projects

Master Thesis: Detecting Refugee-Related Misinformation on Social Media

Context: Misinformation about refugees doesn't stay online. It shapes political narratives, reinforces prejudices, and in cases like the Southport riots and the Springfield pets controversy, spills into real-world violence. Despite the stakes, no peer-reviewed research specifically targeting refugee-related misinformation detection existed.

Project: Built in collaboration with UNHCR, the research question was: how can fine-tuned transformer-based models detect misinformation about refugees on social media? UNHCR already had a hate speech monitoring tool in place and was exploring whether misinformation detection could be added as a complementary function. This thesis was a first step toward making that feasible.

Data: No ready-made dataset existed for this problem, so we built one. We collected and combined 16 misinformation datasets spanning X posts, Reddit, news articles, and fact-checked claims. On top of that, we manually reviewed and labeled 250 refugee-related X posts sourced through Meltwater, creating a validation dataset that didn't exist before.

Methodology: Two modeling phases. First, benchmarking transformer-based models including ModernBERT, Llama 3.1, Gemma 2, and DeepSeek across four dataset variations. Second, taking the best-performing model and using it as a teacher in a knowledge distillation process, training a smaller ModernBERT-large student model to replicate its behavior at a fraction of the size.

Results: The distilled model reached 0.772 accuracy and 0.832 F1-score, matching the teacher at 1/20th of its size. Strong on emotionally charged language, weaker on sarcasm and negation.

Curious about the full research? Read the thesis here.

Check out my GitHub Repository here

Other Academic Projects

Skin lesions classified as benign, malignant, or undetected using SVM and CNN models to compare accuracy and efficiency in automating melanoma diagnosis from image data.

Check out my GitHub Repository here

Melanoma Detection Analysis

Movie Genre Classification Models

Can a model read a plot summary and guess the genre? Using BERT, RNN, and Naive Bayes on IMDb data, I tested how well NLP can replace manual genre tagging.

Check out my GitHub Repository here

Portugal GDP Forecasting

COVID-19 hit Portugal's GDP hard. I used ARIMA and ETS models on Federal Reserve Economic Data (1995 to 2023) to measure the impact and forecast recovery trends.

Check out my GitHub Repository here

Experience

AI & Data Consultant
[EY]

Remediation project for large Danish Bank: Part of an external remediation engagement addressing incorrect penalty interest on housing loans.

Translated legal and regulatory requirements into data-driven calculation logic.
Analyzed and transformed complex loan and payment data using Python and SQL to enable accurate remediation calculations.
Collaborated with legal, business and technical SMEs and supported quality assurance deliverables.

Member of Proof of Concepts delivery team: Building data and AI-enabled prototypes to showcase capabilities and grow the client project portfolio.

Copenhagen, DK

2025 - Currently

Data Analyst (SA)
[Everllence]

Contributed to the Mail Sorting Automation project, utilizing Machine Learning (ML) and Natural Language Processing (NLP) techniques. Developed and trained the WHAT model to classify emails into categories, enhancing operational efficiency and accuracy.

Designed and deployed a Power BI dashboard for Robot Process Automation (RPA) initiatives, resulting in an automation value of €2.1 million and saving 26,000 manual hours. This tool facilitated clear and quantifiable insights into the financial and operational benefits of automation for stakeholders.

Architected and maintained SQL databases, implementing best practices for data integrity, security, and accessibility

Copenhagen, DK

2023 - 2025

Technology Consultant
[KPMG]

Provided expertise and guidance on IFRS-17 and Solvency II regulations, enhancing compliance processes and increasing report accuracy by an average of 23% across several major financial institutions.

Collaborated with clients to optimize their data management systems, achieving a 12% reduction in data handling time and minimizing processing errors, thereby boosting operational efficiency and improving data reliability.

Lisbon, PT

2022 - 2023

Language Skills

Portuguese

Native Language

English

Level C1 Certified

Spanish

Level A2 Certified

Danish

Level A1 Certified

Skills

Certification

Databricks Academy - Databricks Fundamentals Accreditation

Technical Skills

Programming Languages: Python, R, SQL

Data Platforms: Databricks

AI & LLM Tools: AI Agents, Prompt Engineering, LLM-assisted Development

Data Analysis: Pandas, NumPy, Excel

Machine Learning: Scikit-learn, TensorFlow, Keras

Data Visualization: PowerBI, Tableau, Matplotlib

Tomás Gonçalves

Phone Number:

+45 55258648

tomasggoncalves0@hotmail.com