I am Hadi Abdine, an engineer at MBZUAI France lab. I hold a Ph.D. in computer science, data and AI from Institut Polytechnique de Paris. My studies were done at LIX-École Polytechnique with the DaSciM team
under the supervision of Prof. Michalis Vazirgiannis.
My current research work focuses on natural language processing, pretrained language models and their application.
Before joining LIX as a Ph.D. candidate, I graduated with a Master degree in data science
from Institut Polytechnique de Paris, an engineering degree in data science from Telecom Paris
and an engineering degree in computer science and telecommunication from the Lebanese University, Faculty of Engineering 1.
Dedicated to the field of Natural Language Processing (NLP) and the advancements facilitated by large language models, I am deeply passionate about the intersection of technology and linguistics.
My research focuses on diverse NLP applications using transformer-based language models and LLMs. This envolves semantic, political, legal and bioinformatical
(e.g. proteins function generation in free text using their 3D structures and amino acid sequences) applications.
As an AI/NLP researcher, I am primarily responsible for the training and fine-tuning of large and small language models (LLMs), with a particular focus on Arabic dialects. My work involves designing and implementing data collection pipelines, curating high-quality datasets for both pretraining and instruction fine-tuning, and developing robust evaluation frameworks to assess model performance across a range of NLP tasks such as summarization, translation, and sentiment analysis. I led the development of dialect-specific models including Nile-Chat for Egyptian Arabic and Atlas-Chat for Moroccan Darija, optimizing them for dialogue and generative capabilities. My role also includes experimenting with and applying advanced techniques such as LoRA, Direct Preference Optimization (DPO), and multilingual alignment to enhance the performance and adaptability of these models.
Distributed word representations are popularly used in many tasks in natural language processing to achieve high performance in many NLP tasks. In this project, we crawled a huge French corpus and used it to train static French word embeddings (Word2Vec). These word embeddings achived the highest performance in natural language understanding tasks among all the static French word Embeddings. This work is published in CNIA 2022 [PDF]. All the resources and code are published here.
In this internship the main objective was designing and developing an ECG monitoring software using Raspberry Pi 3.
The main obgective of this internship was developing a drawing library, a Face detection tool, and the FaceVerter tool for the social app ”Docomix” using JAVA language.