Projects

Fatima Fellowship - foundation model on edge devices

LLM models are huge in terms of memory sizes (a few GBs to hundreds of GBs). Yet, edge devices do not have hundreds of GBs of memory available. Along with this, they mostly have shared memory systems (shared DRAM) that CPUs, GPUs and other accelerators, and all such computing units use shared memory to access/update the data. The use of LLMs can be quite common in many cases. However, if multiple application uses shared memory at the same time, they will slow down each other. We demonstrated this behavior for deep neural networks (DNNs) in our earlier study by using GPU and DLA (deep learning accelerator) on NVIDIA Jetson Edge devices. So, to eliminate such memory limitations and unexpected slowdowns of execution, we can develop efficient memory management mechanisms for LLMs on edge devices.

Edge AI Foundation models Large Language Models Accelerated Computing

Take Note

Takenote is a product that supports meeting secretaries to enhance their working speed. This product includes several modules, which inherit the power of Large Language Models (LLMs) and state-of-the-art research in audio processing. TakeNote is equipped to transcribe meetings to text, identify speakers in dialogue, summarize the contents, do content semantic searching.

Deep Learning Generative Models Large Language Models NLP

Auto Review

This project aims to support art reviewers in fastening their reviewing process. This in-house developed framework will review more than 20 high standards of any Image Stock platform. This project cover up to 30k images per day and save up the operation cost to 50%, in several case, it can replace human in making decision.

Achievments : An AI solution that support reviewers in evaluating photographs

Challenges : More than 20 standards including tiny objects in photos, enormous number of domains, etc

Organization :Pixta Vietnam

Deep Learning Light-weighted model Cloud Based Solution

Client Scan

This product of ClientScan This product of ClientScan, where I work as an AI advisor and a system engineer. This product is a full-package Face Recognition (FR) solution, including cloud base registration, an extensive database, and an edge-on-face recognition solution. We collaborated with advisors from Oxford University to develop a light-fast FR model that works with a massive number of identifiers (>10k) with an accuracy of> 99.4% while handling up to 30 FPS.

Deep Learning Light-weighted model Cloud Based Solution

Image Generation

This project of PIXTA start from Nov, 2022, inherites the current advances of Stable Diffusion (SD) to create Image Stock level photos for commercial. We want to add more content into the SD model by finetuning it. Plus, we want to direct SD to responsible AI.

Achievments : All in apps including txt2img, img2img, inpaint, outpaint, upscale, imgvariation

Challenges : Host several models in one app and reduce huge computation

Organization :PIXTA Vietnam

Deep Learning Generative Models AI Art

Learning-to-Rank

Learning-to-Rank re-rankes the results of PIXTA Inc's search system to boost revenue of the company, while provide the most matching resuts to end-users. We build a machine learning model from billions user's data. The project employed MLOPs manner to automatically get data, train, and evaluate.

Achievments : Define a feature set based on business, applied active learning model, increase up to 6% revenue per month (30000USD)

Challenges : Complicated price structure, huge data, very unbias behavior

Organization :PIXTA Vietnam

Machine Learning Information Retrival Learning2Rank LGBM MLOPs

Tag Suggestion

Tag Suggestion is a project that My team and I want to enhance UX of users on Pixta Image Stock platform by providing them an AI tool to tag their uploading photos. Beyond this scope, we are using this tool to annotate any kind of photo we have in the database for training. Today, the model can annotate up to 25000 classes in a run.

Achievments:

        ⚫ In 2020, I have built the baseline with ResNest

        ⚫ Apply Meshed-Memory a mixed CNN & Transformer structure

        ⚫ In 2021, I proposed a removal CNN to use Vision Transformer

Challenges : The number of class is huge (25k) and more. Model requires big data to train (up to more than 5 million samples)

Organization :PIXTA Vietnam

Deep Learning Auto Annotation Vision Transformer ResNest Meshed-Memory Transformer

Face Attribute Recognition

The model based on the big backbone of Face Recognition and trained in multi-task learning manner to make model recognize age and emotion of models. We also apply context-aware recognition to extract both facial and contextual features to enhance accuracy of the model

Achievments : With the new solution, our model can improve the quality in age and emotion labels

Challenges : No big public dataset is available while our data is noisy

Organization :PIXTA Vietnam

Deep Learning Face Attribute Classification Context-Aware Efficient Net

Face Recognition

We have big facial dataset without any annotation. In order to fasten the identity assignment process, we use face recognition to automatically assign ids. We clustered hundred thousands unlabelled faces in the database.

Achievments : Hundred thousands unlabelled faces are assigned identities

Challenges : Two many classes to handle, each face has different emotion, attributes, non-human faces, etc.

Organization :PIXTA Vietnam

Deep Learning Unsupervised Learning Clustering DBScan

Sorgenia Voice Bot for Customer Care

In 2018, I joint Pi School at Rome, Italy to develop a POC Customer Care Chatbot, the first voice bot in the energy field in Italy. My sponsor is Sorgenia, an innovative, young, and green energy company.

Achievments : The first voice bot in the energy field in Italy

Challenges : A real voice chatbot requires many data to develop

Organization :Pi School

Chatbot Dialogflow Voice Assistant

Panasonic core conversational system

In 2017, I start working in a project at Panasonic R&D Center Vietnam to build the core conversational system. The project aim to build the core NLP solution for appliances, autonomous, smart devices, HR, etc system at Panasonic. The core include speaker identification, speech recognition, multi-domain conversational system, wake-up word recognition. We also applied this solution to a healhcare center for old and lonely people in Kyoto, Japan.

Achievments : A full converstaional system

Challenges : Data limitation in Japanese. Too many features in one conversational system.

Organization :Panasonic R&D Center Vietnam

Converstaional system Deep Learning Voice Recognition Speaker Recognition