This module is designed to automatically remove license headers and commented code from source code files. It will support multiple programming languages and can be easily extended to accommodate new ones. The module will be integrated into the data preprocessing pipeline to generate a clean code datasets to train Code LLM. In progress
The objective of this research work is to find possible way of detecting finger movements through wrist with help of powerful sensors, which would thereafter be used to detect the sign language that was made and then convert it to audio. In progress
Read and implement papers on quantum GNN In progress
Connected assets are the assets that are together involved in performing one or more business processes. If vulnerability/threat on one of the assets is exploited, because of the connectedness - it could lead to failures or disruptions to assets it is connected to and/or the entire network. Here the first objective is to get a generalised model for Threat posture of connected assets. Another objective would be researching on how risk score of an asset can vary based on the number of assets it is connected to and nature of assets connected to it. In progress
Populate and enrich a benchmark of tabular data tasks for evaluation of LLMs with our evaluation framework. Primary work includes tasks and datasets selection, writing data loaders, preparing task cards with input/output details and pre-processing steps, and prompts for the tasks. Test the data processing pipeline using our framework and evaluate select set of tasks with LLMs. This benchmark standardizes evaluating tabular data tasks in uniform manner against LLMs. Goal is to add a variety of tabular data tasks and make it a rich resource for benchmarking. In progress
To develop techniques using state-of-the-art AI methods to enrich the experience around LLM usage. Skills Required: Python, ML, DL, LLMs, Hugging Face, exposure to UI development In progress
The work would be around setting up existing open source APIs on large scale and evaluating them over open source testing tools for comparative evaluation against an API Testing technology. In progress
1. Create a solution for measuring the interesting-ness of any dataset (typically a table). 2. Come up with a solution to rank different datasets by this interesting-ness measure. 3. Visualize and explain to an end user what is interesting about the dataset. In progress