Aarushi Jain | Computer Science Enthusiast

Hi, I'm Aarushi Jain.

A

Passionate software engineer with big data and backend development expertise.

About

I am a highly motivated Computer Science professional with a passion for cutting-edge technologies and problem-solving. I successfully attained my Master's degree in Computer Science from Columbia University. I am actively seeking software developer opportunities to apply my skills and passion for technology in real-world projects.

Languages: Java, Scala, SQL, Python, C, C++, HTML/CSS
Databases: MySQL, Cassandra, Amazon DynamoDB, Amazon Opensearch
Other Tools: AWS, Jenkins, Kafka, Spark, React.js, Spring Boot, GitHub, Hadoop, PIG

Experience

UnitedHealth Group

Software Engineer

Developed and deployed Spring Boot (Java) applications processing real-time healthcare data through Kafka streams, leveraging Cassandra and MySQL databases and collaborating with cross-functional teams, while ensuring seamless deployment on OpenShift via CI/CD pipelines (Jenkins) for efficient data management.
Collaborated with offshore teams to successfully transform PIG modules to Spark-Scala applications, resulting in a remarkable 60% decrease in processing time for healthcare provider data inventory management and introduced automated testing, eliminating the requirement for manual quality assurance.
Conducted a proof of concept (POC) on alert mechanisms and dashboards using Splunk and Grafana tools to monitor the performance of Spring Boot applications.
Created SQL scripts to support backend logic and shell scripts to execute Spark-Scala, Python, and SQL modules across various environments.

Tools:

July 2019 - December 2021 | Haryana, IN

PowerGrid Corporation of India

Power Systems Intern

Analyzed and created a report on daily power consumptions of areas managed by substation
On the basis of report, proposed a solution to optimize solar energy utilization by attenuating duck curve problem using Particle Swarm Optimisation and networking based on blockchain technology, decreasing wastage by 30%. Programmed on MATLAB and performance of model analyzed using RMSE error

June 2017 - August 2017 | Delhi, IN

Reliance Industries Limited

Market Research Intern

Conducted a market research on the applications of geotextiles in India

December 2017 - January 2018 | Delhi, IN

PwC

Software development Intern

Constructed a mechanism for automation of income tax return filing system leveraging an in-house software
Took initiative to conduct a POC on designing dynamic excel sheets for automation of tasks using excel macros

June 2017 - August 2017 | Delhi, IN

Projects

                
Dining Concierge Chatbot
                  Serverless, micro-service driven Chat Bot using AWS services
                
AccomplishmentsTools: OpenSearch, DynamoDB, Amazon Lex, Lambda Functions, Amazon SES, Amazon SQS
Goal: Develop a serverless, microservice-driven web application that sends restaurant suggestions given a set of preferences provided to the chatbot
Methodology: Managed restaurant data from Yelp API by storing it in DynamoDB table and OpenSearch index. Implemented Amazon Lex with customized intents and utilized a Lambda function (LF1) as a code hook to optimize the bot's responses. Leveraged Lambda function (LF2) to query the DynamoDB table and OpenSearch index, enabling the delivery of personalized restaurant suggestions to users through SES (Simple Email Service)

Iterative Set Expansion
                  Implement the Iterative Set Expansion (ISE) algorithm using SpanBert and GPT-3
                
AccomplishmentsTools: Python(Programming Language), Beautiful Soup, spaCy, SpanBERT, OpenAI GPT-3 API
Goal: Implement the Iterative Set Expansion (ISE) algorithm for information extraction from web pages, using a seed query, extraction confidence threshold, and desired number of tuples to extract.
Methodology: Implemented text preprocessing techniques for relevant documents, including sentence splitting and entity pair extraction using Spacy. Converted entity pairs into GPT-3 and SpanBERT input format and utilized the spaCy and openAI API to feed input to pre-trained models. Added tuples to the extracted tuples list if the predicted relation matched.

Information Retrieval System
                  Information retrieval system to improve the search results returned by Google
                
AccomplishmentsTools: Python(Programming Language), NLTK(Natural Language Toolkit, GCP
Goal: Develop an information retrieval system that exploits user-provided relevance feedback to improve the search results returned by Google
Methodology: Implemented preprocessing techniques for query standardization and Google search results.Developed a query expansion algorithm that computed unigram and bigram-based expansions, incorporating tokenization, stop words removal, and features such as term frequency, inverse document frequency, proximity, and part of speech tagging.

Photo Album Web Application
                  Photo Album Web Application using AWS services
                
AccomplishmentsTools: OpenSearch, DynamoDB, Amazon Lex, Lambda Functions, Amazon SES, Amazon SQS
Goal: Implement a photo web application that can be searched using natural language through both text and voice
                    Methodology:Implemented a Photo Album Web Application with OpenSearch instance "photos" and S3 bucket for photo storage, using "index-photos" Lambda function for Rekognition-based label detection and OpenSearch indexing. Set up Amazon Lex bot "SearchIntent" for handling natural language search queries and built RESTful API with API Gateway for direct photo uploads and search requests.
                    
Association Rule Mining
                  Python application to generate rules of interests
                
AccomplishmentsGoal: Implemented the Apriori Algorithm for Data Mining to generate association rules of interest from a given dataset. 
                    Methodology:Utilized the 311 service requests dataset available on the NYC Open Data site.Cleaned the dataset by removing irrelevant columns, duplicates, and trivial data.Implemented Apriori Algorithm by generating frequent itemsets with support greater than minimum support. Generated high-confidence association rules from the frequent itemsets, considering minimum confidence as the threshold.