Hi, I'm Aarushi Jain.

A
Passionate software engineer with big data and backend development expertise.

About

I am a highly motivated Computer Science professional with a passion for cutting-edge technologies and problem-solving. I successfully attained my Master's degree in Computer Science from Columbia University. I am actively seeking software developer opportunities to apply my skills and passion for technology in real-world projects.

  • Languages: Java, Scala, SQL, Python, C, C++, HTML/CSS
  • Databases: MySQL, Cassandra, Amazon DynamoDB, Amazon Opensearch
  • Other Tools: AWS, Jenkins, Kafka, Spark, React.js, Spring Boot, GitHub, Hadoop, PIG

Experience

Software Engineer
  • Developed and deployed Spring Boot (Java) applications processing real-time healthcare data through Kafka streams, leveraging Cassandra and MySQL databases and collaborating with cross-functional teams, while ensuring seamless deployment on OpenShift via CI/CD pipelines (Jenkins) for efficient data management.
  • Collaborated with offshore teams to successfully transform PIG modules to Spark-Scala applications, resulting in a remarkable 60% decrease in processing time for healthcare provider data inventory management and introduced automated testing, eliminating the requirement for manual quality assurance.
  • Conducted a proof of concept (POC) on alert mechanisms and dashboards using Splunk and Grafana tools to monitor the performance of Spring Boot applications.
  • Created SQL scripts to support backend logic and shell scripts to execute Spark-Scala, Python, and SQL modules across various environments.
  • Tools: Java, Spring Boot, Kafka, Cassandra, MySQL, Splunk, Grafana, Spark, Scala, SQL
July 2019 - December 2021 | Haryana, IN
PowerGrid Corporation of India Logo

PowerGrid Corporation of India

Power Systems Intern
  • Analyzed and created a report on daily power consumptions of areas managed by substation
  • On the basis of report, proposed a solution to optimize solar energy utilization by attenuating duck curve problem using Particle Swarm Optimisation and networking based on blockchain technology, decreasing wastage by 30%. Programmed on MATLAB and performance of model analyzed using RMSE error
June 2017 - August 2017 | Delhi, IN
Reliance Logo

Reliance Industries Limited

Market Research Intern
  • Conducted a market research on the applications of geotextiles in India
December 2017 - January 2018 | Delhi, IN
PwC Logo

PwC

Software development Intern
  • Constructed a mechanism for automation of income tax return filing system leveraging an in-house software
  • Took initiative to conduct a POC on designing dynamic excel sheets for automation of tasks using excel macros
June 2017 - August 2017 | Delhi, IN

Projects

music streaming app
Dining Concierge Chatbot

Serverless, micro-service driven Chat Bot using AWS services

Accomplishments
  • Tools: OpenSearch, DynamoDB, Amazon Lex, Lambda Functions, Amazon SES, Amazon SQS
  • Goal: Develop a serverless, microservice-driven web application that sends restaurant suggestions given a set of preferences provided to the chatbot
  • Methodology: Managed restaurant data from Yelp API by storing it in DynamoDB table and OpenSearch index. Implemented Amazon Lex with customized intents and utilized a Lambda function (LF1) as a code hook to optimize the bot's responses. Leveraged Lambda function (LF2) to query the DynamoDB table and OpenSearch index, enabling the delivery of personalized restaurant suggestions to users through SES (Simple Email Service)
quiz app
Iterative Set Expansion

Implement the Iterative Set Expansion (ISE) algorithm using SpanBert and GPT-3

Accomplishments
  • Tools: Python(Programming Language), Beautiful Soup, spaCy, SpanBERT, OpenAI GPT-3 API
  • Goal: Implement the Iterative Set Expansion (ISE) algorithm for information extraction from web pages, using a seed query, extraction confidence threshold, and desired number of tuples to extract.
  • Methodology: Implemented text preprocessing techniques for relevant documents, including sentence splitting and entity pair extraction using Spacy. Converted entity pairs into GPT-3 and SpanBERT input format and utilized the spaCy and openAI API to feed input to pre-trained models. Added tuples to the extracted tuples list if the predicted relation matched.
Information Retrieval System
Information Retrieval System

Information retrieval system to improve the search results returned by Google

Accomplishments
  • Tools: Python(Programming Language), NLTK(Natural Language Toolkit, GCP
  • Goal: Develop an information retrieval system that exploits user-provided relevance feedback to improve the search results returned by Google
  • Methodology: Implemented preprocessing techniques for query standardization and Google search results.Developed a query expansion algorithm that computed unigram and bigram-based expansions, incorporating tokenization, stop words removal, and features such as term frequency, inverse document frequency, proximity, and part of speech tagging.
Photo Album Web Application
Photo Album Web Application

Photo Album Web Application using AWS services

Accomplishments
  • Tools: OpenSearch, DynamoDB, Amazon Lex, Lambda Functions, Amazon SES, Amazon SQS
  • Goal: Implement a photo web application that can be searched using natural language through both text and voice Methodology:Implemented a Photo Album Web Application with OpenSearch instance "photos" and S3 bucket for photo storage, using "index-photos" Lambda function for Rekognition-based label detection and OpenSearch indexing. Set up Amazon Lex bot "SearchIntent" for handling natural language search queries and built RESTful API with API Gateway for direct photo uploads and search requests.
Association Rule Mining
Association Rule Mining

Python application to generate rules of interests

Accomplishments
  • Goal: Implemented the Apriori Algorithm for Data Mining to generate association rules of interest from a given dataset. Methodology:Utilized the 311 service requests dataset available on the NYC Open Data site.Cleaned the dataset by removing irrelevant columns, duplicates, and trivial data.Implemented Apriori Algorithm by generating frequent itemsets with support greater than minimum support. Generated high-confidence association rules from the frequent itemsets, considering minimum confidence as the threshold.

Skills

Languages

Databases

`

Other Tools

Education

Columbia University in the City of New York

NY, USA

Degree: Master of Science in Computer Science

    Relevant Courseworks:

    • Cloud Computing and Big Data
    • Advanced Databases
    • Analysis of Algorithms
    • Introduction to Databases
    • User Interface Design

Delhi Technological University

New Delhi, India

Degree: B.Tech in Electrical and Electronics Engineering

    Relevant Courseworks:

    • Data Structures and Algorithms
    • Analysis of Algorithms
    • Introduction to C
    • Numerical Engineering and Optimization Methods

Contact