Hi, I‘m Clarissa!

DATA ENGINEER | SOFTWARE ENGINEER | AWS SOLUTIONS ARCHITECT

About Clarissa Sobota

I am a versatile and adaptable engineer ​with education and experience in the fields ​of software engineering, IT, data analytics, ​and civil engineering. I am a life-long learner ​and at any given time I am working on a ​new degree or certification.


My interests include: data analytics and ​engineering, software development, ​process improvement, cloud solutions ​architecture, and developing software/data ​tools.

EDUCATION

Circle
Mortarboard Icon

University of Nevada, Reno

Bachelor of Science

Civil Engineering

2006

Circle
Mortarboard Icon

No​rwich University

Master of Civil Engineering

Structural Engineering

2011

Circle
Mortarboard Icon

De​Paul University

Master of Science

Software Engineering

2016

Circle
Mortarboard Icon

University of Illinois

U​rbana - Champaign

Ma​ster of Business Administration

2021

Circle
Mortarboard Icon

Co​lorado State University

Master of Science

Machine Learning and AI

2025 (projected)

CERTIFICATIONS

Udacity Nanodegree

Data Engineering

Issued 2023​

AWS Associate

SysOp​s Administrator

Issued 2017

ID: WN3KDETCHE4112WR

AWS Associate

Developer

Issued 2019

ID: JN9S2EJ2BJFQ1Y32

AWS Associate

Solutions​ Architect

Issued 2019

ID: 39LTXVD2H1Q4QW96​

Astronomer Certification ​Apache Airflow

Issued 2021

Databricks Associate De​v Apache​ Spark 3.0

Issued 2021

Credential ID: 37123361

MIT

Big Data and Soci​al Analytics

Issued 2016​

IBM​

Big Data Foundations - L1

Issued 2021​

UIUC Specialization

Value Chain Management

Issued 2021​

UIUC Specialization

Business Analytics

Issued 2022​

UIUC Specialization

Digital Marketing

Issued 2022​

NASM

Personal Train​er Certification

Issued 2022​

Credential ID: 1220944119

PERSONAL PROJECTS

Ducky Counter

(Windows Store - 2013)

Counting game geared towards helping ​toddlers learn how to count.

Ducky Block Blaster

(Windows Store - 2013)

Game geared towards helping kids ​practice basic addition and subtraction

Ducky Number Matcher

(Windows Store - 2013)

Matching game geared towards helping ​kids with numbers and counting.

Prime Number Learner

(Windows Store - 2013)

Game geared towards helping kids with ​prime numbers.

Duck Juggler

(Windows Store - 2013)

Game where you control a circus seal and ​earn a living by juggling ducks!

Ducky Multiplication

(Windows Store - 2013)

Math game geared towards helping ​kids learn multiplication up to 9x9.

Ducky Color Counter

(Windows Store - 2013)

Counting game geared towards helping ​kids learn how to count.

Duck Dodger

(Windows Store - 2013)

You are a rubber duck swimming down a ​river and you have to avoid obstacles.

Ducky Math

(Windows Store - 2013)

Math game geared towards helping kids ​learn addition and subtraction.

Kiddie Counting

(Amazon Store - 2015)

Game designed to help toddlers practice ​cou​nting from 1-10.

Macro Master

(Amazon PartyRock - 2024)

App that uses Generative AI to help users ​create nutrition macros.

Professional Projects

PDF Text Extraction Pipeline

Text from PDFs containing possible PHI uploaded to an S3 bucket is extracted and saved to .txt files to be processed on an EC2.

The components of this project were all built in AWS using S3, EventBridge, Lambda, SQS, SNS, Textract, and EC2. Main features ​include:

  • Lambda is scheduled to check the client S3 bucket daily for new files. If new files exist, the file path is pushed to an SQS queue.
  • Lambda is scheduled to check the queue for new files and then sends the new files to be processed in Textract.
  • When Texract completes, a notification is sent to SNS with the Textract Job ID. This kicks off the Lambda to check the new file ​queue again to process new files and additionally pushes to a new queue to extract the Textract results.
  • Lambda is scheduled to check Textract results queue and extract the results to a text file to be saved into S3.
  • A cron job on the EC2 checks the S3 for new result files and copies from the S3 to the EC2 for processing.

While the above pipeline works with daily schedules, it could be outfitted to work with event notifications instead.

Data Transformation Pipeline

Data transformation project in which data was extracted from a database, transformed, and exported into Athena as well as ​customer-ready files. The primary components of this project include AWS services (EMR, Athena, S3, Glue, MWAA / AirFlow), and ​Spark. Main features include:

  • Daily extraction of data from Postgres database: Using a scheduled AirFlow job, the specified tables were extracted from a ​Postgres database and saved as parquet files in S3.
  • Transformation of extracted data: Using manually triggered AirFlow DAGs, the S3 data was transformed to meet specific ​business rules.
  • Loading of data: As part of the manually triggered AirFlow DAG, the transformed data was loaded into Athena tables and was ​also exported to S3 as parquet files.
  • Dynamic DAG creation: The end users desired to have the abilty specify the categories and datasets that could be run. The user ​input is read in as part of the DAG being triggered and the tasks are generated through the use of python.

Traffic Data Reporting

Automated creation of graphs and trends revolving around speed/volume traffic data. Main projects include:

  • Comparison of speed data provided by different data sources. For this project, I extracted several weeks of data from two ​PostgreSQL databases and generated comparison graphs for the peak hours of each day.
  • Analysis of speed camera data in order to analyze traffic trends. The goal of the project was to try to correlate morning traffic ​trends to predict afternoon traffic trends. I extracted several months of traffic data from several cameras stored in PostgreSQL ​databases and created reports using graphs generated with Python.
  • Analysis of bus data extracted from both Oracle and MongoDB data sources. The goal was to compare the timestamps to ​calculate data writing latency as well as determining if there is mis-matching data between the data sources.
  • Monitoring of traffic events, device inventory, traffic speed/volume evaluation, user interactions, and device status using ​Tableau, BIRST, Crystal Reports, Grafana, and SAP Business Intelligence.

E-Learning Site

The company had an E-Learning site which was developed using Moodle and was architected using EC2, S3, CloudFront, RDS ​(PostgreSQL), Elastic Load Balancer, EBS, and Route 53. When we updated the site to HTTPS, the course files, which were SCORM ​files, became slow and had extended load times. To overcome this, I developed a way, similar to SCORM Cloud, that could serve ​SCORM files through CloudFront. The files were stored in S3 and were served through CloudFront using expiring signed URLs ​generated with PHP within the site.

Process Improvement Through VBA in Excel

Creation of several Excel macros and work sheets utilizing VBA. Notable projects include:

  • Math model management tool: The purpose of this tool was to automate the creation of c++ header files using values extracted ​from Excel worksheets. Oftentimes, math models for slot games would span across upwards of a hundred worksheets and if ​done manually could take several days to set up in a game simulator. This tool reduced the creation time to about 10 minutes.
  • Game simulation test comparisons: The purpose of this tool was to iterate through folders of game simulation results ​(sometimes hundreds of files), open each of those files, copy the data, and paste the data into a single spreadsheet. The tool ​would also format the data and would perform statistical calculations on the data. This tool reduced the comparison worksheet ​creation time from several hours to about 10 minutes.
  • Engineering comments parser: The purpose of this tool was to iterate through folders that contained several Excel worksheets ​containing comments pertaining to engineering projects. This tool would open each of these files, copy and paste the data into ​a master workbook, sort the comments into different tabs detailing the related engineering discipline, sort the data by comment ​source, and then format the master workbook.

Automated Slot Game Testing

Creation of a tool that generates game input and expected results from a math model into an XML file. That file is then loaded into ​another tool which runs on the same server that the simulated game is built. When the game simulation is launched, the values from ​the XML are loaded and the results from the game are compared against the values expected. A text file is then generated outlining ​if test case passed, or failed. These files are then parsed by one of the Excel tools listed below.

Feel free to reach out. I’m ​always open to chatting ​about data!

Circle
linkedin icon
Circle
Drafts Icon

li​nkedin

em​ail

clarissa.sobota@gmail.com