About Me

During the day I'm a Data Scientist in the making and rest of the time I work on my pet software projects. I am currently pursuing my PhD in Computer Science from North Carolina State University. Before my PhD, I finished my Masters at NC State, did an internship at facebook in 2015 and worked as a Full Stack Software Engineer at CrowdChat. My prime areas of interest are Machine Learning, Algorithms and Web Development. I go for treks when I get a long weekend and I am a massive foodie.

Contact Details

George V Mathew
(614) 535-8678


North Carolina State University

PhD in Computer Science May 2016 - May 2019

I work with Dr Tim Menzies on software engineering problems. I work on estimating software effort, optimizing requirements engineering problems and analyzing trends in software engineering publications.
I am also the teaching assistant for Model Based Automated Software Engineering and I think my students like me :P.

North Carolina State University

Masters in Computer Science August 2014 - May 2016

Part of the AI4SE lab which works on applying Artificial Intelligence techniques on Software Engineering. I am also an active part of the BIGState club where we come up with Machine Learning solutions to different real world problems. I had a summer internship at facebook in 2015. I was funded by NASA for my masters for my work on estimating software effort for their space programs. I graduated with a GPA of 4.0/4.0.

Amrita School Of Engineering

B.Tech in Electronics & Instrumentation July 2008 - May 2012

Graduated with a university silver medal. Topped my department with a GPA of 3.95/4. Apart from working on Electronics and Instrumentation, I also worked on projects in Image processing and Pattern Recognition using Machine Learning approaches. I also developed a 8086 microcontroller based basic home automation system in my junior year. Our senior year project of a digital sphygmomanometer was adjudged as one of the best for the academic year.


North Carolina State University

Research Assistant January 2015 - Present

  • Guided by Dr Tim Menzies, a pioneer in automated software engineering, search based software engineering and data mining in software engineering
  • Software Effort Estimation based on dimensionality synthesis and clusterring techniques. Multiple alternative approaches are also under study like outlier elimination, synthetic data generation and feature weighted estimation
  • Statistical tests to measure correctness of estimation techniques. ANOVA, A12, Bootstrap and Cliffs-Delta are some of the methods adopted
  • Research Papers enroute. Keeping my fingers crossed :-P

LexisNexis Risk

Software Engineering Intern May 2017 - August 2017

  • Implement Gradient Boosting Trees on Enterprise Computing Language(ECL).
  • Benchmark ECL implementation of GB Trees with the implementation of scikit-learn.
  • Develop a common search platform for both legal and academic documents.


Software Engineering Intern May 2015 - August 2015

  • Worked on parsing presto queries and estimating optimal regions to run queries that span across different data centers.
  • Automatic import of missing data across multiple datacenters based on queries.

NCSU Libraries

Student Programmer September 2014 - December 2014

  • Worked on ImageViewer, an online image slideshow. It is built on a Django stack and the front end is powered by jquery and bootstrap
  • Developed a google calendar interface to store the presentation details for Cedar. Cedar is an event scheduler for various displays used in the NCSU libraries. It is built on a Django stack and uses PostGreSQL for storage

CrowdSpots(VDP IT Solutions)

Software Engineer October 2013 - July 2014

  • Developed crowdchat.net, a hash-tag based chat platform. It enables people from different social media networks to communicate on a common hash-tag. It was built using the NodeJS Redis stack. Bootstrap, jQuery and client jade were used to develop the user interface
  • Worked on platform.crowdchat.net, a data analytics platform that helps you connect with people and subscribe to their activities on twitter. The application was built on the Java MySQL stack and successfully scaled to hold more than 250 GB of analyzed data. Full text search was provided using Apache Solr

Payoda Technologies

Software Engineer June 2012 - September 2013

  • Created a REST based module in AppViewX(Product of Payoda Technologies) to add a device into mongoDB (NO-SQL based database).
  • Created an aggregation script on mongoDB using map-reduce to aggregate statistics periodically.
  • Created a topological view to show hierarchy of an application in a device using jQuery
  • Displaying a load balancer based on global coordinates using Google maps API


Trends in Software Engineering

What's cooking in Software Engineering?

  • Topic modeling analysis of the the abstracts and titles from 9291 papers published in 11 top-ranked SE conferences between 1993 to 2013.
  • Analyzing similarities between conferences and topics using hierarchical clustering.
  • Studying change in publication trends in different topics over the years.
  • Influence of the Program Committee in different Software Engineering conferences.

Optimizing Requirements Engineering Models

Its not that hard. Or is it?

  • Optimize for different objective using multi-objective optimization algorithms.
  • Ranking decisions using bayesian confidence ranker.
  • Clustering similar decisions using linear clusterer.
  • Suggesting choices to users on what decisions they should be taking.

Software Effort Estimation

Whats the cost?

  • Modelling software engineering projects.
  • Working with high dimensional data with less observations.
  • Projecting high dimensional data onto lower components using FastMap
  • Using machine learning approaches to estimate on lower dimensions.
  • Optimizing learners using multi-objective algorithms.


Trends in Topics in Software Engineering.
"SHORT"er Reasoning About Larger Requirements Models
Trends in Topics at SE Conferences (Preliminary Version).
Negative Results for Software Effort Estimation.
Improving and Expanding NASA Software Cost Estimation Methods.
NASA Software Cost Estimation Model: An Analogy Based Estimation Method.
Digital Spyghmomanometer.



Optima is a repository consisting of different multi-objective optimization algorithms implemented in python. Numerous test mathematical problems are also implemented. Used by research groups in NCSU and Hokkaido University.


region.io is an online bookmark manager that helps a user save all his favourite webpages, videos and PDF documents and create neat little widgets for him to comeback and view later. It is built on a Node.js and mongoDB stack. The front end templating is done on jade. Full text search is provided using elasticSearch. I've even built a google chrome extension to help the user store bookmarks in a single click.

Sentiment Analysis

We developed a Bayesian based sentiment analyser on twitter feeds. We used the stanford parser to tokenize the words. We used Map-Reduce to aggregate our business logic. We experimented on different preprocessing techniques like stemming and handling emoticons using regular expressions.

Digital Sphygmomanometer

We developed an automated Digital Sphygmomanometer for our senior project requirement. It was built using 4th order non inverting filters to measure it and the blood pressure calculator was built on an 8086 microcontroller chip. We obtained a 95% accuracy.