About Me

I am a Ph.D. student in the Department of Computer Science at North Carolina State University with Dr. Kathryn Stolee. Before my PhD, I finished my Masters at NC State, did an internship at facebook in 2015 and worked as a Full Stack Software Engineer at CrowdChat. My prime areas of interest are Search Based Software Engineering, Programming Languages, Machine Learning and Web Development.

Contact Details

George Mathew
(614) 535-8678
george <dot> meg91 <at> gmail <dot> com


  North Carolina State University

PhD in Computer Science May 2016 - May 2019

I work with Dr. Kathryn Stolee on cross-language studying source code similarity and its applications. We demonstrated the use of dynamic source code similarity measures in clone detection across static and dynamic typed programming languages.
Earlier, I worked with Dr. Tim Menzies on software engineering problems. We worked on estimating software effort, optimizing requirements engineering problems and analyzing trends in software engineering publications.

  North Carolina State University

Masters in Computer Science August 2014 - May 2016

Part of the AI4SE lab which works on applying Artificial Intelligence techniques on Software Engineering. I am also an active part of the BIGState club where we come up with Machine Learning solutions to different real world problems. I had a summer internship at facebook in 2015. I was funded by NASA for my masters for my work on estimating software effort for their space programs. I graduated with a GPA of 4.0/4.0.

  Amrita School Of Engineering

B.Tech in Electronics & Instrumentation July 2008 - May 2012

Graduated with a university silver medal. Topped my department with a GPA of 3.95/4. Apart from working on Electronics and Instrumentation, I also worked on projects in Image processing and Pattern Recognition using Machine Learning approaches. I also developed a 8086 microcontroller based basic home automation system in my junior year. Our senior year project of a digital sphygmomanometer was adjudged as one of the best for the academic year.


  North Carolina State University

Research Assistant January 2018 - Present

  • Guided by Dr. Kathryn Stolee, a top researcher making innovative contributions to our knowledge of program analysis, human aspects of software engineering, and their intersection.
  • We use dynamic source code similarity based on dynamic profiling of segmented source code to develop a language agnostic approach to detect similar code. More details in our ICSE 2020 presentation.
  • Limitations of dynamic profiling like scalability and low recall is overcome by augmenting with static similarity approaches based on context and structure of source code.
  • Research papers are under review and enroute.

  North Carolina State University

Research Assistant January 2015 - December 2017

  • Guided by Dr. Tim Menzies, a pioneer in automated software engineering, search based software engineering and data mining in software engineering
  • Software Effort Estimation based on dimensionality synthesis and clusterring techniques. Multiple alternative approaches are also under study like outlier elimination, synthetic data generation and feature weighted estimation
  • Statistical tests to measure correctness of estimation techniques. ANOVA, A12, Bootstrap and Cliffs-Delta are some of the methods adopted



Software Engineering Intern May 2019 - August 2019

  • Developed policies that can automatically identify and ban bad actors on WhatsApp. This was based on existing signals or new signals monitored from the platform.
  • Create a dashboard highlighting periodically the bad actors and their actions based on different signals.
  • Deployment of these services across different data centers and monitor their progress.


Software Engineering Intern May 2018 - August 2018

  • Optimizing selection of projects for MacOS builds for Microsoft Office based on usage and history. This resulted in faster setup prior to build and efficient utilization of space.
  • Implement logging framework to profile different modules of the selection and build framework.
  • Develop dashboards for telemetry of the build process in PowerBI.

  LexisNexis Risk

Software Engineering Intern May 2017 - August 2017

  • Implement Gradient Boosting Trees on Enterprise Computing Language(ECL).
  • Benchmark ECL implementation of GB Trees with the implementation of scikit-learn.
  • Develop a common search platform for both legal and academic documents.
  • Research article was published in Elsevier Connect.
  • Research poster was adjudged in the Top-3 best posters.


Software Engineering Intern May 2015 - August 2015

  • Worked on parsing presto queries and estimating optimal regions to run queries that span across different data centers.
  • Automatic import of missing data across multiple datacenters based on queries.

  NCSU Libraries

Student Programmer September 2014 - December 2014

  • Worked on ImageViewer, an online image slideshow. It is built on a Django stack and the front end is powered by jquery and bootstrap
  • Developed a google calendar interface to store the presentation details for Cedar. Cedar is an event scheduler for various displays used in the NCSU libraries. It is built on a Django stack and uses PostGreSQL for storage


  CrowdSpots (VDP IT Solutions)

Software Engineer October 2013 - July 2014

  • Developed crowdchat.net, a hash-tag based chat platform. It enables people from different social media networks to communicate on a common hash-tag. It was built using the NodeJS Redis stack. Bootstrap, jQuery and client jade were used to develop the user interface
  • Worked on platform.crowdchat.net, a data analytics platform that helps you connect with people and subscribe to their activities on twitter. The application was built on the Java MySQL stack and successfully scaled to hold more than 250 GB of analyzed data. Full text search was provided using Apache Solr

  Payoda Technologies

Software Engineer June 2012 - September 2013

  • Created a REST based module in AppViewX to add a device into mongoDB.
  • Created an aggregation script on mongoDB using map-reduce to aggregate statistics periodically.
  • Created a topological view to show hierarchy of an application in a device using jQuery
  • Displaying a load balancer based on global coordinates using Google maps API


  Cross Language Source Code Similarity

How can I learn from code across languages?

  • We developed SLACC, the first technique to detect cross language code clones across static and dynamic typed languages.
  • SLACC is fully automated and it does not require annotations or manual effort such as seeding test inputs or input files.
  • We are currently working on combining static and dynamic similarity measures for code search.
  • Supported languages include Java, Python, Haskell and R.
  • Publication in International Conference in SE 2020.

  Trends in Software Engineering

What's cooking in Software Engineering?

  • Topic modeling analysis of the the abstracts and titles from 9291 papers published in 11 top-ranked SE conferences between 1993 to 2013.
  • Analyzing similarities between conferences and topics using hierarchical clustering.
  • Studying change in publication trends in different topics over the years.
  • Influence of the Program Committee in different Software Engineering conferences.
  • Publications in International Conference in SE 2017, IEEE Transactions of SE 2018 and Journal First at Automated SE 2019.

  Optimizing Requirements Engineering Models

Its not that hard. Or is it?

  • Optimize for different objective using multi-objective optimization algorithms.
  • Ranking decisions using bayesian confidence ranker.
  • Clustering similar decisions using linear clusterer.
  • Suggesting choices to users on what decisions they should be taking.
  • Publications in IEEE Requirement Engineering 2017 and ICSAW 2017.

  Software Effort Estimation

Whats the cost?

  • Modelling software engineering projects.
  • Working with high dimensional data with less observations.
  • Projecting high dimensional data onto lower components using FastMap
  • Using machine learning approaches to estimate on lower dimensions.
  • Optimizing learners using multi-objective algorithms.
  • Publications in Empirical Software Engineering 2017, AAAI 2016 and NASA 2015


SLACC: Simion-based Language Agnostic Code Clones
George Mathew, Chris Parnin, Kathryn Stolee
Software Engineering’s Top Topics, Trends, and Researchers
George Mathew, Tim Menzies
Finding trends in software research
George Mathew, Amritanshu Agrawal, Tim Menzies
Better Metrics for Ranking SE Researchers
George Mathew, Tim Menzies
Data-driven search-based software engineering
Vivek Nair, Amritanshu Agrawal, Jianfeng Chen, Wei Fu, George Mathew, Tim Menzies, Leandro Minku, Markus Wagner, Zhe Yu
Hyperparameter optimization for effort estimation
Tianpei Xia, Rahul Krishna, Jianfeng Chen, George Mathew, Xipeng Shen, Tim Menzies
Negative Results for Software Effort Estimation
Tim Menzies, Ye Yang, George Mathew, Barry Boehm, Jarius Hihn
Trends in Topics at SE Conferences (1993-2013)
George Mathew, Amritanshu Agrawal, Tim Menzies
Using stakeholder preferences to make better architecture decisions
Neil Ernst, John Klein, George Mathew, Tim Menzies
The NASA analogy software cost model: A web-based cost analysis tool
Jairus Hihn, Michael Saing, Elinor Huntington, James Johnson, Tim Menzies, George Mathew
"SHORT"er Reasoning About Larger Requirements Models
George Mathew, Tim Menzies, Neil Ernst, John Klein
Impacts of Bad ESP (Early Size Predictions) on Software Effort Estimation
George Mathew, Tim Menzies, Jarius Hihn
Improving and Expanding NASA Software Cost Estimation Methods
Jairus Hihn, Leora Juster, James Johnson, Tim Menzies, George Mathew
NASA Software Cost Estimation Model: An Analogy Based Estimation Method
Jairus Hihn, Leora Juster, Tim Menzies, George Mathew, James Johnson
Digital Spyghmomanometer
George Mathew, KG Praveen, P Pravin M Sandeep , Sharanya Ramesh, L Vidhya



Optima is a repository consisting of different multi-objective optimization algorithms implemented in python. Numerous test mathematical problems are also implemented. Used by research groups in NCSU and Hokkaido University.


region.io is an online bookmark manager that helps a user save all his favourite webpages, videos and PDF documents and create neat little widgets for him to comeback and view later. It is built on a Node.js and mongoDB stack. The front end templating is done on jade. Full text search is provided using elasticSearch. I've even built a google chrome extension to help the user store bookmarks in a single click.

Sentiment Analysis

We developed a Bayesian based sentiment analyser on twitter feeds. We used the stanford parser to tokenize the words. We used Map-Reduce to aggregate our business logic. We experimented on different preprocessing techniques like stemming and handling emoticons using regular expressions.

Digital Sphygmomanometer

We developed an automated Digital Sphygmomanometer for our senior project requirement. It was built using 4th order non inverting filters to measure it and the blood pressure calculator was built on an 8086 microcontroller chip. We obtained a 95% accuracy.