Network Data Science Lab

AS Relationship Inference

Akmal Khan, Usama Ejaz, Kim Hyun-chul

To study the different aspects of the Internet, it is necessary to have basic knowledge of internet topology and the business relationships between Autonomous Systems (ASes). The organizations do not declare the contractual agreements of AS relationships between ASes publicly. Usually, there are three basic different relationships among Ases: Customer-to-Provider(C2P), Peer-to-Peer(P2P), and Siblings-to-Siblings (S2S). Furthermore, complex relationships such as hybrid relationships as well as partial transit relationships. Many different algorithms have been presented for the last two decades with some good accuracy or some limitations. Most of them are based on heuristic algorithms and a couple of them are based on Deep learning and machine learning. We're working on working on AS relationship Inference.

Automated OSS Security Related Text Classification

Peter Shin, Muhmmad Ali Hamza, Han Jong-woo, Lee So-hyun, Kim Hyun-chul

The lack of reliable sources of detailed information on the vulnerabilities in open-source software is a one of the biggest issue to maintain a secure software supply chain and an effective vulnerability management process. Large organizations have legacy software systems with dependent software libraries. Tracking these dependencies and identifying newly discovered vulnerabilities in legacy systems is technically challenging and costly. The goal of this research project is to shows what parts of a software developer's code are vulnerable allowing the developer to quickly repair or replace the vulnerabilities. Our aim is to detect and track security related bugs, applying patches and version mapping. We are using advanced AI (NLP) algorithms for automatically classification.

Explainable Hate Speech Detection Chatbot System

Lee So-hyun, Kim Jung-in, Kim Hyun-chul

With the development of social media, hate and discrimination are rampant in the world. Hate speech detection is one of the most important problem which has been studied so far in Natural Language Processing research. In the previous work, the research has been focused only on detecting hate speeches. However, clarifying the reason why deep neural network detects a sentence or document as a hate speech is essential when applying it in practice, such as regularization of hate speechin online. Therefore, we build a hate speech detection model by applying explainable artificial intelligence. In addition, we propose an AI chatbot plug-in system for detecting hate speeches in an explainable manner.

Crowdfunding Scam Project Detection (link)

Lee Seung-hun, Wafa Shafqat, Sherish Malik, Kim Hyun-chul

Crowdfunding sites with recent explosive growth are equally attractive platforms for swindlers or scammers. Though the growing number of articles on crowdfunding scams indicate that the fraud threats are accelerating, there has been little knowledge on the scamming practices and patterns. The key contribution of this research is to discover the hidden clues in the text by exploring linguistic features to distinguish scam campaigns from non-scams

Crowdfunding Success Prediction(link)

Lee Seung-hun, Lee Kang-hee, Kim Hyun-chul

Despite the huge success of crowdfunding platforms, the average project success rate is 41%, and it has been decreasing. Hence, finding out the factors that lead to successful fundraising and predicting the probability of success for a project has been one of the most important challenges in the crowdfunding. This work is the first attempt to use in-band project content - text - data only, contained in all the Campaign, Updates, and Comments sections of a crowdfunding project (not in combination with any other out-of-band project metadata or statistically-derived numeric features), for success prediction. By adopting (i) the sequence to sequence (seq2seq) deep neural network model with sentence-level attention and (ii) Hierarchical Attention-based Network (HAN) model, we demonstrate that our proposed model achieves the state-of-the-art performance in predicting success of campaigns, as much as 89-91%. We also show that our method achieves 76% accuracy on average on the very first day of project launch, using campaign main text data only.

Traffic Classification using Deep Learning(link)

Lee Seung-hun, Lee Kang-hee, Kim Hyun-chul

As Deep Learning (DL) algorithms have rapidly become a methodology of choice in various domains, they have recently entered also the field of the Internet traffic classification, successfully demonstrating impressive results. Most of the research work up to this point has focused on improving the accuracy of classification systems, yet there has been little attempt to provide (i) systematic comparison of the various DL algorithms used and (ii) analysis on where the higher accuracy come from, particularly when comparing with the traditional machine learning algorithms like C4.5. To fill this gap, we conduct experiments with four DL algorithms proposed for traffic classification, including CNN, LSTM, Stacked Auto-Encoder (SAE), and Hierarchical Attention Networks (HAN). Further, we propose to leverage and visualize hierarchical attention layers to highlight which parts of the traffic packet traces were most informative for accurate classification, which provides hints about why (and how) DL algorithms achieve the state-of-the-art level high accuracy. We view this paper as the first step towards answering the aforementioned "why" question, which is critical in understanding the real benefit and contribution of deep learning to the field of the Internet traffic classification, and advancing its state-of-the-art.