isense logo

Health and Behavior

Deep Learning for Unstructured Data Analytics and Mining

Led by Xingquan Zhu, Ph.D.

Xingquan Zhu is a Full Professor in the Department of Electrical Engineering and Computer Science, Florida Atlantic University. His research interests mainly include artificial intelligence, machine learning, and bioinformatics. Since 2000, he has published over 300 refereed journal and conference papers in these fields. Dr. Zhu is an IEEE Fellow, Class of 2023, for contributions to data mining for big data analytics and network representation learning.

PROJECT

Natural languages and texts are common in many applications, such as system event logs, social media posts, medical reports, etc. Analyzing such unstructured data is essentially challenging, because traditionally, researchers manually devise and implement features based on what they expect to be of use. This approach has led to modest improvements in performance, as seen with the inclusion of domain specific features such as emoticons in tweets. In this research project, we propose developing and applying deep learning algorithms to massive text data sources to uncover complex linguistic structures and hierarchies underlying the text, thus enabling us to learn a better representation of tweet sentiment data and train better sentiment classifiers. Once trained, these classifiers can be applied to a wide range of social media research topics including election prediction and public opinion polling. Additionally, these techniques can be applied to other text classification domains.

The goal of using deep learning for text classification is to learn high level, abstract ideas from low level, minimally preprocessed text. The primary challenges of this endeavor are volume of data, quality of data and computational costs. Since complex relations must be learned from low-level representations, large volumes of relevant training data are required to ensure enough occurrences of different relationships exist for their patterns to be learned. Deep learning is also sensitive to poor data quality and will learn any bias inherent to the data. Additionally, deep neural networks can contain tens of millions of trainable parameters. Optimization algorithms to tune these parameters are extremely computationally expensive in both the required number of calculations and memory use.