Industry-relevant data comes in many forms, from structured sources (e.g., databases) and unstructured sources (e.g., natural language documents intended to be read by humans). Having access to all this data is only as useful as the methods one has for evaluating and using this data to make decisions.
This research program employs natural language processing, machine learning, and statistics in order to extract as much information as possible from both unstructured and structured data sources. It will in particular focus on developing novel approaches for extracting information from data while taking into accounts its structure and semantics.