Unified processing of natural language and relational data

Date

2022-09-01

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This work outlines a method for performing natural language tasks as part of a relational framework. Utilizing features of PostgreSQL as a relational database and its extensibility to allow for word embedding without leaving the relational database. This system can be extended to incorporate several natural language processing (NLP) techniques, such as latent Dirichlet allocations(LDA) or modern models, such as BERT. The combination of NLP and relational operations allows for extracting data from and analyzing text in the same interface used for general data analysis. This combination allows for gathering richer information from existing sources and makes it all available from one standard interface. The declarative nature of SQL allows for more ad-hoc application of NLP techniques. Two case studies using the DBLP dataset demonstrate this integration’s power. Building an LDA model, augmenting the topic labels for greater descriptiveness, and applying preexisting models for semantic analysis.

Description

Keywords

Query language, Database, Natural language processing, Embedding vectors, Text processing

Citation