Browsing by Author "Maraj, Amit"

Now showing 1 - 2 of 2

Contextual topics: advancing text segmentation through pre-trained models and contextual keywords
(2024-09-01) Maraj, Amit; Vargas Martin, Miguel; Makrehchi, Masoud
Text Segmentation (TS) is a Natural Language Processing based task that is aimed to divide paragraphs and bodies of text into topical, semantically aligned blocks of text. This can play an important role in creating structured, searchable text-based representations after digitizing paper-based documents. Traditionally, TS has been approached with sub-optimal feature engineering efforts and heuristic modelling. In this work, we explore novel supervised training procedures with a labeled text corpus along with a neural Deep Learning model for improved predictions. Results are evaluated with the Pĸ and WindowDiff metrics and show performance improvements beyond any previous unsupervised TS systems evaluated on similar datasets. The proposed system utilizes Bidirectional Encoder Representations from Transformers (BERT) as an encoding mechanism, which feeds to several downstream layers with a final classification output layer, and even shows promise for improved results with future iterations of BERT. It is also found that infusing sentence embeddings with unsupervised features, such as the ones gathered from Latent Dirichlet Allocation (LDA), provides comparable results to current state-of-the-art (SOTA) TS systems. In addition to this, unsupervised features derived from LDA give the proposed system the ability to generalize better than previous supervised systems in the space. Furthermore, it is shown that with the use of novel language models such as Generative Pre-trained Transformers (GPT) for text augmentation, training data can be multiplied, while continuing to see performance improvements. Although the proposed systems are supervised in nature, they have the capability of fine-tuning a threshold variable that allows the system to predict segments more frequently or sparingly, further bolstering the practical usability of it. Due to the increasing competition in the supervised TS space, creating competitive systems often see contributions from larger research companies with more available resources (e.g., Google, Meta, etc.). However, unsupervised TS has been relatively unexplored in comparison with supervised efforts, since it is much more challenging to build a generalizable TS system. To this end, strong word and sentence embeddings are used to create an unsupervised TS system called “Coherence”, that blends the best of pre-trained models and unsupervised features to create a system that is capable of generalizing across various datasets, while achieving competitive results in the space. Since Coherence is unsupervised, inference is quick and requires no upfront investment (i.e., this technique can be picked up and applied to a domain without the need for fine-tuning).
Do extroverts create stronger passwords?
(2018-04-01) Maraj, Amit; Vargas Martin, Miguel
We investigate the relationship between personality types and the strength of created and selected passwords. For this purpose, we conducted an experiment on Amazon’s Mechanical Turk, with 510 participants. Participants were given a pre-questionnaire that included, among others, three binary questions: “Password Awareness”, “Security Training” and “Account Hijacking”, which were used to predict participants’ exposure to passwords in the past. Our results suggest that participants with higher levels of Extroversion, tend to create stronger passwords, if they were not required to change an online account password in the past (e.g., due to a security incident). In contrast, participants with lower levels of Extroversion tend to create stronger passwords (though not significantly), if they had been required to change an online account password in the past. These results indicate that there is a distinct relationship between the Extroversion personality dimension and the way we create passwords, whether it be in a familiar situation or not. Though password strength, as investigated, is the criterion of the aforementioned tests, it is worth mentioning that Extroversion cannot be deemed a predictor in this domain. We also investigated the relationship between personality and several password characteristics such as the total length, letters, digits, and symbols used within a password. To this end, we note that for participants who have had to change an online account password for the first time, Extroversion was directly correlated with creating and selecting shorter passwords, Openness was directly correlated with creating passwords containing fewer letters, but more numbers and symbols, and Conscientiousness was directly correlated with creating passwords containing fewer symbols. These results conclude that there is a distinct correlation between the construction of passwords and personality when participants are required to change an online account password for the first time. This thesis presents the detailed observations and findings from our experiment, discuss potential considerations for contradictions, and identify related future research.