Design and implementation of a component-based distributed system for text mining in social networks

dc.contributor.advisorMahmoud, Qusay H.
dc.contributor.authorHuang, Yu
dc.date.accessioned2016-12-14T20:32:06Z
dc.date.accessioned2022-03-29T19:59:31Z
dc.date.available2016-12-14T20:32:06Z
dc.date.available2022-03-29T19:59:31Z
dc.date.issued2016
dc.degree.disciplineElectrical and Computer Engineeringen
dc.degree.levelMaster of Engineering (MEng)en
dc.description.abstractThis report presents the design and implementation of a component-based distributed system for text mining in social networks. The system consists of three main types of components, data collection, data processing and data visualization. Three possible frameworks explore simple linear architecture, message feedback architecture, Kafka centric architecture and provide implementations of them. The final system adopts Kafka-centric architecture in which all components are connected through Kafka brokers. In terms of functionality, data collection components are responsible for collecting data from Twitter and producing messages to Kafka brokers. Data processing components contain a series of basic text mining topologies. Based on JavaScript libraries, data visualization is presented on web pages and allows users to interact with graphs and charts. In order to improve the scalability and performance of text mining, the project selects Apache Storm framework to implement data processing components. In this report, we evaluate the availability of Kafka and Storm, the rates of data collection components and the performance of data processing components. The experimental results demonstrate our system is available and scalable, and the component-based structure of this system enables it to be extended easily.en
dc.description.sponsorshipUniversity of Ontario Institute of Technologyen
dc.identifier.urihttps://hdl.handle.net/10155/690
dc.language.isoenen
dc.subjectText miningen
dc.subjectSocial networksen
dc.subjectDistributed systemen
dc.subjectApache Stormen
dc.subjectApache Kafkaen
dc.titleDesign and implementation of a component-based distributed system for text mining in social networksen
dc.typeMaster's Projecten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Huang_Yu.pdf
Size:
1.66 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.61 KB
Format:
Plain Text
Description: