Real-time Sentiment Analysis for Twitter

Recently, Martin Illecker (one of my master students) finished his studies in information systems. In his master thesis he developed a sentiment analysis approach for tweets which is based on a combination of sentiment lexica and machine learning algorithms. In particular, we make use of Part-of-Speech tagging which delivers important information for the creation of vectors representing the tweets. The vectors are subsequently classified by an SVM for the final sentiment detection. The figure below depicts the workflow behind the approach, please consult the masterthesis for details.

Storm Topology

Storm Topology

By using this approach, we are able to compete with the winners of the SemEval
2013 competition. The best obtained F-value in the year 2013 was 0.69 by the team NRC-Canada, followed by 0.6527 by the team GU-MLT-LT. Our approach is able to reach an F-value of 0.6685 (virtually claiming the second place in the competition).

Besides the quality of the sentiment detection, Martin also focused on the quantity or throughput of the detection algorithm. He therefore used Apache Storm (therefore, he approach is called Senti-Storm) to parallelize all the tasks previously described. We deployed the system on a Amazon c3.8xlarge EC2 instance and were able to detect the sentiment of 27.876 tweets per second using ten nodes. This allows us to perform sentiment detection of more than 2 billion tweets each and every day. This implies that our approach is able to perform real-time detection of the sentiment of tweets while still reaching high values in terms of prediction quality.

 

Further information:

WebScience Conference and Twitter Cybercrime

A couple of weeks ago, I was at the WebScience conference in Bloomington, Indiana. I had submitted a position paper about how to proceed with research on cybercrime and cyberwarfare in the context of Twitter to the Cybercrime / Cyberwarfare workshop which then got accepted.

My paper is titled “Cybercrime on Twitter: Shifting the User Back into Focus”. The main point I am trying to make in this publication is that research on cybercrime and fraud on Twitter has mostly been related to (i) how to detect spam and hacked accounts or (ii) how do cyber-criminals behave. However, the hacked user—as a central element in this field —is mostly neglected in today’s research. Therefore, I propose to shift the focus back onto the user and his or her needs. From my perspective, this requires the following points:

  • Study user behavior: we have to understand users in order to prevent future hacks and toprovide better support mechanisms (e.g., understand how a user’s trust into a social network changes after he or she has been hacked or how a user perceives the risk of being hacked)
  • Support the user: this point is about how to inform and support the user in regards to hacks and is tightly knit to the previous point as it requires a deeper understanding of the user. Supporting the user incorporates both the prevention of hacks due to increased awareness regarding hacks and a better understanding about e.g., how to recapture a hacked account.
  • Get the big picture: we have to not only focus on single aspects of security and user analysis, but to regain an understanding about the user experience and perception as a whole in regards to fraud on Twitter.
  • Get interdisciplinary: in order to see and analyze the user and his behavior from different perspectives, we have to get interdisciplinary. Therefore, we have to get together with psychology, social sciences, data mining experts and also human-
    computer-interaction specialists.
  • Work together: this is all about fostering cooperation 🙂 (e.g., sharing source code, experiences and also data between researchers)

If you are interesting in tackling any of the above points with me—just get in touch 🙂

  • [PDF] E. Zangerle and G. Specht, “Cybercrime on Twitter: Shifting the User Back into Focus,” in Proceedings of the WebScience Cybercrime / Cyberwar Workshop, co-located with WebSci14, 2014.
    [Bibtex]
    @InProceedings{cybercrime14,
    author={Eva Zangerle and G\"unther Specht},
    title = {{Cybercrime on Twitter: Shifting the User Back into Focus}},
    year = {2014},
    booktitle = {Proceedings of the WebScience Cybercrime / Cyberwar Workshop, co-located with WebSci14},
    note = {published online at http://webscience-cybercrime-workshop.blogs.usj.edu.lb/2014/05/25/accepted-presentations/}
    }

Publication News

Yeeeha, our paper “Sorry, I was hacked”—A Classification of Compromised Twitter Accounts has been accepted at the ACM Symposium on Applied Computing 🙂 In particular, it has been accepted for the Social Network and Media Analysis (SONAMA) track. Our work features an analysis of Twitter whose Twitter accounts have been compromised and it aims at analysing how users deal with this account comprimising. I’m really looking forward to presenting our work at Dongguk University, Gyeongju, Korea. If you are interested in this work, I just uploaded the pdf to the publications section or contact me 🙂

Our two journal articles (finally) have been published too. The first article elaborates on how to support and guide users during collaborative content creation at the Journal on Future Generation Computer Systems (impact factor 1.978). The second article is about how text similarity measures influence the quality of hashtag recommendations for tweets and is published in Springer’s Social Network Analysis and Mining Journal.

Furthermore, my dissertation has been featured in Datenbank-Spektrum, the Journal of  the German Computer Society.

  • [DOI] E. Zangerle, “Dissertationen: Leveraging Recommender Systems for the Creation and Maintenance of Structure within Collaborative Social Media Platforms,” Datenbank-Spektrum, vol. 13, iss. 3, p. 239, 2013.
    [Bibtex]
    @article{dbspektrum,
    title = {Dissertationen: Leveraging Recommender Systems for the Creation and Maintenance of Structure within Collaborative Social Media Platforms},
    author = {Eva Zangerle},
    journal = {Datenbank-Spektrum},
    volume = {13},
    number = {3},
    year = {2013},
    pages = {239},
    doi = {10.1007/s13222-013-0138-6},
    }
  • [PDF] [DOI] E. Zangerle, W. Gassler, and G. Specht, “On the impact of text similarity functions on hashtag recommendations in microblogging environments,” Social Network Analysis and Mining, vol. 3, iss. 4, pp. 889-898, 2013.
    [Bibtex]
    @article{snam,
    year={2013},
    issn={1869-5450},
    journal={Social Network Analysis and Mining},
    volume={3},
    number={4},
    doi={10.1007/s13278-013-0108-x},
    title={On the impact of text similarity functions on hashtag recommendations in microblogging environments},
    url={http://dx.doi.org/10.1007/s13278-013-0108-x},
    publisher={Springer Vienna},
    author={Zangerle, Eva and Gassler, Wolfgang and Specht, GĂĽnther},
    pages={889-898},
    language={English},
    note = {(The final publication is available at link.springer.com.)}
    }
  • [PDF] E. Zangerle and G. Specht, ““Sorry, I was hacked"—A Classification of Compromised Twitter Accounts,” in Proceedings of the 29th ACM Symposium on Applied Computing, Gyeongju, Korea, 2014, pp. 587-593.
    [Bibtex]
    @inproceedings{sac14,
    author = {Eva Zangerle and G\"unther Specht},
    title = {{“Sorry, I was hacked"---A Classification of Compromised Twitter Accounts}},
    publisher = {ACM},
    year = {2014},
    booktitle = {Proceedings of the 29th ACM Symposium on Applied Computing},
    address = {Gyeongju, Korea},
    pages = {587--593},
    note = {(acceptance rate: 24%)}
    }
  • [DOI] W. Gassler, E. Zangerle, and G. Specht, “Guided Curation of Semistructured Data in Collaboratively-built Knowledge Bases,” Journal on Future Generation Computer Systems, vol. 31, pp. 111-119, 2014.
    [Bibtex]
    @article{fgcs,
    author = {Wolfgang Gassler and Eva Zangerle and G\"unter Specht},
    title = {{Guided Curation of Semistructured Data in Collaboratively-built Knowledge Bases}},
    journal = {Journal on Future Generation Computer Systems},
    publisher = {Elsevier Science Publishers},
    year = {2014},
    note = {impact factor 1.978.},
    url = {http://www.sciencedirect.com/science/article/pii/S0167739X13001076},
    pages = {111-119},
    volume = {31},
    doi = {10.1016/j.future.2013.05.008},
    }

Die zweite überarbeitete Auflage ist da – jetzt mit NoSQL-Teil!

Die zweite überarbeitete Auflage ist da – jetzt mit NoSQL-Teil!

Die zweite ĂĽberarbeitete Auflage: MySQL 5.6 - Das umfassende Handbuch

Die zweite überarbeitete Auflage: MySQL 5.6 – Das umfassende Handbuch

Nach einer intensiven Überarbeitungsphase ist es endlich soweit! Wir freuen uns, die zweite Auflage unseres Buches zur aktuellsten MySQL-Version 5.6 präsentieren zu können! Natürlich haben wir neben vielen kleinen Anpassungen auch alle neuen Features der MySQL-Version 5.6, wie zum Beispiel die topaktuelle NoSQL-Schnittstelle oder der neue Volltextindex der InnoDB-Engine, behandelt.

Weitere Informationen bei Galileo oder Amazon

 

In der zweiten Ausgabe lesen Sie neu:

  • wie Sie MySQL effizient und performant ĂĽber die NoSQL-Schnittstelle bedienen
  • wie ein Volltext-Index nun auch (endlich) fĂĽr InnoDB-Tabellen zur Textsuche eingesetzt werden kann
  • wie Sie mit serverseitigem JavaScript ĂĽber Node.js MySQL einsetzen können
  • wie Sie ĂĽber die neuen Sicherheits-Features Ihr System noch besser und komfortabler absichern können
  • wie Sie ĂĽber das erweiterte Performance-Schema Performance-Bremsen in Ihrem System aufspĂĽren und den Turbo zĂĽnden können

News

Just added two new publications on the publications page:

  • Eva Zangerle, Wolfgang Gassler, and GĂĽnther Specht. On the impact of text similarity functions on hashtag recommendations in microblogging environments. Social Network Analysis and Mining, 2013. to appear.
  • Wolfgang Gassler, Eva Zangerle, and GĂĽnter Specht. Guided Curation of Semistructured Data in Collaboratively-built Knowledge Bases. Journal on Future Generation Computer Systems, 2013. impact factor 1.978, to appear.