Loading…
This event has ended. Visit the official site or create your own event on Sched.
THE BIG FESTIVAL ABOUT SMALL CITIES
Tom Tom champions civic innovation, creativity, and entrepreneurship in America’s hometowns.

[Back to Tom Tom Festival]
Back To Schedule
Thursday, April 11 • 10:30am - 11:05am
Featured Student Research Lightning Talks: Collective Biographies of Women - A Deep Learning Approach to Paragraph Annotation

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
The Collective Biographies of Women Project (CBW) investigates cultural representations of women through a large corpus of British and American biographical texts from the nineteenth and twentieth centuries. The texts belong to the collective biography genre, with volumes containing several chapter-length biographies of different women organized around a common theme. The CBW project is supported by Dr. Alison Booth, Director of Scholars Lab at the University of Virginia
CBW seeks to annotate these biographies at the paragraph level using a controlled vocabulary to label each paragraph according to a set of literary-critical dimensions. These dimensions are defined in a controlled vocabulary known as BESS, Biographical Elements and Structure Schema. The BESS vocabulary consists of tags such as - Stage Of life, Persona, Event, Topos, Discourse. The BESS tag Event has labels such as marriage, birth, death. Ultimately, the goal of the project is to develop a complete annotated corpus drawn from 1,270 known books, comprising around 13,000 chapters of about 8,000 women. 
Each paragraph in the biography will be classified with its corresponding BESS annotation. Textual features like Bag of Words, TF-IDFs and linguistic features like semantic and syntactic parameters among others will be used as the model features. Our initial approach to classification will be using a variety of Machine Learning models like Logistic Regression, Tree-Based Models and SVMs. The results of the above model will be considered as our baseline result for our deep learning results.
With the baseline scores from Machine Learning models, the biographiles are annotated using a Recurrent Neural Network approach. Multi-layered bidirectional LSTM based model will be used to understand the context and theme of a paragraph and identify the corresponding BESS annotation. Every word will be initiated with their predefined GloVe embeddings which will further trained to get their meaning aligned with the context of biographies. Different architectures with be tried and tested before selecting the best one suited to this use case. Especially because biographies have not been worked upon a lot in Natural Language Processing, it should be a challenging task arriving at an optimal architecture.
The next objective is to find common events associated with a women in each biography. This is done by drawing parallels with Market Basket Analysis. Just like in Market Basket Analysis where there is some % of probability of the presence of an item in a cart given another item is present, here a cart is represented by a paragraph and the items are words in a paragraph. So, all words (non-trivial word) with higher probability associated with each woman is identified. This is useful in getting a quick gist/summary of the life events of a woman. Thus, with the above objectives, we identify what each paragraph in a biography is talking about and at the same time what are the important life events associated with every woman.

You need this ticket from Eventbrite to sign up: Applied Machine Learning Conference.

Speakers
avatar for Sakshi Jawarani

Sakshi Jawarani

Graduate Student, University of Virginia, Data Science Institute
I'm currently pursuing a Graduate degree in Data Science at the University Of Virginia. I look forward to using the technical competencies gained through my program and apply it to real world business problems. I believe like the human brain, data too has limitless potential and as... Read More →
avatar for Murugesan Ramakrishnan

Murugesan Ramakrishnan

Graduate Student, Data Science Institute, University of Virginia
Murugesan is a current Graduate student at the Data Science Institute, University of Virginia. His research interest lies in solving problems in Natural Language Processing and Computer Vision using Deep Learning techniques, He also has a considerable experience in consulting solving... Read More →
avatar for Varshini Sriram

Varshini Sriram

Graduate Student, Data Science Institute, University of Virginia
Varshini is currently a graduate student at the University of Virginia pursuing her Masters in Data Science. Her research interests include Natural Language Processing and Computer Vision. She is currently working on the Collective Biographies of Women project at UVA where the goal... Read More →

Sponsors

Thursday April 11, 2019 10:30am - 11:05am EDT
Violet Crown: Theater 4 200 W Main St, Charlottesville, VA 22902, USA