Hackpads are smart collaborative documents. .

Steve Slota

317 days ago
The South Big Data Hub Community Engagement, Diversity and Partnerships Working Group has initially identified use cases as an essential way to educate and help new entrants to data science or HPC navigate the decisions surrounding which cyber-infrastructure tools are appropriate for a given project. The working group seeks to initially collect tangible use cases from different domains, forms of cyber-infrastructure, and software tools. 
 
Renata R Text mining in Jobs and Industry: Trey Grainger, Lucidworks (formerly Career Builder) 
Text mining in Business: Matt DeAngelis, Georgia State
Text mining in Urban Planning: Gordon Zhang / Subhro Guhathakurta, Georgia Tech
Text mining in Health: Charity Hilton / Jon Duke, MD, Georgia Tech GTRI/CSE/CS
Text Mining in Education: Renata Rawlings-Goss, South Big Data Innovation Hub
 
Participants:
Renata Rawlings-Goss, co-Executive Director, South Hub
Suranga E Suranga Edirisinghe, HPC Facilitator, GSU
Steve S Stephen Slota, University of California, Irvine/Socio-Technical Team 
Andrew H Andrew S. Hoffman, HCDE/U. Washington/Socio-Technical Team 
Renata R Jinfeng Zhang, Florida State Univ - Statistics/ Text Mining
Giti Javidi, University of South Florida 
Melissa Cragin, Midwest Big Data Innovation Hub
Renata R Trey Grainger, Lucidworks
 Matt DeAngelis, Georgia State
 Gordon Zhang, Georgia Tech
 
Notes: (Feel free to add questions, links or ideas below) 
Trey: Looking into advanced use of Solr to automate some of the common features and integrations. 
What is next: Working toward a text mining spectrum from keyword search to self learning
 
Matthew D Matt: Value-relevant information (such as business executive tone) in regulatory reports and conference calls.
 
Renata R Tools: Is there a link to Apache UIMA
 
Renata R Gordon: How to identify a charming neighborhoods? Looking at Twitter and news articles? 
Ge Z Collect the feedbacks from twitter users in that neighborhood and count how many persons would show positive and negative attitudes towards their neighborhood. If the number of the positive feedbacks is more than the negative one, we think this neighborhood is sort of charming neighborhood.
News articles are only for creating filters to get initial possible related datasets
Renata R Models to test the outcome of policy to changes neighborhoods
Ge Z We are gonna develop a predictive model based on the independent variable of the objective measures of the neighborhood' s characteristics. The attitude of the twitter users will be used as dependent variable. 
Renata R App to choose a neighborhood
Ge Z It is still under developing. 
 
Renata R Charity: Health Data 
Validation Tools
Integrating with structured clinical test with machine learning and prediction
Tools: Solr, Hadoop (HBase) and Spark
Clinical specific Tools: CTakes(UIMA), MedDRA - Family History Discovery.
Text mining: Finite State Machines, n-grams, LDA (Bedrock)
What Next: 
Automate phenotype generation
 
What are the benefits of Solr vs UIMA?
UIMA: extracting information out of documents
Matthew D Solr: Searching and indexing
Question: Is Solr in-memory, or can you run the service on modest hardware?
Answer from Trey: You can run the system on modest hardware with a little more lag for querying.
 
 

Contact Support



Please check out our How-to Guide and FAQ first to see if your question is already answered! :)

If you have a feature request, please add it to this pad. Thanks!


Log in