ARCC RESEARCH PROJECTS

Read below about current and past projects that have involved partnerships with UW researchers as well as with outside organizations and industry. Projects involve a wide array of computational tools and methodologies and span a diverse range of fields, from athletics to medicine to oil & gas to the humanities.

Artificial Intelligence/Computer Vision/Machine Learning

Athlete Identification and Tracking using Computer Vision

PROJECT COLLABORATOR: UW Athletics

This project aims to revolutionize athlete identification and tracking in various sports at the University of Wyoming by developing an advanced system that leverages the power of computer vision and deep learning. The system comprises three distinct stages to achieve accurate and real-time player detection.

In the first stage, state-of-the-art computer vision models, namely groundingDINO (Zero-Shot Open-Set object detector) and YOLOv8 (Real-time image detection and object segmentation model), are employed for player detection. These cutting-edge models possess the ability to autonomously annotate images and dynamically track players in real-time, forming a robust foundation for subsequent processing.

The second stage centers around the precise localization of uniform numbers on the detected players using another convolutional neural network. This critical step ensures reliable association between players and their respective uniform numbers, essential for accurate player identification.

The final stage employs a third Convolutional Neural Network (CNN) to classify the integer value of the uniform number. Once identified, this information is mapped to the corresponding player roster name, culminating in a comprehensive and efficient player identification system. Furthermore, the system continually tracks the positions of the players during games, enabling continuous monitoring and analysis of player performance.

By harnessing the cutting-edge capabilities of computer vision and deep learning techniques, this proposed approach offers a pioneering solution to streamline athlete identification and tracking processes for the University of Wyoming. Beyond mere identification, the system has ability to provide valuable insights into player performance.

Predicting Cancer Progression using Inverse Reinforcement Learning (IRL)

PROJECT COLLABORATORS:

We are building an AI medical software prototype to predict cancer progression using genomic data. Our prior implementation of the Pop-Up Restaurant Inverse Reinforcement Learning has been successfully tested in colorectal cancer, but takes too long to train and uses impractical amounts of computational resources. We are now introducing an efficient DNA encoding method based on bidirectional encoder representations from transformers, which will significantly reduce the computational load and permit the use of larger training datasets, enabling higher quality of prediction. This approach will be applicable to other cancers, COVID variant prediction, and any other process of mutation-driven cellular evolution. This work is utilizing ARCC cluster resources and the Oracle for Research Cloud.

Image Recognition & Coordinate Tracking of Mice in Behavioral Experiments

PROJECT COLLABORATORS:

We are improving the performance of an image recognition software to track physical position of mice during behavioral experiments. The experimenter simultaneously records large-scale neural activity in vivo through a mini scope mounted in the animal’s skull. The scope, wires and the mounting bar obscure the field of vision for the recording camera, which prevents correct mouse identification by the software. We are undertaking two parallel approaches to remedy this situation. We are introducing improvements in the current Matlab code to help it better recognize mice, and also experimenting with a Python-based code that is purpose-built to recognize behaving mice in experimental videos. Once the recognition issues are resolved, the resultant mouse behavioral trajectories are intended to combine with simultaneous recordings of neuronal activities to establish mechanisms by which activity of individual neurons or neural ensembles codes animal’s behavior. This will aid in our understanding of neural circuit mechanisms of depression, autism, and dementia in the medial prefrontal cortex (mPFC), which all produce deficits in social behavior.

Optical Character Recognition (OCR)

Sentiment Analysis of the Beatles as a Cultural Phenomenon

PROJECT COLLABORATORS:

This project leverages large-scale optical character recognition and sentiment analysis to digitize text found in historical newspapers and extract information about a particular topic – namely, public attitudes towards the Beatles, a highly popular British music group from the 1960’s. At this point in project development, key steps have included: obtaining raw data, such as PDFs of historical newspapers mentioning the Beatles, via ProQuest (accessible through the Coe Library proxy), and pre-processed popular culture archives via collaborators at the Coe Library; using the open-source program Tesseract to perform optical character recognition (OCR) on the PDF documents in order to extract the text therein; extensive data cleaning to minimize errors and ensure accurate dates on articles; conducting sentiment analyses using the Python packages VADER, TextBlob, and SentiWordNet to determine the overall positive/negative emotions expressed in each newspaper article; and visualizing the changes in sentiment over time using MatPlotLib and Seaborn Python libraries.

Future steps include enhanced statistical analysis and expansion of the underlying dataset beyond articles from the New York Times and the Adam Matthew Popular Culture in Britain and America, 1950-1975 sources. Preliminary results are included below. The graph shows variations in sentiment expressed in articles related to the Beatles from the Popular Culture dataset over the duration of their careers, with notable dates indicated by red and pink lines. Positive values indicate positive language (such as “good”, “excellent music”, etc.), while negative values indicate the opposite.

Oil Well Cards Parsing for the Enhanced Oil Recovery Institute (EORI)

PROJECT COLLABORATORS:

The purpose of this project is to use Optical Character Recognition (OCR) to extract text from a large collection of over 100,000 digitized Oil Well Card files in the form of PDF documents, and to subsequently organize that text into spreadsheet files. The spreadsheet files will be sent to the EORI so that they can keep track of the oil well information found on the PDF documents. Tools utilized include Kitty Ranger and Tesseract.

Extracting Data from Radiocarbon Dating Cards of Anthropological Records

PROJECT COLLABORATORS:

Radiocarbon dating was invented nearly 70 years ago, and continues to be a crucial method for determining the age of historical objects, fossils and geological sites. Early records were compiled in the form of notched 5×8-inch cards, which still contain valuable information to modern researchers. Fred Johnson (1904-1994), an archaeologist at the Peabody Museum of Andover Academy, compiled 45,000 such cards for the dates 1959-1972 from all over the world, based on the reports and data published in the journal Radiocarbon. To make this information accessible to the scientists in our modern digital world, the University of Wyoming Libraries digitized the cards, and applied Optical Character Recognition (OCR) to the output. Our project focused on correcting and extracting the relevant fields from these records and organizing them for upload to the Canadian Archaeological Radiocarbon Database (CARD). Our Python codes automate this process, which can be used for other batches of cards of similar nature.

Poster Presented at the 2022 Rocky Mountain Advanced
Computing Consortium (RMACC):

 

Software Development & Performance Optimization

Blockchain Code Improvements for XRP & Ripple

PROJECT COLLABORATORS:

  • XRP

  • Ripple

  • UW Foundation

Working with XRP & Ripple, cryptocurrency & blockchain developer, to evaluate & improve existing encryption methods, using WebSocket protocol to build a network layer between server & end-user connections, and optimizing consensus algorithms to conduct validation of peer-to-peer transactions.

  • ARCC will host a Ripple validator in HPC Data Center

  • Debugging C++ software errors for decentralized XRP cryptocurrency blockchain

  • Evaluating & improving existing encryption methods

  • Using WebSocket protocol to build a network layer between server & end-user connections

  • Developing consensus algorithms to conduct validation of peer-to-peer transactions

Constructing a Graphics Processing Unit (GPU) Benchmarking Toolkit

The primary objective of this benchmarking project is to assess the specific ML/AI needs of researchers at the University of Wyoming and equip them with essential resources to enhance their projects. To achieve this, the ARCC Internship program is committed to creating a comprehensive benchmarking toolkit and data library.

The benchmarking toolkit encompasses a range of GPU hardware, including A10, A30, A40, A100, RTX A6000, V100-32gb, and V100-32Gb, which are commonly used in ML/AI workloads. By running various ML methods and algorithms through these GPUs, such as language models (e.g., BERT, GPT-2, DNABERT2), image recognition, classification, text-to-speech, and speech-to-text, the program gathers extensive performance data.

For instance, when examining the Disk IOP performance for GPT-style models, a notable observation emerged during the Docker Container-based test of GPT2. This model was chosen due to its representation of a modern early development unoptimized Language Model (LLM). During the fine-tuning process of GPT2 for three epochs on an NVIDIA A100 GPU, utilizing the OpenWebtext dataset, a consistent decrease in IOPS of approximately 13% was identified between each epoch. This observation sheds light on potential bottlenecks within the Data loading pipeline.

By collating and analyzing such data, the benchmarking project aims to provide researchers with a comprehensive and objective understanding of each GPU's performance characteristics, enabling them to make well-informed decisions when selecting hardware for their specific ML/AI tasks.

One significant aspect of this project is the consideration of emerging technologies. While the ARCC Internship program does not directly incorporate Graph Core into its infrastructure, it dedicates efforts to thoroughly evaluate this novel compute hardware. By doing so, the program ensures researchers are well-informed about potential advancements in hardware that may impact their projects.

A key motivation behind this initiative is the commitment to transparency and avoiding marketing embellishments. The ARCC Internship program recognizes that companies may present their GPU technologies with a marketing bias. Therefore, the benchmarking project seeks to provide unbiased and factual data that researchers can rely on for their hardware purchase decisions.

Ultimately, the ARCC Internship program aims to empower researchers and cluster management at the University of Wyoming with the necessary tools and information to optimize their ML/AI projects. By offering transparent and accurate performance data, researchers can make informed choices on which GPU to deploy for specific algorithms, leading to more impactful research outcomes.

Through the development of this benchmarking toolkit, the ARCC Internship program cements itself as a crucial asset to the research community at the University of Wyoming, providing a solid foundation for advancing ML/AI research in the future.

Big Data Analysis using Parallelization

Automating Workflow for Wyoming State Veterinary Lab Pathogen Study

PROJECT COLLABORATORS:

  • PhD Candidate in Molecular Biology: Yasin M. Ahmed

  • Wyoming State Veterinary Laboratory (WSVL)

In this study, an automated genome analysis workflow was developed in order to identify bacterial isolates from infected wildlife samples collected by the Wyoming State Veterinary Lab. The Nextflow platform was utilized to combine individual bioinformatics programs into a single pipeline that was deployed on ARCC’s HPC cluster. The development of this data analysis pipeline in Nextflow allowed for the rapid, efficient, and effortless processing of a large dataset.

Eddy Covariance Atmospheric Measurement

PROJECT COLLABORATORS:

Analyzing vectors (wind speeds, temperature, pressure) to describe local climate conditions in order to determine climate change impacts. Creating a cutting-edge, fully-automated computational workflow using Nextflow. Producing a modular, easy to parallelize & maintain workflow that allows for multi-faceted analysis and is operational on a range of computational systems

Project Documentation

  File Modified

PNG File *image (6)-20230516-181914.png

Jul 26, 2023 by Lisa Stafford

PNG File image-20220902-192753.png

Jul 26, 2023 by Lisa Stafford

PNG File 3blob_case.png

Jul 26, 2023 by Lisa Stafford

PNG File swn_analysis_pop_cleaned_ii_beatles_fuzzy_aug_15_text_cleaned_dates.png

Jul 26, 2023 by Lisa Stafford

PNG File H3uPy0YA5vo65UwVfqZ12b_lL5EdHnOEQS58XD9P6Gp36SuLxInPmKURKVMxr3Y35m6sGPEToVUE9texIfOKspSTk8H0sqGeKnrcLhhXifWvManjjqkMIuSJflU5_buytW8aL3iPrmEbHn51Ww

Jul 26, 2023 by Lisa Stafford

PNG File 3cols.png

Jul 26, 2023 by Lisa Stafford

PNG File H3uPy0YA5vo65UwVfqZ12b_lL5EdHnOEQS58XD9P6Gp36SuLxInPmKURKVMxr3Y35m6sGPEToVUE9texIfOKspSTk8H0sqGeKnrcLhhXifWvManjjqkMIuSJflU5_buytW8aL3iPrmEbHn51Ww.png

Jul 26, 2023 by Lisa Stafford

PDF File 2022.RMACCposter_RadiocarbonDatingCards.pdf

Jul 26, 2023 by Lisa Stafford

PNG File RMACC-Transparent-logo.png

Jul 26, 2023 by Lisa Stafford

PNG File Chart 1.png

Jul 26, 2023 by Lisa Stafford

PNG File Screenshot 2023-07-26 at 4.34.08 PM.png

Jul 26, 2023 by Lisa Stafford

PNG File Overlay of Images (from Poster).png

Jul 26, 2023 by Lisa Stafford

JPEG File The Beatles 2.jpeg

Jul 26, 2023 by Lisa Stafford

PNG File image.png

Jul 26, 2023 by Lisa Stafford

JPEG File *20230516_123358.jpg

Jul 26, 2023 by Lisa Stafford

JPEG File *Athletics Project Team (CROPPED).jpg

Jul 26, 2023 by Lisa Stafford

JPEG File XRP Coin (Cropped).jpg

Jul 26, 2023 by Lisa Stafford

JPEG File XRP Coin.jpg

Jul 26, 2023 by Lisa Stafford

JPEG File XRP.jpg

Jul 26, 2023 by Lisa Stafford

PNG File WSVL Project Photo 2.png

Jul 27, 2023 by Lisa Stafford

PNG File WSVL Project Photo.png

Jul 27, 2023 by Lisa Stafford