Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The Advanced Research Computing Center (ARCC is looking for dedicated students with high work ethic and professional attitude to engage in a range of research, software development and system administration projects. We pay well, will send you to conferences, aim to expose students to advanced technologies and help with internship opportunities with Industry.

Table of Contents

Potential projects to engage in Fall 2022

Note the projects that have run at ARCC during the Summer of 2022: some of them could still use additional help.

Inverse Reinforcement learning to predict cancer progression

Collaborators:

...

We are building an AI medical software prototype to predict cancer progression using genomic data. Our prior implementation of the Pop-Up Restaurant Inverse Reinforcement Learning has been successfully tested in colorectal cancer, but takes too long to train and uses impractical amounts of computational resources. We are now introducing an efficient DNA encoding method based on bidirectional encoder representations from transformers, which will significantly reduce the computational load and permit the use of larger training datasets, enabling higher quality of prediction. This approach will be applicable to other cancers, COVID variant prediction, and any other process of mutation-driven cellular evolution.

Bioinformatics workflow automation

We will continue working on various projects automating bioinformatics workflows in collaboration with UW faculty and external entities. This will be similar to the microbial pipeline workflow below, as well as these past works:

Automate the ARCC cluster utilization report system

This project builds on the “utilization analysis“ work we did in Summer 2022, see below. Now that we know what measures we want to track, we need an automated system to do it. This will involve building a database of cluster jobs and storage utilization. Then current Python scripts will need to be adapted to mine this database for a specific time frame in order to produce appropriate reports that can be shared with the UW faculty and administration, posted on the ARCC website for public consumption, or reported to the State governor’s office.

Assist with system administration

We are looking for technically minded students who would like to engage in HPC cluster administration tasks and help us out with hardware installation, wiring, testing; software installation, configuration and testing; user support, etc. Training will be provided.

Code improvements for scientific web applications

ARCC administers a number of research web applications that have been written a few years ago and could use code improvements, upgrade to current security standards, and incorporation of best practices. Work will involve investigating how a particular web server is set up with regards to interaction between HTML and PHP codes, database querying, Apache setup etc. Once the setup has been understood, improvement or re-implementation will proceed under the supervision of ARCC staff and the faculty stakeholders.

VM development for standardized research and training environments

ARCC is engaged with a number of training initiatives on campus that require standardized software environments to teach software coding, such as specific IDEs, conda environments, specific versions of software installed etc. We need to build a range of virtual machines that would support such environments for each class or workshop being administered, so that the VM could be supplied on WyoLearn for asynchronous learning.

Engagement with Pacific Research Platform

Pacific Research Platform gives compute access for researchers to perform very large parallel GPU-based computations https://pacificresearchplatform.org . We would like to be able to support UW faculty, staff and students in deploying their workloads on this platform. However, it has a somewhat unusual setup. We need a brave student or three to deploy a project on this platform and figure out how to use it, so as to expand the ARCC bandwidth in supporting users on PRP.

Projects that have run in Summer 2022

Automating A Workflow For High Throughput Genomic Analysis Of Wildlife Pathogens In Wyoming

Image Removed

Automated Workflow Management system makes it possible to orchestrate multistep, complex, time-consuming processes in a well-organized, parallelized, reproducible fashion. In our current study, we developed an automated genome analysis workflow to identify bacterial isolates from infected wildlife samples using Nextflow platform. For that purpose, individual bioinformatics programs were channeled together in a single pipeline deployed on the Teton HPC cluster at the University of Wyoming. The workflow was optimized and benchmarked to run in a parallel manner on very large sample sizes, utilizing a big portion of the cluster in a short amount of time. Whole genome sequencing technologies are becoming robust and inexpensive. Yet the cost of computational analysis and the human effort in deploying and maintaining the code is still very significant. Our objective was to develop a data analysis pipeline that can process very large datasets in a rapid, efficient, standardized manner using the high performance Nextflow platform. This enables the discovery of the microbial groups linked to wildlife diseases researched at the Wyoming State Vet Lab.

Strategies for Correcting and Extracting Fields from Optical Character Recognition Products

Image Removed

Radiocarbon dating was invented nearly 70 years ago, and continues to be a crucial method for determining the age of historical objects, fossils and geological sites. Early records were compiled in the form of notched 5×8-inch cards, which still contain valuable information to modern researchers. Fred Johnson (1904-1994), an archaeologist at the Peabody Museum of Andover Academy, compiled 45,000 such cards for the dates 1959-1972 from all over the world, based on the reports and data published in the journal Radiocarbon. To make this information accessible to the scientists in our modern digital world, the University of Wyoming Libraries digitized the cards, and applied Optical Character Recognition (OCR) to the output. Our project focused on correcting and extracting the relevant fields from these records and organizing them for upload to the Canadian Archaeological Radiocarbon Database (CARD). Our Python codes automate this process, which can be used for other batches of cards of similar nature.

Optical character recognition and sentiment analysis of the Beatles as a cultural phenomenon

...

  • Students: Milana Wolff

  • Faculty: Prof. Kent Drummond

  • Collaborating campus units: The Libraries and the English Department

This project leverages large-scale optical character recognition and sentiment analysis to digitize text found in historical newspapers and extract information about a particular topic – namely, public attitudes towards the Beatles, a highly popular British music group from the 1960’s. At this point in project development, key steps have included: obtaining raw data, such as PDFs of historical newspapers mentioning the Beatles, via ProQuest (accessible through the Coe Library proxy), and pre-processed popular culture archives via collaborators at the Coe Library; using the open-source program Tesseract to perform optical character recognition (OCR) on the PDF documents in order to extract the text therein; extensive data cleaning to minimize errors and ensure accurate dates on articles; conducting sentiment analyses using the Python packages VADER, TextBlob, and SentiWordNet to determine the overall positive/negative emotions expressed in each newspaper article; and visualizing the changes in sentiment over time using MatPlotLib and Seaborn Python libraries. Future steps include enhanced statistical analysis and expansion of the underlying dataset beyond articles from the New York Times and the Adam Matthew Popular Culture in Britain and America, 1950-1975 sources. Preliminary results are included below. The graph shows variations in sentiment expressed in articles related to the Beatles from the Popular Culture dataset over the duration of their careers, with notable dates indicated by red and pink lines. Positive values indicate positive language (such as “good”, “excellent music”, etc.), while negative values indicate the opposite.

Image recognition and coordinate tracking of mice during behavioral experiments

Image Removed

We are improving the performance of an image recognition software to track physical position of mice during behavioral experiments. The experimenter simultaneously records large-scale neural activity in vivo through a mini scope mounted in the animal’s skull. The scope, wires and the mounting bar obscure the field of vision for the recording camera, which prevents correct mouse identification by the software. We are undertaking two parallel approaches to remedy this situation. We are introducing improvements in the current Matlab code to help it better recognize mice, and also experimenting with a Python-based code that is purpose-built to recognize behaving mice in experimental videos. Once the recognition issues are resolved, the resultant mouse behavioral trajectories are intended to combine with simultaneous recordings of neuronal activities to establish mechanisms by which activity of individual neurons or neural ensembles codes animal’s behavior. This will aid in our understanding of neural circuit mechanisms of depression, autism, and dementia in the medial prefrontal cortex (mPFC), which all produce deficits in social behavior.

GPU benchmarking

In this project we are learning how to benchmark performance of AI workloads at a very deep level, such as the IO movenet between CPU and GPU, to and from memory on either chip, as well as CPU and GPU cycles, and disk IO.

ARCC cluster utilization analysis

In this project we are building a software toolkit to gain insights into the ARCC cluster utilization, in terms of cycle usage per user and per job, disk utilization over time, as well as any discernable patterns such as utilization by department, variation of cycle usage as a function of proposal deadlines or time of year (Summer vs academic year), and similar.

attachments) at the University of Wyoming is home to a High Performance Computing (HPC) cluster that provides secure data storage and the capacity for conducting high-level research projects. Our internship program (both paid & unpaid positions are available) began in early 2022 and has grown into its current size of 20 interns working in a variety of capacities. If you are interested in joining the ARCC team, please read on!

...

We accept students from all educational levels!

Whether you are enrolled in an Undergraduate, Master’s, or PhD program, or you are a recent graduate, we encourage you to apply for a position.

ARCC Interns come from a wide variety of academic fields!

You do not need to be a computer scientist to apply for an internship position, as ARCC projects encompass a wide array of academic fields and industries. As long as you have an interest in computational research and are willing to learn, we will train you in the specific skills that you will need to be assigned to a project team. Mechanical engineering? Veterinary science? Anthropology? Music? We are always seeking new ideas toward which we can apply our HPC resources and computational research skillset.

Benefits of Working at ARCC

  • Competitive wages

  • On-the-job training

  • Projects span a wide range of industries & academic fields

  • Team collaboration with other interns and project partners

  • Flexible schedule with remote work option

  • Attend academic and industry conferences

  • Access to scholarship opportunities

  • Utilize your project work for your Master’s or PhD thesis

  • Full-time hours possible during summer and other UW breaks

  • Opportunities for advancement

  • ARCC projects make an impact in the world!

Internship Positions Available in the Areas of:

Research

Join a project team to conduct computational research across a range of fields (see “Types of Projects” description for more details).

Education

Assist ARCC staff in the planning, marketing, & delivery of workshops & trainings offered to UW students & faculty as well as non-UW entities. Serve as Teaching Assistant during the delivery of workshops & trainings on a variety of topics relating to coding & cluster computing, including providing technical support for both instructors and session attendees. Teach occasional lower-level workshops & trainings.

System Administration

Assist the ARCC Infrastructure team with HPC cluster administration, including hardware & software installation, configuration, & testing, as well as user support.

Project Management

Oversee the progress of specific research project teams, including organizing weekly meetings, creating agendas, maintaining meeting notes, and consulting with individual team members and project partners regarding updates and planning for next steps.

Media Relations

Create an online presence for ARCC via social media, including Facebook, Twitter, Instagram, and YouTube. Keep apprised of events and program activities in order to share them with the public via these platforms. Coordinate with ARCC leadership to create photos and videos of workshops and other important events.

New User Support

Engage with new users of the ARCC cluster to offer guidance and support, including resolving issues and providing supervision of individual usage of HPC resources. Host monthly meetings and/or office hours for in-person assistance.

Requirements

  • Team player

  • Willingness to learn new skills

  • Strong work ethic

  • Professional attitude

  • [Paid Positions] Ability to commit to at least 10 hours per week during the school semester

  • [Unpaid Positions] Ability to commit to at least 5 hours per week during the school semester

Preferred Skills

  • Intermediate-level coding in Python

  • Cluster computing background

  • Leadership skills

Types of Projects

ARCC projects are broad-reaching, tackling everything from biomedical research to blockchain development to video analysis. We have done work in the fields of biology, anthropology, English, mathematics & statistics, biomedical engineering, animal science, computer engineering, botany, and athletics. Our partners are entities across a multitude of industries, including oil & gas, health care, agriculture, technology, animal & wildlife, finance, education, biotechnology, entertainment, and energy.

For more specifics on some of our past and current projects, CLICK HERE!

Interested in learning more or applying for future positions?

Contact UW Advanced Research Computing Center for more information at arcc-info@uwyo.edu.