Bioinformatics and Computational Biology Blog: 2011

Monday, September 12, 2011

Protein Sequence Similarity Search Using pblast Tool

Algorithm:

1.Go to NCBI website (http://www.ncbi.nlm.nih.gov/) and select the Protein database from drop down menu.

2.Enter the Name or Accession id of the query protein sequence.

3.Get the sequence in Fasta format.

4. Now open the Blast tool page using this link (http://blast.ncbi.nlm.nih.gov/Blast.cgi) and select pblast for protein sequence similarity.

5.Paste your query protein sequence in the box provided.

6.Enter the job title and let the parameters default (it can be modified as the per your search requirement).

7.click on the "blast" tab to start database search.

8. Get the result.

Interpretation:

It will include following things:

Total score:

Maximum score:

It gives the hits in descending order of their similarity.

E-value:

Essentially, it describes the random background noise that exists for matches between sequences.

It is a parameter that describes the number of hits one can "expect" to see just by chance when searching a database of a particular size. It decreases exponentially with the Score (S) that is assigned to a match between two sequences.

The e-value is used as a convenient way to create a significance threshold for reporting results. When the e-value is increased from the default value of 10, a larger list with more low-scoring hits can be reported. On the other hand, a lower e-value will result in a shorter list with more quality hits.

Saturday, September 10, 2011

Basic Local Alignment Search Tool(Blast)

The comparison of nucleotide or protein sequences from the same or different organisms is a very powerful tool in molecular biology. By finding similarities between sequences, scientists can infer the function of newly sequenced genes, predict new members of gene families, and explore evolutionary relationships. Now that whole genomes are being sequenced, sequence similarity searching can be used to predict the location and function of protein-coding and transcription-regulation regions in genomic DNA.

It is the tool most frequently used for calculating sequence similarity. Blast comes in variations for use with different query sequences against different databases.

Blast uses heuristics(experience-based learning techniques) to align a query sequence with all sequences in a database. The objective is to find high-scoring ungapped segments among related sequences. The existence of such segments above a given threshold indicates pairwise similarity beyond random chance, which helps to discriminate related sequences from unrelated sequences in a database.

Varients of Blast:

Blastx: Search protein database using a translated nucleotide query.

Tblastn: Search translated nucleotide database using a protein query.

Tblastx: Search translated nucleotide database using a translated

nucleotide query.

Protein blast: Search protein database using a protein query.

Nucleotide blast: Search a nucleotide database using a nucleotide query.

Explanation for blast Algorithm with an Example :

1. Taken a query protein sequence : VRDKMLTYS

2.Parse every three residues used in Blast word database searching.

3. suppose one of the three residues in given word finds matches in the database.

Query ............DMK DMK DMK DMK............

Database ............DMK DTK DHK DML............

4. Calculate sums of match scores on BLOSUM62 matrix.

Query ............DMK DMK DMK DMK............

Database ............DMK DTK DHK DML............

Sum of score 15 12 10 10

5. Find the database sequence corresponding to the highest score word match and extend alignment in both the directions.

Query .............VR DMK LTYS............

Database ............VK DMK LTRS............

6. Determine high score segment above a threshold(minimum required) score

Query ............V R D M K L T Y S......

Database ............V K D M K L T R S............

2 3 15 1 -1 -3 2

total score : 19

Web Address for blast tool:

http://blast.ncbi.nlm.nih.gov/Blast.cgi

Friday, September 9, 2011

Rasmol: A Protein structure visualization tool

RasMol is a molecular graphics program intended for the visualisation of proteins, nucleic acids and small molecules. The program is aimed at display, teaching and generation of publication quality images.

The program reads in molecular coordinate files and interactively displays the molecule on the screen in a variety of representations and colour schemes. Supported input file formats include Protein Data Bank (PDB), Tripos Associates' Alchemy and Sybyl Mol2 formats, Molecular Design Limited's (MDL) Mol file format, Minnesota Supercomputer Center's (MSC) XYZ (XMol) format, CHARMm format, CIF format and mmCIF format files.

The loaded molecule can be shown as wireframe bonds, cylinder 'Dreiding' stick bonds, alpha-carbon trace, space-filling (CPK) spheres, macromolecular ribbons (either smooth shaded solid ribbons or parallel strands), hydrogen bonding and dot surface representations.

Atoms may also be labelled with arbitrary text strings. Alternate conformers and multiple NMR models may be specially coloured and identified in atom labels. Different parts of the molecule may be represented and coloured independently of the rest of the molecule or displayed in several representations simultaneously.

The displayed molecule may be rotated, translated, zoomed and z-clipped (slabbed) interactively using either the mouse, the scroll bars, the command line or an attached dial box.

RasMol can read a prepared list of commands from a 'script' file (or via inter-process communication) to allow a given image or viewpoint to be restored quickly. RasMol can also create a script file containing the commands required to regenerate the current image. Finally, the rendered image may be written out in a variety of formats including either raster or vector PostScript, GIF, PPM, BMP, PICT, Sun rasterfile or as a MolScript input script or Kinemage.

This software is freely available.

Download link: http://www.openrasmol.org/#Software

Thursday, September 8, 2011

Autodock: Potein-ligand binding tool

AutoDock Vina is a new open-source program for drug discovery, molecular docking and virtual screening, offering multi-core capability, high performance and enhanced accuracy and ease of use. AutoDock Vina has been designed and implemented by Dr. Oleg Trott in the Molecular Graphics Lab at The Scripps Research Institute. Vina uses the PDBQT molecular structure file format used by AutoDock. PDBQT files can be generated (interactively or in batch mode) and viewed using MGLTools. The structures of the molecules being docked and the specification of the search space including the binding site is required. The summary automatically remains in sync with the possible usage scenarios.

How to use AutoDock ?

The receptor and ligand are required in PDBQT format which is obtained

by following the given procedure :

1. File Read Molecule (Select Receptor)

2. Edit Hydrogen’s Add All Hydrogen OK

3. Color By Atom Type All geometries Ok

4. Select Select from String Add Dismiss

5. Edit Delete Delete Atom set Continue

6. Edit Hydrogen’s Add Polar only OK

7. File Save as Write PDB OK

8. Ligand Input open (select ligand) OK

9. Ligand Torsion tree Detect Root

10. Ligand Torsion Tree Show Root Expansion

11. Ligand Torsion Tree Choose Root Done

12. Ligand Set number of Torsions (enter6) Dismiss

13. Ligand Output Save as PDBQT Save

14. Grid Macromolecule Choose OK Save (as PDBQT)

15. Grid Grid Box (Center on macromolecule) File Close Saving Current

How to run AutoDock Vina ?

In order to run Auto dock vina, the following command prompt is to be used:

vina.exe --receptor name.pdbqt --ligand name.pdbqt --center_x value --center_y value --center_z value --size_x value --size_y value --size_z value --out filename.pdbqt --log filename.txt

At the end of the run, log and output files are obtained. The log file contains the energy values and rmsd values. The output files contains the docked molecule which can be visualized using pymol.

For further help go to this link :

autodock.scripps.edu/.../tutorial

and to download the software link is provided to the right side of page, in bioinformatics software section.

Wednesday, September 7, 2011

Research opportunities in Bioinformatics

Biology easily has 500 years of exciting problems to work on. By developing techniques for analyzing sequence data and related structures, we can attempt to understand molecular basis of life. Bioinformatics is a relatively new interdisciplinary science. It relates to the use of information technology in the field of molecular biology and involves the application of computer technology for analysis and management of biological data. For example, various methods such as graph-theory, general network analysis techniques, Boolean networks, Petri net formalism etc., are used to explore different biochemical systems. Various international universities offer courses and project at the bachelor, master and doctorate levels to gain expertise in the field.

Currently at the Frankfurt University, the bioinformatics department is using systems to cover metabolic networks, signal transduction networks, and gene-regulatory networks with the major focus on network validation techniques and network reduction approaches. Because of the size and complexity of biochemical networks, models of them have to be investigated for their correctness and completeness.

This ranges from protein structure analysis to computational systems biology. the scope of bioinformatics research is in algorithm development to solve problems of molecular biology.

Computational systems biology involves:

Qualitative and quantitative modeling of biochemical processes in metabolic networks, in signal transduction network and in gene regulatory networks.

Adaption and extension of Petri net and Boolean techniques for exploring biochemical systems/pathways, modelling of cell communication.

Structural Bioinformatics involves:

Protein structure topology, protein structure comparison.

Protein-Ligand interactions, protein-protein interactions.

Structural aspects of alternative splice site.

Some other career areas that fall within the scope of bioinformatics include:

Biological image data analysis. Sequence analysis for specific sites within a genome Protein interaction and pathway study.

Homology modelingforanalyzingthe relationship between structure and function of a protein. Molecular modelingof small molecules for drug target information. Multiple network cluster study.

These approaches are reflected in the main aims of the field, which are to understand and organize the information associated with biological molecules on a large scale. As a result, bioinformatics has not only provided greater depth to biological investigations, but added the dimension of breadth as well. In this way, we are able to examine individual systems in detail and also compare them with those that are related in order to uncover common principles that apply across many systems and highlight unusual features that are unique to some. Currently every university offering courses in Biological Sciences around the world also offer Bioinformatics courses as well, as wetlaband insilicolabgohand in hand.

Tuesday, September 6, 2011

Career Avenues in Bioinformatics

Biology in the 21st Century is no longer purely laboratory based, but is becoming an information-based science as well. Bioinformatics is an emerging discipline which combines the latest advances in genetics and biochemistry with the powerful tools of computer science and statistical analysis. It aims at collecting, storage, analysis and merge of biological data using computer technology. It is an interdisciplinary area and today a large number of pharmaceutical and life sciences firms are increasingly placing emphasis on IT investments related research to gain competitive advantage by reducing time for solving complex problems. The field of Bioinformatics is rapidly evolving and growing. New methods of storing and accessing data are needed for Biologists to make efficient use of the data. Bioinformatics is a highly specialized technical field. A bioinformatician should be well versed in biology and must possess specialized skills in the IT domain. It lays a lot of stress on Information Technology for biological data processing which includes learning advanced programming languages such as Java, Perl, R-programming, Oracle and Net.

Individuals with the skills to work on the interface between computer science and molecular biology are in high demand in biotechnology, healthcare and pharmaceutical industries, government and universities; "bioinformatics specialist" is one of the top 30 new and emerging occupations. The career prospects in the field have been steadily increasing with more and more use of information technology. Bioinformatics professionals have multiple career options in all sectors of Biotechnology, pharmaceutical and biomedical sciences, in research institutions and also in the IT industry. Career opportunities in this field encompass a broad range of professions, from molecular modeling and molecular bioscience research, to software development and database management, to biostatistics and disease-related informatics.

Some of the specific career areas that fall within the scope of bioinformatics involve database design and maintenance, sequence assembly, analysis and proteomics(study of protein structure and functions) pharmacology(computer aided drug design), clinical pharmacologist .computational chemist, bio-analytics, Software developer etc. Recently advertised positions include bioinformatics programmer, associate bioinformatics scientist, software analyst, biostatistician, scientific applications manager, genome analyst, bioinformatics analyst, software engineer, and many others. Companies like Biocon, Reliance, Satyam, Accenture, Accelrys, IBM life sciences, Silicon genetics offer good employments to bioinformatics candidates.
- Lanwin Lobo

Monday, September 5, 2011

Bioinformatics: Feeling the forward Momentum

The past few years in the life science research included collection of hoards of data. A crucial challenge in the future of bioinformatics involves putting that data to work. now life scientists hope to plan large experiments, collect loads of data, analyze it, compare data between experiments, and eventually combine all of that information to improve basic theories, biotechnology, and medicine. to realize this effect, though, life scientists need tools to make data, keep track of it, run it in models, and more. A series of new techniques and tools will help all biologists feel this forward momentum in bioinformatics. A growing list of novel tools including hardware and software creates new power for exploring applied and theoretical life sciences.

Computing in Clusters

computers participate in data analysis,ranging from accessing high throughput data and sequencing single nucleotide polymorphism to analyzing microarrays and experiments in proteomics. The biggest challenge is reducing the dimensionality of these data so that scientists can understand them. to do that, computing should combine data mining with biological insight. Second, computers can run in-silico models that test biological theories. computing advance relies on tightly coupled clusters of processors. Many processors can be connected with high levels of communication between them to work as a team. Tightly coupled clusters work very well for many applications, including simulating molecular biology, chemical kinetics, protein folding and so on.

Software for Sequencing

Bioinformatics software play a variety of roles in the general field of sequencing, including assembling genomes and identifying genes and regulatory elements. software also helps investigators analyze similarities and differences between genes and organisms. Several dozen companies including DNASTAR, InforMax and Nonlinear Dynamics create software for manipulating genes and DNA sequences. These softwares perform many tasks: sequence assembly and finishing, primer design, gene discovery and annotation, sequence pair and family alignment with phylogeny, restriction site analysis and mapping, and protein structure analysis.

Integrated Analysis

Proteomics also requires new approaches to bioinformatics. Protein studies often include data from a wide varierty of experiment, including mass spectroscopy, protein chips, and two-dimensional gel electrophoresis. As a result, scientists need tools that keep track of data relate one data set to another. Companies like Amersham Biosciences, Bio-Rad, and Oxford Glycosciences offers those very products. they provide data collection and analysis from various applications, including sequencing, microarrays and proteomics.

Sunday, September 4, 2011

Latest Studies in Bioinformatics

Technology advancement have made our life much easier. we can predict disease, identify genes and discover cells with amazing properties and also visualize them.

Predict cancer in me!

Scientist have developed a new technology that detects distinct genetic changes differentiating cancer patients from healthy individuals and could serve as a future cancer predisposition test. the research team, has created a design for a new DNA Microarray that allows them to measure the 2 million microsatellites (sort, repetative DNA sequences) found within the human genome using 3,00,000 probes.

Found it!( new marker for biliary atresia identified).

Biliary atresia is congenital disease, leading to blockage in tubes carrying bile fluid from liver to gall bladder. Researchers have identified RRAS gene and its related MAPK pathway that play a vital role in the pathogenesis and serves a a noble prognostic marker for biliary atresia. Microarray technology has been used to study the mechanism and allows the simultaneous analysis of thousands of transcripts within a single experiment. Some studies have been performed to investigate the gene expression profiling of livers from BA patients. How ever none of them was designed to identify genes that play a key role in the pathogenesis an prognosis of BA.

Gene behind rare skin cancer that heals itself discovered.

Scientists from the institute of medical biology(IMB) have identified the gene behind a rare skin cancer, which grows rapidly for a few weeks before healing spontaneously. The peculiar behaviour of this rare self-healing cancer, called multiple self-healing squamous epithelioma(MSSE), was discovered to be caused by a failture in the gene called TGFBR-1, which is a key component of a signalling pathway that can also be impaired in other cancers. MSSE patients with faulty TGFBR-1 develops lots of small tumors- but at some point there is a switch in the behaviour and the tumors lacking TGFBR-1 start to shrink and heal by themselves. The research was published in Nature Genetics Today.(ANI).

Scientists develop 3D imaging of individual living cells

A team of scientists at the Arizona State university is working to build a next-generation, 3-dimensional imaging microscope, called a " Cell-CT" scanner, that will perform functional computed tomography (CT) imaging of individual living cells to provide a transformative view of biological structural an functional interrelationships at the single cell level. The Cell-CT scanner may enable, for the first time, rapid 3D spatial localization of protiens, and assessment of their concentration in subcellular compartments and Microdomains, providing powerful insights concerning relationships between cell structure and function in disease. Thus enables scientists to gain new insights into the metabolic pathways of disease, such as cancer.

Join Researchgate to connect with scientists all over the world....

http://www.researchgate.net/profile/Jitendra_Gupta3/
you need to make your login.....

Get any book of any field at one point....... !!!!!

Go to www.library.nu and make your login page there (login is free of cost) then confirm your login.
now start downloading as many books as you want....

Saturday, September 3, 2011

Scope of Bioinformatics in Diverse Fields

It’s exciting. Any career, which integrates biology with computers falls under the field of Bioinformatics. It is also called computational biology. Bioinformatics is application of statistics and computer science to the field of molecular biology. For example, it makes use of Statistical methods such as hypothesis testing and estimation, Poisson processes, Markov models and hidden Markov models to search for patterns within a set of biological data. Such patterns can be used to determine diagnostic biomarkers for a particular disease, to measure the efficacy of a particular medical treatment, compare DNA sequences for similarity in order to define relatedness, such as between man and mouse, determine what biological responses are presented by surviving versus dying patients, and predict biological pathways. All these applications help to improve the quality of human life.

Bioinformaticians are not computer Programmers; they are scientists who use computer to analyze huge volumes of information. From a simple point of view, both of the “Omics” makes use of bioinformatics to analyze huge amounts of data: genomics for genes and proteomics form proteins.

In almost every field of biology and chemistry there are huge numbers of machines that collect large amounts of 2D, 3D, and even 4D+ data. Things that collect images include microarrays, crystallography, NMR (after processing), electron microscopy, and many other techniques. In electron microscopy, specifically, image processing is the “rate-limiting step.” Basically, the way electron microscopy works is that you take many (thousands, millions, billions) of images of complex macromolecular assemblies on a surface. They are oriented randomly (e.g. rotation in x, y, and z are random), and so automated programs need to start collecting the images and classify which ones look similar, assume they have similar x/y/z rotation values. Based on this information, a 3D structure is then generated.

MRI, CT, PET and all the other medical imaging techniques also generate huge amounts of date, and are more complicated because of movement (heart beating, blood flow, patient movement, etc.).

Application of bioinformatics to biochemistry is done in protein structure determination. Right now, an important area of bioinformatics is “trans-membrane domain prediction”. Using knowledge of properties of the various amino acids in a protein, as well as Hidden Markov Model, some programs can predict how a putative gene’s protein product might wave itself back and forth through a cell membrane.

Also, there is an application of small molecule interactions, where one looks at how small drug molecules will bind to proteins or other drug targets. This process is called docking and contributes to the field of drug discovery.

As a cutting-edge bioinformaticist, one can apply the vast biochemistry knowledge to design the programs that determine membrane interaction and small molecule binding. Thus, these areas make ‘Bioinformatics’ a very vibrant and dynamic field of study, with a wide variety of useful applications.

- Jitendra Gupta

Search This Blog